There are some great tools and frameworks available that allows any business to start using advanced analytics and machine learning on top of their business data and start to use the value levers of the models to see business results.
While in the past, statistical analysis meant using tools like R, python, and others, there are now several augmented analytics tools available that have simplified the process of statistical analysis and model development. These tools accelerate the process of running experiments, use a combination of algorithms, and provide a competitive advantage to companies that may not have in-house data science skills.
In my earlier post, I wrote about SAP Analytics Cloud’s’ Smart Discovery feature which allows automated machine learning analysis to be done on any kind of data.
In this blog, I am going to compare and contrast different modeling techniques and use my recruitment dataset example to evaluate the pros and cons of each technique.
1. Python based analysis
This is a coding based approach and provides the most flexibility but is also the most complex of the three. In the recruitment example, I used the SKLearn random forest regression model and compared it to a logistic regression model to predict the outcome for “# of days to hire”.
2. SAP Analytics Cloud Smart Predict based analysis
Smart Predict is a coding-free, wizard-based approach to statistical modeling. In this case, I uploaded my recruitment data excel and created two regression models, one with all features and one with the most relevant features. You can find more details on how to use smart predict here.
Compared to my previous method, this was a much faster approach but I had to rely on the built-in regression algorithm(s) available. Running multiple iterations of my model with different feature sets, train/train split, I was able to compare and contrast the overall metrics across different models easily.
As described in my previous blog, this feature allows you to explore data, gather insights, identify anomalies and run what-if simulations. As shown below, Smart Discovery is also a coding-free option. As compared to the previous two methods, this abstracts a lot of the algorithms behind the scenes and provides a visual output of the key insights, influencers, unexpected values and simulation to the business user. This is a powerful feature to identify data patterns and evaluate the critical features that drive your KPIs.
Comparing feature rankings
All three techniques provided similar results. I was able to see that “Job classification, Agency Name, Country and Promotion Status” columns are critical factors that impact my overall KPI of ‘# of days to hire’.
All the modeling techniques shown above have their pros and cons. It’s usually a tradeoff between the kind of analysis that you want to perform, skill-set that is available and control over the various stages in the machine learning model building process. Nevertheless, the smart discovery and smart predict features are pretty powerful solutions that bring the machine learning capabilities in the hands of a data analyst/citizen data scientist.