In this example, let’s evaluate a recruitment dataset (~ 4K rows and 40 columns) where I want to evaluate factors impacting # of days it takes to hire a candidate for a position. As a data analyst, this is an important metric since depending on the nature of the position, I would like to minimize the days to hire or look for alternate options in other geographies or evaluate other recruitment agencies.
With SAC’s data wrangling features, I was able to upload this data, clean my columns by converting # and null values to numeric entries for my features. The next step in the process was to quickly do an assessment of all columns that impacted my target # of days to hire. Looking at my dataset and based on my domain knowledge, I can tell that there is a correlation between fields such as a country of hire, job agency, job classification. What I need to know is how important these features are and what is the impact of all the key features on my target metric.
The smart discovery feature ran through its machine learning models and provided me with views including:
1. Discover Key Influencers for my target # of days to hire
Looking at this view, I can see that promotion status is the biggest influencer to my target. This could also be due to the fact that most of the candidates that I am hiring or hired in the past are/were internal employees. I also notice that Country, Job classification and Recruitment agencies play a key role in the days it takes to hire. Although this chart only shows correlation and not causation, I can further investigate to see my metrics for external hiring only, by ignoring promotion status and external employee columns.
What is important to note is that I was able to view the key influencers without providing any manual input or scripting. The machine learning provides a recommendation but it is up to me as a data analyst to apply my business knowledge to filter out the noise (in this case it the data could be influenced heavily due to a large number of internal candidates).
2. Explore the anomalies for the # of days to hire
As a data analyst, these features help me get insight into my dataset without any coding or having machine learning or statistical background. The anomalies view help me decide which areas need attention and makes it actionable. In this case, I clearly see that in China and UK there is a large variance in the days to hire for specific positions.
3. Perform a simulation for # of days to hire based on the key influencers.
The simulation view lets me do what-if analysis to dial the key influencing parameters to arrive at my desired target # of days to hire. Note that I can manually remove columns like gender and ethnicity or change other parameter values to reduce the Days to hire from 109 to a lower number.
The smart discovery feature is a great asset for any business analyst to explore the data patterns, find unexpected values and arrive at actionable insights.
In my next blog post, I will run a manual python scikit-learn based smart discovery and compare it to the feature ranking provided by SAC smart discovery.