SAP

Author

Thierry Brunet

Thierry Brunet

Thierry is a Product Manager for SAP Analytics Cloud’s Smart Predict feature portfolio. He is based in the Paris office. He has a computer science background with a thesis on machine learning and works in this domain for over 20 years. When he's not at work, Thierry enjoys hiking and going on road trips with his motorcycle.

Keep in touch

Subscribe for the latest news, updates, tips and more delivered right to your inbox.

Subscribe to updates

Category

Learning

Connect with us

Predictive features of SAP Analytics Cloud (SAC) are designed to be used by business analysts. This means that they are guided through the predictive workflows, and they don’t need to have predictive skills. The predictive outputs are shown in a convivial way to be easily interpreted and included into SAP Analytics Cloud stories or planning. You might have questions about our techniques? Is it a question of trust in the predictions proposed? The objective of this blog is to explain how Smart Predict finds predictions. And why they are relevant!

In SAP Analytics Cloud, when you create a new Predictive Scenario, there are three choices.

Classification ranks the population relatively to an objective. It produces a score that a new event will happen. For example, who among my customers will react positively to my marketing campaign?

Regression is used to find relationship between variables describing events. For each new event, you get an estimation of the value of the target variable. For example, you predict house price by looking at similar houses.

Time Series are used to forecast the evolution of a measure in the future. For example, which quantity of products should be produced in advance?

This blog is dedicated to Time Series.

I explain the types of problems Time Series Forecasting addresses and what kind of data it is necessary to collect. Then, I’ll illustrate the technique used by Smart Predict through a specific use case.

Which questions? Which data?

Time Series Forecasting is useful for estimating future values of a measure where you have a time dimension available to help you identify a trend. Before going into the details, let see what kind of data the time series forecasting of Smart Predict handles. Here are typical questions:

  1. How the revenue of my shop will evolve over the next month?
  2. What are the expected sells by product per regions for the next weeks?
  3. How the stock of my products will vary in my warehouse over the following weeks?
  4. How to predict the evolution of my cash flow during the next quarter?

If now the type of question is clearly defined, let’s check if the type of data you have is usable for Time Series.  There are two distinct aspects:

  1. You must have recorded the values that your target variable had in the past and the corresponding dates. This couple (date, target value) is called the signal. It is this signal that will be analyzed by the Time Series Forecasting process of SAC Smart Predict.
  2. Values of other variables taken at the same dates (in the past and in the future) can be included. These explanatory variables are called “Candidate Influencers”. They are used by SAC Smart Predict to refine the analysis of the signal.

Here is some advice to correctly prepare your dataset.

Firstly, ask yourself how far into the future do I need to project? This is the horizon. It is the number of predictions you want to do in the future. This number depends directly on the size of your historical data. 5:1 is a good ratio to estimate the horizon and get predictions with relevant confidence intervals. This means that if you have 100 historical cases, you predict 20 values of your target variable in the future. Of course, the length of the horizon depends also on your use case and you can choose less than 20 values. But if you need more, it will be better to collect more historical cases.

Fig 1: Signal and predictions on its evolution in the given horizon

Secondly, you need to consider the scale of the predictions. If your historical data is captured every month or every week or every day or every hour or every minute or even every second, then the predictions will be produced in the same unit of time. It seems evident that if you record values every month, it is not meaningful to request predictions for the next minutes! The opposite situation may arrive: for technical reasons if data is recorded every minute by sensors, but the minute is not relevant for your use case; then you need a higher unit of time like hour.

Thirdly, you should consider the aggregation of data in the unit of time you need. For this, you must define an aggregation function. This function calculates one value for the hour from the 60 values measured for each of the 60 minutes of this hour. It can be the first value, the last value, the mid value or a calculated value (average for example or a more complex formula). An important point to keep in mind is the size of the aggregation because a large aggregation may hide information so that it could not be discovered. The consequence is to decrease the quality of the predictions. But an appropriate aggregation smooths the signal when there is a lot of noise. Now there is no secret, and you will have to test and experiment to choose the best aggregation function.

Fourthly, you should chronologically sort the historical dataset, and clean it so that each unit of time correspond to only one value of the target variable. Note that Smart Predict automatically sorts the data.

I cannot conclude this section without a word about candidate influencers I have mentioned above. They are very useful to increase the detection of the components of the signal. Very often, these variables have a meaning only in your business domain and it will be necessary to manipulate your data to get them. Here are examples of such influencers:

  • Specific selling periods for a product
  • Time limit discount
  • Monthly closing day / Quarterly closing day
  • First day of month / Last day of the month
  • Rank Day of Week 1

During the analysis of at least two components of the signal (trends and cycles that will be explained below), there are constraints for candidate influencers which apply :

  1. The future values must be known (at least for the expected horizon).
  2. Candidate influencers with ordinal, continuous, and nominal types are used in the detection of trends.
  3. But only candidate influencers with ordinal, and continuous types are used in the detection of cycles.

Use case : Optimize travel cost and expenses

We’ll follow a use case to illustrate this section, using a dataset with Smart Predict to show you how the signal is processed to propose forecasts. This use case is about the travel costs and expenses of a company that has spun out of control and negatively impacted the P&L analysis and financial performance of the company. The objective of the company is to analyze these costs and understand where they could be reduced but also to better forecast costs to avoid being over budget.

The data collected (see Fig 2) in the past are:

  1. The posting period of travel costs and expenses collected every month from 2013 to 2018. It will be our date variable.
  2. The travel costs and expenses for each posting period. This couple is our signal and the cost our target variable.
  3. The board area is also recorded because travel costs and expenses are not the same from one board area to another.
  4. A series of influencers like Travel & Expense Budget, Software license sales targets per Board Area per month or Number of existing headcounts and their associated budget.
  5. Finally, there are candidate influencers related to time, like Number of working in the month, Event during the month, Specific calendar events (summer holidays, Christmas) or specific discount per month.

Fig 2: Dataset of the Travel costs and Expenses use case

The graphical view of the signal for all board areas looks like Fig 3. But, members of each board area have different activities. This influences the number of voyages and consequently their costs. If travel and expenses could be distinguished and forecasted per board area, the analysis and the previsions will be thinner from the budget point of view.

Fig 3: Signal for dataset

Time Series Modeling Process

Hopefully, when a predictive scenario is created with Smart Predict, there is an option (see Fig 4) which allows us to split the training dataset by the values of the selected variable.

Fig 4: Segmentation of the dataset on the values of the Board Area

Concretely, what does this mean? In this dataset, there are nine board areas. Smart Predict creates nine subsets of the dataset and generates nine forecasts. So, the predictions are specific to each board area and are not influenced by the others.

As an example, here are the signals specific for the board area “Cloud Business Group” (Fig 5) and for board area “Global Customer Operations” (Fig 6).

Fig 5: Signal for Cloud Business Group

Fig 6: Signal for Global Customer Operations

Now, a little bit of theory; it’s necessary to understand how Smart Predict generates the forecast for the travel costs and expenses dataset.

For each of the board areas, the signal is broken out into four components which are:

  1. The Trend: This consist of finding out where your business is headed. In which direction it tends to go. Is it decreasing? Is it increasing?
  2. The Periodicity patterns, which are Seasonality & Periods Cycles. This means that they are reproduced regularly over time.
  3. The Fluctuation creates a model of the dependencies of the values at time “t” with previous values.
  4. Finally, the Residuals are what remains of the signal when Trends, Cycles and Fluctuations have been removed. Residuals are considered as “white noise”; this means that they are purely random effects.

At the end, Smart Predict combines the models of each combination to find the best model.

This process is summarized in Fig 7.

Fig 7: Smart Predict process to handle a time series forecasting signal

Let see in more details each of these four steps of the time series process in Smart Predict.

Trend detection

The first step is to determine the best trend of the signal. The trend is the general orientation of the signal or its long-term evolution. To obtain the trend shown in Fig 8, Smart Predict puts in competition eight trend models. Note that no choice is made at this step before the estimation of the other components of the signal.

Fig 8: Trend of board area Global Customer Operations

These eight trend models are grouped in 2 groups of methods:

  • Three stochastic methods where there is the assumption that forecast at time “t” will depend on the past values of the signal (t-1, t-2, etc.)
    • Lag 1 (L1) – the signal moved one step forward. This is the basic forecast where the predicted observation equals the latest signal observation.
      Trendt = Yt-1
    • Lag 2 (L2) – the signal moved two steps forward.
      Trendt = Yt-2
    • Double Differencing – Second order differencing. The difference between the last two observations is taken in account
      Trendt = Yt-1 + (Yt-1 – Yt-2) = 2Yt-1 – Yt-2
  • Five deterministic methods where there is assumption that the forecast at time “t” is independent of any past values of the signal. The Smart Predict regression algorithm is used for that but, it is based on five different inputs.
    • date: A0 and A1 are estimated with Yt as target variable and Time as input variable.
      Trendt = A0 + A1Time
    • date, date2, sqrt(date): A0, A1, A2 and A3 are estimated with Yt as target variable and Time, Time2 and Square(Time) as input variables.
      Trendt = A0 + A1Time + A2Time2 + A3Sqrt(Time)
    • date, candidate influencer variables: A0, A1, B1, B2, … are estimated with Yt as target variable and Time, X1, X2, … as candidate influencer variables
      Trendt = A0+ A1Time + B1 X1 + B2 X2 + …
    • date, date2, sqrt(date), candidate influencer variables: is a combination of the two last inputs.
      Trendt = A0 + A1Time + A2Time2 + A3Sqrt(Time) + B1 X1 + B2 X2 + …
    • candidate influencer variables: X1, X2, … as candidate influencer variables
      Trendt = B1 X1 + B2 X2 + …

Detection of Cycles

The second step consists in determining the best cycle of the signal. To obtain the cycle shown in Fig 9, Smart Predict iterates to detect all possible cycles.

Fig 9: Cycle of Board Area HR

There are two types of cycles:

  1. Periodicity which describes natural events which reproduce themselves at a fixed interval of time called a period.
  2. Seasonality which describes calendar events: dayOfWeek, dayOfMonth, dayOfYear, weekOfYear, monthOfYear, hourOfDay.

Cycles of both types are computed through an encoding of the signal. For periodicity, the encoding is based on a period length. With a period equals to 5, the encoding will be 0, 1, 2, 3 and 4 and the value will be the average signal observed on every 5th step. Until 450 different periods are tested automatically for Smart Predict. The encoding for seasonality is based of calendar events.

Cycles are also evaluated on candidate influencers. For this, the encoding depends on the type of the variable.

  • For nominal variables: no encoding done, thus no cycle detected.
  • For ordinal and continuous variables: encoding done on the natural order

The training dataset is split into two subsets. The estimation dataset is used to detect cycles and the validation dataset is used to check accuracy of cycles. It is important to reduce the scope of cyclic search to minimize the computation time and particularly for periodicities. Thus, the maximum length of a period is by default equal to the minimum between 1/12 of the estimation dataset size and 450.

I emphasis here that it is necessary to have enough historical data to detect cycles and particularly cycles over long period or long seasonality.

At this step, all the encodings done are cycle candidates and it’s time to put them in competition to select one. To do this, Smart Predict runs this iterative process:

  1. For each of the eight detected trends, the signal Yt is detrended to produced Yt – Trendt
  2. For each candidate cycle Cyclet
    1. Measures the link between the signal Yt – Trendt and Cyclet
    2. If this improves the forecast (comparison of actual and forecasted value in the validation dataset) then analysis is repeated on Trendt + Cyclet
    3. Else reject Cyclet

The selection process stops when there is no significant cycle to add to the predictive model.

Fluctuation detection

The third step consists in determining fluctuation of the signal. The fluctuation is what is left when the trend and the cycles have been extracted. To obtain the fluctuation shown in Fig 10, Smart Predict creates an auto-regression that uses a window of past data to model the current residue.

Fig 10: Fluctuation of board area Cloud Business Group

At this stage, the initial signal has been broken out and removed its trend and cycles: Signal – Trend – Cycles. An auto-regressive model is then computed on what’s left of the signal:

Xt = a0 Xt-1 + a2 Xt-2 + a3 Xt-3 + … + ap Xt-p + e …

Where p is called the order of the auto-regressive model. By default, this order is limited to 100.

At the end, when trend, cycles and fluctuation are removed from the original signal, the only points which persist are the residual which are considered as noise. Fig 11 shows such residuals.

Fig 11: Residuals of board area Cloud Business Group

Model selection and quality of model

The fourth and last step is the selection of the best model. During the previous three steps, the initial signal Yt was broken out in trends, cycles, fluctuations and residuals:

Yt = Trendt + Cyclest + Fluctuationt + Residual

The remaining signal after removing the three components represents the residues. After trying several combinations of models, the final selected model is the one whose residues are the closest possible to white noise.

Now that the components of the signal have been determined, Smart Predict mix the models of each combination to find the best model. To do that it is necessary to have a quality indicator and a validation process.

The historical data is split in two parts: 75% are reserved to study the signal and generate models as seen previously. The 25% which remains constitute the validation dataset and is used to measure the quality of the candidate models so that the best could be selected. The validation process compares the actual values of the measure with the values predicted by each model. A measure called Mean Absolute Error (MAE) is computed with this formula. It is the average of the absolute difference between the predictions done by a model and the actual values.

To select the best model, Smart Predict is doing a combination of these three indicators:

  • The performance measured by the MAE
  • The complexity of the model
  • The horizon.

Note that the MAE is an internal measure and is not surfaced in Smart Predict so far. The quality indicator surfaced in Smart Predict is based on another metric called the Horizon Wide MAPE because the MAPE in a kind of standard in the market.

To get it, the individual MAPE is computed by this formula.

It measures the accuracy of the model’s forecasts and indicates how much the forecast differ from the real signal value.

Now in the input parameter of a forecast, you specify a horizon. This is the distance in the future where Smart Predict predicts values for you. For each model, Smart Predict computes several individual MAPE corresponding to the requested horizon and takes their average. It is the Horizon-Wide MAPE

How to interpret the Horizon-Wide MAPE? Zero indicates a perfect model. Values above one are subject of discussion and you need to compare the model and its forecasts with your business knowledge to decide about its accuracy.

Using the forecasting model

Now that we know how Smart Predict builds a forecasting model, we can run it on the travel costs and expenses dataset with these input parameters (Fig 12).

Fig 12: Input parameters

You’ll notice that:

  • The forecasting is segmented on the 9 values of the board area (BA), so that 9 predictive models will be generated.
  • The horizon for all the forecasts is 12 months. It is month because the granularity of the training dataset is the month.

Once the training process is complete, an overview allows to see the quality of the models per board areas as shown in Fig 13.

Fig 13: Horizon-Wide MAPE per board areas

Immediately you see that quality of the forecast model for board area Global Customer Operations is the best. This means that the predictions will be better and with a low error rate.

You get the details for this segment when you click on it as shown in Fig 14.

Fig 14: Forecast for segment Global Customer Operations

At the top, you recognize in green the signal for this board area. In blue, it is the 12 forecasted values with their error min and max.

A table of the forecasted values is at the bottom.

Finally, if you click on the apply button, the forecasted values are output into a dataset as shown in Fig 15.

Fig 15: Forecasted values applied and exported into a dataset

Up to you to include this dataset inside your BI story or your planning.

Conclusion

It is the end of this blog, and I hope you have a better understanding of the SAC Smart Predict process to provide relevant forecasts. I hope also these explanations increase your confidence in the product.

Resources to learn more about Smart Predict.

Finally, if you enjoyed this post, I’d be very grateful if you’d help it spread, comment and like. Thank you!

SAP Analytics Cloud earns a top ranking from BARC

See how SAP Analytics Cloud performed in the world’s largest survey of Business Intelligence software users.