SAP

Author

Debraj Roy

Debraj Roy

Debraj Roy is an Analytics Expert within the SAP Analytics Customer Experience Team, focusing on SAP Analytics Cloud and SAP Predictive Analytics, based in Boston Massachusetts. He works with customers to drive adoption for SAP Analytics Cloud and SAP Predictive Analytics products at their organization. He has supported various SAP technologies including SAP HANA, Fiori, and GRC. Aside from work, he likes to spend time with his family and playing soccer.

Keep in touch

Subscribe for the latest news, updates, tips and more delivered right to your inbox.

Subscribe to updates

Category

Learning

Connect with us

You may have seen the R visualization icon in your SAP Analytics Cloud story and wondered what it does, or how it works. For those who don’t know, R is a programming language that is used for data analysis, and it can be integrated with SAP Analytics Cloud to tell richer and more robust stories.

With R, you can:

  • Insert R visualizations into your stories
  • Interact with your visualizations, using controls such as filters
  • Edit your R scripts and preview visualizations
  • Share stories containing R visualizations with other users

The benefits of using R with SAP Analytics Cloud?

R is a popular open-source programming language used by many people around the world. This ensures, to some degree, that it will have a longer shelf life than some other more obscure coding languages.

The benefit of this is that people all over the world continue to invest a lot of time and energy into creating new and interesting types of statistical charts and graphs that you can use to analyze and present your data.

Another benefit of integrating R visualizations with SAP Analytics Cloud is that it’s flexible. You can change the chart type, characteristics, and depict your information in a variety of ways.

Here are some highlights of the other benefits of using R:

  • R is the most used language and environment for statistical analysis
  • Most new research results and algorithms are published as R Packages and can be used/tested immediately after they are available so that you can participate in the latest research results and leverage them for your business
  • When seeking help from a statistician or collaborating with a statistician, interacting in a common language would be much more efficient for both sides
  • R makes stunning visualizations — compelling visualizations are essential for every active stakeholder engagement
  • R, as well as the most used IDE: RStudio, is free

Technical details of R integration in SAP Analytics Cloud

To use R capabilities in SAP Analytics Cloud, your system must be configured to connect to an R runtime environment. Depending on your region, these are your options: 

Self-configured R server — with this method, you can connect your R engine running on a machine in the cloud and configure to SAP Analytics Cloud. This option points to the R-engine for R execution. In this case, you are responsible for maintaining the R engine (i.e. install the packages you want). 

R server provided by SAP — this option is an SAP Cloud Platform service that provides ready-to-use R Server runtime to SAP Analytics Cloud and does not require any configuration by you. This environment comes with a list of preinstalled packages. There is a limitation on which tenants can support this R server runtime within SAP Analytics Cloud. Please confirm with a technical expert if your SAP Analytics Cloud tenant is configured with the R server provided by SAP 

Typical data flow for a user’s request to render R visualization in SAC is given below:

For SAP managed R systems

  • The data bound to the R widget is sent to the R system for processing without being persisted by R system. SAP R System is stateless with no persistent storage.
  • Before sending data to R system it is cached in SAP Analytics Cloud system. (This cache is cleared out ranging from every 30 minutes to 24 hours, but no longer than a day. This is true for both live and import connections)
  • Data is processed by one of the r server instances picked up from a pool of servers. Once processed, the response is sent back to browser and R server is returned to the pool.
  • For security reasons, R-server runtime environment is isolated from any communication/access to the external network. For this reason, you cannot install new packages or connect to a database within your R-script.
  • Communication between SAP Analytics Cloud and R servers is done over HTTPS to ensure transport layer security of user’s data.

The R server runtime environment service provided by SAP currently runs R version 3.3.4 (Rserve 1.7.3), and comes with pre-installed packages.

How to use R Visualizations within SAP Analytics Cloud

To add R visualizations to a story, you need to have an R server running and connected to SAP Analytics Cloud. This connection is typically handled by an administrator and includes the server or host address, port number, certificate for encryption, and user credentials.

Alternatively, for some data centers (EU, EUDP, AP1, US1, and US2) SAP Analytics Cloud provides preconfigured and ready-to-use R Packages.

R visuallization in SAP Analytics Cloud

Within your story, you can select the + icon on the insert tab, then choose ‘R Visualization’ from the list. A new tile is added to your page. For a step-by-step walkthrough of how to add R visualizations to your stories in SAP Analytics Cloud, please check out our video tutorial.

Quick Notes:

  1. You can create basic graphs in R quite easily. The plot command is the command to note.
  2. It takes in many parameters from x-axis data, y-axis data, x-axis labels, y-axis labels, color, and title. To create line graphs, merely use the setting, type=l. In particular, the ggplot2 package is quite popular and worth a look for robust visualizations within R. Let’s take a look at the code snippets for some of the visualizations from R that can be used within SAP Analytics Cloud.
  3. If you want a boxplot, you can use the word boxplot, and for barplot use the barplot function.

In particular, the ggplot2 package is quite popular and worth a look for robust visualizations within R. Let’s take a look at the code snippets for some of the visualizations from R that can be used within SAP Analytics Cloud.

Before we begin, it’s important to mention that we will be using the sample datasets within R to plot the graphs. First, we will use the 93 CARS dataset.

The 1993CARS dataset contains information on 93 new cars for the 1993 model year. For our analysis, we will use this sample dataset in R to showcase some visualizations.

The prerequisite is to upload the dataset in SAP Analytics Cloud as a model and assign that specific model to the R-Viz. data source.

Histogram

The histogram is a plot that breaks the data into bins (or breaks) and shows frequency distribution of these bins. You can change the breaks and see the effect it has on data visualization regarding understandability. There are different plots for a histogram:

  • Single Variable Plot — in this section, we will look at the distribution of values for one variable in the dataset by creating histograms using ggplot2’s qplot function. We’ll look for any outliers or extraneous values, and help identify any relationships between variables that are worth investigating further.

Code Snippet to see Frequency distribution for Miles per Gallon:

# Miles Per Gallon
qplot(MASS::Cars93$MPG.highway, xlab = ‘Miles Per Gallon’, ylab = ‘Count’, color=’MPG.highway’,
main=’Frequency Histogram: Miles per Gallon’)
: Miles per Gallon’)

Output:

Code Snippet for Frequency distribution for Number of Cylinder used by Cars:

# Number of Cylinders
qplot(cars$cylinders, xlab = ‘Cylinders’, ylab = ‘Count’,
main=’Frequency Histogram: Number of Cylinders’)

Viewing the values for Cylinders in tabular format:

> table(MASS::Cars93$Cylinders)
3 4 5 6 8 rotary
3 49 2 31 7 1

Now, based on the relatively small counts of 3, 5, 8 Rotary-cylinder cars, we’re eradicating those because they end up being a distraction in later plots.

cars = MASS::Cars93[!MASS::Cars93$Cylinders %in% c(3, 5,8,’rotary’),]
qplot(cars$Cylinders, ylab = ‘Count’, xlab = ‘Cylinders’)

Output:

Now the distributions for MPG and Cylinders are all skewed right — a longer tail toward the higher end of the scale — and there are many more four-cylinder cars than six- or eight-cylinder cars. This analysis supports intuition that there is a strong correlation between these two variables so we will run Multi-Variable Plots and regression analysis to find correlations.

Multi Variable Plot

In this section, we’re using more ggplot2 charting techniques to visualize how one variable affects another. We start with how weight affects MPG by doing a scatter plot overlaid with a linear best-fit line.

With the code below, we can generate a linear regression curve and represent the results in visualization in SAP Analytics Cloud.

library(ggplot2)
ggplot(data = MASS::Cars93, aes(x = Type, y = MPG.highway)) +
geom_boxplot() +
xlab(‘Type’) +
ylab(‘MPG’) +
ggtitle(‘MPG Comparison by Type’)

The data shows that weight and MPG are inversely related — as weight increases, MPG decreases.

We can do some calculation to ensure that the R-squared of the linear best fit line, as shown below, is over 65%. This result means that variations in a car’s weight explain over 65% of the changes to its MPG.

The results of the following calculation can only be visible in the console.
> fit = lm(MPG.highway ~ Weight, data=MASS::Cars93)
> summary(fit)

Call:

lm(formula = MPG.highway ~ Weight, data = MASS::Cars93)

Residuals:
Output:
Min 1Q Median 3Q Max
-7.650070331368 -1.835905253515 -0.077411039876 1.823530076599 11.617223746283

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 51.601365428935715 1.735549783530243 29.73200 < 2.22e-16 ***
Weight -0.007327059223497 0.000554769961341 -13.20738 < 2.22e-16 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.13893757053 on 91 degrees of freedom
Multiple R-squared: 0.657166485488, Adjusted R-squared: 0.65339908423
F-statistic: 174.434959384 on 1 and 91 DF, p-value: < 2.220446049e-16

The next plot uses boxplots to show the mean and distribution of MPG measurements for Type of the Car in the sample. It is often used in descriptive data analysis. This type of graph is used to show the shape of the distribution, its central value, and its variability. In a box and whisker plot: the ends of the box are the upper and lower quartiles, so the box spans the interquartile range.

It shows the MPG is more in the compact, small and sporty cars compared to others.

Correlation Analysis for Multiple Variables using Pairs Chart:

Applying pairs chart, you can produce a matrix of scatterplots that helps you to understand the correlations between multiple variables.

Code Snippet:

pairs(~Horsepower+~Weight+~MPG.highway,data=MASS::Cars93)

Output:

The above pictures show the correlations between various variables. We can see from the chart that the car which has high mileage weighs less.

3D scatterplots in R

Plotly’s R library is free and open source. You can also create an interactive 3D scatterplot using the Plotly’s R library. It generates a spinning 3D scatterplot that can be rotated with the mouse. The first three arguments are the x, y, and z numeric vectors representing points. col=and size= control the color and size of the points respectively.

library(plotly)

MASS::Cars93$Man.trans.avail[which(MASS::Cars93$Man.trans.avail == ‘No’)] <- ‘Automatic’
MASS::Cars93$Man.trans.avail[which(MASS::Cars93$Man.trans.avail == ‘Yes’)] <- ‘Manual’
MASS::Cars93$Man.trans.avail<- as.factor(MASS::Cars93$am)

Here we can see the distribution of MPG, Weight and Horsepower in a 3-Dimensional environment.

Bar/ Line Chart

Line Chart

Using the Air Passenger Dataset.

The prerequisite is to upload the dataset in SAP Analytics Cloud as a model and assign that specific model to the R-Viz. data source.

Below is the line chart showing the increase in air passengers over a given period. Line charts are commonly preferred when we are to analyze a trend spread over a period. Furthermore, line plot is also suitable to plot where we need to compare relative changes in quantities across some variable (i.e. time).

plot(AirPassengers,type=”l”) #Simple Line Plot

Bar Chart

Bar Charts are suitable for showing comparison between cumulative totals across several groups. Stacked Plots are used for bar charts for various categories. Using the Iris Dataset in R.

barplot(iris$Petal.Length) #Creating simple Bar Graph
barplot(iris$Sepal.Length,col = brewer.pal(3,”Set1″))
barplot(table(iris$Species,iris$Sepal.Length),col = brewer.pal(3,”Set1″)) #Stacked Plot

Line Charts and Stacked column charts are also available as chart types for creating visualization within SAP Analytics Cloud. So, the dataset can be directly uploaded to SAP Analytics Cloud as well.

Mosaic Plot

A mosaic plot can be used for plotting categorical data very efficiently with the area of the data showing the relative proportions.

> data(HairEyeColor)
> mosaicplot(HairEyeColor)

Heat Map

Heat maps enable you to do exploratory data analysis with two dimensions as the axis and the third dimension shown by intensity of color. However, you need to convert the dataset to a matrix format.

Using Cars93 dataset
> heatmap(as.matrix(mtcars))
You can use image() command also for this type of visualization as:
> image(as.matrix(b[2:7]))

Additional Resources

For more information about R visualizations in SAP Analytics Cloud.

SAP Analytics Cloud earns a top ranking from BARC

See how SAP Analytics Cloud performed in the world’s largest survey of Business Intelligence software users.