Data wrangling is a necessary step to ensure the highest quality insights when analyzing your business data. However, data wrangling can be both difficult and time-consuming, especially when it comes to large and complex data sets, or ones containing errors.
The Data Wrangling process in SAP Analytics Cloud helps you to enhance your data even faster by suggesting Smart Transformations and automating repetitive workflows with the power of machine learning technology.
Speed things up with samples
It’s no secret that organizations are collecting more and more data from a variety of sources. But while big data helps organizations to uncover better insights, wrangling large volumes of data can take a long time. To help speed things up we’ve introduced sampling to SAP Analytics Cloud.
Now, when you upload a large data set to the Modeler, your data will automatically be sampled for the wrangling stage.
This data sample is a subset of your full data set consisting of 2000 randomly selected rows. Sampling data helps to make the wrangling process more efficient. Any changes you make to your sample while wrangling will be automatically applied to your full data set once you publish it as a model.
Get a better view of your data
When it comes to data layouts, we’re typically used to dealing with rows and columns. As such, the default view in the Modeler is a grid view which looks like a familiar spreadsheet. This layout makes sense when you want to explore individual cells in detail, but when you’re dealing with large volumes of data, it can be tricky to get an overview of what you’re working with.
Card view is a different way of looking at large volumes of data. Switch to card view to see a summarized view of your data set.
Each card contains basic details about each column, such as:
- Column type
- Number of unique values (dimensions)
- The mean value (measures)
- Data quality indicated by a status bar
When you select a card, you’ll see even more detailed information about a particular column in the details panel.
In the details panel, you’re able to re-define your column type as a measure or dimension. You can add value labels to measures and specify dimension attributes such as description, properties, parent-child hierarchies, and geo-locations. Plus, you’ll be able to see if there are any data quality issues within the selected column and a visualization of the data distribution.
Improve efficiency with Smart Transformations
The way that data is collected is often not optimized for analysis. For example, if latitude and longitude are stored in a single column in your data set, you need to split the column in order to create a geolocation based on the coordinates. Geolocations are necessary if you want to create insightful and stunning maps during your analysis. In instances like these, you’ll have to transform your data.
SAP Analytics Cloud makes applying transformations easy. The machine learning technology in the platform automatically suggests Smart Transformations based on the context of your selected column(s).
Hover over the Smart Transformation and you’ll see a preview of the result on the grid.
If the transformation doesn’t produce the result you want, you can modify the transformation formula in the Transformation Bar. Another option is to use the Transformation Bar to create your own transformations from scratch.
Available data transformations:
- Delete rows or columns
- Concatenate (combine) columns
- Split column
- Convert column values to uppercase, lowercase, or title case
- Convert values to date, number, or boolean
Easily track and reverse transformations
As you continue wrangling your data, the transformations you make are tracked in your transformation log. To access this log, open the history panel.
In the history panel, you can switch between your column-specific and model-specific transformation logs. If you change your mind about a particular transformation, you can use the log to reverse it, as long as there are no dependencies based on the transformation.
Validate your changes
Once you’re happy with your data it’s almost time to create your model. But first, it’s important to validate your data.
This step ensures that the transformations you’ve made to your sample data make sense when applied to the full data set. If there are any errors generated as a result of a transformation, you can continue wrangling in order to resolve the issue before creating your model.
Try Smart Data Wrangling today
Now that you’ve read an overview of the SAP Analytics Cloud smart data wrangling process, it’s time to see it in action. Check out our tutorial video and then try it out for yourself when you sign up for a 30-day trial.