Data Analytics: From Business Case to Model Training
Stage 1: Identify the Business Case
Why is it so relevant to identify the business case? It could be a driver and generate business value. The advance of information technologies makes available tons of data that a company is able to capture. Many organizations have on-going processes to capture internal data and from time to time they also capture external data based on surveys or studies. What you measure is what you manage.
Who has to identify the business case? Data experts are very good at gathering the available data and exploring them, but they probably lack knowledge about the business, preventing them from identifying the relevant cases to be digitized. Therefore, a management perspective is key at this step.
Stage 2: Outline the Data Analytics Model and Variables
From business problems, formulate the model, hypothesis and relevant variables. Identify people who have tried to deal with the business problem or question before in order to open up our minds in search for new ways of approaching the case.
There are different types of data analysis. This can be separated into quantitative data analysis, which implies the analysis of numerical data, and qualitative data analysis, focused on understanding the content of non-numerical data like text, images, audio and videos.
Regarding quantitative statistical analysis, two broad categories can be distinguished:
- Exploratory data techniques: try to find patterns and relationships in data.
- Confirmatory data analysis: applies statistical techniques to determine whether hypotheses about a dataset are true or false.
Stage 3: Capture and Prepare Data
Gather the primary and secondary data that can illustrate the predefined variables. Understanding the cost of the data and focusing on capturing only the right data is relevant.
The cost of data capturing. Obtaining good data may not only be difficult, but very expensive. Management must take into account that there could be risks regarding data privacy.
Where does data come from? How is it captured and sampled, and the potential influence of all that on the results.
Which different types of data exist?
- Structured data: has a predetermined format and length and therefore it is able to be added to relational databases which are easier to integrate in the business information system, since they can be sorted, grouped and organized quickly.
- Unstructured data: it does not have a predetermined format. It requires a significant investment of resources to extract from it the necessary information. There is a need for software and algorithms to go through the data efficiently.
Stage 4: Data Cleaning
Even structured data needs to be cleaned or checked for possible inaccuracies before analysis is made. Once the data needed is in place, the next step is to find and fix data quality problems that could affect the accuracy of analytics applications.
Stage 5: Analyze the Data
Different analyses with different sets of data must be run in order to improve the initial model until a good fit is found. The mix of highly analytical people and business managers is relevant.
Three different categories of data analytics applications are distinguished:
- Business Intelligence (BI) and reporting applications: provide business managers with key information in order to monitor and control business operations.
- Data mining applications: are more advanced types of data analytics, which involves sorting through large data sets to identify trends, patterns and relationships or implement predictive analysis.
- More sophisticated applications such as big data analytics and artificial intelligence.
The model is initially run in analytics software and programming languages. Once this test is done, the model is revised and tested again, a process known as “training” the model that continues until it functions as intended.