Data collection is a key component of science, but to gain any insights, carefully collected information needs to be analyzed in one way or the other. So, what different types of data analysis are there? Curious to learn more, we talked to Prof. Marina Axelson-Fisk, Professor in Mathematical Statistics at Chalmers University of Technology, who described some of the most common approaches such as exploratory-, descriptive-, and predictive analyses, and when they are used.
Qualitative vs Quantitative data analysis
A common categorization often encountered in the context of data and data analysis is that of a quantitative and a qualitative approach. So, what is the difference between these two categories?
Quantitative data – data based on measurements and numbers
In natural sciences, there are essentially two categories of data - quantitative and qualitative, Prof. Axelson-Fisk says. Most often you will probably have quantitative data. This is data that is based on measurements and where you have real numbers that are measuring some process, phenomenon, or variable, Prof. Axelson-Fisk continues. In the case of quantitative data, i.e., data based on measurements, there is a wide range of analysis methods that you can use.
Qualitative data – data that is descriptive rather than numerical
In other scientific fields you may instead have qualitative data, Prof. Axelson-Fisk says. This is data that is not based on measurements and numbers. For example, if you do a survey, you will have a set of answers to analyze. The respondents may have been asked to agree with a certain statement, and the possible answers could be “I disagree completely”, or “I disagree somewhat”, and so on.
The answers are not quantitative, but it is however common to label them with numbers. If there are five options, they would be labeled 1 to 5. It is important to remember though that these numbers are just labels. There could be a hierarchy and order between the answers, such as ‘good’, ‘better’, ‘best’, but the distance between ‘good’ and ‘better’ and between ‘better’ and ‘best’ are not necessarily the same for different individuals, for example if you answer or I answer. And they're certainly not the same if we compare you and me, Prof. Axelson-Fisk stresses. So even if you have equidistant between your answers, I might not, and so on. So, we can only treat the answers as if there is an order, where ‘best’ is always ‘best’. But you cannot do averages, t-tests, and that type of analysis. There are however other methods to use depending on what you want to do.
Exploratory vs Explanatory analysis
Another division is into exploratory- and explanatory analysis, Prof. Axelson-Fisk continues. If you have a research question and you have a hypothesis that you want to test, then you do an explanatory analysis where you try to prove or disprove your original thoughts. But if you're new to the area and you want to learn about something, for example learn more about the process, you may not know what the relationships are or what research questions to ask. In this case, you want to find patterns and then you do a so-called exploratory analysis. In this case you collect data quite broadly and try to find various patterns to formulate the research questions, Prof. Axelson-Fisk explains.
Here I must point out that if you do an exploratory analysis and then formulate the research questions, it's very important that you don't do the explanatory analysis on the same data set because the patterns you have found might be true for that sample, but they may not be true for the full population. So, to prove that hypothesis, you need to draw a new independent dataset and do your explanatory analysis on that, simply because you might think you have proved something that actually wasn't true. And that's something we want to avoid, Prof. Axelson-Fisk says. In the explanatory analysis you are testing hypothesis, drawing out relationships.
After you have drawn conclusions from the data, there is a wide variety of things that you can do, Prof. Axelson-Fisk says. For example, you can continue with a descriptive analysis. In a descriptive analysis, you visualize your data in various ways, for example using graphs and diagrams. You may also compute averages, variances, and tendencies in various ways. This will give you an overview or summary of your data.
The descriptive analysis is not conclusive, i.e., it will not help you prove the hypothesis, but it's extremely useful both for you and your potential audience to illustrate what is going on. Studying the graphical representations of data is often very useful and can be used in both the exploratory and explanatory analysis as it may help you illustrate and identify trends, and it may also help you identify problems and errors. It is particularly useful when you present your explanatory results, to have visualize it to readers and to help them understand what you're talking about, Prof. Axelson-Fisk says.
The types of data analysis can be divided in many ways, and there may be an overlap between some of the different categories, Prof. Axelson-Fisk says. One relevant category is predictive analysis. In this type of analysis, you use historical datasets to draw conclusions and make predictions for the future or for a new dataset. You can also look for causality in relationships. You look for correlations or relations between variables, and then you want to know who is dependent on who, i.e., the direction of the dependency.
The next step is then the inferential analysis. This is where you infer conclusions from your data, Prof. Axelson-Fisk explains. Here you have lots of statistical methods and hypothesis tests to use. Typically, there are conditions that must be fulfilled for each method to be used, for example the observations must be independent, normally distributed and so on. So, it is important to know what the requirements are for the methods to be used and to make sure that the conditions are met, otherwise you cannot trust the results. If the conditions are not met, there are often ways to transform your data so that is meets the required conditions. And if this is not possible, there are other methods to use - methods with other conditions or less strict conditions, Prof. Axelson-Fisk says.
The most common thing to do in inferential analysis is to test different types of hypotheses, Prof. Axelson-Fisk continues. You want to compare two populations with each other, so you draw samples from each population and then you test if there is a difference between them. You can of course have more than two populations, then you have an ANOVA or analysis of variance test for instance. Or you want to test relationships, so you might compute a correlation between variables. Another scenario is if you have a causal relationship where you have one variable depending on another. Then you can do regression, which is like a correlation, but where you actually have the direction of the dependency. If you have time dependencies, you have longitudinal data that you have measured over time. For instance, I had a master thesis working on pollution in the air from cars. The data was collected from various measurement stations in Gothenburg, and these stations measure every tenth second or something like that. In this situation, there will be a time dependence, meaning that the measurement of one instance depends on the value of the previous. I.e., if you have a high value ten seconds before, it's more likely to be high this time as well. This must be considered, and we have something called time series, which is a linear relationship similar to regression, but where you are looking for trends over time and where you take into account that you have dependencies between measurements.
So, there are many different types of hypothesis tests or testing various values, averages of a sample towards some other population. And then you have many other statistical methods as well, Prof. Axelson-Fisk concludes.
Data analysis basics and how to make the most of the collected data
Listen to the full interview with Prof. Axelson-Fisk to learn more about different data analysis methods and the data analysis process from start to end, including challenges and pitfalls to avoid.
Malin graduated in engineering physics in 2006, where her research focused on the QCM-D technology. Since then, she has been scrutinizing the how’s and why’s of the world in general, and the world of QCM-D in particular.