If you spend a lot of time planning, executing, and evaluating experiments, time efficiency may be important. A method that could help you both save time and resources and give you as accurate results as possible is Design of Experiments, DoE. We were curious to learn more about this method, and had the privilege to talk to Prof. Marina Axelson-Fisk, Professor in Mathematical Statistics at Chalmers University of Technology. In the interview, Prof. Axelson-Fisk shared her knowledge on how to use DoE to efficiently plan your work and to make the most of the time you spend in the lab.
In brief, design of experiments is a way of making the most of the experiments by planning ahead from the beginning to the end, Prof Axelson-Fisk says. It includes everything from data collection to sample sizes, defining the population, getting an idea of the measurement errors, and what analysis methods to use and so on. With the DoE method, you plan every step of the way before you even begin to collect your data. You can say that it is an umbrella strategy that can help you find answers in an efficient way, Prof Axelson-Fisk says.
Design of Experiments helps you save time and resources. It makes the analysis more efficient and accurate, and it also makes the data collection more efficient, Prof Axelson-Fisk says. Additionally, it is an efficient strategy to help you find the answers that you are looking for. If you do not take care, you may actually miss true phenomena. For example, when you measure something to see a relation between variables, there may be other variables that influence your measurements, and which hide the effects that you are looking for. So, your hypothesis may be true, but you cannot see it because you have not been careful when planning the experiment, Prof Axelson-Fisk says.
Before you start, you must figure out what factors you believe influence the result, Prof Axelson-Fisk says. If there are external factors other than those you are interested in, and you do not include these, you may not see the effects of the variable that you are interested in, she explains. Say for example that you want to know where the difference in blood pressure in humans comes from, and whether or not it is affected by the weight of the individual. If you only build a model that contains weight plus-minus some error, which contains everything else, you may not necessarily see that the effect of weight because the variation of the error is so big. However, if you include other factors, like gender, age, etc., so that you separate the sources of variation, then you may see that weight is significant. Whereas if you disregard all the other factors, your effect may drown in the large variation of these other factors so to speak, Prof Axelson-Fisk says.
The common answer from a statistician is always “it depends”, Prof Axelson-Fisk says with a smile. How many factors to take into account depends on how large sample you take, how big the variation is, and how precise you need your results to be. If you take a large enough sample, then you can take as many factors as you want. Typically, you would have two or three factors in the same model. Or, you can have a so-called factorial design which is common, for example when you want to optimize the process of something and want to figure out the optimal settings of various factors. As a first approach, it is common to include a large number of factors, but only at two levels, i.e., two different settings of each factor, to identify which factors that influence your process the most. The next step is then to determine the optimal settings of these factors. But in short, the number of factors to include depends on how large sample you can afford, or how big the variation is in the process, Prof Axelson-Fisk explains.
The term ‘sample’ has a very specific meaning for a statistician, Prof Axelson-Fisk says. The goal is to draw a conclusion of the population as a whole with respect to some feature. Say for example that you want to know what the variation of blood pressure is in the human population, and how much of it depends on weight. Usually, you cannot measure the entire population. If you could, then you would not need statistics, because then you already have the answer. In situations where you cannot measure the entire population, you must draw a small sample. This sample should then be as good representation of the entire population as possible.
In statistics, sample means that the observations are independent and from the same distribution I.e., we draw individuals randomly, independently from each other, and from the same population. It is very important that the sample mirrors the different sources of variation in the population. If there are different subpopulations, resulting in different distributions of the feature of interest, this needs to be taken into account in the sample. For instance, if the basic levels of blood pressure differ between the genders, the sample needs to include the genders in the same proportions as in the population. You need to make sure that the proportions of the different subpopulations in your sample mirror the proportions in the population. If you have inhomogeneities in your population, it must be represented. Basically, the sample must be a good representation of the variation of the population as a whole. Otherwise, your conclusions of the sample will only be true for that specific sample and will not hold to extrapolate to the population, Prof Axelson-Fisk says.
Everyone who is doing experiments and who wants to draw proper conclusions from these experiments, benefits from using design of experiments, Prof Axelson-Fisk says. In its simplest form it simply means that you think your experiment through, what you want to achieve with it and the best way to go about it. Note that there is a difference between exploratory and explanatory analysis. If you are new to a field, or if you are going into a new direction, then perhaps you do not know what hypothesis to test or what relations between variables there are. In this situation, you may collect data without any preconditions and just look for patterns. You do not do any statistical inference of that data, but rather use descriptive statistics to identify patterns and trends in the data. But once you have found some interesting patterns you cannot do the hypothesis testing of the same data set, since you do not know if the pattern you found is only a random event in your data or if it is true to the whole population. So, you may begin with an exploratory analysis to find potential patterns, hypothesis, or relations. Then, you can do an explanatory analysis. You draw a new sample from the population and test those hypotheses on that to test if the pattern is in fact present in the population at large and not just some unlucky choice of data, Prof Axelson-Fisk says.
In brief, you typically begin by having some hypothesis that you would like to test, Prof Axelson-Fisk says. You want to see if there is a relation between something that you measure and some variables. You decide to test this, so you want to draw a sample. Now, you must determine what is the population that you want to test this on, and what the population looks like. Are there any inhomogeneities? Is it difficult to sample from the entire population? Etc. There is a whole theory on how to select a representative sample. How large sample you need depends on how big the variation is in the population, and how large error margin you can allow. For instance, if you want to claim that a medicine has effect, you may want to be much more certain than if you test something less critical. The more certain you want to be, the larger sample you should have. So, the planning on how big margins you have and how big variance you have you have to decide on the sample size. There are of course methods for computing this as well, Prof Axelson-Fisk says. Next, depending on what question you are asking, you must choose analysis method, and model. Along with the model comes conditions that must be fulfilled. For example, common conditions are that the data must be normally distributed, it should be independently sampled, the variance must be homogeneous over your sample and so on, so you need to make sure that your data fulfills the conditions. If the data does not fulfill the conditions, either you must rethink how you sample your data, if your data can be transformed to meet the conditions, or you have to choose another method. Then, given what method you have chosen to analyze your data, there are lots of tools to help you to do the analysis and how to draw the conclusions. These are the main steps, Prof Axelson-Fisk says.
Depending on what you want to achieve there are various types of statistical analysis methods to use. For instance, if you want to compare populations to see if there is a difference between them, between actual individuals or between methods etc., then you do a so-called two-sample t-test, Prof Axelson-Fisk says. If you have more than two populations, subpopulations, or groups that you want to compare, then you have analysis of variance (ANOVA). Analysis of variance is an extension of two sample t-tests to more than two populations. If you want to explicitly determine the relationship between your measurements and some factors, then you may use for example linear regression, or other forms of regression. In regression, you fit a model to your data. That model can then be used to test if the factors have an influence on the response or to predict what response you would get if you made new measurements of your factors, Prof Axelson-Fisk explains.
In a situation where you would like to optimize the surface uptake of a molecule as a function of, for example, pH and salt concentration, you may use factorial design, where you split your design into factors, Prof Axelson-Fisk says. You have your measurements, which we call the response, and you have your different factors, in this case pH, and salt concentration, that you believe influence the response. If you want to optimize the molecule uptake at the surface, i.e., identify where you get the highest or the lowest response, then you want to find the optimal settings of the two factors. Typically, you determine levels, i.e., fixed values, of these factors. The most common, and most efficient factorials designs, have two levels of each factor, but you could have more. You then randomize the experiment, so you run the experiment on each combination of the factor settings in random order. If you have pH level and salt level, then you choose two pH levels and two salt levels. In the lab you will then have four different combinations of factors and then you will measure your response on these four different conditions, Prof Axelson-Fisk says.
If you do not have any prior knowledge of the system, say you want to maximize the molecular uptake, and you do not know if you need a high salt concentration or a low one, then you must guess which levels to choose but they should be reasonably far apart, Prof Axelson-Fisk says. When you begin your experiment, you may for example see that high salt concentration results in a larger molecular uptake than a low concentration. Then you know that you should go for the higher one. Next, you may choose two new levels, or you may use the so-called response surface, a plot of the result as a function of the factors, to guide you in which direction the molecular uptake increases. Now, you will try new levels of your factors, and maybe eventually you will identify the range, between which values where you have the maximum molecular uptake. Then you can do smaller steps, i.e., choose levels more densely, to pin-point the actual optimum, Prof Axelson-Fisk says.
Listen to the interview with Prof. Axelson-Fisk to learn more about how to use design of experiments to efficiently plan your work and to make the most of the time spent in the lab. In the conversation we also talk about what challenges and difficulties that may arise when using DoE, and what pitfalls to look out for.
Learn about three simple ways that can help you get the most out of your QCM-D data collection.
Read the interview with Dr Deborah Rupert to learn about how work-related stress can be reduced and how burnout can be avoided.
Malin graduated in engineering physics in 2006, where her research focused on the QCM-D technology. Since then, she has been scrutinizing the how’s and why’s of the world in general, and the world of QCM-D in particular.