Background and Problem Identification
Most of the social / educational program are evaluated this way or another, and on this post I would like to focus on repeated measures of the same group of participants or individuals, as opposed to different groups comparisons or tests.
In many occasions we want to learn what is the impact of an intervention on attitudes, perceptions, and behavior; and by this we want to isolate the impact of the specific intervention, hence the program, and see how it changed the attitudes, perceptions, behavior, or a specific situation; in order to infer whether the intervention was effective or not.
Many of us will conduct t-test or repeated measures test. Another common way to investigate those questions is using a linear regression model; and by this try to predict the change on our dependent variable by a series of controlled variables. However, here comes the “catch” –
“Regression to the mean (RTM) is a statistical phenomenon that can make natural variation in repeated data look like real change. It happens when unusually large or small measurements tend to be followed by measurements that are closer to the mean.”
( Barnett et al., 2005)
The problem (RTM) may occur whether we measure an individual or a group, due to the random error (within-subject variance and between-subject variance).
A similar problem is identified as “a standard error of measurement (SEM), which refers to the standard deviation of an individual’s observed scores from repeated administrations of a test (or parallel forms of a test) under identical conditions”
(Koizumi et al., 2015)
The problem: variations in data sometimes DO NOT reflect a real change, but a correction of a previous random error.
In other words, we jump too fast to define a correlation as a causation, without checking carefully it really is!
Indeed, research conducted to investigate these measurement errors in social implications shows that many changes are accounted for RTM or SEM, and do not reflect a real change (Marsden amd Torgerson, 2012; Koizumi et al., 2015).
Solutions and Food for Thoughts:
Be careful when you aim to predict something. Do not assume a vacuum. On the contrary, plan the study cautiously and take into account alternative explanations, and different routes for interpretation. In fact, there is some good advise on how to reduce the chance your study’s results will be affected by natural errors such as RTM.
- assign participant randomly for all groups
- make sure groups are the same size
- always include a control group
- control for alternative variables
- use tools with high reliability
- control for background variables and context
Data Collection and Analysis:
- conduct more than one pretest
- collect two or more baseline data
- control for baseline average / st. dev. by adding the group mean to the equation (either on regression or Ancova)
(Koizumi et al., 2015; Bonate, 2000; Marsden and Torgerson, 2012)
Implications on Program Evaluation
Many social and education program seek to change an attitude or perception, and assist participants in gaining knowledge of certain areas (such as financial literacy or second language).
Evaluation for these program usually focuses on perception measurement using a before-after design. Most of the time, RTS is not taken into account, and therefore interpretation of program impact may be wrong. Needless to say, designs without a “before” measurement worth NOTHING in terms of explaining program impact or change. In addition, there is a second aspect to emphasize which is the presence of a control group. Very often it is very difficult to compose a group of participants just for the sake of evaluation; however you should take into account that if you do not do it, you will never be able to correctly assess neither a baseline nor a change in your group of study.
In short: be cautious, plan and conduct evaluation carefully, when bearing in mind that a change in attitudes, perception, behaviour or knowledge, can be explained by a variety of explanations, that may be slightly different than the intervention you evaluate.
Subscribe if you liked (:
…and feel free to contact me regarding program evaluation consulting projects
Barnett, A.G., Van der Pols, L., Dobson, A. (2005). Regression to the mean: what it is and how to deal with it. Int. J. Epidemiol. , 34 (1):215-220.
Bonate, P. L. (2000). Analysis of pretest–posttest designs. Boca Raton, FL.
Guisasola, J., Solbes, J., José-Ignacio, B., Maite, M., Antonio, M. (2009). Students’ Understanding of the Special Theory of Relativity and Design for a Guided Visit to a Science Museum. In: International Journal of Science Education 31(15), 2085-2104
Koizumi, R., In’nami, Y., Azuma, J., Asano, K., Agawa, T., Eberl, D. (2015). Assessing L2 proficiency growth: Considering regression to the mean and the standard error of difference. Shiken, 19(1).
Marsden, E., Torgerson, C. J. (2012). Single group, pre- and post-test research designs: Some methodological concerns. Oxford Review of Education, 38, 583–616.