Week 5: Power and Study Design

Graduate Class: Open and Reproducible Science (syllabus)

This week, we discussed statistical power (the likelihood of finding an effect when it in fact exists) and study design. When attempting to calibrate power for a given study a lot of choices need to be made. For example, the smallest effect size of interest (SESOI), researcher resources, study design, and number of participants to recruit. There are a lot of wonderful resources available to help researchers understand all of this a lot better, and we reviewed a subset of these resources in this class.

In this space I want to focus more specifically on one of the in-class demonstrations regarding low power and over estimating effect sizes. Using the p-hacker shiny app, we simulated data for an experiment with two conditions containing 20 participants each, across 10 dependent variables (seed set at 108). We assumed that a true effect exists in the population, but that this effect is very small: d = .05. Using the “pwr” package in R , the statistical power of this study is estimated to be .052, meaning there is only a 5.2% chance of obtaining a statically significant effect. Actually running this study with such low power would be a waste of time and resources of course, but luckily simulating such a study takes very little time. We did need to run a few “studies” to find an effect, but the study using seed 108 was a winner. Here are our study “results”:


Excellent, we obtained 2 significant effects! With a bit of p-hacking we could likely turn the p value for the combined DV from .064 (non-significant) to less than .05 (considered significant)*. But for now let’s focus on the results for DV2. The p value for the F test = .023–a respectable degree of significance one might say. The effect size of this test statistic (obtained using an online effect size calculator) is d = .751 (95% CI: .109 – 1.39). We have therefore overestimated the true effect size by a massive degree (d = .70). Other simulations with larger true effect sizes and sample sizes continued to demonstrate that the true effect size was overestimated, usually by non-trivial amounts. The students were surprised by all of this.

A goal of this class was to demonstrate to students the decisions that need to be taken into account when seriously considering statistical power during the study design phase of the research. As part of this goal, I did my best to show them that they *need* to seriously consider statistical power when designing their studies. A goal of the entire course is to convince the students of the importance of openly sharing their decision making process, at all phases of the research process.

*indeed, by removing the highest value in the control condition the p-value for this interaction moves to .026.

Leave a Reply

Your email address will not be published. Required fields are marked *