Thursday 12 March 2020

Do p Values Lose their Meaning in Exploratory Analyses?

In Rubin (2017), I consider the idea that p values lose their meaning (become invalid) in exploratory analyses (i.e., non-preregistered analyses). I argue that this view is correct if researchers aim to control a familywise error rate that includes all of the hypotheses that they have tested, or could have tested, in their study (i.e., a universal, experimentwise, or studywise error rate). In this case, it is not possible to compute the required familywise error rate because the number of post hoc hypotheses that have been tested, or could have been tested, during exploratory analyses in the study is unknown. However, I argue that researchers are rarely interested in a studywise error rate because they are rarely interested in testing the joint studywise hypothesis to which this error rate refers.
For example, imagine that a researcher conducted a study in which they explored the associations between body weight and (1) gender, (2) age, (3) ethnicity, and (4) social class. This researcher is unlikely to be interested in a studywise null hypothesis that can be rejected following a significant result for any of their four tests, because this joint null hypothesis is unlikely to relate to any meaningful theory. Which theory proposes that gender, age, ethnicity, and social class all predict body weight for the same theoretical reason? And, if the researcher is not interested in making a decision about the studywise null hypothesis, then there is no need for them to lower the alpha level (α; the significance threshold) for each of their four tests (e.g., from α = .050 to α = .050/4 or .0125) in order to maintain the Type I error rate for their decision about the studywise hypothesis at α = .050. Instead, the researcher can test each of the four different associations individually (i.e., each at α = .050) in order to make a separate, independent claim about each of four theoretically independent hypotheses (e.g., "male participants weighed more than female participants, p = .021"). By analogy, a woman who takes a pregnancy test does not need to worry about the familywise error rate that either her pregnancy test, her fire alarm, or her email spam filter will yield a false positive result because the associated joint hypothesis is nonsensical.
Sometimes it doesn't make sense to combine different
hypotheses as part of the same family!
Researchers should only be concerned about the familywise error rate of a set of tests when that set refers to the same theoretically meaningful joint hypothesis. For example, a researcher who undertakes exploratory analyses should be concerned about the familywise error rate for the hypothesis that men weigh more than women if they use four different measures of weight, and they are prepared to accept a single significant difference on any of those four measures as grounds for rejecting the associated joint null hypothesis. In this case, they should reduce their alpha level for each constituent test (e.g., to α/4) in order to maintain their nominal Type I error rate for the joint hypothesis at α. Based on this reasoning, I argue that p values do not lose their meaning in exploratory analyses because (a) researchers are not usually interested in the studywise error rate, and (b) they are able to transparently and verifiably specify and control the familywise error rates for any theoretically meaningful post hoc joint hypotheses about which they make claims.
I also recommend that researchers undertake a few basic open sciences practices during exploratory analyses in order to alleviate concerns about potential p-hacking: (1) List all of the variables in the research study. (2) Undertake a sensitivity analysis to demonstrate that the research results are robust to alternative analytical approaches. (3) Make the research data and materials publicly available to allow readers to check whether the results for any relevant measures have been omitted from the research report.

For further information, please see:
Rubin, M. (2017). Do p values lose their meaning in exploratory analyses? It depends how you define the familywise error rate. Review of General Psychology, 21, 269-275. *Publisher’s version* *Self-archived version*