Wednesday, 11 March 2020

The Costs of HARKing: Does it Matter if Researchers Engage in Undisclosed Hypothesizing After the Results are Known?

While no-one's looking, a Texas sharpshooter fires his gun at a barn wall, walks up to his bullet holes, and paints targets around them. When his friends arrive, he points at the targets and claims he’s a good shot (de Groot, 2014; Rubin, 2017b). In 1998, Norbert Kerr discussed an analogous situation in which researchers engage in undisclosed hypothesizing after the results are known or HARKing. In this case, researchers conduct statistical tests, observe their results (bullet holes), and then construct post hoc hypotheses (paint targets) to fit these results. In their research reports, they then pretend that their post hoc hypotheses are actually a priori hypotheses. This questionable research practice is thought to have contributed to the replication crisis in science (e.g., Shrout & Rodgers, 2018), and it provides part of the rationale for researchers to publicly preregister their hypotheses before they conduct their analyses (Wagenmakers et al., 2012). In a recent BJPS article (Rubin, 2019), I discuss the concept of HARKing from a philosophical standpoint and then undertake a critical analysis of Kerr’s 12 potential costs of HARKing.



Source: Dirk-Jan Hoek. https://www.flickr.com/photos/23868780@N00/7374874302


I begin my article by noting that scientists do not make absolute, dichotomous judgements about theories and hypotheses being “true” or “false.” Instead, they make relative judgements about theories and hypotheses being more or less true that other theories and hypotheses in accounting for certain phenomena. These judgements can be described as estimates of relative verisimilitude (Cevolani & Festa, 2018).

I then note that a HARKer is obliged to provide a theoretical rationale for their secretly post hoc hypothesis in the Introduction section of their research report. Despite being secretly post hoc, this theoretical rationale provides a result-independent basis for an initial estimate of the relative verisimilitude of the HARKed hypothesis. (The rationale is "result-independent" because it doesn't formally refer to the current result. If it did, then the rationale's post hoc status would no longer be a secret!) The current result can then provide a second, epistemically independent basis for adjusting this initial estimate of verisimilitude upwards or downards (for a similar view, see Lewandowsky, 2019; Oberauer & Lewandowsky, 2019). Hence, readers can estimate the relative verisimilitude of a HARKed hypothesis (a) without taking the current result into account and (b) after taking the current result into account, even if they have been misled about when the researcher deduced the hypothesis. Consequently, readers can undertake a valid updating of the estimated relative verisimilitude of the hypothesis even though, unbeknowst to them, it has been HARKed. Importantly, there's no “double-counting” (Mayo, 2008), “circular reasoning” (Nosek et al., 2018), or violation of the use novelty principle here (Worrall, 1985, 2014), because the current result has not been used in the formal theoretical rationale for the HARKed hypothesis. Consequently, it's legitimate to use the current result to change (increase or decrease) the initial estimate of the relative verisimilitude of that hypothesis.

To translate this reasoning to the Texas sharpshooter analogy, it's necessary to distinguish HARKing from p-hacking. If our sharpshooter painted a new target around his stray bullet hole but retained his substantive claim that he's “a good shot,” then he'd be similar to a researcher who conducted multiple statistical tests and then selectively reported only those results that supported their original a priori substantive hypothesis. Frequentist researchers would call this researcher a “p-hacker” rather than a HARKer (Rubin, 2017b, p. 325; Simmons et al., 2011). To be a HARKer, researchers must also change their original a priori hypothesis or create a totally new one. Hence, a more appropriate analogy is to consider a sharpshooter who changes both their statistical hypothesis (i.e., paints a new target around their stray bullet hole) and their broader substantive hypothesis (their claim). Let's call her Jane!

Jane initially believes “I’m a good shot” (H1). However, after missing the target that she was aiming for (T1), she secretly paints a new target (T2) around her bullet hole and declares to her friends: "I'm a good shot, but I can't adjust for windy conditions. I aimed at T1, but there was a 30 mph easterly cross-wind. So, I knew I'd probably hit T2 instead." In this case, Jane has generated a new, post hoc hypothesis (H2) and passed it off as an a priori hypothesis. Note that, unlike our original Texas sharpshooter, Jane isn't being deceptive about her procedure here (i.e., what she actually did): It's true that she aimed her gun at T1. She's only being deceptive about the a priori status of H2, which she secretly developed after she missed T1 (i.e., she's HARKing). Importantly, however, Jane's deception doesn't prevent her friends from making a valid initial estimate of the verisimilitude of her HARKed hypothesis and then updating this estimate based on the location of her bullet hole:

"We know that Jane's always trained indoors. So, it makes sense that she hasn't learned to adjust for windy conditions. We also know that (a) Jane was aiming at T1, and (b) there was a 30 mph easterly cross-wind. Our calculations show that, if someone was a good shot, and they were aiming at T1, but they didn't adjust for an easterly 30 mph cross-wind, then their bullet would hit T2's location. So, our initial estimated verismilitude for H2 is relatively high. The evidence shows that Jane's bullet did, in fact, hit T2. Consequently, we can tentatively increase our support for H2: Jane appears to be a good shot who can't adjust for windy conditions. Of course, we'd also want to test H2 again by asking Jane to hit targets on both windy and non-windy days!"

We can predict the location of the sharpshooter's bullet hole on the basis of her (secretly HARKed) hypothesis that she is a good shot but cannot adjust for windy conditions. We can then use the location of the bullet hole to increase or decrease our estimated relative verisimilitude for this prediction. Source: https://pixabay.com/photos/woman-rifle-shoot-gun-weapon-2577104/
The second part of my paper provides a critical analysis of Kerr’s (1998) 12 costs of HARKing. For further information, please see:
Rubin, M. (2022). The costs of HARKing. The British Journal for the Philosophy of Science, 73. https://doi.org/10.1093/bjps/axz050 *Publisher’s free access* *Self-archived version*
References
Cevolani, G., & Festa, R. (2018). A partial consequence account of truthlikeness. Synthesehttp://dx.doi.org/10.1007/s11229-018-01947-3
de Groot, A. D. (2014). The meaning of “significance” for different types of research (E. J. Wagenmakers, D. Borsboom, J. Verhagen, R. Kievit, M. Bakker, A. Cramer, . . . H. L. J. van der Maas). Acta Psychologica, 148, 188–194. http://dx.doi.org/10.1016/j.actpsy.2014.02.001
Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2, 196-217. http://dx.doi.org/10.1207/s15327957pspr0203_4
Lewandowsky, S. (2019). Avoiding Nimitz Hill with more than a little red book: Summing up #PSprereg. https://featuredcontent.psychonomic.org/avoiding-nimitz-hill-with-more-than-a-little-red-book-summing-up-psprereg/
Mayo, D. G. (2008). How to discount double-counting when it counts: Some clarifications. The British Journal for the Philosophy of Science, 59, 857–879.
Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115, 2600-2606. http://dx.doi.org/10.1073/pnas.1708274114
Oberauer, K., & Lewandowsky, S., (2019). Addressing the theory crisis in psychology. Psychonomic Bulletin & Reviewhttp://dx.doi.org/10.3758/s13423-019-01645-2
Rubin, M. (2017a). An evaluation of four solutions to the forking paths problem: Adjusted alpha, preregistration, sensitivity analyses, and abandoning the Neyman-Pearson approach. Review of General Psychology, 21, 321-329. http://dx.doi.org/10.1037/gpr0000135 *Self-archived version*
Rubin, M. (2017b). When does HARKing hurt? Identifying when different types of undisclosed post hoc hypothesizing harm scientific progress. Review of General Psychology, 21, 308-320. http://dx.doi.org/10.1037/gpr0000128 *Self-archived version*
Shrout, P. E., & Rodgers, J. L. (2018). Psychology, science, and knowledge construction: Broadening perspectives from the replication crisis. Annual Review of Psychology, 69, 487-510. http://dx.doi.org/10.1146/annurev-psych-122216-011845
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366. http://dx.doi.org/10.1177/0956797611417632
Wagenmakers, E. J., Wetzels, R., Borsboom, D., van der Maas, H. L., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7, 632-638. http://dx.doi.org/10.1177/1745691612463078
Worrall, J. (1985). Scientific discovery and theory-confirmation. In J. C. Pitt (Ed.), Change and progress in modern science: Papers related to and arising from the Fourth International Conference on History and Philosophy of Science (pp. 301–331). Dordrecht, the Netherlands: Reidel. http://dx.doi.org/10.1007/978-94-009-6525-6_11
Worrall, J. (2014). Prediction and accommodation revisited. Studies in History and Philosophy of Science, 45, 54–61. http://dx.doi.org/10.1016/j.shpsa.2013.10.001