Wednesday, 11 March 2020

The Costs of HARKing: Does it Matter if Researchers Engage in Undisclosed Hypothesizing After the Results are Known?

While no-one is looking, a Texas sharpshooter fires his gun at a barn wall. He then walks up to his bullet holes and paints targets around them. When his friends arrive, he points at the targets and claims that he’s a good shot (de Groot, 2014; Rubin, 2017b). In 1998, Norbert Kerr discussed an analogous situation in which researchers engage in undisclosed hypothesizing after the results are known or HARKing. In this case, researchers conduct statistical tests, observe their research results (bullet holes), and then construct post hoc predictions (paint targets) to fit these results. In their research reports, they then pretend that their post hoc hypotheses are actually a priori hypotheses. This questionable research practice is thought to have contributed to the replication crisis in science (e.g., Shrout & Rodgers, 2018), and it provides part of the rationale for researchers to publicly preregister their hypotheses ahead of conducting their research (Wagenmakers et al., 2012). In a recent BJPS article, I discuss the concept of HARKing from a philosophical standpoint and then undertake a critical analysis of Kerr’s 12 potential costs of HARKing.

Source: Dirk-Jan Hoek.
I begin my article by arguing that scientists do not make absolute, dichotomous judgements about theories and hypotheses being “true” or “false.” Instead, they make relative judgements about theories and hypotheses being more or less true that other theories and hypotheses in accounting for certain phenomena. Such judgements can be described as estimates of relative verisimilitude (Cevolani & Festa, 2018).
I then note that HARKers are obliged to provide a theoretical rationale for each of their secretly post hoc hypotheses in the Introduction sections of their research reports. Despite being secretly post hoc, this theoretical rationale provides a result-independent basis for an initial estimate of the relative verisimilitude of a hypothesis. The reported research results can then provide a second, epistemically independent basis for adjusting this initial estimate (for a similar view, see Lewandowsky, 2019; Oberauer & Lewandowsky, 2019). Hence, readers can estimate the relative verisimilitude of a hypothesis (a) without taking the current result into account and (b) after taking the current result into account, even if they have been misled about when researchers constructed the hypothesis. Consequently, readers are able to undertake a valid counterfactual updating of their estimated relative verisimilitude of a hypothesis even though HARKing has occurred. Importantly, there is no “double-counting” (Mayo, 2008) or violation of the use novelty principle here (Worrall, 1985, 2014), because the current result contributes new information to an initial estimate of relative verisimilitude that has been generated in a result-independent manner.
To translate this reasoning to the Texas sharpshooter analogy, it is necessary to distinguish HARKing from p-hacking. If our sharpshooter painted a new target but retained his substantive claim that he is “a good shot,” then he would be similar to a researcher who conducted multiple statistical tests and then selectively reported only those results that supported their original a priori substantive hypothesis. Frequentist researchers would describe this researcher as a “p-hacker” rather than a HARKer (Rubin, 2017b, p. 325; Simmons et al., 2011). To be a HARKer, researchers must also change their original a priori hypothesis or create a totally new one. Hence, a more appropriate analogy is to consider a sharpshooter who changes both their statistical hypothesis (their target's location) and their broader substantive hypothesis (their claim).
For example, another sharpshooter, Jane, might initially believe “I’m a good shot” but, after seeing that she has missed the target that she was aiming for, she secretly paints a target around her stray bullet hole and declares to her friends: “I’m a good shot, but I can’t adjust for windy conditions.” Based on their a priori knowledge about Jane, her friends should be able to form an initial opinion about the verisimilitude of this claim (e.g., "Jane's always trained indoors. So, we can deduce that she hasn't learned to adjust for windy conditions.") To support her claim, Jane provides her friends with accurate procedural information about her shot (i.e., open research data and materials), including (a) the direction in which she was aiming her gun when she took the shot and (b) the speed and direction of the wind at the time of her shot. Her friends are then able to combine this procedural information with a priori theoretical information about the way in which gun shots are affected by the wind in order to calculate the predicted location of Jane’s bullet hole in a result-independent manner. They observe that this predicted location matches the location of Jane's bullet hole and (newly painted) target. Based on this match, they are warranted to increase their belief in the (secretly HARKed) hypothesis that Jane is a good shot but cannot adjust for windy conditions.
We can predict the location of the sharpshooter's bullet hole on the basis of her (secretly HARKed) hypothesis that she is a good shot but cannot adjust for windy conditions. We can then use the location of the bullet hole to increase or decrease our estimated relative verisimilitude for this prediction. Source:
I should note that my current paper contradicts one of the points that I made in my 2017b article "When does HARKing Hurt?" In that previous article, I argued that a result cannot be used to support a hypothesis if it has already been used to construct that hypothesis. In the current paper, I argue that even a result that has been used to construct a hypothesis can support that hypothesis if the hypothesis can be reconstructed on the basis of a priori theory and evidence that is epistemically independent from that result. The fact that the inspiration for a hypothesis can be, or has been, influenced by a result doesn't mean that it can't also be deduced independent from that result. And, if a hypothesis can be deduced independent from a result, then the result can be used to update an initial estimate of relative verisimilitude that is based on that deduction. HARKing is only problematic in the case of ad hoc accommodation, in which the rationale for the hypothesis or model is induced from the current data per se rather than deduced from a priori theory and evidence. In this case, there is no result-independent basis for establishing an initial estimate of relative verisimilitude.
The second part of my paper provides a critical analysis of Kerr’s (1998) 12 costs of HARKing. For further information, please see:
Rubin, M. (2019). The costs of HARKing. The British Journal for the Philosophy of Science. *Publisher’s free access* *Self-archived version*
Cevolani, G., & Festa, R. (2018). A partial consequence account of truthlikeness. Synthese
de Groot, A. D. (2014). The meaning of “significance” for different types of research (E. J. Wagenmakers, D. Borsboom, J. Verhagen, R. Kievit, M. Bakker, A. Cramer, . . . H. L. J. van der Maas). Acta Psychologica, 148, 188–194.
Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2, 196-217.
Lewandowsky, S. (2019). Avoiding Nimitz Hill with more than a little red book: Summing up #PSprereg.
Mayo, D. G. (2008). How to discount double-counting when it counts: Some clarifications. The British Journal for the Philosophy of Science, 59, 857–879.
Oberauer, K., & Lewandowsky, S., (2019). Addressing the theory crisis in psychology. Psychonomic Bulletin & Review
Rubin, M. (2017a). An evaluation of four solutions to the forking paths problem: Adjusted alpha, preregistration, sensitivity analyses, and abandoning the Neyman-Pearson approach. Review of General Psychology, 21, 321-329. *Self-archived version*
Rubin, M. (2017b). When does HARKing hurt? Identifying when different types of undisclosed post hoc hypothesizing harm scientific progress. Review of General Psychology, 21, 308-320. *Self-archived version*
Shrout, P. E., & Rodgers, J. L. (2018). Psychology, science, and knowledge construction: Broadening perspectives from the replication crisis. Annual Review of Psychology, 69, 487-510.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366.
Wagenmakers, E. J., Wetzels, R., Borsboom, D., van der Maas, H. L., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7, 632-638.
Worrall, J. (1985). Scientific discovery and theory-confirmation. In J. C. Pitt (Ed.), Change and progress in modern science: Papers related to and arising from the Fourth International Conference on History and Philosophy of Science (pp. 301–331). Dordrecht, the Netherlands: Reidel.
Worrall, J. (2014). Prediction and accommodation revisited. Studies in History and Philosophy of Science, 45, 54–61.

Citation: Rubin, M. (2020, March 11). The costs of HARKing: Does it matter if researchers engage in undisclosed hypothesising after the results are known?