Mark Rubin's Research: Two-Sided Significance Tests

In this paper (Rubin, 2022), I make two related points: (1) researchers should halve two-sided p values if they wish to use them to make directional claims, and (2) researchers should not halve their alpha level if they're using two one-sided tests to test two directional null hypotheses.

(1) Researchers should halve two-sided p values when making directional claims

Researchers sometimes conduct two-sided significance tests and then use the resulting two-sided p values to make directional claims. I argue that this approach is inappropriate because two-sided p values refer to non-directional hypotheses, rather than directional hypotheses.

So, for example, if you conduct a two-sided t test and obtain a significant two-sided p value, then your significant result refers to a non-directional null hypothesis (e.g., "men have the same self-esteem as women”), and you should make a corresponding non-directional claim (e.g., "men and women have significantly different self-esteem"). If you wish to make a directional claim (e.g., "men have significantly higher self-esteem than women"), then you should halve your two-sided p value to obtain a one-side p value.

This first point is important because, if you use a two-sided p value to make a decision about a directional null hypothesis, then (a) your evidence will be weaker than it should be (i.e., your p value will be too large), and (b) your Type II error rate will be higher than necessary. For the same view, please see Georgi Georgiev’s onesided.org website here.

(2) Researchers should not halve their alpha level when using two one-sided tests

I also argue that, if you use two one-sided tests to test two directional null hypotheses, then it's not necessary to adjust your alpha level to compensate for multiple testing, because your decision about rejecting each directional hypothesis is based on a single test result, rather than multiple test results.

For example, imagine that you use a one-sided test to test the directional null hypothesis that “men have the same or lower self-esteem than women.” In this case, there's no need to lower your alpha level (e.g., from .050 to .025), because your Type I error rate only refers to a single test of a single null hypothesis. It doesn't refer to either (a) the other directional null hypothesis (i.e., “men have the same or higher self-esteem than women”) or (b) the non-directional null hypothesis (i.e., “men have the same self-esteem as women).” Consequently, no alpha adjustment is required. For similar views, please see Georgi Georgiev's piece here and my paper on multiple testing here.

For further information, please see:

Rubin, M. (2022). That’s not a two-sided test! It’s two one-sided tests! Significance, 19(2), 50-53. Publisher’s version Self-archived version

Monday, 4 April 2022

Two-Sided Significance Tests

(1) Researchers should halve two-sided p values when making directional claims

(2) Researchers should not halve their alpha level when using two one-sided tests