Podpora nulové hypotézy a její miskoncepce v psychologii: Teoretické představení testování ekvivalence

David Lacko; Tomáš Prošek

doi:10.5817/TF2021-14-13648

Podpora nulové hypotézy a její miskoncepce v psychologii: Teoretické představení testování ekvivalence

No.14(2021)

David Lacko Tomáš Prošek

https://doi.org/10.5817/TF2021-14-13648

PDF (Czech)

Abstract

Tento teoretický článek představuje způsoby, kterými lze statisticky argumentovat ve prospěch nulové hypotézy. Představuje čtyři způsoby, které lze využít k testování ekvivalence: metoda dvou jednostranných testů (TOST), p-hodnotu druhé generace (SGPV), Bayesův faktor (BF) a oblast praktické ekvivalence (ROPE). Článek je doplněn o praktické ukázky možných výsledků TOST. Součástí článku je také nezbytné objasnění logiky testování hypotéz a p-hodnoty a kritická analýza výhod a nevýhod popsaných postupů.

Keywords:
P-hodnota; Testování ekvivalence; Nulová hypotéza; Testování hypotéz; TOST

References

Aczel, B., Palfi, B., Szollosi, A., Kovacs, M., Szaszi, B., Szecsi, P., ... , & Wagenmakers, E. J. (2018). Quantifying Support for the Null Hypothesis in Psychology: An Empirical Investigation. Advances in Methods and Practices in Psychological Science, 1(3), 257-366. https://doi.org/10.1177/2515245918773742.

Anvari, F., & Lakens, D. (2019, September 9). The Replicability Crisis and Public Trust in Psychological Science. PsyArXiv Preprints. https://doi.org/10.31234/osf.io/vtmpc.

Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., ... Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6-10. https://doi.org/10.1038/s41562-017-0189-z.

Blume, J. D., D’Agostino McGowan, L., Dupont, W. D., & Greevy, R. A. (2018). Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses. PLOS ONE, 13(3), e0188299. https://doi.org/10.1371/journal.pone.0188299.

Blume, J. D., Greevy, R. A., Welty, V. F., Smith, J. R., & Dupont, W. D. (2019) An Introduction to Second-Generation p-Values. The American Statistician, 73(sup1), 157-167. https://doi.org/10.1080/00031305.2018.1537893.

Campbell, H., & Gustafson, P. (2018). Conditional equivalence testing: An alternative remedy for publication bias. PLoS ONE 13(4), e0195145. https://doi.org/10.1371/journal.pone.0195145.

Cassidy, S. A., Dimova, R., Giguère, B., Spence, J. R., & Stanley, D. J. (2019). Failing Grade: 89% of Introduction-to-Psychology Textbooks That Define or Explain Statistical Significance Do So Incorrectly. Advances in Methods and Practices in Psychological Science, 2(3), 233–239. https://doi.org/10.1177/2515245919858072.

Correll, J., Mellinger, Ch., McClelland, G. H., & Judd, Ch. M. (2020, v tisku). „Avoid Cohen’s ‘Small’, ‘Medium’, and ‘Large’ for Power Analysis.“ Trends in Cognitive Sciences, https://doi.org/10.1016/j.tics.2019.12.009.

Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge.

Demidenko, E. (2016). The p-Value You Can’t Buy. The American Statistician, 70(1), 33-38. https://doi.org/10.1080/00031305.2015.1069760.

Dienes, Z. (2011). Bayesian versus orthodox statistics: Which side are you on? Perspectives on Psychological Science, 63(3), 274-290. https://doi.org/10.1177/1745691611406920.

Dienes, Z. (2014). Using Bayes to get the most out of non-significant results. Frontiers in Psychology, 5, 781. https://doi.org/10.3389/fpsyg.2014.00781.

Dienes, Z. (2016). How Bayes factors change scientific practice. Journal of Mathematical Psychology, 72, 78-89. https://doi.org/10.1016/j.jmp.2015.10.003.

Fritz, A., Scherndl, T., & Kuhberger, A. (2013). A comprehensive review of reporting practices in psychological journals: Are effect sizes really enough?. Theory & Psychology, 23(1), 98-122. https://doi.org/10.1177/0959354312436870.

Gagnier, J. J, & Morgenstern, H. (2017). Misconception, misuses, and misinterpretation of P values and significance testing. Journal of Bone and Joint Surgery, 99(18), 1598-1603. https://doi.org/10.2106/JBJS.16.01314.

Goodman, S. N. (2008). A dirty dozen: Twelve P-value misconceptions. Seminars in Hematology, 45(3), 135–140. https://doi.org/10.1053/j.seminhematol.2008.04.003.

Greenland, S. (2019). Valid P-Values Behave Exactly as They Should: Some Misleading Criticisms of P-Values and Their Resolution With S-Values. The American Statistician, 73(1), 106-114. https://doi.org/10.1080/00031305.2018.1529625.

Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. European journal of epidemiology, 31(4), 337–350. https://doi.org/10.1007/s10654-016-0149-3.

Harms, C., & Lakens, D. (2018). Making ’Null Effects’ Informative: Statistical Techniques and Inferential Frameworks. Journal of Clinical and Translational Research, 3(2), 382–393.

Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E.-J. (2014). Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21(5), 1157–1164. https://doi.org/10.3758/s13423-013-0572-3.

Kass, R., & Raftery, A. (1995) Bayes Factors. Journal of the American Statistical Association, 90(430), 773-795. http://dx.doi.org/10.2307/2291091.

Kruschke, J. K. (2011). Bayesian assessment of null values via parameter estimation and model comparison. Perspectives on Psychological Science, 6(3), 299-312. https://doi.org/10.1177/1745691611406925.

Kruschke, J. K. (2014). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Boston: Academic Press.

Kruschke, J. K. (2018). Rejecting or accepting parameter values in Bayesian estimation. Advances in Methods and Practices in Psychological Science, 1(2), 270-280. https://doi.org/10.1177/2515245918771304.

Kruschke, J. K., & Liddell, T. M. (2018a). Bayesian data analysis for newcomers. Psychonomic Bulletin & Review, 25(1), 155–177. https://doi.org/10.3758/s13423-017-1272-1.

Kruschke, J. K., & Liddell, T. M. (2018b). The Bayesian new statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin & Review, 25(1), 178-206. https://doi.org/10.3758/s13423-016-1221-4.

Lakens, D. (2017). Equivalence tests: A practical primer for t tests, correlations, and meta-analyses. Social Psychological & Personality Science, 8(4), 355–362. https://doi.org/10.1177/1948550617697177.

Lakens, D. (2018). Two One-Sided Tests (TOST) Equivalence Testing. R package version 0.3.4. https://cran.r-project.org/web/packages/TOSTER/.

Lakens, D. (2019, April 9). The practical alternative to the p-value is the correctly used p-value. PsyArXiv Preprints. Version 2 (September 12, 2019). https://doi.org/10.31234/osf.io/shm8v.

Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A. J., Argamon, S. E., ... Zwaan, R. A. (2018a). Justify your alpha. Nature Human Behaviour, 2(3), 168-171. https://doi.org/10.1038/s41562-018-0311-x.

Lakens, D., & Delacre, M. (2018, August 28). Equivalence Testing and the Second Generation P-Value. PsyArXiv Preprints. Version 3 (April 24, 2019). https://doi.org/10.31234/osf.io/7k6ay.

Lakens, D., McLatchie, N., Isager, P. M., Scheel, A. M., & Dienes, Z. (2018b). Improving Inferences about Null Effects with Bayes Factors and Equivalence Tests. The Journals of Gerontology: Series B. Psychological Science and Social Sciences 75(1): 45-57. https://doi.org/10.1093/geronb/gby065.

Lakens, D., Scheel, A. M., & Isager, P. M. (2018c). Equivalence Testing for Psychological Research: A Tutorial. Advances in Methods and Practices in Psychological Science, 1(2), 259–269. https://doi.org/10.1177/2515245918770963.

Lambert, B. (2018). A Student's Guide to Bayesian Statistics. London: SAGE publications.

Lee, M. D., & Wagenmakers, E.-J. (2013). Bayesian cognitive modeling: A practical course. Cambridge University Press.

Ly, A. (2017). Bayes Factors for Research Workers (Doctoral dissertation). Retrieved from: https://hdl.handle.net/11245.1/e601b852-1b29-407b-a276-1ccd2a2ed37b.

Makowski, D., Ben-Shachar M. S. & Lüdecke, D. (2019). Understand and Describe Bayesian Models and Posterior Distributions using bayestestR. R package version 0.2.5. https://cran.r-project.org/web/packages/bayestestR/.

Meyners, M. (2012). Equivalence tests — A review. Food Quality and Preference, 26(2), 231-245. https://doi.org/10.1016/j.foodqual.2012.05.003.

Miller, J., & Ulrich, R. (2019). The quest for an optimal alpha. PLoS ONE, 14(1): e0208631. https://doi.org/10.1371/journal.pone.0208631.

Morey, R. D., & Rouder, J. N. (2011). Bayes factor approaches for testing interval null hypotheses. Psychological Methods, 16(4), 406–419. https://doi.org/10.1037/a0024377.

Morey, R. D., Romeijn, J.-W., & Rouder, J. N. (2016). The philosophy of Bayes factors and the quantification of statistical evidence. Journal of Mathematical Psychology, 72, 6–18. https://doi.org/10.1016/j.jmp.2015.11.001.

Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5(2), 241-301. https://doi.org/10.1037/1082-989x.5.2.241.

Perezgonzalez, J. D. (2015). Fisher, Neyman-Pearson or NHST? A tutorial for teaching data testing. Frontiers in psychology, 6, Article ID 223. https://dx.doi.org/10.3389/fpsyg.2015.00223.

Rogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113(3), 553-565. https://doi.org/10.1037/0033-2909.113.3.553.

Rouder, J. N., & Morey, R. D. (2011). A Bayes-factor meta analysis of Bem’s ESP claim. Psychonomic Bulletin & Review, 18(4), 682–689. https://doi.org/10.3758/s13423-011-0088-7.

Ruiter, J. P. (2019). Redefine or justify? Comments on the alpha debate. Psychonomic Bulletin & Review, 26(2), 430-433. https://doi.org/10.3758/s13423-018-1523-9.

Schuirmann, D. J. (1987). A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics, 15(6), 657–680. https://doi.org/10.1007/BF01068419.

Simonsohn, U. (2015). Small telescopes detectability and the evaluation of replication results. Psychological Science, 26(5), 559–569. https://doi.org/10.1177/0956797614567341.

Simonsohn, U. (2019). [78c] Bayes Factors in Ten Recent Psych Science Papers. Data Colada, http://datacolada.org/78c.

Stegner, B. L., Bostrom, A. G., & Greenfield, T. K. (1996). Equivalence testing for use in psychosocial and services research: An introduction with examples. Evaluation and Program Planning, 19(3), 193-198. https://doi.org/10.1016/0149-7189(96)00011-0.

van de Schoot, R., & Depaoli, S. (2014). Bayesian analyses: Where to start and what to report. European Health Psychologist, 16(2), 75–84.

van de Schoot, R., Kaplan, D., Denissen, J., Asendorpf, J. B., Neyer, F. J., & Aken, M. A (2014). A gentle introduction to Bayesian analysis: Applications to developmental research. Child Development, 85(3), 842–860. https://doi.org/10.1111/cdev.12169.

van de Schoot, R., Winter, S. D., Ryan, O., Zondervan-Zwijnenburg, M., & Depaoli, S. (2017). A systematic review of Bayesian articles in psychology: The last 25 years. Psychological Methods, 22(2), 217–239. https://doi.org/10.1037/met0000100.

Wagenmakers, E.-J., Marsman, M., Jamil, T., Ly, A., Verhagen, A. J., Love, J., . . . Morey, R. D. (2018). Bayesian statistical inference for psychological science. Part I: Theoretical advantages and practical ramifications. Psychonomic Bulletin & Review, 25(1), 35-57. https://doi.org/10.3758/s13423-017-1343-3.

Walker, E., & Nowacki, A. S. (2011). Understanding equivalence and noninferiority testing. Journal of general internal medicine, 26(2), 192–196. https://dx.doi.org/10.1007/s11606-010-1513-8.

Wasserstein, R. L., & Lazar, N. A. (2016). The ASA Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70(2), 129-133. https://doi.org/10.1080/00031305.2016.1154108.

Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “p < 0.05.” The American Statistician, 73(Suppl. 1), 1–19. https://doi.org/10.1080/00031305.2019.1583913.

Welty, V., Stewart, T., Greevy, R., D'Agostino McGowan, L., & Blume, J. (2018). R package for calculating second-generation p-values and associated measures. R package version 0.0.1. https://github.com/weltybiostat/sgpv.

Metrics