[Comment] Redefine statistical significance

Dienes, Zoltan; Field, Andy; et al,

Benjamin et al.pdf (609.34 kB)

[Comment] Redefine statistical significance

journal contribution

posted on 2023-06-09, 07:19 authored by Zoltan DienesZoltan Dienes, Andy FieldAndy Field, et al

The lack of reproducibility of scientific studies has caused growing concern over the credibility of claims of new discoveries based on “statistically significant” findings. There has been much progress toward documenting and addressing several causes of this lack of reproducibility (e.g., multiple testing, P-hacking, publication bias, and under-powered studies). However, we believe that a leading cause of non-reproducibility has not yet been adequately addressed: Statistical standards of evidence for claiming discoveries in many fields of science are simply too low. Associating “statistically significant” findings with P < 0.05 results in a high rate of false positives even in the absence of other experimental, procedural and reporting problems. For fields where the threshold for defining statistical significance is P<0.05, we propose a change to P<0.005. This simple step would immediately improve the reproducibility of scientific research in many fields. Results that would currently be called “significant” but do not meet the new threshold should instead be called “suggestive.” While statisticians have known the relative weakness of using P˜0.05 as a threshold for discovery and the proposal to lower it to 0.005 is not new (1, 2), a critical mass of researchers now endorse this change. We restrict our recommendation to claims of discovery of new effects. We do not address the appropriate threshold for confirmatory or contradictory replications of existing claims. We also do not advocate changes to discovery thresholds in fields that have already adopted more stringent standards (e.g., genomics and high-energy physics research; see Potential Objections below). We also restrict our recommendation to studies that conduct null hypothesis significance tests. We have diverse views about how best to improve reproducibility, and many of us believe that other ways of summarizing the data, such as Bayes factors or other posterior summaries based on clearly articulated model assumptions, are preferable to P-values. However, changing the P-value threshold is simple and might quickly achieve broad acceptance.

History

Publication status

Published

File Version

Accepted version

Journal

Nature Human Behaviour

ISSN

2397-3374

Publisher

Nature Publishing Group

External DOI

https://doi.org/10.1038/s41562-017-0189-z

Volume

2

Page range

6-10

Department affiliated with

Psychology Publications

Full text available

Yes

Peer reviewed?

Yes

Legacy Posted Date

2017-07-20

First Open Access (FOA) Date

2018-03-01

First Compliant Deposit (FCD) Date

2017-07-20

Usage metrics

Keywords

Uncategorised value

Licence

Copyright not evaluated

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

[Comment] Redefine statistical significance

History

Publication status

File Version

Journal

ISSN

Publisher

External DOI

Volume

Page range

Department affiliated with

Full text available

Peer reviewed?

Legacy Posted Date

First Open Access (FOA) Date

First Compliant Deposit (FCD) Date

Usage metrics

Categories

Keywords

Licence

Exports