Replication, which confirms the accuracy of empirical findings in research studies, is crucial to psychological science (Brandt et al., 2013). For replication study to be considered successful, it requires using similar conditions present in the initial experiment and yielding the same effects (MLP; Klein et al., 2014). Unfortunately psychological science has been facing a replication crisis in which replication rate is low. However, there are researchers who argue that replication rate may be underestimated and that it is much higher that what is commonly reported.
In the Open Science Collaboration (OSC, 2015), researchers conducted replications 100 experimental and correlational studies. Their findings were disheartening in that out of the ninety-seven percent of original studies with significant results, only 36% of the replicated studies resulted in p values less than .05. Moreover the effect size of the replicated studies was only about half of that of the original studies.
What are the potential reasons for the lack of replicability?
The low replicability rate maybe related to factors such as using small sample sizes, p-hacking (increasing sample size until the p-value becomes significant), the file-drawer phenomenon (not presenting results with null findings) and publication bias (negative results have a low probability of getting published) that may lead to high rate false positives. The publish or perish phenomenon present in the academia world for example put undue pressure on scientist to produce publications that have positive results at a very high rate.
The bias against null results further compromises the integrity of the field, as researchers are aware that their work would have a low probability of publication. This may partially explain selective reporting for example. Selective reporting refers to the tendency to report only the results that were significant. That is, if even if a study has multiple hypotheses only the ones that generated significant results will be presented.
Why is this a problem?
Replication increases precision of effect size, establishes generalizability of effect, and replication studies that do not yield same results as the original studies provide information on the necessary conditions for the expected effects (Nosek & Lakens, 2014). Given that one of the goals of research is to yield results that are externally valid or generalizable across individuals and contexts, it can lead to inaccurate implementation of results if they do not get replicated in other studies. Failure to identify the necessary conditions that a phenomenon occurs in can lead to overgeneralization of results (Henry, MacLeod, Phillips, & Crawford, 2004; MLP; Klein et al., 2014)
What could be leading to the low replicability statistics?
Gilbert and colleagues (2016) present potential reasons for the low replicability of original studies, specifically those conducted in the open science collaboration. They argue that one potential explanation could be an error in the methodologies that replications use rather than the effects of the original studies not being reproducible. OSC had multiple sources of error in their data including research method infidelities that were taken into account when reporting the results (Gilbert, King, Pettigrew, & Wilson, 2016). Moreover they argue that the OSC studies were underpowered. Finally, Gilbert and colleagues (2016) state that the OSC replication studies may have been biased toward failure; that is, they were expecting low replicability of the original studies and their methodologies and results were utilized to confirm that bias.
So, Is the field of psychology facing replication crisis? Lynch and colleagues (2005) argue that non-replication does not imply falsehood. Given that exact replication of studies is impossible and even unnecessary, replication can fail due to differences in operational variables and the construct (concept) itself.