The Power of Statistics: A Myth of Objectivity

THURSDAY, 5 MAY 2022

For decades, scientists have applied statistical analyses to provide objectivity to their research. Such objectivity is viewed as a critical attribute of good science, given that any factual scientific statement must be free of personal opinions and biases. However, with the onset of the ‘reproducibility crisis’, scientists and statisticians have been forced to discuss flaws in current approaches to statistical inference and to consider the consequences of demanding absolute objectivity from science.

The reproducibility crisis refers to the fact that the results of many scientific studies are difficult or impossible to reproduce. In 2016, a survey carried out by Nature found that over 70% of scientists had been unable to reproduce the experimental work of others and that over 50% of scientists had even failed to reproduce their own work fully. This poses a dilemma as the credibility of subsequent theories and findings is also called into question. Hence, as stated by Ioannidis in 2005, “false findings may be the majority or even vast majority of published research claims”. Whilst others feel that this statement is extreme, there is little doubt that this crisis exists and that the incorrect application and interpretation of statistics is one of its main drivers.

Statistics can be simply defined as “the study and presentation of data”. Researchers collect data and apply statistical analyses to interpret it, hoping to draw meaningful conclusions. Unfortunately, where the results of analyses used to be presented beside other evidence to support a rational argument, statistics are now relied upon as the primary determinant of most scientific conclusions. Ultimately, this over-simplification of statistical inference is driven by a desire and perceived need for objectivity. Ironically, however, statistics can never provide an objective answer to any scientific question as the processes by which they are derived and interpreted are themselves subjective.

Despite this, researchers often appear to have only one goal, to establish statistical significance. That is, to carry out an analysis and get back a p-value < 0.05 or < 0.01. This p-value means that the probability of collecting such data would be less than 5% or 1% if there were no true effect. Scientists will then argue that this statistical significance is enough to draw an objective conclusion. Unfortunately, not only does a p-value threshold represent a subjective cut-off for significance, but it also ignores the specific context of any scientific question. The effect size, for example, is completely disregarded.

Perhaps a common misconception within statistical inference is that p-values determine the presence and size of an effect, that is, a relationship between two factors of interest. In reality, a p-value greater than the significance threshold does mean that there is no effect but instead prevents the conclusion that an effect is not due to chance. Similarly, a smaller p-value does not indicate a greater effect size or a stronger relationship. Yet determining the size of a significant effect is still necessary to determine whether something interesting occurs within a system. Hence, statistics should be viewed as a tool to generate rational arguments instead of a shortcut to a definitive statement.

With that said, the question remains as to why we shy away from subjectivity. In 2009, Davin Weinberger coined the idea that “transparency is the new objectivity”, warning that any claim of objectivity should be supported by sources, disagreements, and personal assumptions. In 2017, this idea was echoed by Gelman and Hennig, who suggested that ‘objective’ should be replaced by ‘transparency’, ‘consensus’, ‘impartiality’ and ‘correspondence to observable reality’.

Complete transparency would require the publication of statistical results alongside information relating to how the data were analysed and what assumptions were made. It is also important that both statistically significant and non-significant data are published to allow readers to determine what they deem an ‘interesting result’. Unfortunately, such a paradigm shift first necessitates a substantial change to the system in which science operates.

Currently, scientific journals usually only publish papers in which significance has been demonstrated. Given that publication metrics often determine a scientist's credentials, it is unsurprising that there is pressure to uncover significant results. Not only does this discourage researchers from including non-significant results in their work, but the pressure to perform has been seen to promote the act of data dredging. This process sees scientists exhaustively analyse their data using any possible method until they find a significant p-value. Thus, the actions of journals will play a key role in shaping a new societal norm concerning statistical inference in science. It has been proposed that one way in which journals could contribute is by increasing the number of editors with statistical expertise.

However, journals are not alone in this endeavour. Governments and funding bodies must also use their influence to improve science responsibly. However, the challenge here comes from establishing implementable policies rather than relying upon recommendations that cannot be easily monitored. An obvious starting point would be integrating the appropriate and transparent use of statistics into current frameworks to monitor scientific integrity in research.

While the problems of statistical inference and poor reproducibility in science are increasingly discussed, solutions remain up for debate. Nevertheless, one thing has become clear – scientists must move away from the unachievable goal of objectivity.

Charlotte Hutchings is a PhD researcher in biochemistry.