Unfortunately, the quality of statistical analyses in most medical research is quite poor, and the impact of this on human health is undoubtedly large due to the erroneous conclusions reached. Here are a few of the reasons for these poor-quality analyses and a few of the most common errors.
Reasons medical statistics are poor quality:
Doctors are usually not experts in statistics (nor should they always be! they have other expertise), and are often intimidated by complex statistical methods. This means that medical journals – including the best ones – insist on statistical methods that will be easily understood by their readership, even when this means that the analyses are not state of the art. I know of no other field in which this is the case. Fields including economics, engineering, physics, ecology/evolution, demography, and sociology all use much more sophisticated statistical approaches than is normal in medical research. A good example of this is the near-complete absence of Bayesian analyses in medical journals.
Doctors need to make clear decisions in their practice, e.g. to treat or not treat. Their disciplinary training thus encourages them to take a black-and-white view of the world. This leads to difficulties correctly interpreting subtle results. For example, a randomized controlled trial of a new medication may not show a significant increase in risk of a side effect (p=0.08), but this is more likely to be a reflection of insufficient statistical power than a lack of the side effect. Such studies are not powered to detect rare events, but this does not mean that any case where p > 0.05 means there is no risk of the side effect!
Most of the variables in medical research are conceived of to correspond to the clinical reality of doctors. For example, diabetes might be diagnosed with the 126 mg/dl blood glucose threshold mentioned in the Our statistical approach section. These variables are rarely the best ones to model the true underlying processes.
In most other fields, a single researcher is an expert in both the subject matter and the statistics, and this means that the choice of analysis is made in light of the research question. In medical research, the statistics are mostly farmed out to biostatisticians. Biostatisticians are excellent at statistics, but are often less invested than the first author in getting every detail right, and don’t have full control of the study from the beginning. Their priority is often methodological developments for biostatistics journals. Because a different statistical test asks a different scientific question, this means that many medical studies end up answering questions slightly different from what they intended to.
Many pharmaceutical companies and others have a large stake in the outcome of medical research, and this has created a large potential for biased results. As a result, many rules have been put in place to try to prevent manipulation of the research system. These rules include pre-specification of almost all aspects of statistical analysis of randomized controlled trials and some other types of research. Unfortunately, these rules are not sufficient to prevent bias, but have the side effect of preventing many of the most creative and interesting analyses that could be conducted with the data sets.
Following the data is sometimes called “fishing” in a pejorative sense, but this is unjustified. There are risks in fishing that can be taken into account during interpretation, but preventing fishing is a sure-fire way to ensure that the data never tell us anything we didn’t already know. To avoid problems that can arise through fishing, it is sufficient to (a) consider thoroughly the risk of false positives due to multiple testing, and (b) replicate, either in independent data sets or training-test subsets of the original data set.
Most doctors and medical researchers take it for granted that RCTs are the gold standard of evidence. However, the theories of a heliocentric solar system, evolution through natural selection, and global warming are all based largely or exclusively on observational rather than experimental data, and all three of these theories are much more widely accepted than any medical conclusion arrived at through RCTs. This discrepancy arises because medical research needs to generate fast results to guide clinicians on what to do now, whereas basic science is slow and involves decades of back and forth before solid proof is reached. In basic science, there is rarely a single definitive study that changes a paradigm; rather, there is a slow accumulation of many types of evidence. Medical research cannot take this time. While it is true that in some contexts RCTs are the preferred methodology, RCTs are very expensive and answer only very narrow questions. If the answer is context-dependent (as is often the case), results of RCTs can even be misleading. A combination of RCTs, observational studies, and studies based on first principles (e.g., biology) are necessary to arrive at a high degree of confidence.
Common errors in medical statistics:
There are very few cases in which a continuous variable (such as age or socioeconomic status) should be cut up into discrete classes, as doing so involves a substantial loss of information. An undergrad intern in this lab, Jean-Louis Barnwell-Ménard, showed that such practices can increase the false positive rates for studies from 5% to 100% under many conditions. Nonetheless, most medical studies categorize continuous variables for no apparent reason other than custom. BMI is one of the most problematic variables in this context.
Does chemical X increase cancer rates? A typical study might examine 8 common cancer types with small sample sizes and incorrectly conclude that, because no type of cancer was individually associated with chemical X, there is no relationship. However, the impact might depend on what type of cancer. There are plenty of statistical approaches available to combine cancer types into a single analysis without ignoring the particularities of each type. Such considerations are rarely present in medical research.
It is common to see medical research articles that suppose that any p greater than 0.05 implies the absence of an effect, when in fact it merely indicates doubt about the presence of the effect.
This is often related to categorization. Most medical research uses clinical variables without sufficient consideration of potential underlying processes.