The COVID-19 pandemic helped popularize a hair-raising phrase: “do your own research.” Portrayed as a call for self-empowerment, it became the tell-tale sign of someone who didn’t trust public health authorities, an invitation to go digging for “alternative facts."
But “doing your own research” doesn’t have to stink of conspiracy ideation: non-scientists have been throwing themselves into the scientific literature for a while, out of a need to understand a new diagnosis or out of sheer curiosity, and it is a process that has been greatly facilitated by the Internet.
Telling non-experts to stay away from papers can be paternalistic. After all, so much of science is funded by citizens. Everyone deserves access to its findings. But developing a nose for bad studies or scientific papers that are simply irrelevant to our lives takes time and expertise. There is a reason why researchers spend years in university honing their craft and deepening their understanding.
In the end, there is an imperfect process that non-experts can use to try to gauge the competence and relevance of a study. I will focus here on biomedical papers, as this is what I know: studies on the application of biology and biochemistry to health and medicine, focusing on understanding disease and its causes, and testing new ways to diagnose and treat it.
A caveat: some scientific papers are open access, meaning that anyone with an Internet connection can read them for free. Others are paywalled on journal websites and require a one-time payment or a subscription which university libraries tend to offer. There are ways around that, which have not made science journals happy.
Let’s dig in.
Is it even a study?
The first step in gauging the relevance of a paper is to figure out the kind of paper it is. I am still surprised when a reader sends me an opinion piece or a letter to the editor, labelling it “a study,” as if it provides solid evidence for their position. Scientific journals publish more than just studies. A doctor’s opinion, though published in The New England Journal of Medicine, can enrich a discussion, but it should not be mistaken for a clinical trial done on 5,000 people. So, tip #1: is it even a study?
Then comes the zoological question: what animal (or part thereof) was tested here? If the answer is “humans,” then the study is more likely to be relevant to us. Very often, however, experiments are conducted in animal models, like mice, hamsters, and worms, or even in cells grown in culture flasks, what are known as in vitro studies (meaning “in glass” or in laboratory glassware, although nowadays this is more likely to be plastic). There is a reason why a very loud and successful Twitter account screams “IN MICE” whenever it shares a media story about a new study that failed to label its animal origin. Transparency here is important.
I want to make it clear that a study in mice is not a “bad study;” too often, however, people search the scientific literature to find answers that are relevant to their lives. We are not large mice, nor are we simply piles of cells. In vitro studies and experiments done in laboratory animals are very useful first steps, but they are not sufficient to endorse a new treatment in humans, for example.
Next comes the question: was this an observational study or an experimental one? As the names suggest, in the former, scientists watch what naturally happens; in the latter, they intervene. An observational study, for example, would have scientists look at how old are the people who eat a lot of fruits and vegetables and those who don’t when they die. The problem here is that any difference that is reported may also be explained by other dissimilarities between the two groups. The people who eat more fruits and vegetables may be better off financially, for instance, and may have access to better healthcare or less stressful jobs. Squeezing strong conclusions out of a single observational study is rarely warranted.
Clinical trials are examples of experimental studies, but even there, not all trials are as relevant as we might think. Typically, there are three phases to clinical trials. Phase I trials are concerned with safety. They give a small number of human participants increasing doses of the drug being tested, for example, and watch for side effects. A successful phase I trial does not automatically mean a drug is efficacious. For that, we need a phase II trial, which looks at efficacy, and a phase III trial, in which the new drug is compared to standard of care or placebo. Only when an intervention successfully passes through this gauntlet can the proper committees consider its approval.
This initial classification we make of a paper—expert opinion vs. actual study; in vitro work vs. animal experimentation vs. human study; observational vs. experimental evidence—can be looked up on a pyramid of evidence to see whether it lives near the bottom or near the top. Pyramids of evidence (plural, as there are many models) are attempts at sorting weaker forms of scientific evidence, like an individual expert’s opinion, toward the bottom of a pyramid and the best forms, like a meta-analysis, at the top. They are not perfect and have been criticized. Indeed, a strong clinical trial is actually superior to a bad meta-analysis of many, poorly done studies, even though the meta-analysis is often placed at the very apex of the pyramid. Also, sometimes a clinical trial would be unethical, so observational research is what we have to rely on. Judgement is required, but these pyramids should help a non-expert figure out what type of study tends to provide more reliable evidence.
Now that you know what kind of study it is, let’s scan the paper for red flags.
Turning the dial down
Conflicts of interest should be looked at. On the first page of a paper, you can typically see the affiliations of the authors. If they all work for a company that sells a supplement, and the paper supposedly proves the supplement is beneficial, it’s time to be a bit skeptical. There is also a section at the very end of the paper that lists the conflicts of interest the authors disclosed, such as receiving funding from or sitting on the board of a pharmaceutical company.
Nuance is important, though. I don’t think of any conflict of interest, or any single one of the other red flags described below, as a reason to toss a paper out. Conflicts of interest are bound to arise, especially as public universities encourage researchers to seek out private financing. Rather, I think of trust in a study as a dial. When I see a red flag, I turn the dial down a bit. If I see enough red flags, the dial is turned all the way down to zero. If I see signs of trust, I crank the dial higher up.
Conflicts of interest can also be harder to see. If a systematic review (meaning a review of all the studies done to answer a specific question) or a meta-analysis (meaning the combination of the results of multiple studies to arrive at our best estimate) is done by a group of people who strongly benefit from their review or analysis churning out a positive result, we should also have our fingers on the dial. A meta-analysis of studies done on a questionable intervention that flies in the face of science done by its very practitioners should arouse suspicion.
Another important question: where was this study published? Some journals are predatory: they are set up for the sole purpose of making money and publishing anything. Identifying these can be challenging, but we should keep in mind that the mere existence of a journal is no proof of its legitimacy. Was the study published in a journal that specializes in a pseudoscience, like homeopathy or energy healing? These fake sciences have a long history of conducting loose studies that favour chance results and steering clear of more robust study designs. Turn that dial down.
Was the paper peer reviewed, meaning was it published by a journal whose editor sent the manuscript to fellow scientists in the field for a critical evaluation prior to publication? Peer review is not perfect, but it’s better than nothing. The pandemic saw an explosion in pre-prints, manuscripts that have not been peer reviewed, in an effort to speed up the broadcasting of discoveries. With a pre-print, turn that dial down.
Is it a pilot study? Research costs money. Often, a researcher will trial an idea with a small pilot study and publish its results. The phrase “pilot study” should be in the title and abstract. Pilot studies tend to be done in too few people to be relevant. Turn that dial down.
Then we get into the meat of the paper: an introduction that sets up what we know and which questions are to be answered; a materials and methods section detailing what was done; the results; and a discussion, which weaves in interpretations. Is the sample size really small? It can be hard to gauge, but if the study was done on five people, you’ll want to turn that confidence dial down. If an intervention is tested (such as giving people a drug or supplement), was there a control group that did not receive it? If not, turn the dial way down. Were participants randomized to the intervention and control groups? If not, the dial goes down. Studies are often designed to answer a single question, but researchers tend to ask a constellation of secondary questions as well. Watch out for studies where that main question (often called the primary outcome or endpoint) is negative, but where one of many secondary questions turns up positive and is sold as a really important finding. That’s not what the study was originally designed to answer, and confirmation of this result will be needed.
Remember that the more tests a scientist conducts, the more likely they are to get a falsely positive result. There are ways to adjust for this (with things like Bonferroni corrections), but they are not perfect. Another thing to keep in mind: are the results clinically significant? A new supplement to help people with insomnia might deliver statistically significant results, but are they clinically relevant if they only help people fall asleep on average ten minutes faster? Turn that dial down.
Finally, the limitations of a study can usually be found toward the end of the discussion section. Researchers are encouraged to point out the weaknesses in their study: too few participants, a sample of convenience that may not reflect reality, a follow-up with the participants that should have been longer, etc. And as we finish reading that paper, it is important to remember that, despite what the authors wrote and how this paper was sold to you by whoever cited it on social media, science usually grows in tiny steps, not massive revolutions. One study is unlikely to shake an entire field’s foundation.
Know your limits
Assessing the worth and rigour of a scientific paper is hard. Even expert scientists have difficulty doing it in their own field, which is why, in part, retractions do happen: peer reviewers failed to spot important reasons why the paper should never have been published in the first place. The ability that researchers have to abuse the choices they have to make in collecting and analyzing data—what is called p-hacking—makes an assessment of their work even more challenging.
As we read the scientific literature to “do our own research,” we also need to guard against motivated reasoning. We can be very kind to a bad study whose conclusion we find comforting. We have to be open-minded but, as Professor Walter Kotschnig said, not so open-minded that “your brains fall out.”
Critically appraising a study is an activity best done in groups, such as journal clubs in universities. This is why a site like PubPeer, that publishes comments on a paper after its publication, is so useful, and I would invite you to search for a paper’s title on PubPeer to see what other scientists have said about it. Some scientists have also made it their mandate to call attention to bad studies on social media. This is not an exhaustive list, but I benefit from the expertise of data detectives such as Elisabeth Bik, Nick Brown, and Gideon Meyerowitz-Katz. It’s also worth keeping an eye on Retraction Watch, which unearths stories of scientific misconduct and incompetence.
Doing our own research requires keeping our fingers on the trust and relevance dials and tuning them up or down as needed. It also benefits from us admitting the limits of our expertise and lending an ear to what the experts have to say.
- Figuring out the trustworthiness and relevance of a scientific paper first requires identifying what kind of study it is (if it even is a study), which helps us know if the evidence is likely to be strong or weak
- There are red flags that should reduce our trust in the evidence presented in a paper, such as the absence of a control group, a very small number of research participants, and a spotlighting of a positive secondary result when the main outcome the study was designed to look at was negative
- Evaluating the worth of a paper can be helped by having many scientists look at it, which is why data detectives who spend their spare time denouncing bad papers are helpful, as well as websites like PubPeer and Retraction Watch