It’s often said that every family has its secret — Uncle Fred’s fondness for the horses, Cousin Bertha’s promiscuity, etc. — whatever it is that ‘we don’t talk about.’ If that’s true the scientific community is no exception. For us the unutterable is ‘reproducibility’ — meaning you’ve done an experiment, new in some way, but the key questions are: ‘Can you do it again with the same result?’ and, even more important: ‘Can someone else repeat it?’
Once upon a time in my lab we had a standing joke: whoever came bounding along shouting about a new result would be asked ‘How reproducible is it?’ Reply: ‘100%!’ Next question: ‘How often have you done the experiment?’ Reply: ‘Once!!’ Boom, boom!!!
Not a rib-tickler but it did point to the knottiest problem in biological science namely that, when you start tinkering with living systems, you’re never fully in control.
How big is the problem?
But, as dear old Bob once put it, The Times They Are a-Changin’. Our problem was highlighted in the cancer field by the Californian biotechnology company Amgen who announced in 2012 that, over a 10 year period, they’d selected 53 ‘landmark’ cancer papers — and failed to replicate 47 of them! Around the same time a study by Bayer HealthCare found that only about one in four of published studies on potential drug targets was sufficiently strong to be worth following up.
More recently the leading science journal Nature found that almost three quarters of over 1,500 research scientists surveyed had tried to replicate someone else’s experiment and failed. It gets worse! More than half of them owned up to having failed to repeat one of their own experiments! Hooray! We have a result!! If you can’t repeat your own experiment either you’re sloppy (i.e., you haven’t done exactly what you did the first time) or you’ve highlighted the biological variability in the system you’re studying.
If you want an example of biological variation you need look no further than human conception and live births. Somewhere in excess of 50% of fertilized human eggs don’t make it to birth. In other words, if you do a ‘thought experiment’ in which a group of women carry some sort of gadget that flags when one of their eggs is fertilized, only between one in two and one in five of those ‘flagged’ will actually produce an offspring.
However you look at it, whether it’s biological variation, incompetence or plain fraud, we have a problem and Nature’s survey revealed that, to their credit, the majority of scientists agreed that there was a ‘significant crisis.’
The results of the survey by Nature from Baker 2016.
Predictably, but disturbingly for us in the biomedical fields, the greatest confidence in published results was shown by the chemists and physicists whereas only 50% of data in medicine were thought to be reproducible. Oh dear!
Tackling the problem in cancer
The idea was to take 50 cancer papers published in leading journals and to attempt to replicate their key findings in the most rigorous manner. The number was reduced from 50 to 29 papers due to financial constraints and other factors but the aim remains to find out what affects the robustness of experimental results in preclinical cancer research.
It is a formidable project. Before even starting an experiment, the replication teams devised detailed plans, based on the original reports and, as the result of many hours effort, came up with a strategy that both they and the original experimenters considered was the best they could carry out. The protocols were then peer reviewed and the replication plans were published before the studies began.
Just to give an idea of the effort involved, a typical replication plan comprises many pages of detailed protocols describing reagents, cells and (where appropriate) animals to be used, statistical analysis and any other relevant items, as well as incorporating the input from referees.
The whole endeavor is, in short, a demonstration of scientific practice at its best.
How are we doing?
The critical numbers are that 6 of the 10 replications ‘substantially reproduced’ the original findings, although in 4 of these some results could not be replicated. In 4 of the 10 replications the original findings were not reproduced.
The first thing to say is that a 60% rate of ‘substantial’ successful replication is a major improvement on the 11% to 25% obtained by the biotech companies. The most obvious explanation is that the massive, collaborative effort to tighten up the experimental procedures paid dividends.
The second point to note is that even when a replication attempt fails it cannot be concluded that the original data were wrong. The discrepancy may merely have highlighted how fiendishly tricky biological experimentation can be. The problem is that with living systems, be they cells or animals, you never have complete control. Ask anyone who has a cat.
More likely, however, than biological variation as a cause of discrepancies between experiments is human variation, aka personal bias.
This may come as a surprise to some but, rather than being ‘black and white’ much of scientific interpretation is subjective. Try as I might, can I be sure that in, say, counting stained cells I don’t include some marginal ones because that fits my model? OK: the solution to that is get someone else to do the count ‘blind’ — but I suspect that quite often that’s not done. However, there are even trickier matters. I do half a dozen repeats of an experiment and one gives an odd result (i.e., differs from the other five). Only I can really go through everything involved (from length of coffee breaks to changes in reagent stocks) and decide if there are strong enough grounds to ignore it. I do my best to avoid personal bias but … scientists are only human (fact!).
A closer look at failure
One of the failed replications is a particularly useful illustration for this blog. The replication study tackled a 2012 report that bacterial infection (specifically a bacterium, Fusobacterium nucleatum, that occurs naturally in the human oral cavity) is present in human colon cancers but not in non-cancerous colon tissues. It hit the rocks. They couldn’t detect F. nucleatum in most tumour samples and, when they did, the number of bugs was not significantly different to that in adjacent normal tissue.
Quite by chance, a few months ago, I described some more recent research into this topic in Hitchhiker or Driver?
I thought this was interesting because it showed that not only was F. nucleatum part of the microbiome of bowel cancer but that when tumour cells spread to distant sites (i.e., underwent metastasis) the bugs went along for the ride — raising the key question of whether they actually helped the critical event of metastasis.
So this latest study was consistent with the earlier result and extended it — indeed they actually showed that antibiotic treatment to kill the bugs slowed the growth of human tumour cells in mice.
Where does that leave us?
Well, helpfully, the Reproducibility Project also solicits comments from independent experts to help us make sense of what’s going on. Step forward Cynthia Sears of The Johns Hopkins Hospital. She takes the view that, although the Replication Study didn’t reproduce the original results, the fact that numerous studies have already found an association between F. nucleatum and human colon cancer means there probably is one — consistent with the work described in Hitchhiker or Driver?
One possible explanation for the discrepancy is that the original report studied colon tissue pairs (i.e., tumour and tumour-adjacent tissues) from colon cancer patients but did not report possibly relevant factors like age, sex and ethnicity of patients. In contrast, the replication effort included samples from patients with cancer (tumour and adjacent tissue) and non-diseased control tissue samples from age, sex and ethnicity matched individuals.
So we now know, as Dr. Sears helpfully remarks, that the association between F. nucleatum bugs and human colon cancer is more complicated first appeared! Mmm. And, just in case you were in any doubt, she points out that we need to know more about the who (which Fusobacterium species: there are 12 of them known), the where (where in the colon, where in the world) and the how (the disease mechanisms).
Can we do better?
In the light of all that the obvious question is: what can we do about the number of pre-clinical studies that are difficult if not impossible to reproduce? Answer, I think: not much. Rather than defeatist this seems to me a realistic response. There’s no way we could put in place the rigorous scrutiny of the Reproducibility Project across even a fraction of cancer research projects. The best we can do is make researchers as aware as possible of the problems and encourage them towards the very best practices — and assume that, in the end, the solid results will emerge and the rest will fall by the wayside.
Looking at the sharp end, it’s worth noting that, if you accept that some of the variability in pre-clinical experiments is down to the biological variation we mentioned above, it would at least be consistent with the wide range of patient responses to some cancer treatments. The reason for that, as Cynthia Sears didn’t quite put it, is that we just don’t know enough about how the humans we’re tinkering with actually work.
Baker, M. (2016). Is There a Reproducibility Crisis? Nature 533, 452-454.
Jarvis, G.E. (2017). Early embryo mortality in natural human reproduction: What the data say [version 2; referees: 1 approved, 2 approved with reservations] F1000Research 2017, 5:2765 (doi: 10.12688/f1000research.8937.2).
Monya Baker & Elie Dolgin (2017). Cancer reproducibility project releases first results. Nature 541, 269–270. doi:10.1038/541269a.
Begley, C.G. and Ellis, L.M. (2012). Drug development: Raise standards for preclinical cancer research. Nature 483, 531–533.
Prinz,F., Schlange,T. and Asadullah, K. (2011). Believe it or not: how much can we rely on published data on potential drug targets? NatureRev. Drug Discov. 10, 712.