It’s great that newspapers carry the number of science items they do but, as regular readers will know, there’s nothing like the typical cancer headline to get me squawking ‘No it isn’t!” Step forward The Independent with the latest: “Major breakthrough in cancer care … groundbreaking international collaboration …”
Let’s be clear: the subject usually is interesting. In this case it certainly is and it deserves better headlines.
So what has happened?
A big flurry of research papers has just emerged from a joint project of the National Cancer Institute and the National Human Genome Research Institute to make something called The Cancer Genome Atlas (TCGA). This massive initiative is, of course, an offspring of the Human Genome Project, the first full sequencing of the 3,000 million base-pairs of human DNA, completed in 2003. The intervening 15 years have seen a technical revolution, perhaps unparalled in the history of science, such that now genomes can be sequenced in an hour or two for a few hundred dollars. TCGA began in 2006 with the aim of providing a genetic data-base for three cancer types: lung, ovarian, and glioblastoma. Such was its success that it soon expanded to a vast, comprehensive dataset of more than 11,000 cases across 33 tumor types, describing the variety of molecular changes that drive the cancers. The upshot is now being called the Pan-Cancer Atlas — PanCan Atlas, for short.
What do we need to know?
Fortunately not much of the humungous amounts of detail but the scheme below gives an inkling of the scale of this wonderful endeavour — it’s from a short, very readable summary by Carolyn Hutter and Jean Claude Zenklusen.
The first point is obvious: sequencing 11,000 paired tumour and normal tissue samples produced mind-boggling masses of data. 2.5 petabytes, in fact. If you have to think twice about your gigas and teras, 1 PB = 1,000,000,000,000,000 B, i.e. 1015 B or 1000 terabytes. A PB is sometimes called, apparently, a quadrillion — and, as the scheme helpfully notes, you’d need over 200,000 DVDs to store it.
The figure of seven data types refers to the variety of information accumulated in these studies (e.g., mutations that affect genes, epigenetic changes (DNA methylation), RNA and protein expression, duplication or deletion of stretches of DNA (copy number variation), etc.
After which it’s worth pausing for a moment to contemplate the effort and organization involved in collecting 11,000 paired samples, sequencing them and analyzing the output. It’s true that sequencing itself is now fairly routine, but that’s still an awful lot of experiments. But think for even longer about what’s gone into making some kind of sense of the monstrous amount of data generated.
And it’s important because?
The findings confirm a trend that has begun to emerge over the last few years, namely that the classification of cancers is being redefined. Traditionally they have been grouped on the basis of the tissue of origin (breast, bowel, etc.) but this will gradually be replaced by genetic grouping, reflecting the fact that seemingly unrelated cancers can be driven by common pathways.
The most encouraging thing to come out of the genetic changes driving these tumours is that for about half of them potential treatments are already available. That’s quite a surprise but it doesn’t mean that hitting those targets will actually work as anti-cancer strategies. Nevertheless, it’s a cheering point that the output of this phenomenal project may, as one of the papers noted, serve as a launching pad for real benefit in the not too distant future.
What should science journalists do to stop upsetting me?
Read the papers they comment on rather than simply relying on press releases, never use the words ‘breakthrough’ or ‘groundbreaking’ and grasp the point that science proceeds in very small steps, not always forward, governed by available methods. This work is quite staggering for it is on a scale that is close to unimaginable and, in the end, it will lead to treatments that will affect the lives of almost everyone — but it is just another example of science doing what science does.
Hutter, C. and Zenklusen, J.C. (2018). The Cancer Genome Atlas: Creating Lasting Value beyond Its Data. Cell 173, 283–285.
Hoadley, K.A. et al. (2018). Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer. Cell 173, 291–304.
Hoadley, K.A. et al. (2014). Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin. Cell 158, 929–944.