Junk Store Opened: Millions of Bargains

Many moons ago, when I was nobbut a lad and sequencing the human genome was 30 years away, we nevertheless knew that there was something very odd about our genetic code. We knew there were three thousand million base pairs but that only a tiny fraction of that (a few percent) was necessary to encode all the proteins found in our bodies. What was the rest doing? As a sort of explanation two terms came into vogue: ‘selfish DNA’ (meaning stuff that just reproduced itself because it was there) and ‘junk DNA’ meaning everything that didn’t code for proteins.

One of the few predictions I’ve made that turned out to be right was embodied in a refusal to use either term – and if there’s anyone who can recall anything of my supervisions (that is, what the rest of the world calls tutorials) they might back me up on this. It’s true that, as time went by, we increasingly appreciated that non-coding DNA is important in controlling whether individual genes are switched on or off – that is, whether they make RNA and from that protein, according to sequences embedded in the DNA, or whether they make nothing.

Ewen's scheme

However, getting a real grip on what all that seemingly spare DNA is doing has turned out to be so challenging that it is only now, 10 years after the first human sequence was produced, that we have hard data to go on. That unveiling has come from a follow-up called the ENCODE (Encyclopedia Of DNA Elements) programme – an international cooperative of extraordinary scale, with its heart at The Sanger Centre just outside Cambridge and with its head one Ewan Birney. Birney is a computational biologist – a new breed of scientist whose strength lies in bringing to bear methods that make sense of the vast amounts of data generated by current DNA sequencing techniques.

A glance at the summary of what ENCODE involved suggests that, in the unlikely event of his getting bored with science, Birney would make a pretty good fist as Secretary-General of the United Nations. I’d like to try and persuade you that scientists are wonderful and lofty forms of our species but, alas, in fact they are generally ambitious, driven, self-centred, ruthless and intolerant. To make matters worse, quite a few are very smart. To get nearly 500 of the world’s best to sink self-interest and focus on one aim in a multi-national, multi-lingual, multi-racial collaboration that requires rigorous assessment of data and in which the scope for individual glory is almost negligible might well qualify as the greatest feat of man-management in the history of the human race.

So Birney’s a star but what did the world get for its money? The short answer is that we now know that, far from being ‘junk’, most of our DNA – over 80% – does something useful. Whilst only 1.6% carries protein-coding genes, much of the rest is important in regulating the activity of proteins generated from coding genes. The regulatory activity comes in the form of RNA: as we noted just now, DNA makes RNA makes protein – and the DNA sequences involved are called genes. But there’s a second class of genes, ones that transcribe DNA sequence into RNA – but then things stop. The RNA doesn’t go on to direct the making of proteins but rather goes off and regulates well, almost everything. So this second group are non-coding genes – because they don’t ‘make’ proteins.

How does the RNA of non-coding genes work? Well, in essence by sticking to other RNAs and to proteins themselves. What ENCODE has revealed is a panoply of types of RNA that comes in a wide range of sizes and has a finger in almost every bit of the cellular pie. So these varied RNAs act as cellular controllers at many levels and because cancers result from the subversion of normal control you would correctly guess that mutations in non-coding genes can be every bit as important as those that affect protein function directly.

Does this help in dealing with cancer and are there any bargains in the junk store? The short-term answers are ‘no’ and ‘lots – in theory’. As units of this army of RNAs help to control how we work normally, they also can go wrong – become mutated – so we have a new set of potential players in the cancer game. Detecting when individual RNAs join in won’t be so difficult: the real cancer challenge now is not target-spotting, it’s making the bullets to hit the targets.


Maher, B. (2012). ENCODE: The human encyclopaedia. Nature 489, 46-48.

Birney, E. (2012). The making of ENCODE: Lessons for big-data projects. Nature 489, 49-51.