It’s widely known that cancers reflect cellular control going awry as a result of change in our genetic material — DNA. Beyond surgery and radiotherapy, cancer treatment uses drugs that either kill cells non-specifically or target mutated proteins. The latter give specificity for tumour cells but currently there are few such drugs. For mutations that inactivate tumour suppressor genes we have as yet no treatment, although one hope is that we will be able to replace these damaged genes with normal versions.
But there’s a problem with drug or gene replacement tactics for any genetic disease because, fundamentally, we don’t understand what we’re doing. Ideally we’d introduce the offending mutation into humans, look at its effect then follow up with our therapy and track what happens. We can’t do that, of course, and, although we can do equivalent experiments in model organisms like the fruit fly or the mouse, models are not the same as humans.
A quite different approach is to note that under the cloak of evolution Nature has been doing these experiments for us. That’s to say, natural human genetic variation has given rise to a vast array of mutations across the population and all we need to do is find them and see what effect they have had on the biology of the individual. The “all” in the previous sentence is a weighty word because to sift out these variants requires DNA sequencing on a grand scale. Fortunately, as followers of this blog will know, such power in the shape of massively parallel sequencing is now available (see Family Tree of Breast Cancer).
The Genome Aggregation Database (gnomAD) has just published (May 2020) its latest efforts in the shape of DNA sequences of 125,748 exomes (protein-coding DNA) and 15,708 whole human genomes. It’s a simply staggering achievement, the aim being to find out what the differences between our individual genetic codes mean in terms of our health. These variants are the differences that make individual genomes unique and they include single nucleotide polymorphisms (‘SNPs’, pronounced ‘snips’ — one nucleotide (base) differing from the reference DNA sequence), insertions (additional nucleotides inserted in a DNA sequence), deletions (missing nucleotides), substitutions (multiple nucleotides altered relative to the reference sequence) and structural variants (large sections of a chromosome or entire chromosomes duplicated, deleted or rearranged).
Cataloguing genetic variation in humans. The genome aggregation database (gnomAD) includes 15,708 whole-genome sequences and 125,748 exomes and the study catalogued the complete range of naturally occurring DNA variants.
Representation of 141,456 human DNA sequences. This way of presenting a vast amount of data is called UMAP (Uniform Manifold Approximation and Projection): the sequence of each individual is a dot, the individuals comprising six global and eight sub-continental ancestries. The pseudo colours mark clusters of related DNA sequence. Note that this ‘map’ does not relate to location: it is merely a visual representation of a lot of data. The horizontal bar indicates the number of individuals by population and sub-population in the gnomAD study with the same colour code as in the upper figure. From Karczewski et al. 2020.
It turns out that there are rather a lot of them. After filtering to minimise the errors that come with high-throughput sequencing, nearly 15 million high-quality variants were identified in the exome dataset and 230 million in the whole genome screens. In the protein-coding sequences alone there were over 400,000 variants predicted to block the function of the protein.
Where is this massive study taking us?
These naturally arising mutations provide a potentially valuable window on our genomes that we can look through to answer our title question: which of our genes are essential for survival and which can we manage without?
What gnomAD did was to construct a ‘spectrum of tolerance’ for each protein-coding gene in the human genome. This is potentially important because, for example, if a gene that is not essential for life acquires a disease-causing mutation, blocking the gene might cure the disease without killing the patient.
The clearest example of using natural in vivo models of human gene inactivation to inform therapeutic strategy has come from the LRRK2 gene. Variants in LRRK2 can change the activity of the protein it encodes so as to significantly increase the risk of Parkinson’s disease. From the gnomAD screen it turned out that variants in LRRK2 that blocked its normal activity were not strongly associated with evident disease. In other words, we can do without LRRK — and if it picks up a harmful mutation we can try to knock it out, secure in the knowledge that it’s not essential for survival.
So thank you Nature for doing the experiment we can’t do — tinkering with our own genes to see what happens.