Category Archives: Articles by Exploreable (Ankur Chakravarthy)

Articles written by Ankur Chakravarthy

A very simple introduction to machine learning…

Machine learning quite simply refers to the use of algorithms to make predictions and solve problems involving properties of things and how they behave in response to an outcome.

The nature of problems

Properties themselves can take a variety of forms, sometimes they are categories – as in whether a person has a disease or has a higher education qualification, for instance; or they could be continuous – as in height or weight, as can outcomes – you could talk about whether someone is rich or poor based on having qualifications, or have a continuous outcome – such as overall income.

The outcomes that are subject to predictions are called response variables and measurements or groupings samples are put into are called features. Machine learning is in essence the practise of building models that can predict the response using rules that are fit on features.

Problems where the response variable consists of groups are called classification problems, because things are being classified. Where the outcome is continuous it is a regression problem. The kinds of classification or regression rules available depend on the nature of the data – some methods, like linear regression, can handle both categorical and continuous feature types, whereas some methods require things to have to be converted to continuous variables first; there simply is a bewildering array of models that can be applied.

The underlying assumption

The very effectiveness of machine learning is predicated on there being non-random patterns in the data however with respect to the measurements of the features being considered – if there is no link between the measurements of the features you’ve chosen to build a model and the problem you are trying to address you will see lousy performance.

Measuring performance and overfitting

Performance is often measured in terms of accuracy for classification problems – and if there are only two classes then measures like sensitivity and specificity are often used and in regression problems it is often measured by how much of the variation in the outcome is explained by a model and by how much of an error there is between the predicted outcome and the actual outcome.

It is important that performance be measured when overfitting is minimised. Overfitting refers to the fact that a model can fit your data too well, including inconsistencies, and learn to account for those present within datasets but not those outside. This can give you a spurious measure of how good your models are and make it look better than it actually is.

To account for this it is ideal to train on a proportion of the data and test using another proportion that the training process never included – either by leaving out a portion at the outset or using an altogether independent dataset; or using cross-validation, where a certain percent of samples is held out, the model is trained on the remaining data, and the held-out bit is used as a test sample, and this is done over and over again to get more accurate estimates of performance.

Where can I learn more about machine learning?

There is a very nice introductory LinkedIn tech talk here http://www.youtube.com/watch?v=wjTJVhmu1JM

And here is a full set of lectures that are a treat

and

I might do a set of posts looking at very particular applications in the next few months, until then, feel free to knock yourself out.

How HPV driven cancers get their mutations…

Hi there!

It’s been a long time since I last blogged, but that is because I’ve been swimming round in data, which has incidentally led to the findings that were published in this paper , which I will describe in this post.

HPV and the link to cancer.

HPV (Human Papillomaviruses) consist of a family of viruses that infect keratinocytes (skin cells) that line the outside of the body and the inner cavities – some of them just cause warts (and genital warts) but some of them are capable of driving the formation of cancer. These types, which are called “High-risk” strains, are the ones that are targeted for prevention by HPV vaccines.

High-risk HPV strains differ from low-risk strains in terms of cancer-causing ability because of proteins they make during their life cycle. Cells need to be actively dividing to permit HPV replication and in order to do this, the virus uses two proteins, called E6 and E7 , to block and degrade two proteins in human cells, called TP53 and pRb, which are two potent tumour suppressors (genes that prevent tumour formation).

Normally, E6 and E7 are only active for a brief while during the virus’ life cycle, which culminates in the production of more viruses that restart the cycle all over again, but before HPV driven cancers form something very strange happens; by complete accident the viral genome gets inserted and integrated into human DNA in infected cells, or infected cells get locked into a state where E6 and E7 are produced all the time. Suddenly you’ve got cells with TP53 and pRb off all the time, leaving behind cells that can grow abnormally. We see this when women have cervical scrapings looked at, and see “dysplastic” cells that have grown clumpy and abnormal.

However, these dysplastic cells are not cancerous – and haven’t acquired all the hallmarks of cancer. For this to happen there need to be additional changes to the DNA sequence (Mutations) of the genes in dysplastic cells that can confer those properties. Well known examples of things that cause mutations include tobacco smoke; for quite a while it had been an open question as to where HPV-driven tumours got their mutations from.

Suspicions are aroused: could the APOBEC family of proteins be making these mutations? 

One of my major research interests is to see what genes are expressed more and what genes are turned off in HPV driven cancers, and when defining a signature for these tumours I compared them to normal tissue and HPV negative tumours that arise in the same tissue (while cervical cancers usually all tend to be HPV-driven, there are head and neck cancers caused by HPV and those caused by chronic tobacco and alcohol exposure) and one of the genes that I found expressed at high levels in HPV-positive tumours was APOBEC3B.

APOBEC3B is one of many proteins of the APOBEC cytosine deaminases family. These act either on RNA or DNA when it is a single stranded state, and take part in the body’s immune response against viruses by messing up the RNA/DNA from the viruses. They work by changing cytosines, one of the four bases that make up DNA to uracil (a base that is normally only found in RNA) which then gets converted to a thymine or a guanine (two other bases that make up DNA); so if you get lots of these changes in viral DNA you fundamentally break them so they can’t do any of the things they usually do, and it had been known for a while that you could find HPV with messed up DNA in precancerous lesions with patterns of change associated with APOBEC proteins.

This led us to wonder if APOBEC proteins could end up accidentally changing human DNA just like it would change viral DNA and therefore generate the necessary DNA sequence changes to cause cancer; and at the same time we started wondering that a couple of papers came out showing that there were human cancers in which mutations looked like they were being generated by APOBEC enzymes, very likely APOBEC3B (We could tell it was likely APOBEC 3B because it is known to change cytosines that are preceded by a thymine and followed by guanine or adenine or thymine, so if the sequence was TCA or TCG or TCT it would be converted to TGA/TTA or TGG/TTG or TTT/TGT ). There is an alternative process that can also generate TCG->TGG/TTG mutations, so in order to specifically measure APOBEC activity we ended up using the others, which we referred to in the paper as TCW to TKW (TCW->TKW, where K = G or T and W = A or T).

Those previous papers also noted that cervical cancers had lots of mutations that showed the APOBEC signature, but the question remained – was this down to it being the cervix? or was it down to these tumours being HPV+? We decided to take a look in head and neck cancers as well where we could compare HPV+ and HPV- tumours that arose in similar tissues to see if there was truly an association with HPV, and hence we did the work reported in the paper…

HPV positive tumours have a vastly higher fraction of mutations belonging to the APOBEC signature.

First, we ended up looking at levels of APOBEC mutagenesis and how much of all the mutations in tumours were attributable to them using publicly available data for 40 HPV+ head and neck tumours and 253 HPV- head and neck tumours. To do this we used multiple approaches – including looking at TCW->TKW mutations and also trying to break down all the mutations we see in these tumours into patterns of mutations, as was done by these people at the Sanger Institute , and also looking at enrichment for the TCW->TKW mutation pattern locally. All the approaches we used showed the same thing – HPV+ tumours had a vastly higher proportion of mutations most likely caused by APOBEC enzymes.

Figure1:APOBEC mutations are highly enriched in HPV+ HNSCs

Multiple measures of APOBEC activity showed a strong association with HPV status but not age or smoking; APOBEC, age and smoking were the three processes we identified as driving the signatures using the Sanger Institute’s approach. The more the numbers are shifted to the right the stronger the association with the factor listed on the left. 

We found signatures previously associated with APOBEC, smoking and age, and showed that APOBEC activity was not associated with the latter two, which was as expected. Having identified an association with HPV driven tumours we wanted to know if this was a general antiviral response or something HPV specific…so we took a look at patterns of mutations in liver cancers caused by hepatitis B and C viruses and found no evidence for APOBEC mediated mutations being significantly enriched in these tumours.

Of drivers and passengers

Most tumours have hundreds and thousands of mutation, but only a few actively contribute to the acquisition and maintenance of the hallmarks of cancer. So, having initially identified high proportions of APOBEC-mediated mutations in HPV driven cancers when looking across the exome (all protein coding genes in general) we decided to ask if the enrichment we saw in all genes was also maintained when we restricted our searching to genes known previously to drive cancer or those that share features associated with drivers, like occurring at a frequency greater than expected by chance. Our analyses confirmed that APOBEC-mediated mutations were again enriched in the HPV+ head and neck, and cervical cancers compared to the HPV- HNSCs.

Figure2

Differences between HPV negative HNSCC and HPV+ tumours (HNSCC and Cervical cancer) are maintained when looking at all protein-coding genes (whole exome) and likely driver mutations (MutSig).

Then we went on to look at which driver genes happened to be most mutated by APOBEC proteins, and found a gene called PIK3CA (one of the components of a protein complex called PI3 kinase) towards the very top of the list. PIK3CA has previously been reported as being vital to the sustenance of many HPV positive tumours in particular and head and neck cancers in general, and drugs are being developed to target it. Interestingly, we observed that in the HPV+ tumours 22/25 PIK3CA mutations recorded were of the APOBEC type, while this wasn’t the case for the HPV negative tumours.

This then led to yet another question – can the levels of APOBEC activity explain a preference for APOBEC mutations in HPV-positive tumours? Now for driver genes there are two things that may govern what kinds of mutations we see – how much of a growth advantage a mutation in a driver gene gives that cell and the mutation itself. My supervisor, Tim Fenton, who worked on PI3 kinases previously, knew that there were two regions in PI3 kinase amongst which mutations regularly occurred (one or the other) and then realised that one of them contained a TCW sequence that APOBEC proteins could act on while the other one did not.

The PIK3CA gene makes a protein called p110-alpha, and proteins have different distinct elements in their structure, called domains. One region, called the helical domain, is often mutated at two TCW sequences while the other region, called a kinase domain, is not, and both mutations confer similar growth advantage, and if you look across multiple tumour types, overall you tend to see a 50-50 split between the two. This enabled us to account for growth advantage and directly see if APOBEC activity, which we had already measured by looking at all protein-coding genes, and a preference for APOBEC-induced mutations in the helical domain, were linked.

Since PIK3CA is mutated in multiple types of cancers, I was able to grab some data from The Cancer Genome Atlas project and measure how strongly there was a skew towards acquiring helical domain mutations compared to the kinase domain mutations and just look at what APOBEC activity looked like in each of those types of tumours. The results were quite robust – the higher the APOBEC activity in a cancer type, the stronger the preference for helical domain mutations compared to kinase domain mutations.

Figure3

Figure 3. A – as you move from left to right (tumour types are arranged from left to right based on median APOBEC activity), you see helical domain mutations (black bars) become strongly preferred compared to kinase domain mutations (yellow bars). B – plotting the median TCW->TKW fraction (APOBEC activity) against the proportion of PIK3CA mutations that are helical hotspot mutations shows a strong correlation.

So yeah, people had been wondering why in bladder cancers, for example, you saw such a strong preference for helical hotspot mutations – we basically addressed that long-standing question with these analyses.

Explanatory factors

So the one other thing we did was to look at what might be driving this process, and surprisingly we found no correlation between how much E6 and E7 was being expressed in these tumours and APOBEC activity, or for that matter between APOBEC3B gene expression and APOBEC activity, and did find a strong link with how many mutations in total these tumours had. The work has led us to hypothesize it may be something like DNA damage induced by HPV, that generates the substrate for APOBEC3B to act upon, that drives the process.

Conclusion

Our work suggests that HPV positive tumours evolve in a trajectory where they incorporate HPV DNA into their own, leading to sustained E6/E7 expression, followed by APOBEC activity until a driver mutation occurs, after which clones expand and show the APOBEC signature when their DNA is sequenced while in HPV negative HNSCC smoking and alcohol do this job, and if PIK3CA is the gene mutated the HPV positive tumours tend to have helical domain hotspot mutations because APOBEC proteins are responsible for them…

Additional stuff

The journal did a Q&A that expands on some of the work in the paper, and you may find it here .

There is a press release from UCL here.

 

Putting stuff into context – on the TRAIL of a new science news story on the BBC

Right, so this time I have a few things to add to a news story doing the rounds currently. http://www.bbc.co.uk/news/health-25625934

Quote…

The most dangerous and deadly stage of a tumour is when it spreads around the body.

Scientists at Cornell University, in the US, have designed nanoparticles that stay in the bloodstream and kill migrating cancer cells on contact.

They said the impact was “dramatic” but there was “a lot more work to be done”.

One of the biggest factors in life expectancy after being diagnosed with cancer is whether the tumour has spread to become a metastatic cancer.

“About 90% of cancer deaths are related to metastases,” said lead researcher Prof Michael King.

On the trail

The team at Cornell devised a new way of tackling the problem.

They attached a cancer-killing protein called Trail, which has already been used in cancer trials, and other sticky proteins to tiny spheres or nanoparticles.

When these sticky spheres were injected into the blood, they latched on to white blood cells.

Tests showed that in the rough and tumble of the bloodstream, the white blood cells would bump into any tumour cells which had broken off the main tumour and were trying to spread.

The report in Proceedings of the National Academy of Sciencesshowed the resulting contact with the Trail protein then triggered the death of the tumour cells.

Prof King told the BBC: “The data shows a dramatic effect: it’s not a slight change in the number of cancer cells.

“The results are quite remarkable actually, in human blood and in mice. After two hours of blood flow, they [the tumour cells] have literally disintegrated.”

He believes the nanoparticles could be used used before surgery or radiotherapy, which can result in tumour cells being shed from the main tumour.

It could also be used in patients with very aggressive tumours to prevent them spreading.

However, much more safety testing in mice and larger animals will be needed before any attempt at a human trial is made.

So far the evidence suggests the system has no knock-on effect for the immune system and does not damage other blood cells or the lining of blood vessels.

But Prof King cautioned: “There’s a lot of work to be done. Various breakthroughs are needed before this could be a benefit to patients.”

Just a few things, though. Firstly, the spread of cancer cells throughout the body may be a very early event in tumour evolution, as suggested by studies in animal models http://www.jci.org/articles/view/43424 . More worryingly, TRAIL can, in cells that have activating mutations in a gene called k-ras, actually lead to more invasive disease and earlier death due to metastasis. http://www.ncbi.nlm.nih.gov/pubmed/20188103 . Several mechanisms of resistance to TRAIL have also been documented http://www.ncbi.nlm.nih.gov/pubmed/22206047

So the promise highlighted in the article makes me immensely skeptical.

Planes are made of metal, but this doesn’t mean you can fly on bauxite.

OK, pardon the weird title please. The internet is home to a lot of rubbish about cancer research, and amongst this set of kooky beliefs is the notion that hemp oil/marijuana can cure cancer. The article that has drawn my attention this time round is http://www.destructionofamerica.4t.com/whats_new_3.html

I quote

I didn’t stutter, Cannabis IE “evil drug Marijuana” hemp cures several kinds of brain tumors as well as other cancers. The FDA has known for over twenty tears and hid it from us!

New research shows that marijuana components fight an aggressive form of brain cancer. And the media says – nothing, again.

Combining the two most common cannabinoid compounds in Cannabis may boost the effectiveness of treatments to inhibit the growth of brain cancer cells and increase the number of brain cancer cells that die off. That’s the finding of a new study published in the latest issue of the journalMolecular Cancer Therapeutics.

Marijuana components have been found to inhibit the growth of the most common, and aggressive form of brain tumor, a glioblastoma, according to a study published in the January 6 issue of Molecular Cancer Therapeutics.

The study was done at the California Pacific Medical Center by researchers who combined a non-psychoactive ingredient of marijauna, cannabidiol (CBD), with Δ9-tetrahyrdocannabinol (Δ9-THC), the primary psychoactive ingredient in Cannabis. The findings demonstrated the inhibitory effect of these two ingredients on brain cancer cells when used together.

“Our study not only suggests that combining these two compounds creates a synergistic effect,” says Sean McAllister, Ph.D., a scientist at CPMCRI and the lead author of the study. “but it also helps identify molecular mechanisms at work here, and that may lead to more effective treatments for glioblastoma and potentially other aggressive cancers.”

“Previous studies had shown that Δ9-THC was effective in inhibiting brain cancer growth in cell cultures and in animal models and prompted a small clinical trial in Spain. There is also evidence that other compounds in Cannabis might prove effective against tumors, but limited scientific evidence is available,” the report stated.

President Reagan & Bush tried to have studies done in Virginia destroyed to hide the facts but fortunately some survived. There are OVER 50,000 products that can be made with hemp conservatively. Yet we have been lied to by the authorities for nearly a hundred years and saw them destroy one of the most profitable commercial industry known to man.

How many people have died that could still be alive? How many have suffered needless pain cannabis could relieve? Call your congress persons today TOLL FREE  800 833 6354 tell them you know the truth and they will be FIRED if they don’t change the law!

Right, time to go fallacy-fishing.

[1] That certain cannabinoid compounds can hit the growth of glioblastoma cell lines in cell lines in no way means that marijuana cures cancer anymore than the effectiveness of Penicillin means Penicilium notatum (the fungus that penicillin is produced from) cures bacterial infections.

[2] Note that these are cell lines and animal models we’re talking about at best – inhibiting the growth of cell lines and xenografts is a preliminary step on the way to develop new cancer therapies; the work in the paper or the studies it cites in no way support the notion that these compounds are curative.

[3] Indeed, the very person who wrote the codswallop above failed to read the very excerpts he quotes.

“Our study not only suggests that combining these two compounds creates a synergistic effect,” says Sean McAllister, Ph.D., a scientist at CPMCRI and the lead author of the study. “but it also helps identify molecular mechanisms at work here, and that may lead to more effective treatments for glioblastoma and potentially other aggressive cancers.”

“Previous studies had shown that Δ9-THC was effective in inhibiting brain cancer growth in cell cultures and in animal models and prompted a small clinical trial in Spain. There is also evidence that other compounds in Cannabis might prove effective against tumors, but limited scientific evidence is available,” the report stated.

In what parallel universe is the notion that Marijuana cures cancer or the idea that the data above implies that the FDA admits it anything less than a brazen display of intellectual contortionism that would put the poriferan equivalent of Houdini to shame?!

A Bird’s Eye View of Cancer Research…

I often get asked what cancer is when people find out I do cancer research for a living and the whys and the wherefores thereof inevitably follow in conversation. The complexities of the disease often mirror the complexities of the bodies they plague and therefore I decided it might be good to get a few things written down that people could be pointed to in an effort to make things a little more lucid and also to serve as a compilation of resources people could delve into if they so desired. So here it is…

Omnis cellula e cellula

All known living organisms are made of cells, in most cases, one cell on its own is an organism, obtaining food from the environment, breathing, growing, multiplying, and carrying out a whole assortment of other life processes that are of interest to biologists, but I digress.

Cancer is fundamentally a disease of organisms that are made of communities of cells – these cells too do all of the above, but some of the more complex varieties of multicellular organisms show specialisation – brains and lungs and guts and genitals – all for the same purposes, to obtain energy, to stay alive and to reproduce; not that there is some grand predisposition to doing this with foresight, merely that those that stumble onto what is passable in the examinations posed by the brutal machinations of nature get to go on.

Starting from one cell,multicellular organisms expand to have several tissues, a whole paraphernalia of different types of cells. Some cells die out, some stay on but don’t divide unless required to make more cells, some double in number until they stumble across a battery of conditions – other cells, other molecules that tell them to stop growing, or those that induce them to multiply in a frenzy but then stop when balance has been restored.

This behaviour of cells, and the organs they form, and then the organ systems that they comprise, and the organisms themselves, emerge from interactions of the environment, both consisting of organisms and other things that appear prosaic but are nonetheless significant influences on the fates of organisms with the delightfully messy workings of the molecules within cells – from the DNA that contains all the genetic material of a cell that simply must be passed on from generation to generation to facilitate survival of the species.

Dawkins in his magnum opus; “The Selfish Gene”, popularised the notion of organisms serving as mere means for the continued survival of genes, but I shall go one step further and put it to you it isn’t just individual genes that are selfish, it is entire genomes (a collection of all the DNA in a cell/organism). Promiscuity is favoured by nature when rivals are less promiscuous and when it is possible to brutally stifle threats posed by the competition, and cancer essentially is this being taken to a horrifying extreme when genomes find ways to be malignantly selfish at the expenses of the other cells that are also integral to the survival of the organism, bringing with it much suffering and often, death…for alas! Cancer cells lack the foresight to know that their fate is tied inextricably with that of their hosts.

Of DNA, RNA and proteins…

Genomes are made of DNA, and this is the medium by which information gets passed on from generation to generation except in the case of a few viruses, but we don’t really consider viruses to be living things because they cannot reproduce on their own. But in order to help them do this, cells orchestrate a variety of biochemical functions through the medium of RNA that is transcribed from DNA, which by itself can affect other RNA or for some genes gets turned into proteins. RNA, proteins and DNA then interact with the outside environment, other DNA and proteins to give rise to the chemistry that leads to the formation of cells and organisms. This complexity is described elsewhere on the blog [1] [2] [3]  and I will add links and notes at the end so you’ll be able to explore further if it interests you.

On the road towards understanding cancer…we discover cancer causing genes have cellular origins. 

Towards the dusk of the first decade of the 1900s, Peyton Rous made a discovery that would shape perspectives towards cancer research for a long long time to come – he found that cancers in chicken could be transmitted akin to other known viral diseases – and the causative agent he isolated came to be known as Rous Sarcoma Virus (RSV). This immediately led people to believe that cancer was essentially a viral disease, until Michael Bishop and Harold Varmus made a revolutionary discovery – that the genes that caused cancer had a cellular origin – at some point, the virus had by sheer accident incorporated this gene while it was packaging itself up, and had no problems transmitting it because it could spread before the chickens died of cancer.

They got their hands on two strains of RSV, one with cancer causing properties, and one without, and consequently one could identify the viral gene that caused cancer as the one that was present in the former but not the latter. They made a probe of a molecule called RNA, which I shall describe later, to look for similar genes in cells, and then they found that it bound to the DNA of cells from different species, and in every species it happened to be found in the same place in the genomes of the cells (genomes are made of DNA); that gene that had been rampaging through chickens when transmitted through the virus was also found in normal cells, and when switched on in very high levels due to lots of replicating viruses, caused cells to lose control of how they grew and to take part in a frenetic orgy of cellular division, it was, in effect, an oncogene.

Then they looked elsewhere for genes with similar properties and began to identify more and more, which in cancer cells had defects in DNA that affected the function of the proteins they produced and consequently acted to launch the cells into rapidly increasing their cell numbers. On the other hand, people began to find proteins which, when altered, lost the ability to stop cells from dividing uncontrolled, these genes came to be known as tumour suppressors [4] and one of the breakthroughs in learning about the function of these genes came was the elucidation of Knudson’s “Two-hit” hypothesis…

Knudson’s two hit theory of cancer causation came about after the discovery of tumour suppressors, specifically a gene called Rb1.

Looking at genes implicated as either oncogenes or tumour suppressors people began to stumble across changes in DNA sequence compared to the sequences in normal cells that affected the proteins the genes made. These mutations, as we describe them, established the foundations of cancer genetics and genomics.

The hallmarks of cancer.

As people found more and more oncogenes and tumour suppressors they wondered what cancer was, for here was a set of diseases stemming from various tissue types that all appeared to consist of cells that grew rapidly and in many cases spread through the body, albeit at different rates. A seminal paper by Hanahan and Weinberg defined the hallmarks of cancer – traits that any disease that qualifies as a cancer *must* possess…

The Hallmarks of Cancer and examples of potential therapeutic methods to target them. From Hanahan and Weinberg’s seminal paper ‘The Hallmarks of Cancer: The Next Generation’, link in references.

These include the ability to multiply abnormally without requiring external signals, and if external signals that stop normal cells from dividing are present, not pay heed to them, the failure to undergo apoptosis (a form of cell death), immortalisation, which is the ability to divide indefinitely in permissive conditions unlike normal cells, angiogenesis, where tumours induce the formation of blood vessels so they can establish a bloody supply and finally, and most critically, metastasis; the ability to spread through the body and colonise other sites in the body, which is incidentally what is thought to kill patients. 
There was a recent update to the classical set of hallmarks described about and three new hallmarks entered the fray – altered cell metabolism; changes in how cancer cells generate energy, inflammation; a molecular response to wounds and injuries in normal cells that goes wrong and promotes cancer metastasis and genome instability – being prone to mutations and other structural aberrations that generate the complexities of cancer genomes, which I describe later [5]…

More than just mutations, and how we came to find out…

People who had been studying families of proteins called transcription factors noticed that they could fundamentally alter the way the RNA of different genes in the genome was produced – they could alter when they were produced, and how much was produced. This could then affect other proteins that controlled how cells divided and interacted with the environment, in some cases, transcription factors were found to be mutated, such as p53, which is also known as the guardian of the cell because of its critical role in stopping errant cells from progressing to cancer [6], which explains why so many tumours modify p53 function so they can get round it,and with this came the idea that cancers would exhibit differences in gene expression relative to normal tissue and this would then contribute to the achievement of the hallmarks of cancer. See [7] for a description of microarrays and case studies of how looking at expression profiles helped understand cancers.

People also realised that you could get changes in expression patterns independent of transcription factors… Cancer cells are host to a wide variety of large, structural, distortions of the genome, and compared to normal cells, which have 46 chromosomes, cancer cells accumulate a variety of aberrations, ranging from small deletions and duplications of bits of chromosomes to gains and losses of whole chromosomes, or in some cases whole sets of chromosomes (The ubiquitously used cancer cell line, HeLa, has 88 chromosomes).

Changes in copy number of genes through these aberrations could also have effects on gene expression profiles. Finally, people who’d been studying epigenetic processes, which involve cells inheriting expression patterns for instance and then modifying them through modifications of DNA without changes in sequence, such as DNA methylation and Hydroxymethylation or the histones around which DNA is wound [8], and began to develop techniques to characterise epigenetic changes in tumours, and we therefore ended up in a situation where we had a whole panel of analyses we could do on tumours.

The Cancer Genome Atlas

While several other tumour sequencing projects were underway at the likes of the Wellcome Trust Sanger Institute, The Cancer Genome Atlas really set things going ahead with their project on the deadly brain cancer; Glioblastoma Multiforme, with sequencing for mutations, microarray analysis for Copy Number Variation and gene expression and microarray analysis for DNA methylation. They essentially found that there are four groups of glioblastomas based on patterns in gene expression and were able to correlate these with different cellular origins and different ways in which those expression profiles were achieved [9].

An external file that holds a picture, illustration, etc.Object name is nihms166306f2.jpg

Heatmap from one of the first TCGA papers that profiled glioblastoma. Four subtypes of glioma were described based on expression profiles. They were able to classify samples in an independent dataset and also versions of glioblastomas grown in mice (technically called xenografts).

They also found a subset of tumours that had very high levels of methylation driven by mutations in a gene called IDH1, which leads to too much methylation as a direct consequence, as demonstrated in a landmark paper in the journal Nature, where they put in mutant IDH1 into astrocytes and showed it induced high methylation levels like those seen in gliomas of that type…

Since then, the TCGA has published multiple studies on breast cancer, ovarian cancer, renal clear cell carcinoma,lung adenocarcinoma and colorectal cancer to name a few [10] and is collecting data for more tumour types, and from 12 datasets so far a pan-cancer analysis was released recently [11]. This has inspired the formation of the even more ambitious International Cancer Genome Sequencing Consortium which aims to widely expand the scope and the size of the type of approaches taken by the TCGA to profile the most striking molecular features of tumours and to then relate them to clinical information.

Things are not quite so simple – the problem of heterogeneity.

One would think that by understanding the make-up of tumours and figuring out what drives them, it would be easy to target altered genes, proteins and pathways with specific drugs to achieve cures, however, tumours have such unstable genomes and often contain so many cells by the time they’re detected that they are capable of a great degree of evolution, which may become reflected in resistance to the drugs used to target them. Indeed, studies starting two years ago began to show two features of tumours, firstly; they evolved through time, and therapy often had an influence on which dominant properties were seen in a patient’s disease as they relapsed. They either saw that the most dominant clone before treatment acquired new mutations and then evolved to resist therapy or a previously minor clone expanded.

At around the same time, evidence was found to strongly support the notion that cancers not only evolved in a linear manner but could evolve in parallel. Both those studies were carried out on leukaemias and the same was shown to be true of solid tumours. An analysis of a kidney tumour looking at multiple regions of the primary tumour and metastases (new outgrowths of the tumour derived from cells that had spread from the primary tumour) highlighted branched evolution and also observed that different parts of the primary tumour showed different patterns of gene expression associated with  survival [12].

Intratumour heterogeneity is extensive in kidney cancer and sequencing multiple biopsies enabled reconstruction of evolutionary patterns.

A recent study looking at multiple regions from a series of glioblastomas also found the same striking pattern; there was evidence that all four of the glioma expression subtypes discovered by the TCGA were found in that tumour [13]. These studies have made one thing abundantly clear; that understanding and classifying tumours into subgroups may be of limited utility when the range of evolutionary invention achieved by tumours permits them to acquire different patterns of alterations for the most part in response to therapy. We will learn a lot, indubitably, from large scale analyses of the kinds already being carried out, and only recently we began to uncover what processes might contribute to the formation of mutations and how to find signatures for mutations from all that data being generated by sequencing tumour after tumour after tumour, so chances are we will have a comprehensive collection of molecular profiles to tuck into, soon, on an unprecedented scale. 


Finding chinks and causes for optimism…

Another way weaknesses might be found in tumours involves approaches based on what we call synthetic lethality and collateral lethality. Synthetic lethality is when, because a gene is mutated or altered otherwise, another gene becomes essential while it was not essential if the other gene was intact. A classic example of this is PARP inhibition. PARP is an enzyme that repairs breaks in DNA, but can be dispensed with if the BRCA genes are intact. A significant proportion of Ovarian and Breast cancers, especially those that run in families, show a characteristic loss of BRCA1 or BRCA2, and this makes them especially vulnerable to the blockade of PARP.

Explanation of synthetic lethality to PARP inhibitors. People with one copy of BRCA lost in normal cells can present with tumours that have lost both. Normal cells have BRCA to compensate for the loss of the PARP gene but cancer cells don’t, and blocking PARP can kill them while sparing normal cells as a consequence.

One way of finding synthetic lethal interactions is to combine knockdown experiments, where little RNA sequences are introduced into cells to block and degrade the RNA of target genes and not permit protein to be formed from that RNA and combining that information with mutation, expression and other “-omics” data as we call them. Even without -omics data in attendance, knockdown experiments themselves can reveal certain genes that come to be required specifically in cancer, and in that way we can identify targets for chemists to then develop specific drugs against. Of course, the other approach would be to target things that appear to transcend cancers, and examples include targeting a protein called CD44 that cancer cells appear to universally express to avoid being destroyed by the immune system or to use drugs that target fundamentally common features of tumours such as DNA methylation [14].

Finally, knowledge of tumour evolution itself may be employed to find weaknesses, as described elsewhere on the blog, the mechanism of resistance may by itself predispose tumours to weaknesses, and this could be as simple as withdrawing the drug (letting go of the brakes suddenly when the driver’s still got the foot on the pedal to compensate for jammed brakes till that point), as discussed here [15].

Dealing with heterogeneity may be rendered possible by bypassing resistance mechanisms where cancers find alternate pathways to get to where they need to be to survive and expand by hitting points that are altered in cancer, but have no known alternatives for tumours to route their functioning through. Indeed, this has been shown experimentally by targeting Myc, which when activated is a potent oncogene or by targeting BIM in glioblastoma where tumour cells evolve resistance by finding other ways to prevent BIM from being turned on when the pathway they usually use to do this is blocked with drugs.

For all the complexities of cancer, there might still be ways in which we will figure out how to target and attack them successfully, and one of the keys to that I think the sense of scale and community that cancer research projects these days are marked by. I think the enduring impact of the Human Genome Project [16] was not just the sequencing of the human genome, but ensuring that data was openly accessible to anybody who wanted to use it for their research or look at it for general interest; the TCGA and ICGC have put in place similar policies to govern how their data is accessed, and by allowing researchers to integrate the research they do locally with data that wouldn’t have been generated without big projects like them it is possible to achieve so much more. And maybe we’ll figure out what cancer is, and then determine what they can and can’t be, someday, soon… 
References!

Links to posts on the workings of DNA, RNA and proteins. [1] http://exploreable.wordpress.com/2011/02/19/the-central-dogma-of-molecular-biology/  (Central Dogma of Molecular Biology)

[2] http://exploreable.wordpress.com/2011/05/16/from-dna-to-rna-the-process-of-transcription/ (Transcription)

[3] http://exploreable.wordpress.com/2011/01/14/right-just-as-promised-here-comes-rna-interference/ (Includes descriptions of microRNAs, RNA species that can block other RNAs from being converted to protein as per the central dogma, thereby affecting gene expression).

Oncogenes, Tumour Suppressors and the Hallmarks of Cancer
[4] https://exploreable.wordpress.com/2011/10/05/oncological-complications-models-surrounding-tumour-suppression/ (Contains an exposition of the two-hit theory of cancer causation and links to further material on the topic, as well as nuance about how some tumour suppressors behave differently)

[5] http://www.cell.com/retrieve/pii/S0092867411001279 (Updated version of the classic paper; The Hallmarks of Cancer by Hanahan and Weinberg. May be paywalled).

[6] http://exploreable.wordpress.com/2011/11/03/the-p53-barcode-a-brief-introduction/ (Explains how p53 functions in different contexts, basically)

Understanding Cancers and large scale analyses
[7] http://www.nature.com/scitable/topicpage/genetic-diagnosis-dna-microarrays-and-cancer-1017 (Nature Scitable article on gene expression and cancer).

[8] http://exploreable.wordpress.com/2012/01/22/an-introduction-to-epigenetics/ (An introduction to epigenetic processes).

[9] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2818769/ (The TCGA glioblastoma paper that documented four expression subtypes).

[10] https://tcga-data.nci.nih.gov/docs/publications/ (a list of papers from The Cancer Genome Atlas, most papers are openly accessible and readable should you want to, but they’re science heavy and really written for people with an in-depth understanding of cancer research. You may be able to search for materials and commentary related to them to get a more popular perspective, also look for TCGA press releases on the same site).

[11] http://blogs.nature.com/freeassociation/2013/09/focus-tcga-pan-cancer.html (Blogpost on the Nature blogging network containing links to further material, commentary and analysis papers from the TCGA pan-cancer project.)

Heterogeneity, synthetic lethality and the future

[12] http://exploreable.wordpress.com/2012/09/22/a-very-short-introduction-to-intratumour-heterogeneity/ (Blogpost on intratumour heterogeneity)

[13] http://www.pnas.org/content/110/10/4009.full (Paper documenting intratumour heterogeneity in glioblastoma multiforme)

[14] http://exploreable.wordpress.com/2013/02/24/more-is-not-always-merrier-methylation-version/ (Blogpost discussing broad spectrum effects of low doses of the DNA methylation blocker decitabine) .

[15] http://exploreable.wordpress.com/2013/06/21/a-window-into-acquired-resistance-to-targeted-therapies-through-the-eyes-of-a-mek-inhibitor/ (Blogpost exploring reports of how cancer cells evolved resistance to a drug and how this could be used to target the tumour).

[16] http://exploreable.wordpress.com/2011/05/03/the-story-of-the-human-genome-project-a-short-narration/ (Long blogpost on the Human Genome Project, written by yours truly and therefore recommended if you are a glutton for a few thousand words more having worked your way through to the end of the article).

That’s all from me now!

Cheers,
Exploreable

Genetics and Epigenetics combine to deadly effect in Pediatric Glioblastoma.

Hi,

As I’ve mentioned several times before, cancer involves a combination of genetic and epigenetic changes that result in alterations of cell signalling and gene expression patterns that go on to establish the hallmarks of cancer.

The recent availability of a wealth of mutational data has led to the identification of recurrent mutations in multiple genes that read, write and modify chromatin marks, serving to highlight a direct link between cancer genetics and epigenetics [1] . In most of these cases though, we’ve observed changes in the enzymes that mediate epigenetic processes, ranging from mutation to amplification/overexpression/silencing.

Enzymes always have substrates, and perhaps not altogether surprisingly, a very recent discovery found mutations in the primary substrate of EZH2 [2], which trimethylates lysine 27 of the histone tail of histone H3, which represses gene expression, in paediatric glioma.

Histone 3 is one of the four core histones that comprise the nucleosomes round which DNA is wrapped, and in humans there are two genes that produce a variant of Histone 3, called Histone 3.3, and the authors of this paper found mutations that converted lysine 27 in one of those genes to a methionine. The mutations were heterozygous (the other H3k27 was intact) and so they consequently went on to investigate what the mutation did to H3k27 epigenetic modifications in neurospheres they established from patient tissue, compared to adult glioma and a normal neural cell line of the same differentiation status.

They found global reductions in dimethylated and trimethylated H3k27, and subsequent evidence that this was not attributable to changes in the levels of Ezh2 and Suz12, which are components of the PRC2 complex that mediates H3k27 methylation and silencing. Much to their surprise, they note other H3 marks, including H3k27 acetylation, didn’t differ significantly, which is in direct opposition to findings in [3], even if that was based on mutant Ezh2 as opposed to a mutant histone.

They then had to confirm the reduction was actually due to the mutated histone variant, and to do this they expressed the mutant gene in 293T cells at very low levels, and also established cell lines that contained another previously known histone 3 mutation, albeit not at lysine 27 (K27), and also a version of H3.1 that had a K27 mutation. They again found dramatic reductions in H3k27me3 levels, and were able to validate these results in human astrocyte cultures and in murine embryonic fibroblasts, suggesting a tissue independent mechanism of reduction of H3k27me3 levels was at work.

The introduction of H3.1 and H3.3 K27 mutants is associated with global reductions in H3k27me3 and me2 levels by western blotting and fluorescent microscopy. Transfection experiments into MEF’s reveals reductions associated solely with k27 mutants and also that this is gradual (bar graph and fluorescent micrographs at the bottom) .

They then carried out ChIP-seq and gene expression experiments to understand what altered levels of H3k27me3 did to the gene expression profiles of the mutant cell lines they had developed, and by doing ChIP-seq on both Ezh2 and H3k27me3, they observed that there were no significant reductions in Ezh2 peaks, but there were regions, compared to normal neural stem cells, where the local levels of H3k27me3 were higher than in neural stem cells.  They confirmed this using ChIP-qPCR on two other patient samples and also found that the peaks (indicating maximum binding/concentration of H3k27me3 and/or Ezh2 based on what they’d done ChIP on) identified a strong overlap in the mutant cells but not in normal neural stem cells, suggesting enrichment in those areas might be down to the mutant histones trapping Ezh2.

In order to confirm Ezh2 was actually co-localising more with the mutant histone than the wild-type histone, they pulled all three down with specific antibodies to check if more Ezh2 was pulled down with the mutant histone, and indeed, this is exactly what they found.

Genes that had gained H3k27me3 specifically in mutant glioma cell lines were found to be associated with H3k4me3 as well, marking what are commonly known as “bivalent” genes, that have both activating and repressive chromatin marks and are poised to swing either way, and are responsible for driving tissues to mature and differentiate  [4]. They found using RNA-seq that the expression of these genes was far lower than normal stem cells. They do admit that adult neuronal stem cells are far from the ideal controls to use, and that transfecting proper neurons with mutant histones would help consolidate findings further.

Finally, they conclude that the lack of histone mutations in adult glioma might have to do with the context in which paediatric and adult gliomas develop, with the former being in context of a developing brain very early in life.

So, yeah, I think that is a cool paper.

References

[1] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3396881/
[
2] http://genesdev.cshlp.org/content/27/9/985.long
[
3] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2926606/
[4] http://www.ncbi.nlm.nih.gov/pubmed/16630819

Cheers,
Ankur Chakravarthy.

CRISPR based genome editing – the future of molecular biology.

A vector for genome editing using CRISPR contains the aforementioned elementes, we put in an insert targeting our gene of interest using the enzyme BsblI, which cuts DNA at precise sites marked by the red arrows in the magnified area. We then put in two single stranded DNA oligos and insert them into the region, and when we express the vector in mammalian cells it produces Cas9 or its variants, needed for editing, as well as the RNA that guides Cas9 to our gene of interest.

It isn’t often that I make such seemingly outlandish claims in the title of a blog-post, but this particular technology, CRISPR-based genome editing, I believe deserves the hype.

CRISPR stands for Clustered Regularly Interspaced Short Palindromic Repeats – what that really means is a long sequence of DNA made of similar repeating sequences grouped together. They serve as part of an immune system that evolves to recognise viruses in bacteria and inactivate viral DNA.

Basically, the system cuts up viral DNA into fragments and puts those fragments into the middle of a CRISPR sequence, which we call a CRISPR array, and bacteria then begin to pump out large amounts of RNA encoding this sequence. Other enzymes in the CRISPR pathway go on to find viral DNA based on the CRISPR with the viral DNA insert and then trigger their degradation – which is neat.

What is neater is that we’re now able to use this in a modified form to edit genes in mammalian cells (including human cells). Instead of putting in a viral insert into the CRISPR array, we can put in a 30-base-pairs long stretch of DNA that targets our genes of interest and express RNA from this using some pretty simple genetic engineering (See figure for a description please).

When expressed in cells this can then result in the Cas9 inducing double strand breaks or inducing a nick in one strand, based on what variant of Cas9 we use, in the gene of interest. The really cool thing about this is we can then either wreck the gene using non-homologous end joining or spike in a mutant sequence using homologous recombination, which are two methods by which double strand breaks are repaired – in non homologous end joining the region with the double strand break is cut out and and the sequences on either side are brought together, resulting in the likely loss of the sequence containing the break. If we have a template with a mutation that varies near the site of the break and sequences of DNA that match the sequences on either side of the break well enough that gets incorporated in place of the broken regions instead (See figure 2)


Mammalian double-strand break (DSB) repair. DNA DSBs are predominantly repaired by either non-homologous end-joining (NHEJ) or homologous recombination (HR) [156]. NHEJ rejoins broken DNA ends, and often requires trimming of DNA before ligation can occur. This can lead to loss of genetic information. In NHEJ, the broken DNA ends are bound by the KU70/KU80 heterodimer, which orchestrates the activity of other repair factors and recruits the phosphatidylinositol 3-kinase DNA-PKcs/PRKDC. DNA-PKcs phosphorylates and activates additional repair proteins, including itself and the ARTEMIS/DCLRE1C nuclease. ARTEMIS and/or the heterotrimeric MRE11-RAD50-NBN complex are thought to process the DNA ends prior to ligation. The DNA ends are joined by the activity of polymerases and a ligase complex consisting of XRCC4, XLF/NHEJ1 and LIG4. In contrast to NHEJ, HR is an error-free repair pathway that utilizes a sister chromatid, present only in the S- or G2-cell cycle phase, as template to repair DSBs. HR is initiated by DNA end-resection, involving the MRE11-RAD50-NBN complex and several accessory factors including nucleases. The MRE11-RAD50-NBN complex also recruits the phosphatidylinositol 3-kinase ATM, which phosphorylates histone H2AX and many other proteins involved in repair and checkpoint signaling. Single-stranded DNA generated by DNA end-resection is bound by RPA, which is subsequently replaced by RAD51. RAD51 promotes the invasion of the single-stranded DNA to a homologous double-stranded DNA template, leading to synapsis, novel DNA synthesis, strand dissolution, and repair. Many more proteins are involved in both NHEJ and HR, which are not depicted here for clarity, as they are not referred to in the main text. For details, see recent reviews by Lieber [81] and San Filippo et al. [80].

 Lans et al. Epigenetics & Chromatin 2012 5:4   doi:10.1186/1756-8935-5-4
This technique has earned some rave reviews recently and one of the really cool things is you can express multiple tracrRNA (RNA containing the crispr array to guide Cas9) from a single vector, and it’s even been used to generate mice carrying multiple mutations in one step – which I think is remarkably cool ( http://www.sciencedirect.com/science/article/pii/S0092867413004674 )

Generating mutant versions of genes is one of the things I will be doing in the next few months of my PhD, and I must say these are very exciting times indeed to be a molecular biologist – there is something very exquisite about being able to not just turn off genes temporarily but to delete them or to edit their sequence permanently – it is the sort of stuff that enables us to ask what specific mutations in a gene mean for the development of cancer and I see it contributing to some very good research in the years to come.

Cheers,
Ankur “Exploreable” Chakravarthy.

Update – There’s a new paper out in Nature Biotechnology showing that conventional CRISPR-based systems (the version that induces double strand breaks) can result in promiscuous off-target mutagenesis because the DNA-gRNA coupling can tolerate mismatches. http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.2623.html?WT.mc_id=TWT_NatureBiotech

It will be of interest to see if nickase (single strand break inducing Cas9) variants of CRISPR result in higher fidelity because nicks should be repaired unless there is enough mutant template for that to be knocked in by recombination instead.