Tag Archives: mutations

How HPV driven cancers get their mutations…

Hi there!

It’s been a long time since I last blogged, but that is because I’ve been swimming round in data, which has incidentally led to the findings that were published in this paper , which I will describe in this post.

HPV and the link to cancer.

HPV (Human Papillomaviruses) consist of a family of viruses that infect keratinocytes (skin cells) that line the outside of the body and the inner cavities – some of them just cause warts (and genital warts) but some of them are capable of driving the formation of cancer. These types, which are called “High-risk” strains, are the ones that are targeted for prevention by HPV vaccines.

High-risk HPV strains differ from low-risk strains in terms of cancer-causing ability because of proteins they make during their life cycle. Cells need to be actively dividing to permit HPV replication and in order to do this, the virus uses two proteins, called E6 and E7 , to block and degrade two proteins in human cells, called TP53 and pRb, which are two potent tumour suppressors (genes that prevent tumour formation).

Normally, E6 and E7 are only active for a brief while during the virus’ life cycle, which culminates in the production of more viruses that restart the cycle all over again, but before HPV driven cancers form something very strange happens; by complete accident the viral genome gets inserted and integrated into human DNA in infected cells, or infected cells get locked into a state where E6 and E7 are produced all the time. Suddenly you’ve got cells with TP53 and pRb off all the time, leaving behind cells that can grow abnormally. We see this when women have cervical scrapings looked at, and see “dysplastic” cells that have grown clumpy and abnormal.

However, these dysplastic cells are not cancerous – and haven’t acquired all the hallmarks of cancer. For this to happen there need to be additional changes to the DNA sequence (Mutations) of the genes in dysplastic cells that can confer those properties. Well known examples of things that cause mutations include tobacco smoke; for quite a while it had been an open question as to where HPV-driven tumours got their mutations from.

Suspicions are aroused: could the APOBEC family of proteins be making these mutations? 

One of my major research interests is to see what genes are expressed more and what genes are turned off in HPV driven cancers, and when defining a signature for these tumours I compared them to normal tissue and HPV negative tumours that arise in the same tissue (while cervical cancers usually all tend to be HPV-driven, there are head and neck cancers caused by HPV and those caused by chronic tobacco and alcohol exposure) and one of the genes that I found expressed at high levels in HPV-positive tumours was APOBEC3B.

APOBEC3B is one of many proteins of the APOBEC cytosine deaminases family. These act either on RNA or DNA when it is a single stranded state, and take part in the body’s immune response against viruses by messing up the RNA/DNA from the viruses. They work by changing cytosines, one of the four bases that make up DNA to uracil (a base that is normally only found in RNA) which then gets converted to a thymine or a guanine (two other bases that make up DNA); so if you get lots of these changes in viral DNA you fundamentally break them so they can’t do any of the things they usually do, and it had been known for a while that you could find HPV with messed up DNA in precancerous lesions with patterns of change associated with APOBEC proteins.

This led us to wonder if APOBEC proteins could end up accidentally changing human DNA just like it would change viral DNA and therefore generate the necessary DNA sequence changes to cause cancer; and at the same time we started wondering that a couple of papers came out showing that there were human cancers in which mutations looked like they were being generated by APOBEC enzymes, very likely APOBEC3B (We could tell it was likely APOBEC 3B because it is known to change cytosines that are preceded by a thymine and followed by guanine or adenine or thymine, so if the sequence was TCA or TCG or TCT it would be converted to TGA/TTA or TGG/TTG or TTT/TGT ). There is an alternative process that can also generate TCG->TGG/TTG mutations, so in order to specifically measure APOBEC activity we ended up using the others, which we referred to in the paper as TCW to TKW (TCW->TKW, where K = G or T and W = A or T).

Those previous papers also noted that cervical cancers had lots of mutations that showed the APOBEC signature, but the question remained – was this down to it being the cervix? or was it down to these tumours being HPV+? We decided to take a look in head and neck cancers as well where we could compare HPV+ and HPV- tumours that arose in similar tissues to see if there was truly an association with HPV, and hence we did the work reported in the paper…

HPV positive tumours have a vastly higher fraction of mutations belonging to the APOBEC signature.

First, we ended up looking at levels of APOBEC mutagenesis and how much of all the mutations in tumours were attributable to them using publicly available data for 40 HPV+ head and neck tumours and 253 HPV- head and neck tumours. To do this we used multiple approaches – including looking at TCW->TKW mutations and also trying to break down all the mutations we see in these tumours into patterns of mutations, as was done by these people at the Sanger Institute , and also looking at enrichment for the TCW->TKW mutation pattern locally. All the approaches we used showed the same thing – HPV+ tumours had a vastly higher proportion of mutations most likely caused by APOBEC enzymes.

Figure1:APOBEC mutations are highly enriched in HPV+ HNSCs

Multiple measures of APOBEC activity showed a strong association with HPV status but not age or smoking; APOBEC, age and smoking were the three processes we identified as driving the signatures using the Sanger Institute’s approach. The more the numbers are shifted to the right the stronger the association with the factor listed on the left. 

We found signatures previously associated with APOBEC, smoking and age, and showed that APOBEC activity was not associated with the latter two, which was as expected. Having identified an association with HPV driven tumours we wanted to know if this was a general antiviral response or something HPV specific…so we took a look at patterns of mutations in liver cancers caused by hepatitis B and C viruses and found no evidence for APOBEC mediated mutations being significantly enriched in these tumours.

Of drivers and passengers

Most tumours have hundreds and thousands of mutation, but only a few actively contribute to the acquisition and maintenance of the hallmarks of cancer. So, having initially identified high proportions of APOBEC-mediated mutations in HPV driven cancers when looking across the exome (all protein coding genes in general) we decided to ask if the enrichment we saw in all genes was also maintained when we restricted our searching to genes known previously to drive cancer or those that share features associated with drivers, like occurring at a frequency greater than expected by chance. Our analyses confirmed that APOBEC-mediated mutations were again enriched in the HPV+ head and neck, and cervical cancers compared to the HPV- HNSCs.


Differences between HPV negative HNSCC and HPV+ tumours (HNSCC and Cervical cancer) are maintained when looking at all protein-coding genes (whole exome) and likely driver mutations (MutSig).

Then we went on to look at which driver genes happened to be most mutated by APOBEC proteins, and found a gene called PIK3CA (one of the components of a protein complex called PI3 kinase) towards the very top of the list. PIK3CA has previously been reported as being vital to the sustenance of many HPV positive tumours in particular and head and neck cancers in general, and drugs are being developed to target it. Interestingly, we observed that in the HPV+ tumours 22/25 PIK3CA mutations recorded were of the APOBEC type, while this wasn’t the case for the HPV negative tumours.

This then led to yet another question – can the levels of APOBEC activity explain a preference for APOBEC mutations in HPV-positive tumours? Now for driver genes there are two things that may govern what kinds of mutations we see – how much of a growth advantage a mutation in a driver gene gives that cell and the mutation itself. My supervisor, Tim Fenton, who worked on PI3 kinases previously, knew that there were two regions in PI3 kinase amongst which mutations regularly occurred (one or the other) and then realised that one of them contained a TCW sequence that APOBEC proteins could act on while the other one did not.

The PIK3CA gene makes a protein called p110-alpha, and proteins have different distinct elements in their structure, called domains. One region, called the helical domain, is often mutated at two TCW sequences while the other region, called a kinase domain, is not, and both mutations confer similar growth advantage, and if you look across multiple tumour types, overall you tend to see a 50-50 split between the two. This enabled us to account for growth advantage and directly see if APOBEC activity, which we had already measured by looking at all protein-coding genes, and a preference for APOBEC-induced mutations in the helical domain, were linked.

Since PIK3CA is mutated in multiple types of cancers, I was able to grab some data from The Cancer Genome Atlas project and measure how strongly there was a skew towards acquiring helical domain mutations compared to the kinase domain mutations and just look at what APOBEC activity looked like in each of those types of tumours. The results were quite robust – the higher the APOBEC activity in a cancer type, the stronger the preference for helical domain mutations compared to kinase domain mutations.


Figure 3. A – as you move from left to right (tumour types are arranged from left to right based on median APOBEC activity), you see helical domain mutations (black bars) become strongly preferred compared to kinase domain mutations (yellow bars). B – plotting the median TCW->TKW fraction (APOBEC activity) against the proportion of PIK3CA mutations that are helical hotspot mutations shows a strong correlation.

So yeah, people had been wondering why in bladder cancers, for example, you saw such a strong preference for helical hotspot mutations – we basically addressed that long-standing question with these analyses.

Explanatory factors

So the one other thing we did was to look at what might be driving this process, and surprisingly we found no correlation between how much E6 and E7 was being expressed in these tumours and APOBEC activity, or for that matter between APOBEC3B gene expression and APOBEC activity, and did find a strong link with how many mutations in total these tumours had. The work has led us to hypothesize it may be something like DNA damage induced by HPV, that generates the substrate for APOBEC3B to act upon, that drives the process.


Our work suggests that HPV positive tumours evolve in a trajectory where they incorporate HPV DNA into their own, leading to sustained E6/E7 expression, followed by APOBEC activity until a driver mutation occurs, after which clones expand and show the APOBEC signature when their DNA is sequenced while in HPV negative HNSCC smoking and alcohol do this job, and if PIK3CA is the gene mutated the HPV positive tumours tend to have helical domain hotspot mutations because APOBEC proteins are responsible for them…

Additional stuff

The journal did a Q&A that expands on some of the work in the paper, and you may find it here .

There is a press release from UCL here.



Genetics and Epigenetics combine to deadly effect in Pediatric Glioblastoma.


As I’ve mentioned several times before, cancer involves a combination of genetic and epigenetic changes that result in alterations of cell signalling and gene expression patterns that go on to establish the hallmarks of cancer.

The recent availability of a wealth of mutational data has led to the identification of recurrent mutations in multiple genes that read, write and modify chromatin marks, serving to highlight a direct link between cancer genetics and epigenetics [1] . In most of these cases though, we’ve observed changes in the enzymes that mediate epigenetic processes, ranging from mutation to amplification/overexpression/silencing.

Enzymes always have substrates, and perhaps not altogether surprisingly, a very recent discovery found mutations in the primary substrate of EZH2 [2], which trimethylates lysine 27 of the histone tail of histone H3, which represses gene expression, in paediatric glioma.

Histone 3 is one of the four core histones that comprise the nucleosomes round which DNA is wrapped, and in humans there are two genes that produce a variant of Histone 3, called Histone 3.3, and the authors of this paper found mutations that converted lysine 27 in one of those genes to a methionine. The mutations were heterozygous (the other H3k27 was intact) and so they consequently went on to investigate what the mutation did to H3k27 epigenetic modifications in neurospheres they established from patient tissue, compared to adult glioma and a normal neural cell line of the same differentiation status.

They found global reductions in dimethylated and trimethylated H3k27, and subsequent evidence that this was not attributable to changes in the levels of Ezh2 and Suz12, which are components of the PRC2 complex that mediates H3k27 methylation and silencing. Much to their surprise, they note other H3 marks, including H3k27 acetylation, didn’t differ significantly, which is in direct opposition to findings in [3], even if that was based on mutant Ezh2 as opposed to a mutant histone.

They then had to confirm the reduction was actually due to the mutated histone variant, and to do this they expressed the mutant gene in 293T cells at very low levels, and also established cell lines that contained another previously known histone 3 mutation, albeit not at lysine 27 (K27), and also a version of H3.1 that had a K27 mutation. They again found dramatic reductions in H3k27me3 levels, and were able to validate these results in human astrocyte cultures and in murine embryonic fibroblasts, suggesting a tissue independent mechanism of reduction of H3k27me3 levels was at work.

The introduction of H3.1 and H3.3 K27 mutants is associated with global reductions in H3k27me3 and me2 levels by western blotting and fluorescent microscopy. Transfection experiments into MEF’s reveals reductions associated solely with k27 mutants and also that this is gradual (bar graph and fluorescent micrographs at the bottom) .

They then carried out ChIP-seq and gene expression experiments to understand what altered levels of H3k27me3 did to the gene expression profiles of the mutant cell lines they had developed, and by doing ChIP-seq on both Ezh2 and H3k27me3, they observed that there were no significant reductions in Ezh2 peaks, but there were regions, compared to normal neural stem cells, where the local levels of H3k27me3 were higher than in neural stem cells.  They confirmed this using ChIP-qPCR on two other patient samples and also found that the peaks (indicating maximum binding/concentration of H3k27me3 and/or Ezh2 based on what they’d done ChIP on) identified a strong overlap in the mutant cells but not in normal neural stem cells, suggesting enrichment in those areas might be down to the mutant histones trapping Ezh2.

In order to confirm Ezh2 was actually co-localising more with the mutant histone than the wild-type histone, they pulled all three down with specific antibodies to check if more Ezh2 was pulled down with the mutant histone, and indeed, this is exactly what they found.

Genes that had gained H3k27me3 specifically in mutant glioma cell lines were found to be associated with H3k4me3 as well, marking what are commonly known as “bivalent” genes, that have both activating and repressive chromatin marks and are poised to swing either way, and are responsible for driving tissues to mature and differentiate  [4]. They found using RNA-seq that the expression of these genes was far lower than normal stem cells. They do admit that adult neuronal stem cells are far from the ideal controls to use, and that transfecting proper neurons with mutant histones would help consolidate findings further.

Finally, they conclude that the lack of histone mutations in adult glioma might have to do with the context in which paediatric and adult gliomas develop, with the former being in context of a developing brain very early in life.

So, yeah, I think that is a cool paper.


[1] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3396881/
2] http://genesdev.cshlp.org/content/27/9/985.long
3] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2926606/
[4] http://www.ncbi.nlm.nih.gov/pubmed/16630819

Ankur Chakravarthy.

CRISPR based genome editing – the future of molecular biology.

A vector for genome editing using CRISPR contains the aforementioned elementes, we put in an insert targeting our gene of interest using the enzyme BsblI, which cuts DNA at precise sites marked by the red arrows in the magnified area. We then put in two single stranded DNA oligos and insert them into the region, and when we express the vector in mammalian cells it produces Cas9 or its variants, needed for editing, as well as the RNA that guides Cas9 to our gene of interest.

It isn’t often that I make such seemingly outlandish claims in the title of a blog-post, but this particular technology, CRISPR-based genome editing, I believe deserves the hype.

CRISPR stands for Clustered Regularly Interspaced Short Palindromic Repeats – what that really means is a long sequence of DNA made of similar repeating sequences grouped together. They serve as part of an immune system that evolves to recognise viruses in bacteria and inactivate viral DNA.

Basically, the system cuts up viral DNA into fragments and puts those fragments into the middle of a CRISPR sequence, which we call a CRISPR array, and bacteria then begin to pump out large amounts of RNA encoding this sequence. Other enzymes in the CRISPR pathway go on to find viral DNA based on the CRISPR with the viral DNA insert and then trigger their degradation – which is neat.

What is neater is that we’re now able to use this in a modified form to edit genes in mammalian cells (including human cells). Instead of putting in a viral insert into the CRISPR array, we can put in a 30-base-pairs long stretch of DNA that targets our genes of interest and express RNA from this using some pretty simple genetic engineering (See figure for a description please).

When expressed in cells this can then result in the Cas9 inducing double strand breaks or inducing a nick in one strand, based on what variant of Cas9 we use, in the gene of interest. The really cool thing about this is we can then either wreck the gene using non-homologous end joining or spike in a mutant sequence using homologous recombination, which are two methods by which double strand breaks are repaired – in non homologous end joining the region with the double strand break is cut out and and the sequences on either side are brought together, resulting in the likely loss of the sequence containing the break. If we have a template with a mutation that varies near the site of the break and sequences of DNA that match the sequences on either side of the break well enough that gets incorporated in place of the broken regions instead (See figure 2)

Mammalian double-strand break (DSB) repair. DNA DSBs are predominantly repaired by either non-homologous end-joining (NHEJ) or homologous recombination (HR) [156]. NHEJ rejoins broken DNA ends, and often requires trimming of DNA before ligation can occur. This can lead to loss of genetic information. In NHEJ, the broken DNA ends are bound by the KU70/KU80 heterodimer, which orchestrates the activity of other repair factors and recruits the phosphatidylinositol 3-kinase DNA-PKcs/PRKDC. DNA-PKcs phosphorylates and activates additional repair proteins, including itself and the ARTEMIS/DCLRE1C nuclease. ARTEMIS and/or the heterotrimeric MRE11-RAD50-NBN complex are thought to process the DNA ends prior to ligation. The DNA ends are joined by the activity of polymerases and a ligase complex consisting of XRCC4, XLF/NHEJ1 and LIG4. In contrast to NHEJ, HR is an error-free repair pathway that utilizes a sister chromatid, present only in the S- or G2-cell cycle phase, as template to repair DSBs. HR is initiated by DNA end-resection, involving the MRE11-RAD50-NBN complex and several accessory factors including nucleases. The MRE11-RAD50-NBN complex also recruits the phosphatidylinositol 3-kinase ATM, which phosphorylates histone H2AX and many other proteins involved in repair and checkpoint signaling. Single-stranded DNA generated by DNA end-resection is bound by RPA, which is subsequently replaced by RAD51. RAD51 promotes the invasion of the single-stranded DNA to a homologous double-stranded DNA template, leading to synapsis, novel DNA synthesis, strand dissolution, and repair. Many more proteins are involved in both NHEJ and HR, which are not depicted here for clarity, as they are not referred to in the main text. For details, see recent reviews by Lieber [81] and San Filippo et al. [80].

 Lans et al. Epigenetics & Chromatin 2012 5:4   doi:10.1186/1756-8935-5-4
This technique has earned some rave reviews recently and one of the really cool things is you can express multiple tracrRNA (RNA containing the crispr array to guide Cas9) from a single vector, and it’s even been used to generate mice carrying multiple mutations in one step – which I think is remarkably cool ( http://www.sciencedirect.com/science/article/pii/S0092867413004674 )

Generating mutant versions of genes is one of the things I will be doing in the next few months of my PhD, and I must say these are very exciting times indeed to be a molecular biologist – there is something very exquisite about being able to not just turn off genes temporarily but to delete them or to edit their sequence permanently – it is the sort of stuff that enables us to ask what specific mutations in a gene mean for the development of cancer and I see it contributing to some very good research in the years to come.

Ankur “Exploreable” Chakravarthy.

Update – There’s a new paper out in Nature Biotechnology showing that conventional CRISPR-based systems (the version that induces double strand breaks) can result in promiscuous off-target mutagenesis because the DNA-gRNA coupling can tolerate mismatches. http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.2623.html?WT.mc_id=TWT_NatureBiotech

It will be of interest to see if nickase (single strand break inducing Cas9) variants of CRISPR result in higher fidelity because nicks should be repaired unless there is enough mutant template for that to be knocked in by recombination instead.