Category Archives: Science Primers

Introductions to areas of science and essays about scientific topics.

A very simple introduction to machine learning…

Machine learning quite simply refers to the use of algorithms to make predictions and solve problems involving properties of things and how they behave in response to an outcome.

The nature of problems

Properties themselves can take a variety of forms, sometimes they are categories – as in whether a person has a disease or has a higher education qualification, for instance; or they could be continuous – as in height or weight, as can outcomes – you could talk about whether someone is rich or poor based on having qualifications, or have a continuous outcome – such as overall income.

The outcomes that are subject to predictions are called response variables and measurements or groupings samples are put into are called features. Machine learning is in essence the practise of building models that can predict the response using rules that are fit on features.

Problems where the response variable consists of groups are called classification problems, because things are being classified. Where the outcome is continuous it is a regression problem. The kinds of classification or regression rules available depend on the nature of the data – some methods, like linear regression, can handle both categorical and continuous feature types, whereas some methods require things to have to be converted to continuous variables first; there simply is a bewildering array of models that can be applied.

The underlying assumption

The very effectiveness of machine learning is predicated on there being non-random patterns in the data however with respect to the measurements of the features being considered – if there is no link between the measurements of the features you’ve chosen to build a model and the problem you are trying to address you will see lousy performance.

Measuring performance and overfitting

Performance is often measured in terms of accuracy for classification problems – and if there are only two classes then measures like sensitivity and specificity are often used and in regression problems it is often measured by how much of the variation in the outcome is explained by a model and by how much of an error there is between the predicted outcome and the actual outcome.

It is important that performance be measured when overfitting is minimised. Overfitting refers to the fact that a model can fit your data too well, including inconsistencies, and learn to account for those present within datasets but not those outside. This can give you a spurious measure of how good your models are and make it look better than it actually is.

To account for this it is ideal to train on a proportion of the data and test using another proportion that the training process never included – either by leaving out a portion at the outset or using an altogether independent dataset; or using cross-validation, where a certain percent of samples is held out, the model is trained on the remaining data, and the held-out bit is used as a test sample, and this is done over and over again to get more accurate estimates of performance.

Where can I learn more about machine learning?

There is a very nice introductory LinkedIn tech talk here

And here is a full set of lectures that are a treat


I might do a set of posts looking at very particular applications in the next few months, until then, feel free to knock yourself out.

A Bird’s Eye View of Cancer Research…

I often get asked what cancer is when people find out I do cancer research for a living and the whys and the wherefores thereof inevitably follow in conversation. The complexities of the disease often mirror the complexities of the bodies they plague and therefore I decided it might be good to get a few things written down that people could be pointed to in an effort to make things a little more lucid and also to serve as a compilation of resources people could delve into if they so desired. So here it is…

Omnis cellula e cellula

All known living organisms are made of cells, in most cases, one cell on its own is an organism, obtaining food from the environment, breathing, growing, multiplying, and carrying out a whole assortment of other life processes that are of interest to biologists, but I digress.

Cancer is fundamentally a disease of organisms that are made of communities of cells – these cells too do all of the above, but some of the more complex varieties of multicellular organisms show specialisation – brains and lungs and guts and genitals – all for the same purposes, to obtain energy, to stay alive and to reproduce; not that there is some grand predisposition to doing this with foresight, merely that those that stumble onto what is passable in the examinations posed by the brutal machinations of nature get to go on.

Starting from one cell,multicellular organisms expand to have several tissues, a whole paraphernalia of different types of cells. Some cells die out, some stay on but don’t divide unless required to make more cells, some double in number until they stumble across a battery of conditions – other cells, other molecules that tell them to stop growing, or those that induce them to multiply in a frenzy but then stop when balance has been restored.

This behaviour of cells, and the organs they form, and then the organ systems that they comprise, and the organisms themselves, emerge from interactions of the environment, both consisting of organisms and other things that appear prosaic but are nonetheless significant influences on the fates of organisms with the delightfully messy workings of the molecules within cells – from the DNA that contains all the genetic material of a cell that simply must be passed on from generation to generation to facilitate survival of the species.

Dawkins in his magnum opus; “The Selfish Gene”, popularised the notion of organisms serving as mere means for the continued survival of genes, but I shall go one step further and put it to you it isn’t just individual genes that are selfish, it is entire genomes (a collection of all the DNA in a cell/organism). Promiscuity is favoured by nature when rivals are less promiscuous and when it is possible to brutally stifle threats posed by the competition, and cancer essentially is this being taken to a horrifying extreme when genomes find ways to be malignantly selfish at the expenses of the other cells that are also integral to the survival of the organism, bringing with it much suffering and often, death…for alas! Cancer cells lack the foresight to know that their fate is tied inextricably with that of their hosts.

Of DNA, RNA and proteins…

Genomes are made of DNA, and this is the medium by which information gets passed on from generation to generation except in the case of a few viruses, but we don’t really consider viruses to be living things because they cannot reproduce on their own. But in order to help them do this, cells orchestrate a variety of biochemical functions through the medium of RNA that is transcribed from DNA, which by itself can affect other RNA or for some genes gets turned into proteins. RNA, proteins and DNA then interact with the outside environment, other DNA and proteins to give rise to the chemistry that leads to the formation of cells and organisms. This complexity is described elsewhere on the blog [1] [2] [3]  and I will add links and notes at the end so you’ll be able to explore further if it interests you.

On the road towards understanding cancer…we discover cancer causing genes have cellular origins. 

Towards the dusk of the first decade of the 1900s, Peyton Rous made a discovery that would shape perspectives towards cancer research for a long long time to come – he found that cancers in chicken could be transmitted akin to other known viral diseases – and the causative agent he isolated came to be known as Rous Sarcoma Virus (RSV). This immediately led people to believe that cancer was essentially a viral disease, until Michael Bishop and Harold Varmus made a revolutionary discovery – that the genes that caused cancer had a cellular origin – at some point, the virus had by sheer accident incorporated this gene while it was packaging itself up, and had no problems transmitting it because it could spread before the chickens died of cancer.

They got their hands on two strains of RSV, one with cancer causing properties, and one without, and consequently one could identify the viral gene that caused cancer as the one that was present in the former but not the latter. They made a probe of a molecule called RNA, which I shall describe later, to look for similar genes in cells, and then they found that it bound to the DNA of cells from different species, and in every species it happened to be found in the same place in the genomes of the cells (genomes are made of DNA); that gene that had been rampaging through chickens when transmitted through the virus was also found in normal cells, and when switched on in very high levels due to lots of replicating viruses, caused cells to lose control of how they grew and to take part in a frenetic orgy of cellular division, it was, in effect, an oncogene.

Then they looked elsewhere for genes with similar properties and began to identify more and more, which in cancer cells had defects in DNA that affected the function of the proteins they produced and consequently acted to launch the cells into rapidly increasing their cell numbers. On the other hand, people began to find proteins which, when altered, lost the ability to stop cells from dividing uncontrolled, these genes came to be known as tumour suppressors [4] and one of the breakthroughs in learning about the function of these genes came was the elucidation of Knudson’s “Two-hit” hypothesis…

Knudson’s two hit theory of cancer causation came about after the discovery of tumour suppressors, specifically a gene called Rb1.

Looking at genes implicated as either oncogenes or tumour suppressors people began to stumble across changes in DNA sequence compared to the sequences in normal cells that affected the proteins the genes made. These mutations, as we describe them, established the foundations of cancer genetics and genomics.

The hallmarks of cancer.

As people found more and more oncogenes and tumour suppressors they wondered what cancer was, for here was a set of diseases stemming from various tissue types that all appeared to consist of cells that grew rapidly and in many cases spread through the body, albeit at different rates. A seminal paper by Hanahan and Weinberg defined the hallmarks of cancer – traits that any disease that qualifies as a cancer *must* possess…

The Hallmarks of Cancer and examples of potential therapeutic methods to target them. From Hanahan and Weinberg’s seminal paper ‘The Hallmarks of Cancer: The Next Generation’, link in references.

These include the ability to multiply abnormally without requiring external signals, and if external signals that stop normal cells from dividing are present, not pay heed to them, the failure to undergo apoptosis (a form of cell death), immortalisation, which is the ability to divide indefinitely in permissive conditions unlike normal cells, angiogenesis, where tumours induce the formation of blood vessels so they can establish a bloody supply and finally, and most critically, metastasis; the ability to spread through the body and colonise other sites in the body, which is incidentally what is thought to kill patients. 
There was a recent update to the classical set of hallmarks described about and three new hallmarks entered the fray – altered cell metabolism; changes in how cancer cells generate energy, inflammation; a molecular response to wounds and injuries in normal cells that goes wrong and promotes cancer metastasis and genome instability – being prone to mutations and other structural aberrations that generate the complexities of cancer genomes, which I describe later [5]…

More than just mutations, and how we came to find out…

People who had been studying families of proteins called transcription factors noticed that they could fundamentally alter the way the RNA of different genes in the genome was produced – they could alter when they were produced, and how much was produced. This could then affect other proteins that controlled how cells divided and interacted with the environment, in some cases, transcription factors were found to be mutated, such as p53, which is also known as the guardian of the cell because of its critical role in stopping errant cells from progressing to cancer [6], which explains why so many tumours modify p53 function so they can get round it,and with this came the idea that cancers would exhibit differences in gene expression relative to normal tissue and this would then contribute to the achievement of the hallmarks of cancer. See [7] for a description of microarrays and case studies of how looking at expression profiles helped understand cancers.

People also realised that you could get changes in expression patterns independent of transcription factors… Cancer cells are host to a wide variety of large, structural, distortions of the genome, and compared to normal cells, which have 46 chromosomes, cancer cells accumulate a variety of aberrations, ranging from small deletions and duplications of bits of chromosomes to gains and losses of whole chromosomes, or in some cases whole sets of chromosomes (The ubiquitously used cancer cell line, HeLa, has 88 chromosomes).

Changes in copy number of genes through these aberrations could also have effects on gene expression profiles. Finally, people who’d been studying epigenetic processes, which involve cells inheriting expression patterns for instance and then modifying them through modifications of DNA without changes in sequence, such as DNA methylation and Hydroxymethylation or the histones around which DNA is wound [8], and began to develop techniques to characterise epigenetic changes in tumours, and we therefore ended up in a situation where we had a whole panel of analyses we could do on tumours.

The Cancer Genome Atlas

While several other tumour sequencing projects were underway at the likes of the Wellcome Trust Sanger Institute, The Cancer Genome Atlas really set things going ahead with their project on the deadly brain cancer; Glioblastoma Multiforme, with sequencing for mutations, microarray analysis for Copy Number Variation and gene expression and microarray analysis for DNA methylation. They essentially found that there are four groups of glioblastomas based on patterns in gene expression and were able to correlate these with different cellular origins and different ways in which those expression profiles were achieved [9].

An external file that holds a picture, illustration, etc.Object name is nihms166306f2.jpg

Heatmap from one of the first TCGA papers that profiled glioblastoma. Four subtypes of glioma were described based on expression profiles. They were able to classify samples in an independent dataset and also versions of glioblastomas grown in mice (technically called xenografts).

They also found a subset of tumours that had very high levels of methylation driven by mutations in a gene called IDH1, which leads to too much methylation as a direct consequence, as demonstrated in a landmark paper in the journal Nature, where they put in mutant IDH1 into astrocytes and showed it induced high methylation levels like those seen in gliomas of that type…

Since then, the TCGA has published multiple studies on breast cancer, ovarian cancer, renal clear cell carcinoma,lung adenocarcinoma and colorectal cancer to name a few [10] and is collecting data for more tumour types, and from 12 datasets so far a pan-cancer analysis was released recently [11]. This has inspired the formation of the even more ambitious International Cancer Genome Sequencing Consortium which aims to widely expand the scope and the size of the type of approaches taken by the TCGA to profile the most striking molecular features of tumours and to then relate them to clinical information.

Things are not quite so simple – the problem of heterogeneity.

One would think that by understanding the make-up of tumours and figuring out what drives them, it would be easy to target altered genes, proteins and pathways with specific drugs to achieve cures, however, tumours have such unstable genomes and often contain so many cells by the time they’re detected that they are capable of a great degree of evolution, which may become reflected in resistance to the drugs used to target them. Indeed, studies starting two years ago began to show two features of tumours, firstly; they evolved through time, and therapy often had an influence on which dominant properties were seen in a patient’s disease as they relapsed. They either saw that the most dominant clone before treatment acquired new mutations and then evolved to resist therapy or a previously minor clone expanded.

At around the same time, evidence was found to strongly support the notion that cancers not only evolved in a linear manner but could evolve in parallel. Both those studies were carried out on leukaemias and the same was shown to be true of solid tumours. An analysis of a kidney tumour looking at multiple regions of the primary tumour and metastases (new outgrowths of the tumour derived from cells that had spread from the primary tumour) highlighted branched evolution and also observed that different parts of the primary tumour showed different patterns of gene expression associated with  survival [12].

Intratumour heterogeneity is extensive in kidney cancer and sequencing multiple biopsies enabled reconstruction of evolutionary patterns.

A recent study looking at multiple regions from a series of glioblastomas also found the same striking pattern; there was evidence that all four of the glioma expression subtypes discovered by the TCGA were found in that tumour [13]. These studies have made one thing abundantly clear; that understanding and classifying tumours into subgroups may be of limited utility when the range of evolutionary invention achieved by tumours permits them to acquire different patterns of alterations for the most part in response to therapy. We will learn a lot, indubitably, from large scale analyses of the kinds already being carried out, and only recently we began to uncover what processes might contribute to the formation of mutations and how to find signatures for mutations from all that data being generated by sequencing tumour after tumour after tumour, so chances are we will have a comprehensive collection of molecular profiles to tuck into, soon, on an unprecedented scale. 

Finding chinks and causes for optimism…

Another way weaknesses might be found in tumours involves approaches based on what we call synthetic lethality and collateral lethality. Synthetic lethality is when, because a gene is mutated or altered otherwise, another gene becomes essential while it was not essential if the other gene was intact. A classic example of this is PARP inhibition. PARP is an enzyme that repairs breaks in DNA, but can be dispensed with if the BRCA genes are intact. A significant proportion of Ovarian and Breast cancers, especially those that run in families, show a characteristic loss of BRCA1 or BRCA2, and this makes them especially vulnerable to the blockade of PARP.

Explanation of synthetic lethality to PARP inhibitors. People with one copy of BRCA lost in normal cells can present with tumours that have lost both. Normal cells have BRCA to compensate for the loss of the PARP gene but cancer cells don’t, and blocking PARP can kill them while sparing normal cells as a consequence.

One way of finding synthetic lethal interactions is to combine knockdown experiments, where little RNA sequences are introduced into cells to block and degrade the RNA of target genes and not permit protein to be formed from that RNA and combining that information with mutation, expression and other “-omics” data as we call them. Even without -omics data in attendance, knockdown experiments themselves can reveal certain genes that come to be required specifically in cancer, and in that way we can identify targets for chemists to then develop specific drugs against. Of course, the other approach would be to target things that appear to transcend cancers, and examples include targeting a protein called CD44 that cancer cells appear to universally express to avoid being destroyed by the immune system or to use drugs that target fundamentally common features of tumours such as DNA methylation [14].

Finally, knowledge of tumour evolution itself may be employed to find weaknesses, as described elsewhere on the blog, the mechanism of resistance may by itself predispose tumours to weaknesses, and this could be as simple as withdrawing the drug (letting go of the brakes suddenly when the driver’s still got the foot on the pedal to compensate for jammed brakes till that point), as discussed here [15].

Dealing with heterogeneity may be rendered possible by bypassing resistance mechanisms where cancers find alternate pathways to get to where they need to be to survive and expand by hitting points that are altered in cancer, but have no known alternatives for tumours to route their functioning through. Indeed, this has been shown experimentally by targeting Myc, which when activated is a potent oncogene or by targeting BIM in glioblastoma where tumour cells evolve resistance by finding other ways to prevent BIM from being turned on when the pathway they usually use to do this is blocked with drugs.

For all the complexities of cancer, there might still be ways in which we will figure out how to target and attack them successfully, and one of the keys to that I think the sense of scale and community that cancer research projects these days are marked by. I think the enduring impact of the Human Genome Project [16] was not just the sequencing of the human genome, but ensuring that data was openly accessible to anybody who wanted to use it for their research or look at it for general interest; the TCGA and ICGC have put in place similar policies to govern how their data is accessed, and by allowing researchers to integrate the research they do locally with data that wouldn’t have been generated without big projects like them it is possible to achieve so much more. And maybe we’ll figure out what cancer is, and then determine what they can and can’t be, someday, soon… 

Links to posts on the workings of DNA, RNA and proteins. [1]  (Central Dogma of Molecular Biology)

[2] (Transcription)

[3] (Includes descriptions of microRNAs, RNA species that can block other RNAs from being converted to protein as per the central dogma, thereby affecting gene expression).

Oncogenes, Tumour Suppressors and the Hallmarks of Cancer
[4] (Contains an exposition of the two-hit theory of cancer causation and links to further material on the topic, as well as nuance about how some tumour suppressors behave differently)

[5] (Updated version of the classic paper; The Hallmarks of Cancer by Hanahan and Weinberg. May be paywalled).

[6] (Explains how p53 functions in different contexts, basically)

Understanding Cancers and large scale analyses
[7] (Nature Scitable article on gene expression and cancer).

[8] (An introduction to epigenetic processes).

[9] (The TCGA glioblastoma paper that documented four expression subtypes).

[10] (a list of papers from The Cancer Genome Atlas, most papers are openly accessible and readable should you want to, but they’re science heavy and really written for people with an in-depth understanding of cancer research. You may be able to search for materials and commentary related to them to get a more popular perspective, also look for TCGA press releases on the same site).

[11] (Blogpost on the Nature blogging network containing links to further material, commentary and analysis papers from the TCGA pan-cancer project.)

Heterogeneity, synthetic lethality and the future

[12] (Blogpost on intratumour heterogeneity)

[13] (Paper documenting intratumour heterogeneity in glioblastoma multiforme)

[14] (Blogpost discussing broad spectrum effects of low doses of the DNA methylation blocker decitabine) .

[15] (Blogpost exploring reports of how cancer cells evolved resistance to a drug and how this could be used to target the tumour).

[16] (Long blogpost on the Human Genome Project, written by yours truly and therefore recommended if you are a glutton for a few thousand words more having worked your way through to the end of the article).

That’s all from me now!


CRISPR based genome editing – the future of molecular biology.

A vector for genome editing using CRISPR contains the aforementioned elementes, we put in an insert targeting our gene of interest using the enzyme BsblI, which cuts DNA at precise sites marked by the red arrows in the magnified area. We then put in two single stranded DNA oligos and insert them into the region, and when we express the vector in mammalian cells it produces Cas9 or its variants, needed for editing, as well as the RNA that guides Cas9 to our gene of interest.

It isn’t often that I make such seemingly outlandish claims in the title of a blog-post, but this particular technology, CRISPR-based genome editing, I believe deserves the hype.

CRISPR stands for Clustered Regularly Interspaced Short Palindromic Repeats – what that really means is a long sequence of DNA made of similar repeating sequences grouped together. They serve as part of an immune system that evolves to recognise viruses in bacteria and inactivate viral DNA.

Basically, the system cuts up viral DNA into fragments and puts those fragments into the middle of a CRISPR sequence, which we call a CRISPR array, and bacteria then begin to pump out large amounts of RNA encoding this sequence. Other enzymes in the CRISPR pathway go on to find viral DNA based on the CRISPR with the viral DNA insert and then trigger their degradation – which is neat.

What is neater is that we’re now able to use this in a modified form to edit genes in mammalian cells (including human cells). Instead of putting in a viral insert into the CRISPR array, we can put in a 30-base-pairs long stretch of DNA that targets our genes of interest and express RNA from this using some pretty simple genetic engineering (See figure for a description please).

When expressed in cells this can then result in the Cas9 inducing double strand breaks or inducing a nick in one strand, based on what variant of Cas9 we use, in the gene of interest. The really cool thing about this is we can then either wreck the gene using non-homologous end joining or spike in a mutant sequence using homologous recombination, which are two methods by which double strand breaks are repaired – in non homologous end joining the region with the double strand break is cut out and and the sequences on either side are brought together, resulting in the likely loss of the sequence containing the break. If we have a template with a mutation that varies near the site of the break and sequences of DNA that match the sequences on either side of the break well enough that gets incorporated in place of the broken regions instead (See figure 2)

Mammalian double-strand break (DSB) repair. DNA DSBs are predominantly repaired by either non-homologous end-joining (NHEJ) or homologous recombination (HR) [156]. NHEJ rejoins broken DNA ends, and often requires trimming of DNA before ligation can occur. This can lead to loss of genetic information. In NHEJ, the broken DNA ends are bound by the KU70/KU80 heterodimer, which orchestrates the activity of other repair factors and recruits the phosphatidylinositol 3-kinase DNA-PKcs/PRKDC. DNA-PKcs phosphorylates and activates additional repair proteins, including itself and the ARTEMIS/DCLRE1C nuclease. ARTEMIS and/or the heterotrimeric MRE11-RAD50-NBN complex are thought to process the DNA ends prior to ligation. The DNA ends are joined by the activity of polymerases and a ligase complex consisting of XRCC4, XLF/NHEJ1 and LIG4. In contrast to NHEJ, HR is an error-free repair pathway that utilizes a sister chromatid, present only in the S- or G2-cell cycle phase, as template to repair DSBs. HR is initiated by DNA end-resection, involving the MRE11-RAD50-NBN complex and several accessory factors including nucleases. The MRE11-RAD50-NBN complex also recruits the phosphatidylinositol 3-kinase ATM, which phosphorylates histone H2AX and many other proteins involved in repair and checkpoint signaling. Single-stranded DNA generated by DNA end-resection is bound by RPA, which is subsequently replaced by RAD51. RAD51 promotes the invasion of the single-stranded DNA to a homologous double-stranded DNA template, leading to synapsis, novel DNA synthesis, strand dissolution, and repair. Many more proteins are involved in both NHEJ and HR, which are not depicted here for clarity, as they are not referred to in the main text. For details, see recent reviews by Lieber [81] and San Filippo et al. [80].

 Lans et al. Epigenetics & Chromatin 2012 5:4   doi:10.1186/1756-8935-5-4
This technique has earned some rave reviews recently and one of the really cool things is you can express multiple tracrRNA (RNA containing the crispr array to guide Cas9) from a single vector, and it’s even been used to generate mice carrying multiple mutations in one step – which I think is remarkably cool ( )

Generating mutant versions of genes is one of the things I will be doing in the next few months of my PhD, and I must say these are very exciting times indeed to be a molecular biologist – there is something very exquisite about being able to not just turn off genes temporarily but to delete them or to edit their sequence permanently – it is the sort of stuff that enables us to ask what specific mutations in a gene mean for the development of cancer and I see it contributing to some very good research in the years to come.

Ankur “Exploreable” Chakravarthy.

Update – There’s a new paper out in Nature Biotechnology showing that conventional CRISPR-based systems (the version that induces double strand breaks) can result in promiscuous off-target mutagenesis because the DNA-gRNA coupling can tolerate mismatches.

It will be of interest to see if nickase (single strand break inducing Cas9) variants of CRISPR result in higher fidelity because nicks should be repaired unless there is enough mutant template for that to be knocked in by recombination instead.

A window into acquired resistance to targeted therapies – through the eyes of a MEK inhibitor.


Cancer cells, like all other cells in multicellular organisms, are often dependent on inputs from the environment outside the cell for signals that drive growth and survival, among other things. Since abnormal growth and a failure to die like normal cells do are hallmarks of cancer, it makes sense to try and block signalling pathways that contribute to these features using drugs specific to the proteins in these pathways. This is the fundamental premise behind targeted therapies.

The work I’m going to focus on in this article happens to do with inhibition of MEK, which connects external signals to a set of transcription factors that promote the expression of genes related to cell growth and survival.

The Map-kinase signalling pathway. External growth factor receptor kinases are coupled to transcription of genes promoting cell survival and proliferation by means of the Ras-Raf-MEK-ERK signalling cascade. Kinases are proteins that add an inorganic phosphate group to other proteins or in some cases, lipids. 

This pathway is of interest because b-Raf is found to have a very particular mutation, V600E, in a majority of malignant melanomas – something so characteristic of the disease that there is a drug that specifically targets the mutant version of this protein, but responses are often short-lived because cells learn to get round the blockade of the protein. Interestingly, this pathway is also involved in colorectal cancer, often involving the same mutations or a mutation in the protein that comes before b-Raf, called k-ras, which is a very potent oncogene.

Simon Cook and his group at the Babraham Institute tried to figure out how cancer cells that depended on this pathway would come to acquire resistance to a MEK inhibitor (which is downstream of B-raf and k-ras). To do this, they took two colorectal cancer cell lines with a b-Raf mutation and two with a k-ras mutation and cultured them in the presence of ever increasing concentrations of a MEK inhibitor : AZD6244. They fundamentally found that resistant cells seemed to acquire amplifications in the number of copies of b-Raf or k-Ras, thus serving to maintain the same signal intensity downstream of MEK in the presence of the drug as they would have in the absence thereof.

Resistance to a MEK inhibitor in a B-raf mutant cell line is explained by amplification of B-raf at the DNA level, which is reflected at the protein level and increased activation of ERK1 (P-ERK1/2). The graph at the bottom right shows that knocking down b-Raf levels using RNA interference reverses resistance to the MEK inhibitor.

They also found that in Ras mutant cell lines, amplification of k-Ras was to blame for the phenotype. The problem with that is Ras is a pain in the rear to develop drugs against, but with b-Raf there are inhibitors available and it should be possible to resensitize resistant cells to the MEK inhibitor by hitting it at both points in the pathway.

The trouble with this MEK inhibitor is that it leads to cell cycle arrest being the major response as opposed to cell death, so it would be sensible to see if, for already dampened levels of ERK activation through MEK inhibition, it should be possible to increase the proportion of cancer cells that actually fuck off and die instead of just waiting for the drug to wear off.

Apoptosis is a process mediated by a combination of pro-apoptotic proteins and anti-apoptotic proteins and when a threshold is reached in terms of dominance of pro-apoptotic proteins it sets of a cascade of signalling events that leads to the destruction of the cells. One of the reasons cell cycle arrest is favoured over cell death is the hyperactivity of anti-apoptotic proteins such as Bcl2. There is a protein called BH3 which can bind to and disable Bcl2, rendering cells much more susceptible to apoptosis if the MEK pathway is hit, so they looked at combining a drug that mimics the structure of BH3 with the MEK inhibitor and promptly found that apoptosis was greatly enhanced and the emergence of resistance delayed .

Finally, of course, it is worth considering the fact that in malignant melanoma, drug holidays, where treatment is not administered for a while, has been shown to reverse resistance well in line with what we’d expect – the overexpression of oncogenes is associated with oncogene induced senescence and what might maintain the activity of the pathway in the presence of the drug might activate the pathway too much when the drug is taken away – like how a car might crash if you suddenly took the brakes off while the pedal was still pressed to the same extent as when driving with the brakes on. This means that drug resistance is favoured only because of a selective pressure imposed by the drug and might actually be a detriment in competition with drug sensitive cells when the drug is absent. Take the drug away when the cells that are most dependent form the bulk of the tumour, and they really do crash dramatically

That’s all from me until next time.


PS – additional papers…



Why torties are…well…torties.

Hi there,

Tortoiseshell cats are quite unique and there is an epigenetic reason for the characteristic speckling and mottling that mark the breed. Tortoiseshell cats also happen to be exclusively female due to the nature of the epigenetic process at work.

The coat colours of cats are determined by genes localised to the X-chromosome, and just like humans, female cats have two X-chromosomes and male cats have one. The neat trick is here – femal torties express different copies of the coat-colour gene in different cells, randomly – a hallmark of a process called X-inactivation.


X-inactivation is actually extremely important in order to prevent an overexpression of genes in female animals – in order to maintain steady levels of gene expression similar to that seen in males you need to express only one copy of the X chromosome, despite having two. The process of X-chromosome inactivation is how cells achieve this.

The process itself involves the production of long non-coding RNAs from a gene cluster on X chromosomes that we call the X-inactivation centre, the most famous of which is Xist. Xist spreads along the X chromosome that inactivates it and then recruits other factors that lock in epigenetic modifications such as DNA methylation to silence expression of that particular X chromosome, permanently. X inactivation does not happen in males though because to X-inactivation centers have to be points where X chromosomes “kiss” to trigger the cascade of changes that shuts all but one of them down. Here’s another cool thing – If you read Xist from the opposite side it encodes a long non-coding RNA called Tsix (Clever name – see if you can figure it out) – and if it is expressed, Xist isn’t. You can induce inactivation in other chromosomes by putting in copies of a chromosome that have been spiked with the X-inactivation centre.

Here’s a cool video explaining the process…

It is all very very intriguing because the X-inactivating centre is composed entirely of non-coding RNAs as far as we know it.  There’s more to biology than proteins (prejudice declared – I don’t like protein work much). You can read a lot of the backstory and study details at Scitable here.


Oh, trouble, stemness is thy name.

OK, just a few days ago, three major papers turned up that put a previously controversial idea about the way tumours are organised on a formidable footing, at least in cases of melanomas, gliomas and colorectal tumours in a mouse model.

We have been looking at how many cells are required to transmit a tumour from one mouse to another isogenic (genetically identical) mouse for a long time, and experiments in the fifties and early sixties led to the observation that you had to inject a certain number of cells to induce a tumour in half the injected animals, and this was way greater than one cell. However, there was also the fact that cancers were known to be clonal (originating from one cell) to contend with.

Putting the two ideas together led to the simple conclusion that not all transplanted tumour cells could induce the disease – you had to introduce so many tumour cells that one of them would be a cancer “stem cell” and could induce the tumour. This led to the formulation of the Cancer Stem Cell Hypothesis – that within a tumour there would be a small subpopulation of cells that could, if not eliminated completely, led to a return of the disease. This concept had become enshrined in the principles of radiotherapy but of course questions were raised in the molecular biology community, especially with respect to their presence in solid tumours. Skeptics took it upon themselves to point out that one could simply be looking at immune rejection or loss of viability as the reason why not all cells successfully transmitted the tumours.

The onus, then, was on proponents of the CSC hypothesis to show that there was such a thing as a resident subpopulation of cancer stem cells that led to recurrences. That evidence, it would appear, has finally come up, and in exquisite detail.

Cedric Blanplain and his group used a model of chemical carcinogenesis in mice where a chemical called DMBA is used to initiate a skin tumour and a substance called TPA is used to encourage the growth of the tumour in the site treated with DMBA. Eventually, that process results in a benign tumour called a papilloma which, after persistent TPA treatment, turns malignant. Papillomas are composed of a mass of terminally differentiated cells and an expanding population of undifferentiated, basal-epithelial cells. The number of the former stays fairly constant throughout but the latter expands.

Design of the TAM induced construct used to visualise clonal expansion. As you can see, the expression of the Yellow Fluorescent Protein is conditional to treatment with Tamoxifen, and individual basal cells can easily be identified. From Blanplain et al (See text for link)

To address whether there was a distinct CSC population that maintained the tumour, they used a very clever genetic engineering method to label individual cells – a construct that expressed Yellow Fluorescent Protein when the cells carrying them were exposed to Tamoxifen. They found that some of the labelled cells in the basal compartment expanded and also formed the non-basal differentiated structures in the tumour. In effect, they demonstrated that a subpopulation of cells could give rise to the entire tissue heterogeneity of tissue in the tumour – there was clear evidence for stemness in a population of tumour cells. They additionally found that the stem cell compartment in the tumours gave rise to progenitors capable of multiple fates and finally differentiated cells.
They were able to use microscopy to quantitatively evaluate what was happening and found that CSCs were dividing twice a day – extremely quickly compared to progenitors, which were dividing once in two days. Also, following Tamoxifen treatment, after three weeks only 20% of the originally YFP expressing cells were still expressing it (The basal ones were selectively labelled initially by using a basal specific promoter to drive gene expression)

Then of course we’ve had two more major studies. Parada and his group, again publishing in Nature showed that we had a similar thing going on with glioblastoma multforme, which has an abysmal prognosis. His group found that chemotherapy could wipe out most of the non-stem compartment but there was always a recurrence driven by a stem-like compartment of cells which escaped the effects of said chemotherapy. I find the work in question all the more intriguing because they started off with the hypothesis that these tumours were actually driven by modified versions of human adult neural stem cells in the Subventricular zone of the brain.

That hypothesis was of course well grounded in evidence – for they had identified what combination of mutations always resulted in tumours (using a conditional knock-out of genes that were known to be essential mutations in glioblastoma), and had been able to track those initiating cells to that location. Exploiting this, they used a transgenic construct encoding Green Fluorescent Protein and a Thymidylate Kinase (TK) protein driven by a Nestin promoter,which is active in adult neural stem cells (but not differentiated ones). TK expressing cells, in the presence of Ganciclovir, die if they are cycling, and this allows them to be ablated.

Telozolomide treatment kills proliferating cells, and this leads to subsequent repopulation, which is mediated by previously quiescent CSCs kicking off into division. (e) Reference – Parada et al, see text for link.

When they treated glioblastomas in these mice, they found that treatment with Telozolomide, an agent used to kill glioblastoma cells in clinical practise, was able to kill a mass of cells that was proliferating rapidly. Combining this with Ganciclovir resulted in enhanced cell kill by eliminating some of the stem cell compartment as well, but recurrence, they postulated, was inevitable because they found most of these cells to be quiescent (resting) and thus immune to drugs that hit proliferating cells (like TMZ). Having eliminated proliferating cells, though, the question was what would drive repopulation – would this be a random occurrence with any of the remaining cells kicking off? Or would the postulated CSCs be responsible? The reasoning they used to work this out was that the uptake of CldU and IdU, which are uracil analogues taken up only by proliferative cells, would be extraordinarily biased towards the GFP expressing CSC compartment if that were the source of repopulation following a pulse of TMZ treatment, and they promptly found it was indeed the case.

Treatment of Mice carrying the Nestin-TK-GFP constructs with Ganciclovir results in dramatically improved survival that is not seen in mice not carrying the construct when treated with the drug, or mice carrying the construct not treated with the drug. In some mice surviving after 10 weeks of treatment tumours have shrunk into low-grade lesions following the elimination of the CSC compartment.

And then came their piece de resistance’ – they showed that the only way to ensure long term survival in glioblastoma-afflicted mice was to eliminate the stem cell compartment altogether with ganciclovir treatment, and this led to massive survival benefits in GCV treated mice, and in some cases their brains only had low-grade lesions, benign vestiges of originally aggressive, malignant disease. A clincher if ever there was one for the CSC hypothesis.

The third paper, which I will not bore you with now, can be read here and features Hans Clevers and his group’s work showing that colorectal adenomas also depend on a CSC population that phenocopies normal intestinal crypt stem cells in terms of known surface markers. The great similarity of these CSCs with their normal adult counterparts worries me greatly, for it opens up the possibility that CSCs may be normal adult stem cells going rogue following a series of hits. Any therapy that is designed to hit these must also take care not to hit the normal stem cell compartments in tissues that are exposed.

Of course, there are still other solid tumours for which the CSC model has to be verified, but given that we have exquisite approaches like those described above to tease them out evidence either way shouldn’t be long coming.

What this means for therapy…
We will clearly have to make eliminating cancer stem cells a priority in dealing with cancer, while radiotherapy has the inherent potential to eliminate these most chemotherapy probably does not, and even with the advent of targeted therapies we will need to ensure that we don’t leave stem cells behind.
We have had some thinking heading this way already with the understanding that stem-like cells, as CSCs are otherwise known, may be sensitive to inhibitors of the Sonic hedgehog pathway which is hyperactivated in them, and there is some degree of preclinical evidence showing that this may be worthwhile investigating…

That is all from me this time round.


Bisulfite Sequencing – Interrogating CpG Methylation at single base pair resolution.

Hi all, just a quick post here.

In previous posts, I have discussed methods of probing DNA methylation with methylation arrays and MeDIP-Chip. Those methods are genome wide but not quite whole genome. The highest resolution method available (and not surprisingly the most expensive and lowest-throughput) is a whole-genome method called bisulfite sequencing.

Bisulfite sequencing is used to map out DNA methylation at single base pair resolution. The technical challenge here is rather apparent – one must be able to differentiate unmethylated DNA from methylated DNA.

To do this, a process called Bisulfite DNA conversion is utilized. DNA is treated with Sodium or potassium bisulfite, this has the effect of converting unmethylated cytosines to uracil, while methylated cytosines (5-methylcytosines) are left intact. During sequencing reactions, uracils are converted to thymine. This makes identifying methylation changes relatively straightforward using bog-standard bioinformatics.

In effect, we are looking at which particular cytosines have been converted to thymines through bisulfite-induced uracil intermediates. To do this, two versions of a reference genome are digitally created – one with all cytosines converted to thymines, one with all guanines changed to adenines and the resulting sequence is aligned to both, reads that are then uniquely aligned have their previously removed cytosines replaced  and and those cytosines that have been changed to thymine are identified, and these indicate unmethylated cytosines, the rest of the cytosine bases that were subjected to conversion are methylated.


Representation of methylation variation as identified by bisulfite sequencing a given locus. (Reference – )

The costs of this are extremely high, and one may query if whole genome methods are necessary when only a small number of genes are differentially expressed; I, for one, am of the opinion that it may be wise to couple analysis of differentially expressed genes with amplification of targeted regions and bisulfite sequencing on a targeted basis instead of a generalised whole genome search.

Further reading.

[1] Targeted bisulfite sequencing reveals changes in DNA methylation associated with nuclear reprogramming. – example of targeted bisulfite sequencing.

[2] De novo quantitative bisulfite sequencing using the pyrosequencing technology. – Paper documenting the combination of bisulfite conversion with pyrosequencing, which is ideal for short read length sequencing. (cost per base is high)

[3] A good review of the technique, protocols and associated challenges may be found here –

That’s all from me for this post, a rather short one, I’m afraid.