On the difficulty of curing cancers.

In my experience, it is common on the internet to find conspiracies abound surrounding seemingly suppressed cures for cancer, and these usually take the forms of “there is money for big pharma in suppressing cures” or “there is too much money in cancer research to permit the development of cures”. There are not only multiple problems with the notions above, but also with the very notion of a single cure for cancer.

To be fair, we do have an excellent chance at curing a significant proportion of cancers, and these are usually dealt with through surgery, or radiotherapy, and in at least some conceptions, they would qualify as single cures in cancer. It is mostly in the setting of metastatic disease that challenges remain, because chemotherapy, while it is curative for some cancer patients (or many in the setting of certain cancer types), may be at best palliative for others; and it is in this area that we have had a string of disappointing failures in cancer therapy.

What I am going to do next is examine some of the claims above in more detail and introduce (some of) the relevant findings from the field.

The claim that “there is money or big pharma in suppressing cures”.

Now quite simply, the problems we have with cancer drugs, including old ones that are off-patent, and many of the new targeted ones, is that they all seem to work for a while and eventually fail for biological reasons that I will explain later. Some of these drugs also tend to not be approved by NICE in the UK, for instance, because the benefit of the limited time they work for is deemed as not justifying the cost to the healthcare system. This would suggest that trying to sell stuff that isn’t effective is a massive disincentive to pharma companies.

Second, dead patients don’t make profits for the pharma companies involved – at the very least that would suggest that pharma companies depend on drugs that kept patients alive for long periods of time. That there are such problems finding drugs that can reliably do this is actually indicative of an underlying biological problem that also underlies why the notion of a single cure outside surgery/radiotherapy is very flawed.

Third, there are instances where some cancers can be effectively targeted with good probability of a cure. Even if these oft-curative drugs were developed by academia, pharma companies have been more than willing to invest in them and commercialise the platform – suggesting that should the option arise, even big pharmaceutical companies will invest in cures, not least because it significantly increases the chance the product will be profitable upon approval by socialised healthcare systems going by the increased benefit that cures provide relative to costs, which means these drugs can expect to be approved even if the price-tag is significantly higher compared to a drug that is cheaper but is relatively that much more ineffective.


An example for this is the development of CAR T-cell therapy for some B-cell leukaemias and lymphomas, wherein this therapy effectively eliminates the cancer in 50% of the patient population, which consists of those that stopped responding to or failed to respond to every prior therapy ( https://www.cancer.gov/news-events/cancer-currents-blog/2017/yescarta-fda-lymphoma ).
The problem with the notion that there is a singular cure for cancer.

First, it is imperative to know what cancer is – far from being a single disease, it is a collection of diseases that all share similar hallmarks, namely –  the ability to multiply abnormally without requiring external signals, and if external signals that stop normal cells from dividing are present, not pay heed to them, the failure to undergo apoptosis (a form of cell death), immortalisation, which is the ability to divide indefinitely in permissive conditions unlike normal cells, angiogenesis, where tumours induce the formation of blood vessels so they can establish a bloody supply and finally, and most critically, metastasis; the ability to spread through the body and colonise other sites in the body, which is incidentally what is thought to kill patients.

Recent additions to this set include altered cell metabolism; changes in how cancer cells generate energy, inflammation; a molecular response to wounds and injuries in normal cells that goes wrong and promotes cancer metastasis and genome instability – being prone to mutations and other structural aberrations that generate the complexities of cancer genomes. (Reference – http://www.cell.com/cell/fulltext/S0092-8674(11)00127-9 )

More to the point, these hallmarks are acquired in different cancer patients through the development of distinct mutations, or epigenetic changes, to the point that two people’s cancers may be entirely different, as well described by the genomics profiling efforts of The Cancer Genome Atlas (List of TCGA publications here – https://cancergenome.nih.gov/publications ) . This is akin to how two different infections may be caused by two different germs altogether, and what works to treat one may not work for another. This is fundamentally incompatible with one cure for all cancers, just like how you cannot have one drug for all microbial infections.

To summarise –
*because* there are many ways to make a cancer, and *because* there is much variation within individual cancers and across individual cancers, it is nigh on impossible to have a single cure for cancer; outside surgery and radiotherapy. As a case in point, even the development of cancer vaccines is now punctuated by seeking to exploit this in order to develop vaccines individually tailored to the antigenic mutations present in a patient (https://www.nature.com/articles/nature23093)


The problem of intratumour heterogeneity

Differences that exist between cancers in different patients are just one part of the problem when it comes to the notion of a singular cure for cancer. The other problem is intratumour heterogeneity, or differences within the cells that are in a tumour. Due to the inherent genomic instability of cancers, mutations and other alterations are generated throughout the growth of a tumour from the original cancer cell in the cells descended from it. As a consequence, many of these genomic alterations are found only within subsets of cells within a tumour.

The implications of this is two-fold. First, if a cancer is treated using a drug that only targets an alteration found within a proportion of cancer cells, it will be ineffective against the rest of the tumour, and as sensitive cells are eliminated, and resistant cells take over the tumour, the therapy will fail and a tumour will expand. One of the major issues we have currently with many of our drugs is that they are very good at blocking their targets, but not all cancer cells can be targeted.

Secondly, there is the problem of resistance evolving – if there is a small proportion of cells within the tumour that carry other alterations that confer resistance to the drug, they take over the tumour as the drugs in question kill off the sensitive cells, and therefore, this produces a tumour that is resistant to the drug in the end (and this is the major problem we have with targeted therapies failing). Indeed, from McGranahan and Swanton , I quote ( http://www.cell.com/cell/pdf/S0092-8674(17)30066-1.pdf ).

“During the selection pressures of targeted therapies, parallel evolution driving polyclonal-acquired drug resistance has been frequently documented. For example, in 13 out of 16 patients with BRAF mutant melanomas with resistance to RAF inhibition, multiple parallel mechanisms of resistance were observed (Shi et al., 2014). Likewise, following EGFR monoclonal antibody therapy, multiple KRAS mutations have been observed in circulating free DNA (Bettegowda et al., 2014; Misale et al., 2012). One patient acquired a codon 12 KRAS, codon 61 KRAS, and a codon 61 NRAS mutation together with a BRAF codon 600 mutation following acquired resistance to EGFR monoclonal antibody therapy that were not detectable prior to therapy (Bettegowda et al., 2014). Following acquired resistance to a PI3K alpha inhibitor, Juric et al. (2015) found parallel evolution of six distinct PTEN aberrations across 10 metastatic sites on the background of a clonal single copy PTEN deletion, reminiscent of second hit tumor suppressor gene loss following an early clonal event witnessed in So  breast and renal cancers”.


This fundamentally means that any drug is not always likely to be effective *enough* against individual cancers to completely eradicate it, and that is the challenge we are constantly up-against.


The Illusion of Merit and the Privilege of Caste – Why Affirmative Action is Justified.

Content note – discrimination, especially caste-based.

A barebones introduction to the caste system

India has had a long history of casteist oppression, and it looks like the practise is still rampant, with nearly 1 in 4 people across the country admitting to practising untouchability.

To anyone who is not familiar with the caste system; caste is an oppressive social force that imposes a hereditary social and professional hierarchy; with a caste traditionally associated with priests (called Brahmins)  being top of the social pecking order, and a group that were (and importantly, still are) considered untouchable bearing the brunt of oppression from those higher up the caste hierarchy.

The correct label to use for the group that is oppressed by untouchability is Dalits , for that is the identity they have themselves assumed, and not another traditionally used derogatory term imposed on them by the upper caste-hegemony, led by the Brahmins.

I believe I have a duty to talk about this by virtue of the fact that I had an upper caste-upbringing. The people in my family who continue to be religious still retain their caste identity. It is something that even I, being an atheist, will never be able to get rid of just because my name is so stereotypically upper-caste. This relationship between name, caste identity and privilege matters because of reasons that become apparent in subsequent paragraphs.

Before we move on – a bit about privilege.

The meaning of privilege, and common misconceptions

If you are already aware of the concept of privilege – feel free to skip this section.

Nowt gets people tangled up as much as the notion of privilege. Privilege is in effect the absence of a certain kind of oppression; so if one is rich they have class privilege in that they do not have to face issues that come from being poor, if one is white in a society where white people occupy positions of institutional power , they have white privilege in that they do not have to struggle with issues that come from being non-white.

Note that there is a reason I just brought up positions of power – being a majority in terms of population does not always guarantee being on the right side of the power balance – one only needs to look at the history of Apartheid in South Africa. More pertinently to this article, Brahmins are overrepresented in positions of power and are in terms of the population, a small fraction.

. Brahmins comprise a minority that nonetheless comes out top in terms of caste privilege.

From Outlook India. Accessed here http://www.outlookindia.com/article/brahmins-in-india/234783 – figures represent percentages of each state’s population.

There are many different forms of privileges that intersect, and groups that are defined by the intersection of multiple oppressed categories on average experience more discrimination as a consequence.

For example, a group of people who are black, poor, disabled, are women, and are transgender collectively on average are worse off because of the lack of white, male, cis, able-bodied and class privileges. Again, there is a very important reason for talking about groups – not every person within an oppressed group is going to experience the same detrimental outcomes.

Not every person outside that group is going to experience better outcomes relative to those within that group. However, when the groups are compared, one group will have worse outcomes compared to another. This is just like how, for example, in my field you can have two different types of cancer, with one type that responds well to therapy, and one that does not, and there still can be those in the good-response group that do worse, but members of the latter have a greater probability of a bad response.

The existence of various forms of oppression almost trivially implies, that due to the varied nature of discrimination and the varied nature of situations where discrimination happens,  it isn’t a monolithic thing; for example, as a brown guy I do not have white privilege over a white woman, and yet I have male privilege over a white woman. That she has white privilege does not nullify the fact that I have male privilege – both play out in different ways in different situations.

The point I am trying to make here is it can be a crappy idea to engage in whataboutery that surrounds a different parallel form of oppression in a brazen display of fondness for tangents. Usually, parallels only matter when they converge on common deterimental outcomes or patterns of discrimination, often intersectionally.

Anyway, that said and done, I want to talk about a rather popular idea that pervades upper-caste opposition to affirmative action from what I have seen both round me and on various fora that pertain to discussion of social justice issues along the axis of caste, and why that idea is shown to be mistaken by some really good research, and in the process talk about why names matter. For those unacquainted with the term – quotas instituted through affirmative action are called “reservations” in Indian English parlance.

The idea of a meritocracy and a popular cultural reflection thereof

There are multiple caricatures of affirmative action that seek to somehow seem to mix up merit with achievement and quotas with incompetence that I have seen shared amongst opponents of reservations for Dalits. Below I offer a rather ridiculous specimen that I last saw shared amongst certain relatives.

“Wipro chairman Mr. Azim Prem ji’s comment on reservation: Good one..read on…. I think we should have job reservations in all the fields. I completely support the PM and all the politicians for promoting this. Let’s start the reservation with our cricket team. We should have 10 percent reservation for Muslims. 30 percent for OBC, SC /ST like that. Cricket rules should be modified accordingly. The boundary circle should be reduced for an SC/ST player. The four hit by an OBC player should be considered as a six and a six hit by a OBC player should be counted as 8 runs. An OBC player scoring 60 runs should be declared as a century. We should influence ICC and make rules so that the pace bowlers like Shoaib Akhtar should not bowl fast balls to our OBC player. Bowlers should bowl maximum speed of 80 kilometer per hour to an OBC player. Any delivery above this speed should be made illegal. Also we should have reservation in Olympics. In the 100 meters race, an OBC player should be given a gold medal if he runs 80 meters. There can be reservation in Government jobs also. Let’s recruit SC/ST and OBC pilots for aircrafts which are carrying the ministers and politicians (that can really help the country.. ) Ensure that only SC/ST and OBC doctors do the operations for the ministers and other politicians. (Another way of saving the country..) Let’s be creative and think of ways and means to guide INDIA forward… Let’s show the world that INDIA is a GREAT country. Let’s be proud of being an INDIAN.. *SHARE THIS if u are against 🚫 reservation .”

The key supposed “points” one gets from reading the above are the notions that people should start off on a level playing field, regardless of the ground realities of the oppression they face on account of various identities, firstly, and secondly, it is somehow ridiculous to have quotas with different requirements for different people, and thirdly, that people who gain access to opportunities through affirmative action are somehow incompetent.

There is also the brazenly casteist accusation that people from scheduled castes/tribes/other backward castes (all categories that experience casteist oppression and therefore have quotas under Indian policymaking) are incompetent at whatever they do, just because they have been afforded opportunities through reservations.

The idea of accomplishments in the backdrop of a level playing field is somehow perceived in the minds of reservation opponents to be a reflection of merit, and which lends justification in their mindsw to the argument that reservations would be unnecessary if Dalits were not so incompetent.

This brings me to the crux of my post – to show evidence that this perception of merit in fact is an illusion that masks an inherently unfair game, where continued casteist discrimination is passed off for merit. Before I talk about the evidence though – I wish to make another point with a metaphor – the trope above tries to paint reducing the burden on some participants as fundamentally unfair on the rest, without acknowledging the fact that some of those people have already been beaten up so they cannot compete on a level playing field.

A level playing field is only appropriate when people have a level starting point, and the evidence is clear that caste disproportionately tracks with worse outcomes. There is also the fact that the analogy of a sporting contest is nonsensical here, sensu stricto,  cricket matches manifestly are *not* analogous to egalitarianism in Indian society;  To promote social equality is an axiomatic principle espoused in the Constitution of India, whereas an equality of outcome in cricket matches is not held to be ideal in any axiomatic sense.

The private sector and evidence for casteism in a supposed meritocratic setting

One place where reservations do not take place, to the best of my knowledge, is the private sector in India. This then, offers a fertile testing ground for the hypothesis that the outcomes in the private sector are strictly a matter of merit. The thing about caste/religion in India is that names by themselves are very powerful markers thereof, and this lends itself to some very powerful, and important research.

If merit was all that mattered, if you were applying to jobs, then the name itself (and by proxy, caste and/religion) should have no influence on outcomes, and if you carry out the experiment across lots of recruiters with lots of CVs identical in every way but for the caste/religion that the name implicitly conveys, you would only see differences if they were due to discrimination, since everything else has been controlled for.

So when Professor Sukhadeo Thorat and colleagues ended up carrying out a study looking at whether names made a difference to interview-call ups from recruiters, using a sample of up to 4808 CVs, sent in response to a 548 job postings, they found that CVs with SC/ST names or muslim names attracted remarkably fewer interview call ups despite every other detail on the CVs being similar to ones with upper caste names.

They found that if one had a muslim name they got call-ups at a rate that was 2/3 lower than upper class applicants, if one had an SC/ST name, they were called up at a rate a third lower – a difference larger than between underqualified upper-caste CVs and adequately  qualified Dalit CVs, a difference so big it would happen less than 2-3 times out of every 100 times the experiment was done if there was no discrimination.

The study, in other words, revealed rampant discrimination even at the very first stage of the job application process in a supposed meritocratic environment. That study is available here

In additional studies, they showed that being Dalit was associated with reduced access to work opportunities and promotions, as well as lower pay, despite being equally qualified. which is yet more testament to the fact that all the clamouring about merit is a convenient smokescreen that obscures discrimination and masks the brutal existence of caste as an oppressive force. That study, incidentally, is here  .

Then there is also this study, which is fairly self-explanatory  http://www.epw.in/caste-and-economic-discrimination/where-path-leads.html

This study attempts to trace the differential pathways that dalit and non-dalit students from comparable elite educational backgrounds traverse in their journey from college to work. While the training they receive in the university world is quite comparable, dalit students lack many advantages that turn out to be crucial in shaping their employment outcomes. Dalit students support the affirmative action policy completely, which allows them to break their traditional marginality. Our findings suggest that social and cultural capital (the overlapping of caste, class, family background and networks) matter a great deal in the urban, highly skilled, formal and allegedly meritocratic private sector jobs, where hiring practices are less transparent than appear at first sight.

So the point is that even if people get into a great university , they still experience different career trajectories and challenges just by virtue of caste privilege or the lack thereof, again in a supposed meritocracy. The lack of transparency and the challenges of addressing individual instances of discrimination is why affirmative action is justified.

The influence of caste is measurable, as is the population size of oppressed groups, and reservations offer an easy and pragmatic way of examining and quantifying the extent of discrimination, and reacting by ensuring that the pre-existing bias against victims of caste-discrimination is eliminated.

So if you, like me, have caste privilege, I suggest you’d be doing a good thing by not buying into ridiculous, casteist, anti-reservation rhetoric and by talking to people that do oppose reservations about the reality of caste discrimination, the illusion of merit, and why reservations continue to be an essential component in the battle against discrimination. There are also other misconceptions regarding caste and reservations that I have seen, but they will be the subject of another post altogether.

On Randomness, Determinism, False Dichotomies and Cancer

Before I start – a short summary

[1] A recent paper attributed a large proportion of variation in incidence of cancers across different tissues to the number of stem cell divisions in them, and
stochastic errors in cell division.

[2] The paper grouped tumour types with known external causes as “deterministic” and those without as “stochastic”

[3] I have seen people being hostile to the notion of stochasticity in cancer who’ve postulated other deterministic factors, with the implicit assumption that what is stochastic is really deterministic processes with as-of-now undiscovered causes.

[4] Here I explain why processes with known causes are still stochastic, leading to my gripe with both the misunderstanding that has permeated discussion of the paper as well as the iffy notion of grouping tumours into stochastic and deterministic ones in the paper. My assertion is that even those cancers strongly driven by external carcinogens involve randomness/stochasticity.


Sooo, last week, a paper was published in the journal Science that linked the number of stem cell divisions in normal tissues to the rates of incidence of cancer in that tissue. So the more dividing stem cells there were in a tissue, it turns out, the more likely the tissue would be prone to developing cancers in populations.

The paper is to be found here and where I quote without further reference, it is from this paper http://www.sciencemag.org/content/347/6217/78

To quote, the abstract reads…


Some tissue types give rise to human cancers millions of times more often than other tissue types. Although this has been recognized for more than a century, it has never been explained. Here, we show that the lifetime risk of cancers of many different types is strongly correlated (0.81) with the total number of divisions of the normal self-renewing cells maintaining that tissue’s homeostasis. These results suggest that only a third of the variation in cancer risk among tissues is attributable to environmental factors or inherited predispositions. The majority is due to “bad luck,” that is, random mutations arising during DNA replication in normal, noncancerous stem cells. This is important not only for understanding the disease but also for designing strategies to limit the mortality it causes.

 Much of the reaction I’ve seen to the paper on the cybersphere involves a fundamental misunderstanding of the processes that drive cancer – far too many people have been thinking that things cause cancers deterministically ,and even in the paper the authors group cancers into stochastic ones and deterministic ones in Figure 2, somehow conveying the impression that there are those that are caused, and those that are due to chance. There are several well-described summaries for laypeople already on the web, ranging from the almost always excellent David Gorski’s post http://www.sciencebasedmedicine.org/is-cancer-due-mostly-to-bad-luck/ , to PZ Myers’ explanation of the paper http://freethoughtblogs.com/pharyngula/2015/01/03/cancer-bad-genes-or-bad-luck/ . David’s post in particular describes the trainwreck that the media misinterpretation of stochastic errors as “bad luck” has led to.

To summarise all of that – the paper says that differences in the incidence of cancers amongst different tissues can be mostly explained by the number of cell divisions in stem cells, and known environmental factors and genetic predisposition only explain a very small percentage of why different tissues get cancers at different rates. They postulate that mutations accumulate with the number of stem cell divisions because of stochastic or chance errors in cell division.

This led David Colquhoun, on twitter, to note that a lot of the opposition to this finding seemed to be from people who opposed the role of chance in driving cancers…and he is right about the amazing indignance of those reacting with hostility to the role of chance , for reasons I will tell you in a little bit.



Additionally, there were people positing the notion that it couldn’t be chance, there was just some undiscovered latent factor/factors – and so the dichotomy was set up between stochastic (assumed to be with no cause) and deterministic (assumed to be with causes) in what passed for discourse amongst those that did protest too much.

Where the paper gets it right…

Coming to the paper itself, there are bits I like – it was quite elegant evidence for the role of stem cell divisions in driving the evolution of cancer; in some cases tumours can be latent for a long long time before they present clinically; and previous studies have reported a case of latency in lung cancer for up to two decades before the tumour showed up. Turns out you need multiple mutations to go from a normal cell to a cancerous cell , and obviously cellular lineages that persist longer (and have more divisions) are likelier to acquire the full complement of mutations.

What I disagree with is the paper is the authors lumping everything that is not attributable to external factors to be the product of stochastic errors in cell division.

Importantly, my gripe with that is not that they are attributing it to a stochastic process, because, as I shall explain, almost all mutations, even those with known causes are still random. My problem was with them putting it down to cell division and DNA replication ; turns out there are loads of internal cellular processes, which are neither genetic nor products of environmental factors, that can generate mutations – of course, all of these are still stochastic, but their phrasing of everything as errors in cell division is something I find too vague.

Some of these internal processes include age-related mutations, which are characterised by the spontaneous deamination of CpG dinucleotides, and by and large comprise a mutational signature that is found ubiquitously across cancers of different types. http://www.sanger.ac.uk/about/press/2013/130814.html

Wondering what a mutational signature is? Well, DNA is made of 4 bases, and when you look at what mutations (DNA sequence changes) have taken place in tumours compared to normal tissue you can look at the DNA sequence around the mutated sequence, and turns out certain processes generate mutations in certain sequence contexts; I’ve blogged about this earlier in the context of APOBEC enzymes, which accidentally mutate human DNA and can potentially cause cancer.

So, the point I am trying to make here is that every mutation that isn’t caused by inheritance or exposure to external carcinogens are not down to errors in DNA replication during  cell division , unless you use the term for all mutations generated internally, in which case it loses nuance. Indeed, there is extensive documentation of internal mutagenic processes, the repair pathways that deal with the lesions they produce, and so on and so forth… http://www.nature.com/nrg/journal/v15/n9/full/nrg3729.html

However, this is the important bit – Even if causes are known, external or internal, this does not mean they are deterministic; I reiterate, mutations with known causes are still random and chance still plays a massive role.

How can things have causes and yet be random?!

The trouble is that the popular use of the word “random” differs from the scientifically and statistically rigorous usage of the term thereof. In common parlance, people often assume that random means “for no reason” or “with no cause” , like “She turned up wearing a hat, totally random, blud”.

In science, it means “following a probability distribution”, where uncertainty is involved. This is why we talk of risks – see, smoking causes lung cancer and there is a mutational signature associated with smoking, and many of the mechanisms of mutations induced by cigarette smoke are well known. However, the relationship between smoking and lung cancer is not deterministic – i.e, not everyone who smokes heavily gets lung cancer – what smoking does is it increases the chances one has of getting lung cancer. This is why the relationship between lung cancer and smoking is stochastic (chance is involved).

Likewise, APOBEC enzymes can cause very specific mutations and have a specific mutational signature associated with them, i.e, they change C to T or C to G when there is a T before and a G,A or T after; i.e, TCW -> TTW or TGW – however, the mutations induced by APOBEC enzymes are still random.

How can this be the case?

Well, it turns out, that for cancers to develop, you need mutations or epigenetic changes in certain types of genes – those that control the cell’s (or in this case, a lineage of cells’) ability to acquire the hallmarks of cancer (i.e, ability to grow without external stimulation, ability to escape cell death, ability to escape the immune system et cetera)  and not all regions of the human genome harbour genes that can cause cancer. So there are, for instance , plenty of TCW sites in the genome that are not capable of affecting cancer-associated genes if they pick up an APOBEC induced mutation.

So while we know that a molecule of APOBEC can act upon a TCW site to mutate it at a given rate – which TCW site in the genome gets mutated is down to chance, and the probability it gets repaired also involves a chance element; this is how even a factor with a well-defined mode of action can still make random mutations.

On top of this – you see chance involved when different combinations of mutations occur in a cell or its lineage – the right combination of cancer-causing mutations happening is still a matter of chance – whether it evolves sufficiently to evade the immune system is still a matter of chance; chance is everywhere – cancer evolution is a stochastic phenomenon, fundamentally.

Additionally, mutations happen at different sites in the genome with different probabilities, but if it happens in a cancer related gene that then gives cells that carry it a selective advantage or not is a matter of chance – this is why cancers contain both driver mutations that confer growth advantages and passenger mutations which don’t.

This leads me to my main bone of contention with the paper, along with the iffy statistics of the second figure in the paper, I find the authors group them into stochastic and deterministic classes ;

They also make this clanger

We refer to the tumors with relatively high ERS as D-tumors (D for deterministic; blue cluster in Fig. 2) because deterministic factors such as environmental mutagens or hereditary predispositions strongly affect their risk. We refer to tumors with relatively low ERS as R-tumors (R for replicative; green cluster in Fig. 2) because stochastic factors, presumably related to errors during DNA replication, most strongly appear to affect their risk.

It turns out that they are all stochastic – because, for instance where exactly in the genome smoke-induced carcinogens induce mutations is down to chance. Smoke and environmental mutagens are not deterministic factors, nor are any internal mutational processes. 

A very simple introduction to machine learning…

Machine learning quite simply refers to the use of algorithms to make predictions and solve problems involving properties of things and how they behave in response to an outcome.

The nature of problems

Properties themselves can take a variety of forms, sometimes they are categories – as in whether a person has a disease or has a higher education qualification, for instance; or they could be continuous – as in height or weight, as can outcomes – you could talk about whether someone is rich or poor based on having qualifications, or have a continuous outcome – such as overall income.

The outcomes that are subject to predictions are called response variables and measurements or groupings samples are put into are called features. Machine learning is in essence the practise of building models that can predict the response using rules that are fit on features.

Problems where the response variable consists of groups are called classification problems, because things are being classified. Where the outcome is continuous it is a regression problem. The kinds of classification or regression rules available depend on the nature of the data – some methods, like linear regression, can handle both categorical and continuous feature types, whereas some methods require things to have to be converted to continuous variables first; there simply is a bewildering array of models that can be applied.

The underlying assumption

The very effectiveness of machine learning is predicated on there being non-random patterns in the data however with respect to the measurements of the features being considered – if there is no link between the measurements of the features you’ve chosen to build a model and the problem you are trying to address you will see lousy performance.

Measuring performance and overfitting

Performance is often measured in terms of accuracy for classification problems – and if there are only two classes then measures like sensitivity and specificity are often used and in regression problems it is often measured by how much of the variation in the outcome is explained by a model and by how much of an error there is between the predicted outcome and the actual outcome.

It is important that performance be measured when overfitting is minimised. Overfitting refers to the fact that a model can fit your data too well, including inconsistencies, and learn to account for those present within datasets but not those outside. This can give you a spurious measure of how good your models are and make it look better than it actually is.

To account for this it is ideal to train on a proportion of the data and test using another proportion that the training process never included – either by leaving out a portion at the outset or using an altogether independent dataset; or using cross-validation, where a certain percent of samples is held out, the model is trained on the remaining data, and the held-out bit is used as a test sample, and this is done over and over again to get more accurate estimates of performance.

Where can I learn more about machine learning?

There is a very nice introductory LinkedIn tech talk here http://www.youtube.com/watch?v=wjTJVhmu1JM

And here is a full set of lectures that are a treat


I might do a set of posts looking at very particular applications in the next few months, until then, feel free to knock yourself out.

How HPV driven cancers get their mutations…

Hi there!

It’s been a long time since I last blogged, but that is because I’ve been swimming round in data, which has incidentally led to the findings that were published in this paper , which I will describe in this post.

HPV and the link to cancer.

HPV (Human Papillomaviruses) consist of a family of viruses that infect keratinocytes (skin cells) that line the outside of the body and the inner cavities – some of them just cause warts (and genital warts) but some of them are capable of driving the formation of cancer. These types, which are called “High-risk” strains, are the ones that are targeted for prevention by HPV vaccines.

High-risk HPV strains differ from low-risk strains in terms of cancer-causing ability because of proteins they make during their life cycle. Cells need to be actively dividing to permit HPV replication and in order to do this, the virus uses two proteins, called E6 and E7 , to block and degrade two proteins in human cells, called TP53 and pRb, which are two potent tumour suppressors (genes that prevent tumour formation).

Normally, E6 and E7 are only active for a brief while during the virus’ life cycle, which culminates in the production of more viruses that restart the cycle all over again, but before HPV driven cancers form something very strange happens; by complete accident the viral genome gets inserted and integrated into human DNA in infected cells, or infected cells get locked into a state where E6 and E7 are produced all the time. Suddenly you’ve got cells with TP53 and pRb off all the time, leaving behind cells that can grow abnormally. We see this when women have cervical scrapings looked at, and see “dysplastic” cells that have grown clumpy and abnormal.

However, these dysplastic cells are not cancerous – and haven’t acquired all the hallmarks of cancer. For this to happen there need to be additional changes to the DNA sequence (Mutations) of the genes in dysplastic cells that can confer those properties. Well known examples of things that cause mutations include tobacco smoke; for quite a while it had been an open question as to where HPV-driven tumours got their mutations from.

Suspicions are aroused: could the APOBEC family of proteins be making these mutations? 

One of my major research interests is to see what genes are expressed more and what genes are turned off in HPV driven cancers, and when defining a signature for these tumours I compared them to normal tissue and HPV negative tumours that arise in the same tissue (while cervical cancers usually all tend to be HPV-driven, there are head and neck cancers caused by HPV and those caused by chronic tobacco and alcohol exposure) and one of the genes that I found expressed at high levels in HPV-positive tumours was APOBEC3B.

APOBEC3B is one of many proteins of the APOBEC cytosine deaminases family. These act either on RNA or DNA when it is a single stranded state, and take part in the body’s immune response against viruses by messing up the RNA/DNA from the viruses. They work by changing cytosines, one of the four bases that make up DNA to uracil (a base that is normally only found in RNA) which then gets converted to a thymine or a guanine (two other bases that make up DNA); so if you get lots of these changes in viral DNA you fundamentally break them so they can’t do any of the things they usually do, and it had been known for a while that you could find HPV with messed up DNA in precancerous lesions with patterns of change associated with APOBEC proteins.

This led us to wonder if APOBEC proteins could end up accidentally changing human DNA just like it would change viral DNA and therefore generate the necessary DNA sequence changes to cause cancer; and at the same time we started wondering that a couple of papers came out showing that there were human cancers in which mutations looked like they were being generated by APOBEC enzymes, very likely APOBEC3B (We could tell it was likely APOBEC 3B because it is known to change cytosines that are preceded by a thymine and followed by guanine or adenine or thymine, so if the sequence was TCA or TCG or TCT it would be converted to TGA/TTA or TGG/TTG or TTT/TGT ). There is an alternative process that can also generate TCG->TGG/TTG mutations, so in order to specifically measure APOBEC activity we ended up using the others, which we referred to in the paper as TCW to TKW (TCW->TKW, where K = G or T and W = A or T).

Those previous papers also noted that cervical cancers had lots of mutations that showed the APOBEC signature, but the question remained – was this down to it being the cervix? or was it down to these tumours being HPV+? We decided to take a look in head and neck cancers as well where we could compare HPV+ and HPV- tumours that arose in similar tissues to see if there was truly an association with HPV, and hence we did the work reported in the paper…

HPV positive tumours have a vastly higher fraction of mutations belonging to the APOBEC signature.

First, we ended up looking at levels of APOBEC mutagenesis and how much of all the mutations in tumours were attributable to them using publicly available data for 40 HPV+ head and neck tumours and 253 HPV- head and neck tumours. To do this we used multiple approaches – including looking at TCW->TKW mutations and also trying to break down all the mutations we see in these tumours into patterns of mutations, as was done by these people at the Sanger Institute , and also looking at enrichment for the TCW->TKW mutation pattern locally. All the approaches we used showed the same thing – HPV+ tumours had a vastly higher proportion of mutations most likely caused by APOBEC enzymes.

Figure1:APOBEC mutations are highly enriched in HPV+ HNSCs

Multiple measures of APOBEC activity showed a strong association with HPV status but not age or smoking; APOBEC, age and smoking were the three processes we identified as driving the signatures using the Sanger Institute’s approach. The more the numbers are shifted to the right the stronger the association with the factor listed on the left. 

We found signatures previously associated with APOBEC, smoking and age, and showed that APOBEC activity was not associated with the latter two, which was as expected. Having identified an association with HPV driven tumours we wanted to know if this was a general antiviral response or something HPV specific…so we took a look at patterns of mutations in liver cancers caused by hepatitis B and C viruses and found no evidence for APOBEC mediated mutations being significantly enriched in these tumours.

Of drivers and passengers

Most tumours have hundreds and thousands of mutation, but only a few actively contribute to the acquisition and maintenance of the hallmarks of cancer. So, having initially identified high proportions of APOBEC-mediated mutations in HPV driven cancers when looking across the exome (all protein coding genes in general) we decided to ask if the enrichment we saw in all genes was also maintained when we restricted our searching to genes known previously to drive cancer or those that share features associated with drivers, like occurring at a frequency greater than expected by chance. Our analyses confirmed that APOBEC-mediated mutations were again enriched in the HPV+ head and neck, and cervical cancers compared to the HPV- HNSCs.


Differences between HPV negative HNSCC and HPV+ tumours (HNSCC and Cervical cancer) are maintained when looking at all protein-coding genes (whole exome) and likely driver mutations (MutSig).

Then we went on to look at which driver genes happened to be most mutated by APOBEC proteins, and found a gene called PIK3CA (one of the components of a protein complex called PI3 kinase) towards the very top of the list. PIK3CA has previously been reported as being vital to the sustenance of many HPV positive tumours in particular and head and neck cancers in general, and drugs are being developed to target it. Interestingly, we observed that in the HPV+ tumours 22/25 PIK3CA mutations recorded were of the APOBEC type, while this wasn’t the case for the HPV negative tumours.

This then led to yet another question – can the levels of APOBEC activity explain a preference for APOBEC mutations in HPV-positive tumours? Now for driver genes there are two things that may govern what kinds of mutations we see – how much of a growth advantage a mutation in a driver gene gives that cell and the mutation itself. My supervisor, Tim Fenton, who worked on PI3 kinases previously, knew that there were two regions in PI3 kinase amongst which mutations regularly occurred (one or the other) and then realised that one of them contained a TCW sequence that APOBEC proteins could act on while the other one did not.

The PIK3CA gene makes a protein called p110-alpha, and proteins have different distinct elements in their structure, called domains. One region, called the helical domain, is often mutated at two TCW sequences while the other region, called a kinase domain, is not, and both mutations confer similar growth advantage, and if you look across multiple tumour types, overall you tend to see a 50-50 split between the two. This enabled us to account for growth advantage and directly see if APOBEC activity, which we had already measured by looking at all protein-coding genes, and a preference for APOBEC-induced mutations in the helical domain, were linked.

Since PIK3CA is mutated in multiple types of cancers, I was able to grab some data from The Cancer Genome Atlas project and measure how strongly there was a skew towards acquiring helical domain mutations compared to the kinase domain mutations and just look at what APOBEC activity looked like in each of those types of tumours. The results were quite robust – the higher the APOBEC activity in a cancer type, the stronger the preference for helical domain mutations compared to kinase domain mutations.


Figure 3. A – as you move from left to right (tumour types are arranged from left to right based on median APOBEC activity), you see helical domain mutations (black bars) become strongly preferred compared to kinase domain mutations (yellow bars). B – plotting the median TCW->TKW fraction (APOBEC activity) against the proportion of PIK3CA mutations that are helical hotspot mutations shows a strong correlation.

So yeah, people had been wondering why in bladder cancers, for example, you saw such a strong preference for helical hotspot mutations – we basically addressed that long-standing question with these analyses.

Explanatory factors

So the one other thing we did was to look at what might be driving this process, and surprisingly we found no correlation between how much E6 and E7 was being expressed in these tumours and APOBEC activity, or for that matter between APOBEC3B gene expression and APOBEC activity, and did find a strong link with how many mutations in total these tumours had. The work has led us to hypothesize it may be something like DNA damage induced by HPV, that generates the substrate for APOBEC3B to act upon, that drives the process.


Our work suggests that HPV positive tumours evolve in a trajectory where they incorporate HPV DNA into their own, leading to sustained E6/E7 expression, followed by APOBEC activity until a driver mutation occurs, after which clones expand and show the APOBEC signature when their DNA is sequenced while in HPV negative HNSCC smoking and alcohol do this job, and if PIK3CA is the gene mutated the HPV positive tumours tend to have helical domain hotspot mutations because APOBEC proteins are responsible for them…

Additional stuff

The journal did a Q&A that expands on some of the work in the paper, and you may find it here .

There is a press release from UCL here.


Putting stuff into context – on the TRAIL of a new science news story on the BBC

Right, so this time I have a few things to add to a news story doing the rounds currently. http://www.bbc.co.uk/news/health-25625934


The most dangerous and deadly stage of a tumour is when it spreads around the body.

Scientists at Cornell University, in the US, have designed nanoparticles that stay in the bloodstream and kill migrating cancer cells on contact.

They said the impact was “dramatic” but there was “a lot more work to be done”.

One of the biggest factors in life expectancy after being diagnosed with cancer is whether the tumour has spread to become a metastatic cancer.

“About 90% of cancer deaths are related to metastases,” said lead researcher Prof Michael King.

On the trail

The team at Cornell devised a new way of tackling the problem.

They attached a cancer-killing protein called Trail, which has already been used in cancer trials, and other sticky proteins to tiny spheres or nanoparticles.

When these sticky spheres were injected into the blood, they latched on to white blood cells.

Tests showed that in the rough and tumble of the bloodstream, the white blood cells would bump into any tumour cells which had broken off the main tumour and were trying to spread.

The report in Proceedings of the National Academy of Sciencesshowed the resulting contact with the Trail protein then triggered the death of the tumour cells.

Prof King told the BBC: “The data shows a dramatic effect: it’s not a slight change in the number of cancer cells.

“The results are quite remarkable actually, in human blood and in mice. After two hours of blood flow, they [the tumour cells] have literally disintegrated.”

He believes the nanoparticles could be used used before surgery or radiotherapy, which can result in tumour cells being shed from the main tumour.

It could also be used in patients with very aggressive tumours to prevent them spreading.

However, much more safety testing in mice and larger animals will be needed before any attempt at a human trial is made.

So far the evidence suggests the system has no knock-on effect for the immune system and does not damage other blood cells or the lining of blood vessels.

But Prof King cautioned: “There’s a lot of work to be done. Various breakthroughs are needed before this could be a benefit to patients.”

Just a few things, though. Firstly, the spread of cancer cells throughout the body may be a very early event in tumour evolution, as suggested by studies in animal models http://www.jci.org/articles/view/43424 . More worryingly, TRAIL can, in cells that have activating mutations in a gene called k-ras, actually lead to more invasive disease and earlier death due to metastasis. http://www.ncbi.nlm.nih.gov/pubmed/20188103 . Several mechanisms of resistance to TRAIL have also been documented http://www.ncbi.nlm.nih.gov/pubmed/22206047

So the promise highlighted in the article makes me immensely skeptical.

Planes are made of metal, but this doesn’t mean you can fly on bauxite.

OK, pardon the weird title please. The internet is home to a lot of rubbish about cancer research, and amongst this set of kooky beliefs is the notion that hemp oil/marijuana can cure cancer. The article that has drawn my attention this time round is http://www.destructionofamerica.4t.com/whats_new_3.html

I quote

I didn’t stutter, Cannabis IE “evil drug Marijuana” hemp cures several kinds of brain tumors as well as other cancers. The FDA has known for over twenty tears and hid it from us!

New research shows that marijuana components fight an aggressive form of brain cancer. And the media says – nothing, again.

Combining the two most common cannabinoid compounds in Cannabis may boost the effectiveness of treatments to inhibit the growth of brain cancer cells and increase the number of brain cancer cells that die off. That’s the finding of a new study published in the latest issue of the journalMolecular Cancer Therapeutics.

Marijuana components have been found to inhibit the growth of the most common, and aggressive form of brain tumor, a glioblastoma, according to a study published in the January 6 issue of Molecular Cancer Therapeutics.

The study was done at the California Pacific Medical Center by researchers who combined a non-psychoactive ingredient of marijauna, cannabidiol (CBD), with Δ9-tetrahyrdocannabinol (Δ9-THC), the primary psychoactive ingredient in Cannabis. The findings demonstrated the inhibitory effect of these two ingredients on brain cancer cells when used together.

“Our study not only suggests that combining these two compounds creates a synergistic effect,” says Sean McAllister, Ph.D., a scientist at CPMCRI and the lead author of the study. “but it also helps identify molecular mechanisms at work here, and that may lead to more effective treatments for glioblastoma and potentially other aggressive cancers.”

“Previous studies had shown that Δ9-THC was effective in inhibiting brain cancer growth in cell cultures and in animal models and prompted a small clinical trial in Spain. There is also evidence that other compounds in Cannabis might prove effective against tumors, but limited scientific evidence is available,” the report stated.

President Reagan & Bush tried to have studies done in Virginia destroyed to hide the facts but fortunately some survived. There are OVER 50,000 products that can be made with hemp conservatively. Yet we have been lied to by the authorities for nearly a hundred years and saw them destroy one of the most profitable commercial industry known to man.

How many people have died that could still be alive? How many have suffered needless pain cannabis could relieve? Call your congress persons today TOLL FREE  800 833 6354 tell them you know the truth and they will be FIRED if they don’t change the law!

Right, time to go fallacy-fishing.

[1] That certain cannabinoid compounds can hit the growth of glioblastoma cell lines in cell lines in no way means that marijuana cures cancer anymore than the effectiveness of Penicillin means Penicilium notatum (the fungus that penicillin is produced from) cures bacterial infections.

[2] Note that these are cell lines and animal models we’re talking about at best – inhibiting the growth of cell lines and xenografts is a preliminary step on the way to develop new cancer therapies; the work in the paper or the studies it cites in no way support the notion that these compounds are curative.

[3] Indeed, the very person who wrote the codswallop above failed to read the very excerpts he quotes.

“Our study not only suggests that combining these two compounds creates a synergistic effect,” says Sean McAllister, Ph.D., a scientist at CPMCRI and the lead author of the study. “but it also helps identify molecular mechanisms at work here, and that may lead to more effective treatments for glioblastoma and potentially other aggressive cancers.”

“Previous studies had shown that Δ9-THC was effective in inhibiting brain cancer growth in cell cultures and in animal models and prompted a small clinical trial in Spain. There is also evidence that other compounds in Cannabis might prove effective against tumors, but limited scientific evidence is available,” the report stated.

In what parallel universe is the notion that Marijuana cures cancer or the idea that the data above implies that the FDA admits it anything less than a brazen display of intellectual contortionism that would put the poriferan equivalent of Houdini to shame?!