Monthly Archives: May 2012

Bisulfite Sequencing – Interrogating CpG Methylation at single base pair resolution.

Hi all, just a quick post here.

In previous posts, I have discussed methods of probing DNA methylation with methylation arrays and MeDIP-Chip. Those methods are genome wide but not quite whole genome. The highest resolution method available (and not surprisingly the most expensive and lowest-throughput) is a whole-genome method called bisulfite sequencing.

Bisulfite sequencing is used to map out DNA methylation at single base pair resolution. The technical challenge here is rather apparent – one must be able to differentiate unmethylated DNA from methylated DNA.

To do this, a process called Bisulfite DNA conversion is utilized. DNA is treated with Sodium or potassium bisulfite, this has the effect of converting unmethylated cytosines to uracil, while methylated cytosines (5-methylcytosines) are left intact. During sequencing reactions, uracils are converted to thymine. This makes identifying methylation changes relatively straightforward using bog-standard bioinformatics.

In effect, we are looking at which particular cytosines have been converted to thymines through bisulfite-induced uracil intermediates. To do this, two versions of a reference genome are digitally created Рone with all cytosines converted to thymines, one with all guanines changed to adenines and the resulting sequence is aligned to both, reads that are then uniquely aligned have their previously removed cytosines replaced  and and those cytosines that have been changed to thymine are identified, and these indicate unmethylated cytosines, the rest of the cytosine bases that were subjected to conversion are methylated.


Representation of methylation variation as identified by bisulfite sequencing a given locus. (Reference – )

The costs of this are extremely high, and one may query if whole genome methods are necessary when only a small number of genes are differentially expressed; I, for one, am of the opinion that it may be wise to couple analysis of differentially expressed genes with amplification of targeted regions and bisulfite sequencing on a targeted basis instead of a generalised whole genome search.

Further reading.

[1] Targeted bisulfite sequencing reveals changes in DNA methylation associated with nuclear reprogramming. – example of targeted bisulfite sequencing.

[2] De novo quantitative bisulfite sequencing using the pyrosequencing technology. – Paper documenting the combination of bisulfite conversion with pyrosequencing, which is ideal for short read length sequencing. (cost per base is high)

[3] A good review of the technique, protocols and associated challenges may be found here –

That’s all from me for this post, a rather short one, I’m afraid.



An Introduction to Epigenome Wide Association Studies.

Hello everyone, it’s been a long time since the last post…

In previous posts here, I have talked about epigenetics and the involvement of processes of this kind in disease. The feasibility of methylation markers recently received a further boost with the validation of intragenic DNA methylation in the ATM gene in blood cells as a risk marker for breast cancer [1].

Basic Premises-what, why and how?
This of course raises the question; what strategies can be used to develop epigenetic markers for complex diseases efficiently and at a large scale? Past inspiration has come through the use of Genome-Wide Association Studies (GWASs) [2], which has established itself as a reliable approach to identifying markers for diseases, for instance, by using high throughput sequencing and related techniques such as SNP arrays. In GWASs, large populations are split into cohorts; those with a phenotype and those without (Eg. A disease) and are sequenced to see if particular genetic variants appear at a higher frequency in the cohort expressing the phenotype than expected by chance alone. Well designed studies with good statistical power can certainly help identify reliable markers.
The scale such studies can take is well illustrated by cases such as this [3], where 14,000 cases of common diseases were profiled against 3,000 normal controls to identify disease associated markers using an Affy GeneChip mapping array.

Often, genetic disorders are picked up by familial clustering, and the genetic nature of diseases is often described by heritability; the amount of variation in phenotype explicable by inheritance alone[4]. The problem is that GWASs produce hits that don’t fully account for all the heritability that is accorded to a phenotype, and this is known as the problem of missing heritability [5].

In some diseases, like Schizophrenia, there is often monozygotic twin discordance in manifestation of the disease. Now these twins have a genetic risk, but not everyone who carries risk markers gets it. One possible hypothesis to explain this invoked epigenetics, and the application of epigenomic profiling  using Illumina 27k methylation arrays revealed methylation changes in genes associated with neural function and development to be significantly associated with bipolar disorder and schizophrenia in discordant twins [6]. They then validated findings in postmortem brain tissue samples from 45 people, both affected and unaffected.

Approaches and Challenges to successful EWASs.

Section reference – Rakyan et al, Nature [7]
Approaches in Epigenome Wide Association Studies. Citation - Rakyan et al [7]

The beautiful thing about discordant twin studies is that you can look at phenotypic variation that is due to epigenetic differences alone (discordant twins vary in phenotype and have a common environment, but vary in epigenotype). However, this is limited and rather difficult to use on a large scale because the number of discordant twins might be difficult to obtain.

The solution to obviate this problem would be to combine MZ twin discordance studies with large scale non-twin cohorts. This approach would lead to controlling for the influence of genotype in the first phase and enable validation of epigenetic markers in the second.

However, there is still one major issue with EWASs. With GWASs, unless you have somatic mutations, you know that mutations/SNPs/disease associated variants precede the appearance of the disease (it’s inherited and present in the genome from the moment of fertilisation, remember?). However, epigenetic changes may be a consequence of the disease, a cause of the disease or may simply be the by-product of a gene causing the disease. The third possibility can be accounted for by the approach mentioned above, which enables epigenetic diversity to be coupled with genetic diversity, but the first two possibilities need the use of prospective studies in combination with the other two strategies.

In prospective studies, the epigenomes of people are regularly profiled before a phenotype manifests, and in this way, it is possible to know if a change is a consequence or not. Integrating studies of this sort with the aforementioned approaches can help establish the epigenetic basis of disease.

Ironing stuff out – other challenges.

One of the other challenges is environmental confounding, you can account for genetics alright, but what about the environment? There is also the issue of quantifying methylation levels. Unlike genes, where you either have a variant or you don’t, methylation can be extremely variable, and arbitrary criteria- low, intermediate and high, and relative comparisons (hypo and hyper) are used. How biologically relevant this stratification is could be important in interpreting the results of such EWASs. Environmental confounding could potentially be taken care of using the same strategies used to account for the problem in GWASs.

Illumina methylation arrays – core concept. There are two types of beads for each locus, one for methylated variants, the other for unmethylated variants. A methylated locus only binds to methylated beads and is then bound to a fluorophore for visualisation. The same thing happens with unmethylated DNA and unmethylated beads with another fluorophore. The resulting fluorophore images can help describe loci as being methylated or not and serve as methylation maps for further analysis.

The advent of sequencing technologies that can read off methylation marks without prior bisulfite conversion [8] may further improve the resolution of EWASs. A short term stop gap may be Illumina 450K methylation arrays [9], which use a combination of array technology and bisulfite conversion to produce high resolution methylation maps.

Other Examples – Identification of early-warning markers that precede diagnosis of type 1 diabetes. – specific methylation marker for susceptibility to obesity and type 2 diabetes uncovered by an integrated analysis. – Locating methylation changes associated with aging.