Bisulfite Sequencing Methods

Adam Nunn

Overview

Though it is by no means the only biological mechanism within the domain of epigenetics, DNA methylation is among the most prevalent and the most studied throughout the field. In earlier chapters, we have discussed underlying molecular processes and the ecological consequences of differential methylation within species- but how exactly do we detect these differences? One technique that has emerged at the forefront of epigenetic research is bisulfite sequencing: a distinct adaptation of next-generation sequencing which produces genome-wide methylation profiles at a nucleotide-level resolution.

The technique, devised by Frommer et al. [Frommer 1992] and refined for modern sequencing techniques by Lister et al. [Lister 2008] and Cokus et al. [Cokus 2008], involves the treatment of extracted DNA from test samples with sodium bisulfite, a deaminating agent which mediates the conversion of unmethylated cytosine nucleotides into uracil. Cytosine bases that carry methyl groups (e.g. 5-methylcytosine, 5-hydroxymethylcytosine) are left unaffected by the treatment and remain in their original unconverted state. As uracil residues are subsequently converted to thymine during the PCR step of standard DNA sequencing, these bisulfite-treated samples can be subjected to standard sequencing protocols and used to generate sequencing reads, which carry epigenetic information. Once treated, the reads effectively reframe the research question from a biological to a computational, algorithmic concern, at least until the results require interpretation.

In standard sequencing, the next step is to follow workflows for read alignment of the sequencing reads to a reference genome assembly. Aligning sequencing reads uses overlapping sequences to puzzle as many reads as possible into long sequence fragments, so-called contigs. The alignment presents some issues when handling bisulfite data, as thymine residues can no longer be considered as entirely independent entities to cytosine. Read alignment algorithms usually operate based on scoring matrixes, which assign an overall probability for the alignment of two sequences based on the number and position of matches, mismatches, insertions, and deletions between nucleotides. The problem arises in that reference cytosines can conceptually match thymines in bisulfite-treated reads, but not vice versa. Existing algorithms are often not built to handle this asymmetry between bases, so the solution is either to adapt these tools in some way further or to operate specifically with algorithms designed for bisulfite data. Several tools now exist in representation of either category, including notably Bismark [Krueger 2011] and BWA-meth [Pederson 2014], which adapt the popular standard aligners bowtie [Langmead 2013] and BWA [Li 2010], and software such as segemehl [Otto 2012] or ERNE-BS5 [Prezza 2012] which are capable of interpreting bisulfite reads in their own right.

The principles of bisulfite sequencing notwithstanding, another important consideration when designing such an experiment involves the chosen strategy for library preparation. Like generally in next-generation-sequencing, sequencing depth and coverage are also very important for bisulfite-sequencing, as we seek to maximize sequencing coverage regarding the scope of the questions we are looking to answer and the practical limitations of the study, such as cost and time. For example, does your study seek to investigate genome-wide methylation patterns, or is it enough to focus on a reduced subset of the DNA? Herein, we will consider Whole-Genome Bisulfite Sequencing (WGBS) applications, Reduced-Representation Bisulfite Sequencing (RRBS), and variations of these methods. In particular, what implications have such protocols on data robustness, and how should we adjust the procedure to improve quality control during the downstream analyses?

The chapter covers various technical concerns of bisulfite sequencing, from DNA extraction and library preparation to sequencing itself and the downstream extraction of methylated positions. The bioinformatic principles determine the data validity for answering the questions posed by the study, and an a priori consideration, therefore, is fundamental to the successful outcome of any such experiment. Finally, we discuss the rigid limitations of bisulfite sequencing and give brief suggestions for alternative methods that might be used to address these issues.

Last updated