General Sequencing Guidelines


Where can you find up-to-date publications?

 

For the most current information, we recommend searching the latest sequencing publications on the Illumina publication list

 

Illumina Publication Search Page

Publication Review Page

Sequencing Methods Review –  200 pages of fantastic methods and techniques!
Find library prep method descriptions, pros and cons, publication summaries, and references.

Cancer and Immune System Research Review
Advances in high-throughput sequencing have dramatically improved our knowledge of the cancer genome and the intracellular mechanisms involved in tumor progression and response to treatment. While the primary focus to date has been on the cancer cell, this technology can also be used to understand the interaction of the tumor cells and the cells in the surrounding tumor microenvironment.

Immunology Research Review
Repertoire sequencing has enabled researchers to identify unique receptor variants found in individuals with susceptibility to hematological malignancies, autoimmune diseases, and allergen response.

Metageomics Review

Metagenomics refers to the study of genomic DNA obtained from microorganisms that cannot be cultured in the laboratory. Recent technical improvements allow nearly complete genome assembly from individual microbes directly from environmental samples or clinical specimens, without the need to develop cultivation methods3. This accumulation of sequence information has greatly expanded the appreciation of the dynamic nature of microbial populations and their impact on the environment and human health. This document highlights recent publications that demonstrate the use of Illumina sequencing technologies in metagenomics.

Plus reviews of the following: Viral Detection, Genetic Disease, Oncology, cancer, Infectious Disease, and Microbial Genomics.

 

 

Do you need a Single Read (SE) or a Paired End (PE)?

 

Single-end reads start at the end of the sequencing primer and the instrument reads the incorporated nucleotides as it extends to the opposite end, until the cycles that you have specified. SE reads are typically adequate for tag counting, such as differential gene expression or ChIP-seq.

 

Paired-end sequencing starts with the first read just like SE, then another primer anneals to synthesize from the opposite end towards the read 1 primer location. It is important to understand that you cannot bioinformatically treat paired end reads as separate events. The pairs work in conjunction for additional information on positioning. You would most likely use PE sequencing for experiments studying methylation, RNA-seq when you are interested in splice-variants, and SNP identification.

 

 

What length read do you need?

 

The amount of cycles you specify pertains to the length of the reads in nucleotides. For example, a 50-cycle SE run, would yield data that has one set of 50-nucleotide reads. A 50×50 PE, 100-cycle run would yield two sets of paired data, 50-base forward and 50-base reverse reads. If you are interested in RNA-seq profiling or other counting experiments (e.g. ChIP-seq), then most likely 50-cycle single-end runs will be sufficient. These will be the cheapest runs, and can provide enough length to map to a reference for counting. Longer reads or paired-end reads can provide more information about alternative splicing, are useful in methylation studies, or perhaps you’d like to know the ultimate length of each insert of the library.

 

 

RNA-seq Recommendations

 

Recommendations for RNA-seq is a bit more complicated and should be based on the experimental objective. Whole genome sequencing is much easier to calculate since the reads result from a library consisting of random fragmented genomic inserts. On the other hand, RNA-seq libraries are generated by synthesizing cDNA from RNA transcripts. Thus, the sequencing outcome of a conventional RNA-seq library will depend on expression levels of a particular gene; highly expressed genes will yield more reads, whereas a low expressing gene will yield less reads. On a basic level, a researcher should decide if they are interested only in Differential Gene Expression (DGE) or do they prefer to obtain alternative splicing information and positional information.

 

For Large genomes such as Human or Mouse we recommend (increasing order of depth):

 

Level of inquiry: # of Reads (Millions) Type of Run
Highly expressed genes (genotyping) 5-10 Single End 50-cycles
Differential Gene Expression 15-30 Single End 50-cycles
Rare Transcripts / De Novo 100 Single End 50-cycles
Positional information 100 Paired End 50 x 50
Alternative Splicing 100 Paired End 100 x 100

** These estimates are for Poly A selection. Ribosomal removal may require more reads due to presence of non-coding RNA.

 

Coverage

 

The easiest way to define coverage is the average number of times a single base is read during a sequencing run. The Lander/Waterman equation is commonly used to express this:

 

C=LN/G

 

C=Coverage

 

G= number of bases in the sample’s haploid genome (whole genome seq). This could be replaced with total bases in an exome

 

L=read length

 

N=number of reads

 

Example:

 

The question may be, “How many reads do I need to get 20X coverage for a single human sample on a 150×150 run?” Thus the following rearrangement of the above equation:

 

Human genome haploid size is ~3×10^9bp

150bp + 150bp = 300bp for each read

 

(CG)/L = N     or        (20 * 3×10^9) / (300bp)= 2 x10^8 reads           or 200M Reads

 

Or perhaps you wanted to run a Rapid Run Mode which typically yields 300M reads, but do not know how many cycles you need (length of read) to get an exome coverage of 100X.

 

(CG)/N=L       or        (100X coverage * 66Mb)/300M reads= 22 cycles

 

(100 * 66×10^6)/300×10^6 = 22

 

Thus, for one sample you could do a 50-cycle Rapid Run and get around 200x coverage on a human exome. Or you could add another sample to the run and still obtain the 100X coverage.