EnsemblEnsembl Home

RNASeq Gene Models

Human RNASeq Gene Models

RNASeq data from Illumina's Human BodyMap 2.0 project have been used to generate gene models for human. The data, generated on HiSeq 2000 instruments in 2010, consist of 16 human tissue types including adrenal, adipose, brain, breast, colon, heart, kidney, liver, lung, lymph, ovary, prostate, skeletal muscle, testes, thyroid, and white blood cells. For each tissue, we have aligned the raw reads to the genome and then linked exons into tissue-specific transcript models using the reads that span an exon-exon boundary.

You can view these data in the Region in Detail view. Click on `Configure this page' and choose `RNASeq models' at the left of the main panel. Enable any or all of the 54 tracks and then close the configuration panel. Out of 54 possible tracks you can draw, 18 are tissue `gene model' tracks, and 18 are `intron' tracks, and 18 are BAM files.

The `gene model' track shows a transcript model. The `intron' track shows how many raw reads aligned across an exon-exon junction. The higher the intron block, the more highly expressed the transcript isoform is. When read coverage is high, the transcript's exon-intron structure produced for the gene track has a good chance of being correct. When read coverage is very low, it is not always possible to build a full-length transcript model.

From Ensembl release 70 onwards, BAM files are also available for these data. The BAM files are available for download from our FTP site.

Zebrafish RNASeq Gene Models

This is an experimental set of gene models produced using paired end Illumina RNASeq data from the Wellcome Trust Sanger Institute Zebrafish Transcriptome Sequencing Project Ref: ERP000016. Please see our publication (PMID: 22798491) for more details on our RNASeq pipeline and the zebrafish genebuild on Zv9.

The models are produced from a 2 step alignment process using Exonerate. First, a local genomic alignment is created that is collapsed to create alignment blocks roughly corresponding to exons. Read pairing information is used to group exons into approximate transcript structures. Secondly, reads are realigned to the proto-transcripts using a splice model and a short word length to create a set of spliced alignments representing canonical and non-canonical introns.

Gene models are created by combining the proto-transcripts with the spliced reads to create all possible variants, the variant with the most read support is displayed.

Intron Supporting Features represent the collapsed set of spliced reads used to inform the gene models. The features show the number of reads from each tissue that confirm a particular intron. Not all introns show expression in all tissues. Also, not all of the introns features were used in the gene models shown.

The intron track can be configured to use a variable height display where the height of the feature varies in accordance with the number of reads supporting the intron (up to a maximum of 50 reads). This display also highlights non-canonical splices in red.

RNASeq Gene Models in other species

Our RNASeq pipeline has been applied to a number of species following a similar method as was used for zebrafish. For each tissue or individual, the raw reads are aligned to the genome using BWA. This step allows us to quickly identify regions of the genome that are actively transcribed. We use the results from all tissues to create one set of alignment blocks roughly corresponding to exons. Read pairing information is used to group exons into one set of approximate transcript structures called proto-transcripts. Next, both the pooled (merged) reads and also reads from each tissue that were partially mapped by BWA are re-aligned to the proto-transcripts using Exonerate and a short word length to create a merged or tissue-specific set of spliced alignments representing canonical and non-canonical introns. For more details on RNASeq data in other species, please visit the species pages or contact us via the link at the bottom right of this page.