The Ensembl Annotation Process
The Genome Assemblies page gives more information on where we get our genome assemblies from, how the sequence data for these genome assemblies are structured, and how we represent these data in Ensembl.
Protein-coding gene annotation
Protein-coding genes are automatically annotated using Ensembl's genebuild pipeline. All transcripts are based on mRNA and proteins in public scientific databases.
See the annotation article for more about the Ensembl genebuild pipeline, gene names and annotation.
Low-coverage genomes are annotated using a modified pipeline which attempts to locate genes across multiple scaffolds.
EST-based genes are predicted and displayed on the website but are not included in the final gene set.
Paired-end Illumina RNA-seq data have been used to generate transcript models for many species including human, zebrafish and pig.