What do the different biotypes in Ensembl mean?
The Ensembl automatic annotation system classifies genes and transcripts into biotypes inlcuding: protein_coding, pseudogene, processed_pseudogene, miRNA, rRNA, scRNA, snoRNA, snRNA.
For human, mouse and selected other species, we incorporate manual annotation from Havana. For genes and transcripts that include manual annotation, we display the manually assigned biotype. The full list of Havana biotypes can be found here.
The biotypes can be grouped into protein coding, pseudogene, long noncoding and short noncoding. Examples of biotypes in each group are as follows:
- Protein coding: IG_C_gene, IG_D_gene, IG_gene, IG_J_gene, IG_LV_gene, IG_M_gene, IG_V_gene, IG_Z_gene, nonsense_mediated_decay, nontranslating_CDS, non_stop_decay, polymorphic, polymorphic_pseudogene, protein_coding, TR_C_gene, TR_D_gene, TR_gene, TR_J_gene, TR_V_gene
- Pseudogene: disrupted_domain, IG_C_pseudogene, IG_J_pseudogene, IG_pseudogene, IG_V_pseudogene, processed_pseudogene, pseudogene, transcribed_processed_pseudogene, transcribed_unitary_pseudogene, transcribed_unprocessed_pseudogene, translated_processed_pseudogene, TR_J_pseudogene, TR_pseudogene, TR_V_pseudogene, unitary_pseudogene, unprocessed_pseudogene
- Long noncoding: 3prime_overlapping_ncrna, ambiguous_orf, antisense, antisense_RNA, lincRNA, ncrna_host, non_coding, processed_transcript, retained_intron, sense_intronic, sense_overlapping
- Short noncoding: miRNA, miRNA_pseudogene, misc_RNA, misc_RNA_pseudogene, Mt_rRNA, Mt_tRNA, Mt_tRNA_pseudogene, ncRNA, ncRNA_pseudogene, rRNA, rRNA_pseudogene, scRNA, scRNA_pseudogene, snlRNA, snoRNA, snoRNA_pseudogene, snRNA, snRNA_pseudogene, tRNA, tRNA_pseudogene