This page gives an overview of the information available at the gene level and it's composed of three sections.
At the top, the page shows the gene name and Ensembl gene ID, the full description of the gene, its synonyms, its genomic location and strand, INSDC coordinates, and its number of transcripts.
The following sections show the Transcript Table and the Summary with links to external databases, and a Gene Diagram.
It shows each splice variant of a gene, i.e. protein-coding and non-coding transcripts, in addition to transcript and translation length, biotype, CCDS and RefSeq IDs, APPRIS and TSL flags. This table is hidden by default. Each transcript is given an Ensembl Transcript ID, which is unique and stable.
It provides additonal information and links to external databases:
- Name - from official gene nomenclature commitees such as HGNC (for human) and MGI (for mouse)
- CCDS - coding sequence IDs from the Consensus Coding Sequence Set
- UniProtKB - protein IDs from UniProtKB that match one of the translations of this gene
- RefSeq - Gene ID from Entrez Gene that matches the Ensembl gene
- LRG - IDs from the Locus Reference Genomic (LRG) project matching the Ensembl gene
- Ensembl version - versioning of the Ensembl gene ID
- GRCh37 assembly - (for human only) with genomic coordinates and links to the Location and Gene views of the gene on the previous human assembly
- Gene type - The gene type includes both status (e.g. known) and biotype (e.g. protein coding)
- Annotation method - It can be the Ensembl automatic, Havana manual or a merge between automatic and manual (for human, mouse, zebrafish, pig, and rat)
- Alternative genes - IDs from the HAVANA project that match the Ensembl gene
It depicts the gene and all its transcripts in the context of the genome. The image can be configured to add or remove data tracks.
Transcripts are drawn as boxes for exons and connecting lines for introns. Filled boxes show coding sequence, and empty boxes show UTRs (untranslated regions). Transcripts drawn above the blue bar (i.e the contig) are on the forward strand, whereas transcripts below are on the reverse strand.
Transcripts are represented by different colours:
- Red or gold transcripts are protein coding. Gold transcripts are identical between the annotation from Ensembl automatic pipeline and the manual annotation from HAVANA
Blue, pink or grey transcripts are noncoding. Go to the transcript summary help page for more information
The number next to the transcript name e.g. MYO6-001 and MYO6-201 tells you whether the transcript was manualy annotated by HAVANA (numbers beginning with 0) or annotated by the Ensembl automatic pipeline (numbers beginning with 2). Gold transcripts begin with 0.
It indicates the type of evidence that supports the annotation:
- Known: when the gene is previously reported in other external databases, such as Entrez and HGNC. The cDNA supporting this annotation should be from the same species as the genome
- Known by Projection: the annotation of the gene is based on the annotation of its homologous in well-studied genomes. The homology is determined during the Ensembl comparative analyses
- Novel: when it's not previously known in Entrez, HGNC or MGI for example and therefore have not been given an official name. It can be supported by cross species mRNAs or EST
- Merged: genes with at least one merged (gold) transcript between the automatic and manual annotation
it's an indicator of biological significance for genes.
If a gene has been manually annotated (i.e. in human, mouse, zebrafish, pig, and rat), we use the biotypes assigned by the HAVANA team. The full list of biotypes is available on the VEGA website.
Biotypes can be grouped into protein coding, pseudogene, long noncoding and short noncoding.