HumanEnsembl Home

Gene summary

We provide displays at two levels:

  • Transcript views which provide information specific to an individual transcript such as the cDNA and CDS sequences and protein domain annotation.
  • Gene views which provide displays for data associated at the gene level such as orthologues.

This view is a gene level view. To access the transcript level displays select a Transcript ID in the transcript table and then navigate to the information you want using the menu at the left hand side of the page. To return to viewing gene level information click on the Gene tab in the menu bar at the top of the page.

Example page

Listed at the top of this page are the gene symbol, name, source of the name, and location (in chromosome and base pairs). The gene name refers to the HGNC symbol, for human, or the best match to a species-specific protein or mRNA.

Ensembl IDs begin with 'ENSG' for human. A three-letter code is inserted for other species. (For example, ENSMUSG is a mouse gene). The 11-digit number following each ID is unique for that gene, and stable (unchanging), unless the gene model drastically changes. Manually curated transcripts by VEGA/HAVANA are assigned an Ensembl Transcript ID when they are imported into the Ensembl transcript set.

TRANSCRIPT TABLE

The transcript table shows all protein-coding transcripts (splice variants), translations, and non-coding transcripts annotated within the gene. For more information about the transcript table and/or biotypes, see the transcript summary help page or the Ensembl glossary.

ADDITIONAL GENE INFORMATION

Below the transcript table is more information about the gene. This information includes:

  • Name - The HGNC name (for human), or the best match to a known gene name in a public database.
  • Synonyms - Other gene names used for this particular gene.
  • CCDS - If the gene has transcripts in the Consensus Coding Sequence Set, the CCDS IDs will be listed.
  • LRG - If there exists a stable Locus Reference Genomic (LRG) framework for this gene, the identifier and a link to the display are listed. For more information on the LRG project, please visit www.lrg-sequence.org.
  • Gene type - The gene type includes both status and biotype. More information about these properties are explained further down this page.
  • Prediction method - Indicates if automatic annotation and/or manual curation was used to determine transcripts belonging to this gene.
  • Alternative genes - Matching gene IDs from the VEGA/HAVANA project.

TRANSCRIPT DIAGRAM

All transcripts for a gene (including protein-coding splice-variants and non-coding transcripts) are listed in the table, and drawn in the diagram. Click on any Ensembl transcript ID in the transcript table (ENST, for human), or click any transcript in the diagram, to select one particular transcript.

Individual transcripts for a gene are drawn as boxes for exons and connecting lines for introns. Filled or darkened boxes show coding sequence, and empty boxes show UTR (Untranslated Region). Transcripts drawn above the chromosome (blue bar) are on the forward strand, while transcripts below are on the reverse strand.

  • Red or gold transcripts are protein coding. Gold transcripts and those with a CCDS have coding sequences that are well-supported and are unlikely to change. Gold transcripts are identical between manual curation from the VEGA/HAVANA project and the Ensembl automatic annotation pipeline. They are available for human, mouse, zebrafish, pig, and rat.
  • If you work with human, depending on factors such as which cell type/ tissue type you are working in, you may need to use one of the protein-coding transcripts not in these 'reviewed' sets (i.e. not with a CCDS ID, nor a gold transcript). Please see the general identifiers link at the left of the transcript tab. This will show you matching IDs in other databases, and may help you decide on a transcript.

  • Blue, pink or grey transcripts are noncoding. Go to the transcript summary help page for more information.

The number next to the transcript name lets you know if the transcript came from Havana manual curation (numbers beginning with 0, e.g. MYO6-001) or Ensembl automatic annotation (numbers beginning with 2, e.g. MYO6-201). Merged transcripts begin with 0.

STATUS

Genes can be classified according to their status, which indicates the type of evidence that supports the annotation:

  • Known gene has at least one transcript with a sequence match in a sequence repository external to Ensembl for the same species.
  • Known by Projection refers to genes that are homologous, based on Ensembl comparative analysis, to genes with Known status in another species (usually human genes).
  • Novel gene contains only transcripts that have a sequence match outside Ensembl for an alternate species. (Can be read as novel gene or transcript for this species).
  • Merged gene has at least one merged (gold) transcript. A merged transcript is a case where an identical sequence to the Ensembl prediction has been determined by the VEGA/HAVANA project. Two distinctions are made. If the coding sequence (CDS) is the same, but the UTR differs, the link to the Havana ID (OTT...) will read: "Havana transcript having same CDS". If the entire transcript (both UTR and CDS) is the same in Havana and Ensembl, the link to the Havana transcript will read: "Transcript having exact match between Ensembl and Havana". Merged genes are only available for human, mouse, and zebrafish.

BIOTYPE

Genes can be classified according to their biotype, an indicator of biological significance.

For human, mouse, zebrafish, pig, and rat, we incorporate manual annotation from HAVANA. Where a gene or transcript has been manually annotated, we use the manually assigned biotype. The full list of biotypes used by HAVANA is here.

Biotypes can be grouped into protein coding, pseudogene, long noncoding and short noncoding.

Finding biotype groupings

If you see a biotype in Ensembl and are not sure which biotype group it belongs to, you can check this by connecting to the latest ensembl_production database.

For Ensembl release 73, connect to database "ensembl_production_73" eg.

mysql -uanonymous -P3306 -hensembldb.ensembl.org -Densembl_production_73 -e "select distinct(name),biotype_group from biotype where db_type like '%core%' and is_current=1 order by biotype_group,name;"

For more about the Ensembl gene set, Vega/Havana, and the GENCODE set, please see the gene annotation articles.

Note: Links at the left of the page are for gene-related information only. For more specific information on the transcript level, such as the cDNA or protein sequence, click on a transcript (either an ENST ID in the table, or a transcript in the diagram). This will open the transcript tab.