Coelacanth assembly and gene annotation

Name: Ensembl Coelacanth Gene Set
Creator: Ensembl
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: genebuild, transcripts, transcription, alignment, loci

Assembly

This is the preliminary display of the Coelacanth (Latimeria chalumnae) genome produced by the Broad Institute. Illumina techology was used to produced this high quality draft. The whole genome shotgun data was assembled with Allpaths. The genome is 2.9gb in length;,composed of 291,828 contigs with an N50 value of 12.6kb and 22,818 scaffolds with an N50 value of 765kb.

Gene annotation

The Ensembl genome annotation pipeline was used to identify genes. Models built from Coelacanth proteins and cDNAs have been given priority over predictions from other vertebrate species. 6,671 transcript models made from paired end Illumina RNA-Seq were added into the gene set where they added a novel model or splice variant. RNA-Seq data was also used to add UTR to non species specific models. The total gene set contains 19,697 protein coding genes with a further 2,894 ncRNAs and 141 pseudogenes.

Detailed information on genebuild (PDF)

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

Assembly	LatCha1, INSDC Assembly GCA_000225785.1, Sep 2011
Base Pairs	2,860,591,921
Golden Path Length	2,860,591,921
Annotation provider	Ensembl
Annotation method	Full genebuild
Genebuild started	Sep 2011
Genebuild released	Oct 2011
Genebuild last updated/patched	Nov 2012
Database version	115.1

Gene counts

Gene/transcipt that contains an open reading frame (ORF).Coding genes	19,569
Non coding genes	2,918
Small non coding genes	2,859
Misc non coding genes	59
A gene that has homology to known protein-coding genes but contain a frameshift and/or stop codon(s) which disrupts the ORF. Thought to have arisen through duplication followed by loss of function.Pseudogenes	141
A transcript is the operational unit of a gene. In a genomic context, transcripts consist of one or more exons, with adjoining exons being separated by introns. The exons/introns are transcribed and then the introns spliced out. Transcripts may or may not encode a proteinGene transcripts	26,660

Other

Genscan gene predictions

103,879

Coelacanth assembly and gene annotation

Assembly

Gene annotation

More information

Statistics

Summary

Gene counts

Other

About Us

Get help

Our sister sites

Follow us

Favourite species

All species

Coelacanth assembly and gene annotation

Assembly

Gene annotation

More information

Statistics

Summary

Gene counts

Other

About Us

Get help

Our sister sites

Follow us