Armadillo assembly and gene annotation

Name: Ensembl Armadillo Gene Set
Creator: Ensembl
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: genebuild, transcripts, transcription, alignment, loci

Assembly

This is the new release of the draft assembly of the armadillo (Dasypus novemcinctus) genome, Dasnov3.0, provided by the Baylor College of Medicine in Dec 2011. There are 46,558 scaffolds comprised of 314,971 contigs with a scaffold N50 of 1,717,291b and a contig N50 of 26,277b. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer.

Aim

The aim is to increase our understanding of functional elements, especially in the human genome. Dasypus novemcinctus is of particular interest to developmental biologists owing to its habit of producing litters of four genetically-identical young, and it is also the animal model for leprosy.

Other assemblies

ARMA (Ensembl release 54)

Gene annotation

Dasnov3.0 was annotated using the standard Ensembl gene annotation system, incorporating RNAseq data provided by BROAD Institute. Gene models are based on:

Models build from armadillo RNASeq data using our in-house RNASeq pipeline
Genewise alignments of UniProt protein sequences from mammal species
Exonerate alignments of Ensembl human proteins from Ensembl release 71

Protein-coding models were extended into their untranslated regions using RNASeq models. In addition to the coding transcript models, non-coding RNAs and pseudogenes were annotated.

RNASeq data set

In addition to the Ensembl gene set, we produced RNAseq-based gene models and an indexed BAM file for each sample used by the RNAseq pipeline and also for the merged data from all tissues. Each RNAseq-based gene model represents only the best supported transcript model. We did a BLASTp of these transcript models against UniProt proteins in order to annotate the open reading frame. The best BLAST hit is displayed as a transcript supporting evidence.

The tissue-specific sets of transcript models built using our RNAseq pipeline are as follows:

Tissue	Number of gene models
	Set 1	Set 2
Ascending Colon	17046	16937
Cerebellum with brainstem	17289	17405
Heart	14723	14786
Kidney	16047	16041
Liver	15643	15856
Lung	17891	18020
rt Quadricep	14450	14569
Spleen	16988	17093
Merged	21177

The RNAseq data were used to add UTR to protein coding models. Additional splice junctions (introns) may have been identified by our pipeline and not included in the best supported transcript model. We therefore provide users with the full set of introns identified by our RNAseq pipeline to enable further analysis. These introns were identified by searching for reads that splice when mapped to the genome.

Detailed information on genebuild (PDF)

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

Assembly	Dasnov3.0, INSDC Assembly GCA_000208655.2, Dec 2011
Base Pairs	3,631,522,711
Golden Path Length	3,631,522,711
Annotation provider	Ensembl
Annotation method	Mixed strategy build
Genebuild started	Jun 2013
Genebuild released	Dec 2013
Genebuild last updated/patched	May 2016
Database version	116.3

Gene counts

Gene/transcipt that contains an open reading frame (ORF).Coding genes	22,711
Non coding genes	9,163
Small non coding genes	5,510
Long non coding genes	3,179
Misc non coding genes	474
A gene that has homology to known protein-coding genes but contain a frameshift and/or stop codon(s) which disrupts the ORF. Thought to have arisen through duplication followed by loss of function.Pseudogenes	1,500
A transcript is the operational unit of a gene. In a genomic context, transcripts consist of one or more exons, with adjoining exons being separated by introns. The exons/introns are transcribed and then the introns spliced out. Transcripts may or may not encode a proteinGene transcripts	37,723

Other

Genscan gene predictions

64,600

Upcoming Ensembl Platform Transition

Armadillo assembly and gene annotation

Assembly

Aim

Other assemblies

Gene annotation

RNASeq data set

More information

Statistics

Summary

Gene counts

Other

About Us

Get help

Our sister sites

Follow us

Upcoming Ensembl Platform Transition

Favourite species

All species

Armadillo assembly and gene annotation

Assembly

Aim

Other assemblies

Gene annotation

RNASeq data set

More information

Statistics

Summary

Gene counts

Other

About Us

Get help

Our sister sites

Follow us