Armadillo assembly and gene annotation


This is the new release of the draft assembly of the armadillo (Dasypus novemcinctus) genome, Dasnov3.0, provided by the Baylor College of Medicine in Dec 2011. There are 46,558 scaffolds comprised of 314,971 contigs with a scaffold N50 of 1,717,291b and a contig N50 of 26,277b. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer.


The aim is to increase our understanding of functional elements, especially in the human genome. Dasypus novemcinctus is of particular interest to developmental biologists owing to its habit of producing litters of four genetically-identical young, and it is also the animal model for leprosy.

The genome assembly represented here corresponds to GenBank Assembly ID GCA_000208655.2

Other assemblies

Gene annotation

Dasnov3.0 was annotated using the standard Ensembl gene annotation system, incorporating RNAseq data provided by BROAD Institute. Gene models are based on:

  • Models build from armadillo RNASeq data using our in-house RNASeq pipeline
  • Genewise alignments of UniProt protein sequences from mammal species
  • Exonerate alignments of Ensembl human proteins from Ensembl release 71

Protein-coding models were extended into their untranslated regions using RNASeq models. In addition to the coding transcript models, non-coding RNAs and pseudogenes were annotated.

RNASeq data set

In addition to the Ensembl gene set, we produced RNAseq-based gene models and an indexed BAM file for each sample used by the RNAseq pipeline and also for the merged data from all tissues. Each RNAseq-based gene model represents only the best supported transcript model. We did a BLASTp of these transcript models against UniProt proteins in order to annotate the open reading frame. The best BLAST hit is displayed as a transcript supporting evidence.

The tissue-specific sets of transcript models built using our RNAseq pipeline are as follows:

TissueNumber of gene models
Set 1Set 2
Ascending Colon1704616937
Cerebellum with brainstem1728917405
rt Quadricep1445014569

The RNAseq data were used to add UTR to protein coding models. Additional splice junctions (introns) may have been identified by our pipeline and not included in the best supported transcript model. We therefore provide users with the full set of introns identified by our RNAseq pipeline to enable further analysis. These introns were identified by searching for reads that splice when mapped to the genome.

More information

General information about this species can be found in Wikipedia.



AssemblyDasnov3.0, INSDC Assembly GCA_000208655.2, Dec 2011
Database version79.3
Base Pairs3,299,882,059
Golden Path Length

The golden path is the length of the reference assembly. It consists of the sum of all top-level sequences in the seq_region table, omitting any redundant regions such as haplotypes and PARs (pseudoautosomal regions).

Genebuild byEnsembl
Genebuild methodMixed strategy build
Genebuild startedJun 2013
Genebuild releasedDec 2013
Genebuild last updated/patchedDec 2013

Gene counts

Coding genes

Genes and/or transcript that contains an open reading frame (ORF).

Non coding genes5,984
Small non coding genes

Small non coding genes are usually fewer than 200 bases long. They may be transcribed but are not translated. In Ensembl, genes with the following biotypes are classed as small non coding genes: miRNA, miscRNA, rRNA, scRNA, snlRNA, snoRNA, snRNA, and also the pseudogenic form of these biotypes. The majority of the small non coding genes in Ensembl are annotated automatically by our ncRNA pipeline. Please note that tRNAs are annotated separately using tRNAscan. tRNAs are included as 'simple fetaures', not genes, because they are not annotated using aligned sequence evidence.

Misc non coding genes474

A pseudogene shares an evolutionary history with a functional protein-coding gene but it has been mutated through evolution to contain frameshift and/or stop codon(s) that disrupt the open reading frame.

Gene transcriptsNucleotide sequence resulting from the transcription of the genomic DNA to mRNA. One gene can have different transcripts or splice variants resulting from the alternative splicing of different exons in genes.34,035


Genscan gene predictions64,600