Long-tailed chinchilla (ChiLan1.0)

Long-tailed chinchilla assembly and gene annotation

Assembly

The ChiLan1.0 assembly was submitted by Broad Institute on 2012/08/28 . The assembly is on the Scaffold level, consisting of 81,656 assembled into 2,839 scaffolds. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer. The N50 length for the contigs is 61,105 while the scaffold N50 is 21,893,125.

Gene annotation

The long-tailed chinchilla (Chinchilla lanigera), also called the Chilean, coastal, common chinchilla, or lesser chinchilla, is one of two species of rodents from the genus Chinchilla, the other species being Chinchilla chinchilla. Wild populations of C. lanigera occur in Aucó, near Illapel, IV Región, Chile (31°38’S, 71°06’W), in Reserva Nacional Las Chinchillas and in La Higuera, about 100 km (62 mi) north of Coquimbo (29°33’S, 71°04’W) Chilean chinchillas were reported from Talca (35°30’S), Chile, reaching north to Peru and eastward from Chilean coastal hills throughout low mountains. By the mid-19th century, Chilean chinchillas were not found south of the Choapa River.

The gene annotation process was carried out using a combination of protein-to-genome alignments, annotation mapping from a suitable reference species and RNA-seq alignments (where RNA-seq data with appropriate meta data were publicly available). For each candidate gene region, a selection process was applied to choose the most appropriate set of transcripts based on evolutionary distance, experimental evidence for the source data and quality of the alignments. Small ncRNAs were obtained using a combination of BLAST and Infernal/RNAfold. Pseudogenes were calculated by looking at genes with a large percentage of non-biological introns (introns of <10bp), where the gene was covered in repeats, or where the gene was single exon and evidence of a functional multi-exon paralog was found elsewhere in the genome. lincRNAs were generated via RNA-seq data where no evidence of protein homology or protein domains could be found in the transcript.

In accordance with the Fort Lauderdale Agreement , please check the publication status of the genome/assembly before publishing any genome-wide analyses using these data.

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyChiLan1.0, INSDC Assembly GCA_000276665.1, May 2012
Base Pairs2,390,868,971
Golden Path Length2,390,868,971
Annotation providerEnsembl
Annotation methodFull genebuild
Genebuild startedNov 2016
Genebuild releasedJul 2017
Genebuild last updated/patchedJul 2017
Database version111.1

Gene counts

Coding genes17,809
Non coding genes11,170
Small non coding genes3,127
Long non coding genes7,065
Misc non coding genes978
Pseudogenes282
Gene transcripts39,338

Other

Genscan gene predictions55,839