Amazon molly assembly and gene annotation

Assembly

The Amazon molly (Poecilia formosa) genome sequence was produced in October 2013 by the Aquatic Genome Models Consortium.

The genome is 1Gb in length, consisting of 3,985 toplevel sequences, all of which are unplaced scaffolds (from 31,058 contigs). The N50 of the contigs of the submitted assembly is 57.47 Kb and the N50 of the scaffolds is 1.574 Mb. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer.

The genome assembly represented here corresponds to GenBank Assembly ID GCA_000485575.1

Gene annotation

The gene set was built using a mixed approach. Due to the lack of species-specific sequences and the availability of RNASeq data for Amazon molly from Washington University, the final gene set comprises models based on orthologous proteins from the vertebrate division of UniProtKB, longest translations of some stickleback gene models from Ensembl 73, as well as models from RNASeq data.

8162 gene models were made exclusively from RNASeq data. The data were also used to add UTR to gene models. The total gene set contains 23615 protein-coding genes with a further 679 ncRNAs and 60 pseudogenes.

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyPoecilia_formosa-5.1.2, INSDC Assembly GCA_000485575.1, Oct 2013
Base Pairs714,197,265
Golden Path Length748,923,461
Annotation providerEnsembl
Annotation methodFull genebuild
Genebuild startedNov 2013
Genebuild releasedJul 2014
Genebuild last updated/patchedAug 2014
Database version90.512

Gene counts

Coding genes23,615
Non coding genes679
Small non coding genes665
Misc non coding genes14
Pseudogenes60
Gene transcripts31,637

Other

Genscan gene predictions45,660

About this species