Platyfish assembly and gene annotation

Assembly

Platyfish are members of the Poeciliidae family. The genus Xiphophorus is composed of 27 described species of both platyfish and swordtails that are found from northeastern Mexico (Coahuila) as far south as Honduras (2200 km). The Xiphophorus maculatus fish utilized for genome sequencing was a female of the 104th inbred generation from the X. maculatus/Jp 163 A, a line maintained at The Xiphophorus Genetic Stock Center. This release features the first preliminary assembly of the X. maculatus/Jp 163 A genome, XipMac4.4.2, provided by The Genome Institute, Washington University School of Medicine. This whole genome shotgun assembly was produced from two independent assemblies built with all sequence data, using the Newbler and PCAP algorithms from ~19.6X total sequence coverage. A merged assembly was achieved using graph accordance. The genome assembly displayed here comprises 20,640 scaffolds with an N50 of 1.3 Mb. These scaffolds have been assembled from a set of 67,070 contigs with an N50 of 22.3 kb. The final ungapped sequence length is 653 Mb.

The genome assembly represented here corresponds to GenBank Assembly ID GCA_000241075.1

Gene annotation

XipMac4.4.2 was annotated using the Ensembl genebuild pipeline. Gene models are based on

  • Genewise alignments of UniProt protein sequences from platyfish
  • Models build from platyfish RNASeq data using our in-house RNASeq pipeline
  • Genewise alignments of UniProt protein sequences from other fish and vertebrate species
  • Exonerate alignments of Ensembl Stickleback and Zebrafish proteins from Ensembl release 65

Protein-coding models were extended into their untranslated regions using RNASeq models. In addition to the coding transcript models, non-coding RNAs and pseudogenes were annotated.

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyXipmac4.4.2, INSDC Assembly GCA_000241075.1, Jan 2012
Database version76.1
Base Pairs652,815,383
Golden Path Length729,664,433
Genebuild byEnsembl
Genebuild methodFull genebuild
Genebuild startedJan 2012
Genebuild releasedSep 2012
Genebuild last updated/patchedApr 2013

Gene counts

Coding genes

Genes and/or transcript that contains an open reading frame (ORF).

20,379
Small non coding genes

Small non coding genes are usually fewer than 200 bases long. They may be transcribed but are not translated. In Ensembl, genes with the following biotypes are classed as small non coding genes: miRNA, miscRNA, rRNA, tRNA, scRNA, snlRNA, snoRNA, snRNA, tRNA, and also the pseudogenic form of these biotypes. The majority of the small non coding genes in Ensembl are annotated automatically by our ncRNA pipeline.

372
Pseudogenes

A pseudogene shares an evolutionary history with a functional protein-coding gene but it has been mutated through evolution to contain frameshift and/or stop codon(s) that disrupt the open reading frame.

28
Gene transcriptsNucleotide sequence resulting from the transcription of the genomic DNA to mRNA. One gene can have different transcripts or splice variants resulting from the alternative splicing of different exons in genes.20,854

Other

Genscan gene predictions71,805