The Nile tilapia (Oreochromis niloticus) genome produced by the Broad Institute. Illumina technology was used to produced this high quality draft. The whole genome shotgun data was assembled with Allpaths-LG. It is composed of 77578 contigs with an N50 value of 29.5kb and 13517 scaffolds with an N50 value of 2.8Mb.
The genome assembly represented here corresponds to GenBank Assembly ID GCA_000188235.1
The gene set was built using a mixed approach. First the Ensembl pipeline was used to generate 195657 models from orthologous vertebrate proteins from UniprotKB with a protein existence level of 1 or 2. Then, due to the lack of species specific sequences and the availability of RNA-Seq for tilapia, we used 700M paired-end reads sequenced by the Broad Institute. The RNA-Seq data contains 11 tissue types: blood, brain, embryo, eye, heart, kidney, liver, muscle, ovary, skin and testis. We pooled the tissues to avoid creating too many fragmented models. Using the RNA-Seq pipeline, we created 40899 models from the pooled set. By combining the orthologous set, the RNA-Seq set and our ncRNA pipeline we built the final gene set: 21437 protein coding gene models, 22 pseudogenes, 3 retrotransposed and 821 non coding RNA.
RNASeq data set
In addition to the main set, we have predicted gene models for each tissue type using the RNA-Seq pipeline. We did a BLASTp of these models against UniProt proteins of protein existence level 1 and 2 in order to confirm the open reading frame (ORF). The best BLAST hit is displayed as a transcript supporting evidence.
The tissue-specific sets of transcript models built using our RNAseq pipeline are as follows:
|Tissue||Number of gene models|
General information about this species can be found in Wikipedia.
|Assembly||Orenil1.0, INSDC Assembly GCA_000188235.1, Jan 2011|
|Golden Path Length||927,383,394|
|Annotation method||Mixed strategy build|
|Genebuild started||May 2011|
|Genebuild released||Mar 2012|
|Genebuild last updated/patched||Oct 2016|
|Non coding genes||5,626|
|Small non coding genes||808|
|Long non coding genes||4,805|
|Misc non coding genes||13|
|Genscan gene predictions||51,668|