C.savignyi assembly and gene annotation


The genome of a single Ciona savignyi from San Francisco Bay was shotgun-sequenced by the Broad Institute and assembled using Arachne2. The Sidow lab at Stanford used this as a basis for the assembly.

The assembly consists of 374 Reftigs, totalling 174 Megabases, with a Contig N50 of 141Kb and Reftig N50 size of 1800Kb.

Gene annotation

The standard Ensembl mammalian pipeline was modified for annotation of the Ciona savignyi genome, owing to the lack of genomic information from closely-related species. Thus, in addition to aligning known Ciona proteins to the sequence (as per the standard pipeline), we aligned Ciona-specific cDNA and EST sequences against the genome, and then used these in conjunction with protein data from other species to build additional gene models.

More information

General information about this species can be found in Wikipedia.



AssemblyCSAV 2.0, Oct 2005
Database version79.2
Base Pairs177,003,750
Golden Path Length

The golden path is the length of the reference assembly. It consists of the sum of all top-level sequences in the seq_region table, omitting any redundant regions such as haplotypes and PARs (pseudoautosomal regions).

Genebuild byEnsembl
Genebuild methodFull genebuild
Genebuild startedApr 2006
Genebuild releasedJun 2006
Genebuild last updated/patchedApr 2013

Gene counts

Coding genes

Genes and/or transcript that contains an open reading frame (ORF).

Non coding genes340
Small non coding genes

Small non coding genes are usually fewer than 200 bases long. They may be transcribed but are not translated. In Ensembl, genes with the following biotypes are classed as small non coding genes: miRNA, miscRNA, rRNA, scRNA, snlRNA, snoRNA, snRNA, and also the pseudogenic form of these biotypes. The majority of the small non coding genes in Ensembl are annotated automatically by our ncRNA pipeline. Please note that tRNAs are annotated separately using tRNAscan. tRNAs are included as 'simple fetaures', not genes, because they are not annotated using aligned sequence evidence.

Misc non coding genes7

A pseudogene shares an evolutionary history with a functional protein-coding gene but it has been mutated through evolution to contain frameshift and/or stop codon(s) that disrupt the open reading frame.

Gene transcriptsNucleotide sequence resulting from the transcription of the genomic DNA to mRNA. One gene can have different transcripts or splice variants resulting from the alternative splicing of different exons in genes.20,711


FGENESH gene prediction13,464
Genefinder gene prediction12,480
Genscan gene predictions12,655
Snap gene prediction35,571