Platypus assembly and gene annotation


The platypus (Ornithorhynchus anatinus) genome of a female nicknamed "Glennie" (collected at the Upper Barnard River on Glen Rock Station, New South Wales) was sequenced to a total of 6x whole genome coverage. The sequencing strategy we utilized, combined whole genome shotgun plasmid, fosmid and BAC end sequences. The combined sequence reads were assembled using the PCAP software (Genome Res. 13(9):2164-70 2003). This draft sequence assembly submitted to Genbank is referred to as Ornithorhynchus_anatinus-5.0. The database now contains the longer range mapping of the sequence onto Ultracontigs and Chromosomes. Although some of the Supercontigs are mapped to chromosomes, these only represent 21% of the platypus DNA, so we have not emphasised a chromosomal view of platypus for the current release.Future improvements to the platypus draft sequence assembly will be dependent on the availability of funding and improvements to existing assembler software. Funding for the sequencing of the platypus genome was provided by the National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH).

The genome assembly represented here corresponds to GCF_000002275.2

Gene annotation

The gene set for Platypus was built using a modified version of the standard Ensembl genebuild pipeline, using available cDNA evidence to add UTRs and improve the protein-based gene models. However, this initial geneset was limited by the lack of species-specific evidence. The gene models were assessed by generating sets of potential orthologs to genes from other mammalian species and chicken. Potentially missing predictions and partial gene predictions were identified by examining the orthologs, and exonerate was to align orthologous human and chicken peptides in order to build new gene models. We have now extended the initial gene set using recently released cDNA data from 454 sequencing, plus additional annotation from the Oxford Functional Genomics group. These data have enabled us both to clarify existing models and to add additional transcripts.

More information

General information about this species can be found in Wikipedia.



AssemblyOANA5, INSDC Assembly GCF_000002275.2, Dec 2005
Database version80.1
Base Pairs1,917,748,604
Golden Path Length

The golden path is the length of the reference assembly. It consists of the sum of all top-level sequences in the seq_region table, omitting any redundant regions such as haplotypes and PARs (pseudoautosomal regions).

Genebuild byEnsembl
Genebuild methodFull genebuild
Genebuild startedJan 2007
Genebuild releasedAug 2007
Genebuild last updated/patchedAug 2012

Gene counts

Coding genes

Genes and/or transcript that contains an open reading frame (ORF).

Non coding genes3,871
Small non coding genes

Small non coding genes are usually fewer than 200 bases long. They may be transcribed but are not translated. In Ensembl, genes with the following biotypes are classed as small non coding genes: miRNA, miscRNA, rRNA, scRNA, snlRNA, snoRNA, snRNA, and also the pseudogenic form of these biotypes. The majority of the small non coding genes in Ensembl are annotated automatically by our ncRNA pipeline. Please note that tRNAs are annotated separately using tRNAscan. tRNAs are included as 'simple fetaures', not genes, because they are not annotated using aligned sequence evidence.

Misc non coding genes27

A pseudogene shares an evolutionary history with a functional protein-coding gene but it has been mutated through evolution to contain frameshift and/or stop codon(s) that disrupt the open reading frame.

Gene transcriptsNucleotide sequence resulting from the transcription of the genomic DNA to mRNA. One gene can have different transcripts or splice variants resulting from the alternative splicing of different exons in genes.28,002


Genscan gene predictions133,723
Short Variants1,487,771