Platypus assembly and gene annotation

Assembly

The platypus (Ornithorhynchus anatinus) genome of a female nicknamed "Glennie" (collected at the Upper Barnard River on Glen Rock Station, New South Wales) was sequenced to a total of 6x whole genome coverage. The sequencing strategy we utilized, combined whole genome shotgun plasmid, fosmid and BAC end sequences. The combined sequence reads were assembled using the PCAP software (Genome Res. 13(9):2164-70 2003). This draft sequence assembly submitted to Genbank is referred to as Ornithorhynchus_anatinus-5.0. The database now contains the longer range mapping of the sequence onto Ultracontigs and Chromosomes. Although some of the Supercontigs are mapped to chromosomes, these only represent 21% of the platypus DNA, so we have not emphasised a chromosomal view of platypus for the current release.Future improvements to the platypus draft sequence assembly will be dependent on the availability of funding and improvements to existing assembler software. Funding for the sequencing of the platypus genome was provided by the National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH).

The genome assembly represented here corresponds to GCF_000002275.2

Gene annotation

The gene set for Platypus was built using a modified version of the standard Ensembl genebuild pipeline, using available cDNA evidence to add UTRs and improve the protein-based gene models. However, this initial geneset was limited by the lack of species-specific evidence. The gene models were assessed by generating sets of potential orthologs to genes from other mammalian species and chicken. Potentially missing predictions and partial gene predictions were identified by examining the orthologs, and exonerate was to align orthologous human and chicken peptides in order to build new gene models. We have now extended the initial gene set using recently released cDNA data from 454 sequencing, plus additional annotation from the Oxford Functional Genomics group. These data have enabled us both to clarify existing models and to add additional transcripts.

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyOANA5, INSDC Assembly GCF_000002275.2, Dec 2005
Database version75.1
Base Pairs1,917,748,604
Golden Path Length2,073,148,626
Genebuild byEnsembl
Genebuild methodFull genebuild
Genebuild startedJan 2007
Genebuild releasedAug 2007
Genebuild last updated/patchedAug 2012

Gene counts

Coding genes

Genes and/or transcript that contains an open reading frame (ORF).

21,698
Short non coding genes

Short non coding genes are usually fewer than 200 bases long. They may be transcribed but are not translated. In Ensembl, genes with the following biotypes are classed as short non coding genes: miRNA, miscRNA, rRNA, tRNA, ncRNA, scRNA, snlRNA, snoRNA, snRNA, tRNA, and also the pseudogenic form of these biotypes. The majority of the short non coding genes in Ensembl are annotated automatically by our ncRNA pipeline.

3,871
Pseudogenes

A pseudogene shares an evolutionary history with a functional protein-coding gene but it has been mutated through evolution to contain frameshift and/or stop codon(s) that disrupt the open reading frame.

547
Gene transcriptsNucleotide sequence resulting from the transcription of the genomic DNA to mRNA. One gene can have different transcripts or splice variants resulting from the alternative splicing of different exons in genes.28,002

Other

Genscan gene predictions133,723
Short Variants1,487,771

InterPro Hits