News for Mouse Ensembl Release 80 (May 2015)

News categories

New web displays and tools

New export options for comparative views

As part of our ongoing upgrade of the export interface, Release 80 includes the following new features:


Sequence export has been added for both orthologues and paralogues. Unaligned gene sequences can be exported in FASTA format, whilst multiple alignments can be exported in all our usual BioPerl formats (including FASTA); both types can be exported as DNA or amino acids.

Note that at the moment you cannot filter the paralogues by species, as that option is not available within the paralogue page itself.

Gene Trees

Previously, the only export available from individual nodes of the tree was in FASTA and Newick formats. These links have been replaced with one that opens the export interface, so you can choose any of the same formats as the full tree export.

OrthoXML filtering

Exports in orthoXML now honour the current page settings, allowing you to export only the homologues you see.

New styles for BigWig files on karyotype

Last release we added the ability to display your BigWig data on whole chromosomes or the entire karyotype, as line graphs or histograms. In release 80 we have added two new track styles for line graphs:

1 - "raw mean", i.e. mean values scaled relative to one another, instead of relative to the maximum value in that region (the default setting)

2 - "mean with whiskers", which combines the line graph with "whiskers" to display the minimum and maximum values within each bin.

To use either style, go to the track in the "Configure this page" screen, and choose from the style dropdown.

New userdata track type: long-range interactions

We are very pleased to announce that Ensembl now supports long-range pairwise interaction data, which can be drawn as arcs on Region in Detail. Scores are indicated using a grey-to-black gradient, and labels can be displayed by selecting the appropriate track style from the configuration menu.

Initially we are supporting the two formats developed by WashU for their Epigenomics browser: a simple text-based format which can be pasted into the form or uploaded from your computer, and a richer tabix-indexed format that must be attached via a URL. More information on both formats can be found in our online documentation.

We hope to support more formats in the future, so please let us know which formats you are currently using!

New species, assemblies and genebuilds

Mouse: update to Ensembl-Havana GENCODE gene set

Updated Ensembl-Havana mouse gene set. This gene set is a merge of complete Ensembl gene models and the latest Havana gene annotation. All CCDS genes are included in this gene set.

Vega Mouse annotation updated

Manual annotation of mouse from Havana has been updated and contains the data released in Vega 60

Vega Zebrafish annotation updated

Manual annotation of zebrafish from Havana has been updated and contains the data released in Vega 59

New variation data

dbSNP142 import for mouse

The mouse GRCm38 variation database will be updated to dbSNP142

API and schema changes

start_lost to replace initiator_codon_variant consequence type

We will replace the use of initiator_codon_variant with the more specific start_lost. The difference between the two is largely semantic.

The new term protein_altering_variant will be used for variants within the protein which are not better described by any of its child terms 

API: new method to get the multiple alignment of several homologues

The new method is GeneTree::get_alignment_of_homologues($ref_member)

Other updates


Schema version update

79 -> 80

API: new methods to fetch data from a Gene / Transcript / DnaFrag

To improve the usability of our API, we'll add methods to fetch the data directly from the Core objects, without having to create Members and DnaFrags. Members and DnaFrags will still be used to represent the data on our side, though

D.rer GRCz10 pairwise alignments and syntenies

  • lastz D.rer vs T.rub (on D.rer)
  • lastz D.rer vs T.nig (on D.rer used to be T.nig)
  • lastz D.rer vs G.acu (on D.rer used to be G.acu)
  • lastz D.rer vs (on D.rer)
  • lastz D.rer vs O.nil (on D.rer)
  • lastz D.rer vs X.mac (on D.rer)
  • lastz D.rer vs L.ocu (on D.rer)
  • lastz D.rer vs G.mor (on D.rer)
  • lastz D.rer vs A.mex (on D.rer)
  • lastz D.rer vs P.for (on D.rer)
  • lastz D.rer vs (on
  • lastz D.rer vs (on
  • lastz D.rer vs M.mus (on M.mus)
  • lastz D.rer vs X.tro (on D.rer)
  • lastz D.rer vs L.cha (on D.rer)
  • lastz D.rer vs P.mar (on D.rer)
  • lastz D.rer vs C.sav (on D.rer)
  • lastz D.rer vs (on D.rer)

Synteny maps will be generated when both species have their karyotype stored in the database

R.nor Rnor_v6.0 pairwise alignments and syntenies

  • lastz R.nor vs M.mus (on M.mus) + synteny
  • lastz R.nor vs (on + synteny

D.rer GRCz10 multiple alignments

  • 5-way fish EPO alignments
  • 11-way fish EPO-2X alignments

R.nor Rnor_v6.0 multiple alignments

  • 17-way eutherian EPO alignments
  • 39-way eutherian EPO-2X alignments
  • 23-way amniota MercatorPecan alignments

We will also regenerate the "Age Of Base" human track from the new 17way EPO MSA


External database references update

Xrefs update for:

danio_rerio (zebrafish), homo_sapiens (human), rattus_norvegicus (rat), mus_musculus, (mouse), gasterosteus_aculeatus (stickleback), latimeria_chalumnae (coealacanth), ciona_savignyi, anas_platyrhynchos (duck), tupaia_belangeri (tree shrew), erinaceus_europaeus (hedgehog), echinops_telfairi (tenrec), ictidomys_tridecemlineatus (squirrel), oryctolagus_cuniculus (rabbit), pelodiscus_sinensis (softshell turtle), ficedula_albicollis (flycatcher), papio_anubis (olive baboon)

Ensembl VM Build

The Ensembl Virtual Machine applicance will be updated to version 80.

patch_79_80_a.sql - schema_version update

Update schema_version in meta table to 80.

Stable ID lookup

Stable ID lookup provided for REST services

Includes lookup for RefSeq and CCDS entries

patch_79_80a.sql - schema_version update in production db

Update schema_version in production database to 80.

patch_79_80_a.sql - schema_version update in ontology db

Update schema_version in meta table to 80.


increase length of dbprimary_acc column in xref table


increase length of synonym column in seq_region_synonym table


increase length of value column in genome_statistics table


patch_79_80_c - stable_id changed to varchar

The regulatory_feature.stable_id field was changed form an int to a varchar. API support was implemented to handle this.

Micro Array Mapping

Microarray updates

Micro array mappping was carried out for species with updated gene builds:

  • Rat
  • Zebrafish

Added the missing transcript annotations for 

  • HuEx-1_0-st-v2
  • HuGene-1_0-st-v1
  • HuGene-2_0-st-v1

Added new Human Affymetrix array:

  • HTA-2_0 

Human Segmentation adjacent feature merge

Adjacent segmentation features with the same segmentation classification were merged into a single feature.

patch_79_80_b dbfile_registry unique key

A unique key patch was applied to the dbfile_registry table.


Adding matrix method to BindingMatrix to store the matrix array. This will be used to generate the frequency string.

patch_79_80_a.sql - schema_version update

patch_79_80_a.sql - schema_version update


Mouse: updated cDNA alignments

A new cdna database will be created for e80: The latest set of cDNAs for mouse from the European Nucleotide Archive and NCBI RefSeq will be aligned to the current genome using Exonerate.

Mouse: updated RefSeq gene import

The imported RefSeq gene set was updated in the mouse otherfeatures database. Please note that RefSeq annotates gene models on cDNA sequence and not on the reference genome, meaning that when users choose to translate the RefSeq transcripts off the reference genome that the translations may contain stop codons.

Updated mouse otherfeatures db: New CCDS import

This release of the mouse gene set also includes nn,nnn transcript models as part of an updated version (March 2015) of CCDS


EMBL and Genbank Dumps

EMBL and Genbank dumps for all species.

Ensembl 80 mart databases

  • Ensembl Genes 80
    • Added Protein domains start and end attributes (protein-based coordinate)
    • Renamed "ENCODE region" filter to "ENCODE Pilot Regions", added a link to the publication (
    • Renamed the following Uniprot filters and attributes
      • "UniProt/Swissprot ID" to "UniProt/Swissprot Accession"
      • "UniProt/TrEMBL ID" to "UniProt/Swissprot Accession"
      • "UniProt Genename ID" to "UniProt Gene Name"
      • "Uniprot Genename Transcript Name" to "Uniprot Transcript Name"
    • Updated rat (Rnor_6.0) and zebrafish (GRCz10) assemblies
    • New Phenotype source filter in the "Phenotype" filter section
    • Renamed "APPRIS principal isoform annotation" filter and attributes to "APPRIS annotation"
  • Ensembl Variation 80
    • New Phenotype source filter in the "General variation" filter section
    • Updated rat (Rnor_6.0) and zebrafish (GRCz10) assemblies
  • Ensembl Regulation 80
  • Vega 60
    • Updated rat (Rnor_6.0) and zebrafish (GRCz10) assemblies
    • Added Protein domains start and end attributes (protein-based coordinate)

External reference projection

Gene ontology (GO) identifiers and gene name projection to all species.

FASTA & GTF dumps

FASTA & GTF dumps for all the species

New Ensembl BioMart documentation

We now have a brand new Ensembl BioMart documentation, we have re-organised, updated and added the following new pages:

  • Combining multiple species datasets
  • BiomaRt, Bioconductor R package

  • BioMart perl API

  • BioMart RESTful access (Perl and wget)

You can find the new documentation on the following page:


Phenotype data updates

  • Human phenotype data will be updated from different sources including NHGRI-EBI GWAS, OMIM, ClinVar, UniProt, DDG2P and Decipher.
  • OMIA data for Cow, Dog, Zebrafish, Horse, Cat, Chicken, Macaque, Turkey, Sheep and Chimpanzee
  • RGD data for Rat
  • AnimalQTL for Cow, Horse, Chicken, Pig
  • ZFIN for Zebrafish
  • EuroPhenome, 3i, IMPC, MGP for Mouse


Deprecation of Sanger::Graphics

To improve long-term maintainability of the Perl GD drawing code, in release 79 we moved all necessary functionality from the Sanger::Graphics namespace into EnsEMBL::Draw. All Sanger::Graphics modules have been deprecated as of release 80, and they will be removed in release 82.

This change should only affect developers whose code calls methods directly from Sanger::Graphics instead of inheriting from EnsEMBL::Draw modules.

Note that a few helper modules, including ColourMap, have been moved into EnsEMBL::Draw::Utils.

New web dependency - ensembl-io

Starting with release 80, the Ensembl webcode will have an additional dependency: ensembl-io, our Git repository for file-parsing code. This can be checked out from GitHub the same way as our other repos, and is also included in the "web" group of repos used by ensembl-git-tools, so that the command

git ensembl --clone web

will automatically clone ensembl-io in addition to existing web code.

Initially this dependency will only affect variation data, but the plan is to integrate the new parsers into other areas of the website such as user uploads. Deprecation of the old parser modules will be announced in due course.

Gene Expression Atlas Widget

The Gene Expression Atlas widget has been embedded in ensembl. You can now view where the gene is expressed anatomically (where exactly in the species) and also which experiment it is associated with.;r=15:65658046-65792293 

The code has been added to the widgets plugin.

GlyphSets at risk

The following GlyphSets are unused in the core Ensembl webcode. Third-party integrators should be aware that they are at risk of deletion in a future release. If you have any use for the following GlyphSets, please contact the Ensembl team.

_text, ctcf, fg_wiggle, GlyphSet_feature, histone_modifications, ld2, lsv_variations, missing, P_protdas, P_separator, preliminary, restrict, simple_histogram, tsv_missing, urlfeature, Vrefseqs

Tools (BLAST & BLAT) gap initiation update

Gap initiation and extension penalties has been updated/changed to reflect the correct order which is dependant on the matrix (BLOSUM and PAM) and the same has been done for BLASTN which is dependant on the match and mismatch score.

Download the ncRNA secondary-structure as SVG

You can now download the ncRNA secondary structure view as  SVG. A link has been added at the top of the view.

Track label improvements for images

Some tracks in images now appear within sections, grouping common tracks within a category.

Each section is identified by a heading underlined in a certain colour, and each track within that section by the same colour being used on its left-hand side.

Also, some tracks now have labels within the image itself, to allow particularly long labels. These in-image labels can be configured on or off via the configuration panel.

Initially these features are primarily targeted at trackhub support, but will be increasingly used for other tracks as the opportunity is identified.