Ensembl News for Release 78 (December 2014)


News categories

New web displays and tools

Updated export for homologues and trees (Mouse)

Phase 3 of our export update includes the remaining comparative genomics views, as follows.

Orthologues and Paralogues

Orthologues and paralogues can now be exported as OrthoXML only, using the blue download button above the summary table.

In addition, homologue alignments - both cDNA and protein - are now available in the folllowing formats:

  • Mega
  • MSF
  • Nexus
  • OrthoXML
  • Pfam
  • Phylip
  • PSI
  • Stockholm

To find this export, go to an Orthologue or Paralogue page, click on 'Alignment (protein)' or 'Alignment (cDNA)' in the table and then click on the blue button labelled 'Export alignment'.

Gene trees

The pages 'Gene tree (text)' and 'Gene tree (alignment)' have been removed and replaced with a single export interface. Click on the download icon in the blue bar above the gene tree image to open the export panel.

Available formats:

  • Mega
  • MSF
  • NHX
  • Newick
  • Nexus
  • OrthoXML
  • PhyloXML
  • Pfam
  • Phylip
  • PSI
  • Stockholm
  • Text tree

Gene gain/loss trees

This new option allows you to export gene gain/loss trees in PhyloXML format.

Protein families

Another new export option is the ability to export all the protein alignments for a family. From the Gene Family page, click on a family stable ID and on the following page you will see a blue 'Export family alignment' button above the table of proteins.

Note: this function is only available for families with two or more genes.

Changes to legacy export interface

The PhyloXML and OrthoXML options have been removed from the list on the 'Export Data" popup panel, as they are no longer supported. Please export these formats from the relevant pages.

RNASeq track display (Mouse)

RNASeq tracks can now be displayed as either a "wiggle track" summary (the default) or as individual features; the summary version uses a BigWig file created from the BAM file. This speeds up the display and allows us to show this information over larger regions than is possible using the raw data.

Improved RTF export (Mouse)

We have now integrated many of the options from the "Configure This Page" interface, e.g. variation filtering, into the RTF export so that any changes you have made to the page will be reflected in your RTF download. You can also change these options in the export interface just before you download, but these changes will not be copied back to the page itself.

New species, assemblies and genebuilds

Vega Mouse annotation updated (Mouse)

Manual annotation of mouse from Havana has been updated and contains the data released in Vega 58

API and schema changes

Variation API changes (Mouse)

  • Add a new object "Source" handling the source data used in several Variation API objects.

Adjust Unmapped_reason table (Mouse)

Change unmapped_reason_id from smallint to int to match Regulation requirements

Other updates

Update to Ensembl-Havana mouse merge (Mouse)

Updated Ensembl-Havana mouse gene set. This gene set is a merge of complete Ensembl gene models and the latest Havana gene annotation. All CCDS genes are included in this gene set.

The mouse GRCm38.p3 gene annotation is also included:

The patches for GRCm38.p3 were annotated using a combination of annotation projected from the primary assembly and annotation derived from cDNA and protein alignment evidence. Annotation of the patches is stored in the core database.

Mouse: assembly updated to GRCm38.p3 (Mouse)

The mouse genome assembly was updated to GRCm38.p3 and the assembly information in all mouse databases has been altered accordingly. This minor assembly update contains 17 assembly patches. The DNA sequence for the primary assembly (chromosomes, unlocalized scaffolds and unplaced scaffolds) remains unchanged.

Updated mouse otherfeatures db: New CCDS import (Mouse)

This release of the mouse gene set also includes nn,nnn transcript models as part of an updated version (Month 2014) of CCDS

Mouse: GRCm38.p3 Karyotype Bands (Mouse)

Karyotype bands were updated in regions overlapping patches

Mouse: updated cDNA alignments (Mouse)

A new cdna database was created for e78: The latest set of cDNAs for mouse (as of Month 2014) from the European Nucleotide Archive and NCBI RefSeq (release nn) were aligned to the current genome using Exonerate.

Ensembl 78 mart databases (Mouse)

  • Ensembl Genes 78
    • Updated mouse assembly to GRCm38.p3
    • Added "Transcript length" in the Features, Structures and Sequences pages of the attribute section
    • Removed DN/DS attributes for species that don't have the data
    • Added "Transcript Support Level (TSL)" to the attribute section
    • Added "GENCODE basic annotation" to the attribute section
    • Added "APPRIS principal isoform annotation" to the attribute section
    • Added "Transcript Type" to the filter section
    • Added QTL data for Horse
    • Renamed "Gene biotype" and "Transcript biotype" to "Gene type" and "Transcript type" in the attribute section
  • Ensembl Variation 78
    • Updated mouse assembly to GRCm38.p3
    • Added "Chromosome postition end (bp)" in filter and attribute sections (Short Variation datasets only)
    • Added band filter to all the structural variation datasets
    • Added marker filter to all the structural variation datasets
    • Added default attributes for all the structural variation datasets
    • Renamed the following attributes
      • "1000 genomes global MAF (ALL) to "1000 genomes global Minor Allele Frequency (ALL)
      • "1000 genomes global MAC (ALL) to "1000 genomes global Minor Allele Count (ALL)
      • "Position on chromosome (bp) to "Chromosome position start (bp)"
      • "sequence region start (bp)" to "Chromosome position start (bp)" (structural variation datasets only)
      • "sequence region end (bp)" to "Chromosome position end (bp)" (structural variation datasets only)
  • Ensembl Regulation 78
    • Updated mouse assembly to GRCm38.p3
  • Vega 58
    • Updated mouse assembly to GRCm38.p3
    • Added "Transcript Type" to the filter section

Merge species: updated RefSeq gene import (Mouse)

RefSeq GFF3 annotation were added to the otherfeatures databases. Please note that RefSeq annotates gene models on cDNA sequence and not on the reference genome, meaning that when users choose to translate the RefSeq transcripts off the reference genome that the translations may contain stop codons. In this release we will also include Immunoblobulin genes, tRNAscan-SE source data and MT genes.

Micro-array mapping (Mouse)

All species which had a genome assembly or transcript update had the appropriate alignments and xref redone.

This includes a correction to the Human array xrefs, where some AFFY_ST array xrefs were missing and others array formats had release 76 xrefs 

EMBL and Genbank Dumps (Mouse)

EMBL and Genbank dumps for all species.

External reference projection (Mouse)

Gene ontology (GO) identifiers and gene name projection to all species.

FASTA & GTF dumps (Mouse)

FASTA & GTF dumps for all the species

External database references update (Mouse)

Xrefs update for:

human, mouse, rat, cod, turkey, cat, marmoset, guinea pig, platyfish, tetraodon, orangutan, microbat, nile tilapia, gibbon, ferret, cave fish, spotted gar.

Stable ID lookup (Mouse)

Stable ID lookup provided for REST services

Includes lookup for RefSeq and CCDS entries

patch_77_78_b.sql - source column size increase (Mouse)

New Refseq data requires a larger source column in the gene and transcript tables.

patch_77_78_a.sql - schema_version update in ontology db (Mouse)

Update schema_version in meta table to 78.

patch_77_78a.sql - schema_version update in production db (Mouse)

Update schema_version in production database to 77.

LRG Import (Mouse)

Importing the latest version of Locus Reference Genomic dataset

Database schema change (Mouse)

  • Add a column "copy_number" in the table "structural_variation" to store the number of copies for the CNV, at the supporting evidence level.
  • Delete the table "study_variation"
  • Update the index "type_val_idx" in the "attrib" table by extending the indexed size for the "value" column (currently limited to 40 characters).

REST support for multiple sequences (Mouse)

The sequence endpoint should accept requests for multiple sequences in a POST request.

ProteinTrees and homologies (Mouse)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.27+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.017)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

Protein Families (Mouse)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Getting distances by NCBI BlastP (v.2.2.28+)
 -- Clustering by MCL (v.12-135)
 -- Multiple Sequence Alignments with MAFFT (v.7.113)
 -- Family stable ID mapping

ncRNAtrees and homologies (Mouse)

Classification based on Rfam models (v11.0)
Multiple sequence alignments with Infernal
Phylogenetic reconstruction using RAxML
Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
Additional multiple sequence alignments with Prank (w/ genomic flanks)
Additional phylogenetic reconstruction using PhyML and NJ
Phylogenetic tree merging using TreeBeST
Per family gene dynamics using CAFE
Homology inference
Secondary structure plots

Compara dumps (Mouse)

EMF / Fasta / OrthoXML / PhyloXML dumps for ProteinTrees
EMF / Fasta / OrthoXML / PhyloXML dumps for ncRNAtrees
PhyloXML dumps for CAFE ProteinTrees
PhyloXML dumps for CAFE ncRNAtrees

Removal of obsolete tables from ensembl_website (Mouse)

The following tables have been dropped from the ensembl_website database as they are no longer in use - their contents were moved to either ensembl_archive or ensembl_production many releases ago:

  • ens_release
  • item_species
  • news_category
  • news_item
  • release_species
  • species

In addition, table help_record_link has been dropped as it was never used - the help_link table is used to associate URLs with help_record entries.

The associated queries have also been removed from EnsEMBL::Web::DBSQL::WebsiteAdaptor.

New tables (Mouse)

We are adding two new tables that will be used in the future to store an HMM classification of the proteins

New synteny analyses (Mouse)

New synteny data for:

  • Chicken vs Zebrafinch
  • Human vs Orangutan
  • Chicken vs Opossum

New speciestree view (Mouse)

it is now possible to view the ensembl species tree with this new interactive view. In previous release, you only had the possibility to download the tree in pdf but you can now view it here:

http://www.ensembl.org/info/about/species.html  (and click the link "View the full Ensembl species tree")

you can switch between radial and vertical view. You can also view NCBI or Ensembl tree.

The following browsers support the new view:

IE 9 onwards;  firefox 31 onwards; Chrome 31 onwards; Safari 5.1 onwards

And for older browser there will be the link to download the tree in pdf.

patch_77_78_a.sql - schema_version update (Mouse)

Update schema_version in meta table to 78.

patch_77_78_b.sql - unmapped_reason_id (Mouse)

Change unmapped_reason_id from smallint to int

Rendering of BigBed files (Mouse)

We have updated our BigBed parser to reflect changes in the UCSC bedToBigBed script, specifically the naming of the last field as 'chromStarts' in AutoSQL rather than the 'blockStarts' specified in the online documentation.

This allows us to correctly render features that contain this information, i.e. as a series of joined blocks instead of a single alignment.

We have also updated our online documentation about the BED format.

patch_77_78_c.sql (Mouse)

Integrate schema changes to support Regulation requirements

patch_77_78_c.sql - master_unmapped_reason (Mouse)

master_unmapped_reason patched to match datatype with core schema.

Retitrement of archive 65 (Mouse)

This release cycle we will be retiring archive 65 (December 2011) in accordance with our three-year rolling retirement policy. The data will remain available on our public database server; only the web interface will be removed.

Pre Species on the species list page (Mouse)

We now have a duplicate row for species which is both on pre and ensembl. One row is the species with the ensembl assembly and the row with the faded species image is the one with the pre assembly.


public MySQL server for GRCh37.ensembl.org resources (Mouse)

The public MySQL server for the GRCh37 databases (ensembldb.ensembl.org port 3337) will contain two full sets of databases - one on the current schema and one on the previous schema.

VEP : quick preview of results (Mouse)

When pasting compatible data (all formats apart from Pileup) into the input box for VEP you will be presented with a blue 'Quick results for Variant' box. Choosing this will generate a quick preview of the consequences, limited to running the first Variation in the list against the Ensembl Transcript data set. If you wish to view the complete set of results for all Variations in your file then you can still submit the job to our Tools queue as usual.

patch_77_78_a.sql - schema_version update (Mouse)

Update schema_version in meta table to 78.

Ensembl VM Build (Mouse)

The Ensembl Virtual Machine applicance will be updated to version 78.