Ensembl News for Release 77 (October 2014)


New interface for alignment export (Zebra Finch)

For Phase 2 of our export upgrade, we have ported the genomic alignment pages to the new export interface. We have also upgraded our version of BioPerl to 1.6.1, which allows us to support the latest write_aln options in Bio::AlignIO. Export file formats supported in release 77 are:

  • Mega
  • MSF
  • Nexus
  • Pfam
  • Phylip
  • PSI
  • RTF (text alignment views only)
  • Stockholm*

This change affects the following pages:

  • Location -> Alignments (image)
  • Location -> Alignments (text)
  • Gene -> Genomic alignments
  • Variation -> Phylogenetic context

The old "Export data" button has been disabled on these pages and replaced with a new button within the page. The Alignments image has a download icon on the toolbar:

and the text-based alignments use the same "Download sequence" button as the pages upgraded in release 76.

* Please note that sites which still use the old BLAST interface have a dependency on BioPerl 1.2.3 and therefore cannot offer Stockholm export

New icons on transcript table (Zebra Finch)

Two new icons have been added to the transcript table, showing confidence levels from different projects.

See the following news items for more details on this data:

  • APPRIS tags
  • Transcript Support Levels

Gene expression page has moved (Zebra Finch)

We have decided to move the Gene Expression table to our per-species documention, as the data is not applicable to every gene. You can now find the list of tissues and associated datasets on the homepage of every species with an RNASeq database, e.g. Homo_sapiens/Info/Expression.

Display of Alternative alleles (Zebra Finch)

We have added a new Gene panel that shows a table of manually curated alternative alleles. An example of this would be a gene on the human GRCh38 reference assembly and one on a GRC haplotype patch. This table also provides links to Region comparison that are preconfigured to show the regions of interest.

Links between alternative allleles have also been added to Gene Summary panels and to the menus that appear when mousing over a gene on a location view.

Other updates

dbSNP SubSNP ids no longer held as synonymns (Zebra Finch)

dbSNP SubSNP ids will no longer be held as synonyms. They will be retained in allele records and retrieval of variants by SubSNP id through the web site will still be supported.

Ensembl 77 mart databases (Zebra Finch)

  • Ensembl Genes 77
    • Added new species Vervet-AGM (Chlorocebus sabaeus)
    • Added "Associated Gene Name" (internal name "external_gene_name") in the id list limit filter section
    • Renamed "Reference ID" to "Variation Name" and "Source description" to "Variation source description" in the Variation attribute section for germline and somatic. 
    • Renamed internal name:
      • For the "Variation name" attribute from "external_id" to "variation_name"
      • For the "Variation Source" attribute from "source_name" to "germ_line_variation_source".
      • For the "Variation name" somatic attribute from "somatic_reference_id" to "somatic_variation_name"
  • Ensembl Variation 77
    • Optimized the somatic and germinal structural variation templates
    • Removed "Phenotype significance" (internal name "phenotype_significance") from the filter and attribute sections
  • Vega 57
    • Added new species Rat (rattus norvegicus)
    • Added "Associated Gene Name" (internal name "external_gene_name") in the id list limit filter section

Human: updated cDNA alignments (Zebra Finch)

A new cdna database was created for e77: The latest set of cDNAs for human (as of August 2014) from the European Nucleotide Archive and NCBI RefSeq (release nn) were aligned to the current genome using Exonerate.

Mouse: updated cDNA alignments (Zebra Finch)

A new cdna database was created for e77: The latest set of cDNAs for mouse (as of August 2014) from the European Nucleotide Archive and NCBI RefSeq (release nn) were aligned to the current genome using Exonerate.

Update Sequence Ontology terms (Zebra Finch)

We update terms from the Sequence Ontology:

  • nc_transcript_variant will be updated to non_coding_transcript_variant
  • non_coding_exon_variant will be updated to non_coding_transcript_exon_variant

Improved GeneAdaptor::fetch_nearest_Gene_by_Feature() (Zebra Finch)

We are improving our code for retrieving the nearest gene to any Ensembl feature.

patch_76_77a.sql - schema_version update in production db (Zebra Finch)

Update schema_version in production database to 77.

Stable ID lookup (Zebra Finch)

Stable ID lookup provided for REST services

Includes lookup for RefSeq and CCDS entries

patch_76_77_a.sql - schema_version update in ontology db (Zebra Finch)

Update schema_version in meta table to 77.

External database references update (Zebra Finch)

Xrefs update for:

human, rat, anolis, cow, dog, ciona intestinalis, horse, chicken, gorilla, macaque, opposum, platypus, medaka, sheep, chimp, lamprey, pig, zebrafinch, shrew

Introduce new variation class (Zebra Finch)

Add genetic_marker as a new variation class.

Add variation_attrib table (Zebra Finch)

Add variation_attrib table.

For now this will be used by Ensembl Genomes to link homeologous variants on polyploid genomes, though main Ensembl may find use cases for this in future.

FASTA & GTF dumps (Zebra Finch)

FASTA & GTF dumps for all the species

External reference projection (Zebra Finch)

Gene ontology (GO) identifiers and gene name projection to all species.

EMBL and Genbank Dumps (Zebra Finch)

EMBL and Genbank dumps for all species.

8-way primate EPO multiple alignments (Zebra Finch)

 callithrix jacchus
 chlorocebus sabeus
 gorilla gorilla
 homo sapiens
 macaca mulatta
 pan troglodytes
 papio anubis
 pongo abelii

17-way mammal EPO multiple alignments (Zebra Finch)

 bos taurus
 callithrix jacchus
 canis familiaris
 chlorocebus sabeus
 equus caballus
 felis catus
 gorilla gorilla
 homo sapiens
 macaca mulatta
 mus musculus
 oryctolagus cuniculus
 ovis aries
 pan troglodytes
 papio anubis
 pongo abelii
 rattus norvegicus
 sus scrofa

39-way mammal low-coverage EPO multiple alignments (Zebra Finch)

 ailuropoda melanoleuca
 bos taurus
 callithrix jacchus
 canis familiaris
 cavia porcellus
 chlorocebus sabeus
 choloepus hoffmanni
 dasypus novemcinctus
 dipodomys ordii
 echinops telfairi
 equus caballus
 erinaceus europaeus
 felis catus
 gorilla gorilla
 homo sapiens
 ictidomys tridecemlineatus
 loxodonta africana
 macaca mulatta
 microcebus murinus
 mustela putorius_furo
 mus musculus
 myotis lucifugus
 nomascus leucogenys
 ochotona princeps
 oryctolagus cuniculus
 otolemur garnettii
 ovis aries
 pan troglodytes
 papio anubis
 pongo abelii
 procavia capensis
 pteropus vampyrus
 rattus norvegicus
 sorex araneus
 sus scrofa
 tarsius syrichta
 tupaia belangeri
 tursiops truncatus
 vicugna pacos

Citation data (Zebra Finch)

Citation data will be updated from Europe PMC and UCSC. Cited variants will now be flagged when they fail standard QC filters but will still be displayed in the usual tracks.

A new column 'display' will be added to the variation table to facilitate this.

23-way amniota-pecan multiple alignments (Zebra Finch)


Primate ancestral alleles (Zebra Finch)


Compara dumps (Zebra Finch)

EMF / Fasta / OrthoXML / PhyloXML dumps for ProteinTrees
EMF / Fasta / OrthoXML / PhyloXML dumps for ncRNAtrees
PhyloXML dumps for CAFE ProteinTrees
PhyloXML dumps for CAFE ncRNAtrees
EMF and MAF dumps for epo_39_eutherian_mammals
MAF dumps for epo_8_primates
EMF and MAF dumps for pecan_23_amniota
MAF dumps for epo_17_eutherian_mammals
BED files for constrained elements
MAF dumps for H.sap-C.sab LASTZ

ncRNAtrees and homologies (Zebra Finch)

Classification based on Rfam models (v11.0)
Multiple sequence alignments with Infernal
Phylogenetic reconstruction using RAxML
Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
Additional multiple sequence alignments with Prank (w/ genomic flanks)
Additional phylogenetic reconstruction using PhyML and NJ
Phylogenetic tree merging using TreeBeST
Per family gene dynamics using CAFE
Homology inference
Secondary structure plots

Protein Families (Zebra Finch)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Getting distances by NCBI BlastP (v.2.2.28+)
 -- Clustering by MCL (v.12-135)
 -- Multiple Sequence Alignments with MAFFT (v.7.113)
 -- Family stable ID mapping

ProteinTrees and homologies (Zebra Finch)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.27+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.017)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

Retitrement of archives 64 (Zebra Finch)

This release cycle we will be retiring archive 64 (September 2011) in accordance with our three-year rolling retirement policy. The data will remain available on our public database server; only the web interface will be removed.

Add Sequence Ontology type to regulatory feature consequences (Zebra Finch)

The SO type for each regulatory feature overlapped by a variant will be reported both on the Variation web page and the VEP results.

BioPerl upgrade (Zebra Finch)

Important: This change only affects developers of Ensembl-powered websites, not general users of our code.

In release 77 we have upgraded to BioPerl 1.6.1 in order to support all the latest features of BIo::AlignIO. This version has long been supported by the Ensembl API, but the old BLAST interface was dependent on features in version 1.2.3 so we were unable to upgrade the web servers until the new interface was released in Ensembl version 76. There are therefore two strategies available to developers:

1. If you are still using the old BLAST interface, do not upgrade to BioPerl 1.6.1. However if you upgrade your site to Ensembl 77, you will need to configure the export interface to use the formats available in the older version of Bio::AlignIO.

Create a file in your plugin with namespace EnsEMBL::Web::Component::DataExport::Alignments and add the following method:

sub alignment_formats {
### Configure this list to match what's available
### in the installed version of BioPerl
  my $self = shift;
  return qw(CLUSTALW FASTA Mega MSF Nexus Pfam Phylip PSI Selex);

2. If you don't use BLAST or have upgraded to the new interface, you should upgrade BioPerl and reconfigure your BioPerl location if it has changed. E.g. if you haven't installed it in /path/to/ensembl/bioperl-live, check in your SiteDefs.pm file for the line:

  $SiteDefs::BIOPERL_DIR                    = '/path/to/bioperl-live';

patch_76_77_a.sql - schema_version update (Zebra Finch)

Update schema_version in meta table to 77.

Ensembl VM Build (Zebra Finch)

The Ensembl Virtual Machine applicance will be updated to version 77.

New RNASeq data matrix configuration (Zebra Finch)

The configuration panel for the RNASeq data is changed to a matrix to easily turn on/off tracks together.