EnsemblEnsembl Home

News for Ensembl Release 86 (October 2016)

News categories

New web displays and tools

BigWig Manhattan plots (all species)

In this release, we've added the ability to display your bigWig data as a Manhattan plot:

Simply select 'Manhattan plot' from the track style list after loading.

Orthologue Table redesign (all species)

The orthologue table has been redesigned slightly with the aim of inproving usability. It also accomodates three new metrics representing the confidence of our ortholgue predictions - Gene Order Conservation (GOC) Score, Whole Genome Alignment (WGA) Coverage, and High Confidence.

Variation track style consolidation (all species)

The code for drawing our SNP and indel variation tracks has been consolidated so that tracks will appear the same across all views.

Summary of changes

  • Deletions are drawn as a block spanning the deleted area, with the addition of a downwards-pointing triangle symbol superimposed.
  • Insertions are drawn the same way as before, but are enlarged at smaller scales for better visibility
  • Variation tracks on Protein Summary are drawn the same way as on other images, instead of as tiny squares and triangles, and can thus be configured as different styles, eg. with labels

In addition to the visual changes, we have also fixed some bugs with variant placement on Protein Summary.

New comparative genomic views for strains (all species)

The mouse strains resources (mouse reference assembly, mouse strains and rat reference assembly) have a separate left hand menu for their own comparative genomics resources which includes gene trees, orthologues and paralogues.

New species selector (all species)

New species selector user interface designed for Blast, Region Comparison and Alignment Selector with below features.

  • Search
  • Division based species selection
  • Add/remove/sort species

Mouse strain variant table (Mouse)

A new location display, "Strain table", visible for Mouse location displays, shows variants between mouse strains within the region in a tabular form.

GTEx beta value display (Human)

The GTEx per-tissue variant expression data for genes can now be configured to display beta-distribution values, instead of log-p-value figures. This is configured by altering the track renderers. These tracks have also been moved out of "Other regulatory regions" into their own section, directly under the general Regulation heading.

New species, assemblies and genebuilds

Mouse Strains (all species)

Annotation and assemblies for 16 mice, produced by the Mouse Genomes Project, have been added. De novo assemblies for each strain were built from a mixture of short- and long-range illumina libraries, optical maps, and third generation sequencing. Genes were annotated primarily by projection of the GENCODE gene set from GRCm38 to each strain. The projected annotation was refined with strain-specific RNA-seq data. The RNA-seq data were also used to find novel annotations.

We have annotationed additional genomic features including: repeats, CpG islands, predicted promotor regions and BLAST alignments of UniProt proteins. We have also annotated protein domains using InterProScan.

The collection comprises of one outgroup mouse (Mus spretus), three Mus musculus subspecies (castaneus, musculus and domesticus) and twelve strains of Mus musculus. The full list can be seen below:

Mus musculus 129S1/SvImJ

Mus musculus A/J

Mus musculus AKR/J

Mus musculus BALB/cJ

Mus musculus C3H/HeJ

Mus musculus C57BL/6NJ

Mus musculus CBA/J

Mus musculus DBA/2J

Mus musculus FVB/NJ

Mus musculus LP/J

Mus musculus NOD/ShiLtJ

Mus musculus NZO/HlLtJ

Mus musculus castaneus CAST/EiJ

Mus musculus domesticus WSB/EiJ

Mus musculus musculus PWK/PhJ

Mus spretus SPRET/EiJ

You can access the strains here.

Chicken new assembly and gene set (Chicken)

A new genebuild on the chicken assembly, Gallus_gallus-5.0

Macaque new assembly and genebuild (Macaque)

A new gene set for the macaque assembly Mmul_8.0.1

Mouse lemur new assembly and genebuild (Mouse Lemur)

A new gene set for the mouse lemur assembly Mmur_2.0

Zebrafish: update to Ensembl-Havana merged gene set (Zebrafish)

Updated Ensembl-Havana zebrafish gene set. This gene set is a merge of complete Ensembl gene models and the latest Havana gene annotation. 

Vega Mouse annotation updated (Mouse)

Manual annotation of mouse from Havana has been updated and contains the data released in Vega 66

Vega Zebrafish annotation updated (Zebrafish)

Manual annotation of zebrafish from Havana has been updated and contains the data released in Vega 66

Mouse: update to Ensembl-Havana GENCODE gene set (Mouse)

Updated Ensembl-Havana mouse gene set. This gene set is a merge of complete Ensembl gene models and the latest Havana gene annotation. All CCDS genes are included in this gene set.

Other updates

Compara

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.30+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.221)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

 -- computation of pairwise gene-order conservation score

 -- comparison of orthologies with whole-genome alignments

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Getting distances by NCBI BlastP (v.2.2.30+)
 -- Clustering by MCL (v.14-137)
 -- Multiple Sequence Alignments with MAFFT (v.7.221)
 -- Family stable ID mapping

Schema version update (all species)

85 -> 86

Compute wga_coverage scores for homologies (all species)

Add whole genome alignment coverage for each homology as a quality measure

G.gal pairwise alignments (Chicken)

Recompute all LASTZ alignements for chicken:

  • human
  • mouse
  • opossum
  • anole lizard
  • zebra finch
  • flycatcher
  • duck
  • turkey
  • chinese softshell turtle
  • xenopus
  • zebrafish
  • c. savignyi

Mouse lemur v human LASTZ (Mouse Lemur)

Recompute pairwise alignment

Macaque v human LASTZ (Macaque)

Recompute pairwise alignment

Remove the "species_tree" column from the species_tree_root table (all species)

The "species_tree" column was used to keep a stringified version of the tree (in newick) but the API is in fact able to generate a customised newick string using the species_tree_node table

Update syntenies (all species)

Syntenies need to be recomputed for the new assemblies:

  • chicken
  • mouse lemur
  • macaque

Recompute multiple alignments (all species)

EPOs containing updated species should be recomputed:

  • sauropods
  • primates
  • mammals

Pecan alignments:

  • vertebrates (only to update new assemblies)

Recompute low coverage multiple alignments (all species)

EPO LOW COVERAGE alignments need to be updated:

  • sauropods
  • mammals

ncRNAtrees and homologies (all species)

  • Classification based on Rfam models (v12.0)
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE
  • Homology inference
  • Secondary structure plots

Schema: new species_tree_root_id column in the gene_tree_root table (all species)

This column will link a gene-tree to the species-tree it's been reconciled with. This way, it will be obvious which gene-trees are reconciled with which species-trees, especially given the fact that intermediate trees of the same family can have different reconciliation parameters

Schema: new high_confidence column in the homology table (all species)

New column to define a subset of high-confidence orthologues based on various QC steps

Schema: homology stats (number of orthologes, etc) moved to a separate table (all species)

The new table is gene_member_hom_stats

Core

EBeye Dumps (all species)

Generate search indexes to be used at the EBI.

Ensembl VM Build (all species)

The Ensembl Virtual Machine applicance will be updated to version 85.

External database references update (multiple species)

Xref updates for: homo_sapiens (human), mus_musculus (mouse), danio_rerio (zebrafish), erinaceus_europaeus (hedgehog), ciona_savignyi (Sea squirt), sorex_araneus (European shrew), petromyzon_marinus (sea lamprey), oryzias_latipes (Japanese medaka), taeniopygia_guttata (zebra finch), chlorocebus_sabaeus (African green monkey), anolis_carolinensis (green anole), ornithorhynchus_anatinus (platypus), otolemur_garnettii (small-eared galago), ovis_aries (sheep)

LRG Import (all species)

Importing the latest version of Locus Reference Genomic dataset.

patch_85_86_a.sql - schema_version update (all species)

Update schema_version in meta table to 86.

patch_85_86_a.sql - schema_version update in ontology db (all species)

Update schema_version in meta table to 86.

patch_85_86_a.sql - schema_version update in production db (all species)

Update schema_version in production database to 86.

REST server upgrade (all species)

REST server updated to latest API and data

Stable ID lookup (all species)

Stable ID lookup provided for REST services.

Includes lookup for RefSeq and CCDS entries.

Regulation

Reprocess data from ENCODE project for mouse (Mouse)

The peak calling component of the Ensembl Regulation Sequencing Analysis pipeline has been improved. It is now following a well defined approach for calling narrow / broad peaks but as a result all the existing mouse data in Ensembl's Regulation database need to be reprocessed.

Micro-array mapping (Mouse)

Updates of the links from the probes to the transcripts.

Please not that this update is only done for the reference genome, not the 16 new strains.

Patches (all species)

patch_85_86_a.sql
-- schema version update 

patch_85_86_b.sql
-- Drop tables epigenome_lineage and lineage - Not used anymore

patch_85_86_c.sql
-- Add column (production name) to feature_type table

patch_85_86_d.sql
-- Add new columns (read_length, is_paired_end, paired_with, file_size) to input_subset table to accommodate paired-end data

patch_85_86_e.sql
-- Add QC tables'

Reprocess human regulation data (Human)

In the last release (e85) 19 histone modifications were missing from our database (Known Bug). We will be adding these and redo the peak calling for all data currently in our database  using our  Ensembl Regulation Sequencing Analysis pipeline. 

Genebuild

Anolis lizard lincRNA (Anole lizard)

lincRNAs for anole lizard will be added to  core database

Flycatcher lincRNA (Flycatcher)

lincRNAs for flycatcher will be added to  core database

Cave fish lincRNA (Cave fish)

lincRNAs for cave fish will be added to  core database

Human: updated cDNA alignments (Human)

A new cdna database will be created for e86: The latest set of cDNAs for human (as of July 2016) from the European Nucleotide Archive and NCBI RefSeq will be aligned to the current genome using Exonerate.

Mouse: updated cDNA alignments (Mouse)

A new cdna database will be created for e86: The latest set of cDNAs for mouse (as of July 2016) from the European Nucleotide Archive and NCBI RefSeq will be aligned to the current genome using Exonerate.

New zebrafish otherfeatures database (Zebrafish)

Zebrafish-specific cDNA and ESTs have been aligned to GRCz10. These are made available through the website and otherfeatures database.

Zebrafish: updated RefSeq gene import (Zebrafish)

The imported RefSeq gene set was updated in the zebrafish otherfeatures database. Please note that RefSeq annotates gene models on cDNA sequence and not on the reference genome, meaning that when users choose to translate the RefSeq transcripts off the reference genome that the translations may contain stop codons.

Human: updated RefSeq gene import (Human)

The imported RefSeq gene set was updated in the human otherfeatures database. Please note that RefSeq annotates gene models on cDNA sequence and not on the reference genome, meaning that when users choose to translate the RefSeq transcripts off the reference genome that the translations may contain stop codons.

Mouse: updated RefSeq gene import (Mouse)

The imported RefSeq gene set was updated in the mouse otherfeatures database. Please note that RefSeq annotates gene models on cDNA sequence and not on the reference genome, meaning that when users choose to translate the RefSeq transcripts off the reference genome that the translations may contain stop codons.

Human/mouse/zebrafish: Ensembl-to-RefSeq comparison attributes (Zebrafish, Human, Mouse)

For each Ensembl transcript present in the human and mouse core db, a comparison is carried out with all overlapping RefSeq transcripts from the otherfeatures db.

Up to five comparisons are carried out (depending on if the models are non-coding or coding):

1) Check if all exons coordinates match (all transcripts) 

2) Check if transcript sequences match (all transcripts)

3) Check if the CDS exon coordinates match (coding transcripts only)

4) Check if the CDS sequences match (coding transcripts only)

5) Check if the translation sequences match (coding transcripts only)

For non-coding models, if comparisons (1) and (2) are a match then the transcripts are considered to match on the whole transcript level and the Ensembl transcript is given an attribute to say there is a match on the whole transcript level.

For coding models if all five comparisons are true then the Ensembl transcript is given an attribute to say there is a match on the whole transcript level. Failing that, if comparisons (3), (4) and (5) are true the Ensembl transcript is given an attribute to say there is a match on the whole transcript level.

The stable ids of any matching RefSeq transcripts will be stored in the value field of the Ensembl transcript attribute.

Mouse lemur otherfeatures database (Mouse Lemur)

A new gene set for the mouse lemur assembly, Mmur2.0, requires new core, rnaseq and otherfeatures databases. Species-specific ESTs and cDNAs were aligned to the genome and alignments are available through the website or otherfeatures database.

Chicken otherfeatures database (Chicken)

A new genebuild on the chicken assembly, Gallus_gallus-5.0, requires chicken core, rnaseq and otherfeatures databases. Chicken-specific cDNAs and ESTs were aligned to the chicken genome and are made available through the Ensembl website and the chicken otherfeatures database.

Chicken RNAseq and Bam files (Chicken)

In addition to the gene annotation for Galgal_5.0, an rnaseq database will be released where users can view BAM files and transcript models for different tissues.

Macaque otherfeatures database (Macaque)

A new genebuild on the macaque assembly, Mmul_8.0.1, requires macaque core, rnaseq and otherfeatures databases.  Macaque-specific cDNAs and ESTs were aligned to the macaque genome and are made available through the Ensembl website and the macaque otherfeatures database.

Macaque RNAseq and Bam files (Macaque)

In addition to the gene annotation for macaque, an rnaseq database will be released where users can view BAM files and transcript models for different tissues.

Mouse lemur RNASeq and Bam files (Mouse Lemur)

In addition to the gene annotation for mouse lemur, an rnaseq database will be released where users can view BAM files and transcript models for different tissues.

Updated mouse otherfeatures db: New CCDS import (Mouse)

This release of the mouse gene set also includes 24,826 transcript models as part of an updated version (July 2016) of CCDS

Human: transcript attributes for Refseq-genomic-to-mRNA comparison (Human)

Transcript attributes will be added for the refseq_import geneset in the human otherfeatures db. Each refseq_import transcript will have an attribute to denote whether the genomic sequence that the transcript covers matches the mRNA sequence that the transcript is based on (the sequences present in the RefSeq mRNA file).

A prefect match is denoted as an alignment across the entirety of both sequences that contains no mismatches or indels. If initially there is a mismatch, the RefSeq mRNA will go through polyA clipping and the sequences will be compared again to see if a perfect match is possible post polyA clipping.

Transcripts that do not have a perfect match between the mRNA and the genomic sequence will get additional attributes to define what regions (5' UTR, CDS, 3' UTR, or 'whole transcript' if there is no CDS defined) do not align perfectly, along with a summary of the information in the alignment (match,mismatch, indel count, total indel length).

Production

Ensembl 86 mart databases (all species)

  • Ensembl Genes 86
  • Mouse Genes 86
    • New gene mart containing the 16 Mouse strains
  • Ensembl Variation 86
  • Ensembl Regulation 86
  • Vega 66

External reference projection (all species)

Gene ontology (GO) identifiers and gene name projection to all species.

EMBL,Genbank,FASTA,GTF,GFF3,RDF,TSV Dumps (all species)

EMBL,Genbank,FASTA,GTF,GFF3,RDF,TSV dumps for all species.

Variation

dbSNP update for Chicken, Cow, Horse and Zebra Finch (Cow, Horse, Chicken, Zebra Finch)

Chicken will be updated to dbSNP version 147.

Horse, cow and zebra finch will be updated to dbSNP version 148.

Structural variants (Cow, Dog, Horse, Human)

  • Added new studies from DGVa
  • Updated some of the existing studies from DGVa

Phenotype data updates (all species)

  • Updated Human phenotype data from different sources including NHGRI-EBI GWAS, OMIM, ClinVar, UniProt, Orphanet and GOA.
  • Other GOA data for several species
  • OMIA data for several species
  • AnimalQTL data for several species
  • RGD data for Rat
  • ZFIN data for Zebrafish
  • IMPC data for Mouse
  • MGI data for Mouse

HGMD (Human)

HGMD data will be updated to version 2016.2 (June 2016)

Patches (all species)

patch_85_86_a.sql - update schema version

patch_85_86_a.sql - add qualifier & index to phenotype_onology_accession

patch_85_86_a.sql - add index on study.external_reference

 

Web

Session code rewrite (all species)

The way the ensembl webcode interacts with session and accounts databases and saves user configurations has been changed. The code interacting with mysql database has been removed and the webcode now uses ensembl-orm API to access session database.

Parts of ensembl-webcode where major changes are expected are Apache Handlers, Controllers, ImageConfig and ViewConfig.

Further description can be found on GitHub

New views added for mobile site (all species)

You can now view details of Transcripts on the mobile site, for example the Summary page, and sequence of Exons, cDNAs and proteins. We also added the gene sequence display to the site.

To view the sequence pages, please click on the left hand arrow icon which will open a menu allowing you to choose the relevant view you want.

Mobile site redirection (all species)

Users on mobile phones will be automoatically redirected to the mobile version of the site when you visit www.ensembl.org for the first time. You can choose to view the desktop site by clicking on the link at the bottom of the screen.

Mouse strain search (Mouse)

The addition of mouse strain data is accompanied by changes to the search interface to allow this data to be found but without overwhelming the results in other circumstances.

By deafult, only reference mouse hits are shown. By faceting on mouse, all hits on strains are also shown, along with a further faceting option to refine that selection.

Retirement of archive 73 (all species)

This release cycle we will be retiring archive 73 (September 2013) in accordance with our three-year rolling retirement policy. The data will remain available on our public database server; only the web interface will be removed.

Future Plans

Read about our future plans on our blog!