EnsemblEnsembl Home

News for Ensembl Release 83 (December 2015)

News categories

New web displays and tools

Filtering Variants by MAF (all species)

The variants displayed on all sequence markup views can be filtered by minor allele frequency (MAF), allowing you to either show or hide according to a range of frequencies (between 0.01% and 10%). This filtering is not on by default so to enable it go to 'Configure this page' and then chose the value you want from the 'Hide variants by frequency (MAF)' drop down menu.

Advanced Filtering and Counts on Variant table (all species)

The functionality of the recently reimplemented variant table has been further expanded to allow a wider range of filtering options. Filtering can now be applied by Minor Allele Frequency, SIFT and PolyPhen scores, Clinical Significance, Consequence Type and many other columns, using buttons along the top of the variant table.

For many of these filters, preset useful combinations of options are available within the popup allowing rapid configuration of more complex combinations.

In addition, row counts for each consequence type have been readded to the existing Consequence Type filter. These are displayed in the popup which appears once the filter button has been pressed.

Manhattan plot track for LD (all species)

This new linkage disequilibrium (LD) track is focused on a variant and displays the linked variants surrounding the focus variant. The track displays a Manhattan plot, using the r2 and D prime values (from 0 to 1) on the Y axis.

The new track is accessible from the Variation Linkage disequilibrium page, through the links in the new column "Linked variants (image)", for example:

http://www.ensembl.org/Homo_sapiens/Variation/HighLD?v=rs1333049

ID History converter (all species)

The ID History Converter has been incorporated into the Ensembl Tools infrastructure. This allows you to save jobs and access them later in a similar manner to how you would for tools such as BLAST.

Genoverse version updated (all species)

The version of Genoverse used for the overview panel on Location View has been updated to version 2.3.

New species, assemblies and genebuilds

Update to Ensembl-Havana human GENCODE gene set (release 24) (Human)

Updated Ensembl-Havana gene set (GENCODE release 24). This gene set is a merge of complete Ensembl gene models and the latest Havana gene annotation. All CCDS genes are included in this gene set.

The human GRCh38.p5 gene annotation is also included:

The patches for GRCh38.p5 were annotated using a combination of manual annotation, annotation projected from the primary assembly and annotation derived from cDNA and protein alignment evidence. Annotation of the patches is stored in the core database.

Mouse: update to Ensembl-Havana GENCODE gene set (Mouse)

Updated Ensembl-Havana mouse gene set. This gene set is a merge of complete Ensembl gene models and the latest Havana gene annotation. All CCDS genes are included in this gene set.

rat: update to Ensembl-Havana gene set (Rat)

Updated Ensembl-Havana rat gene set. This gene set is a merge of complete Ensembl gene models and the latest Havana gene annotation.

Vega Mouse annotation updated (Mouse)

Manual annotation of mouse from Havana has been updated and contains the data released in Vega 63

Vega Human annotation updated (Human)

Manual annotation of human from Havana has been updated and contains the data released in Vega 63

Vega Rat annotation updated (Rat)

Manual annotation of rat from Havana has been updated and contains the data released in Vega 63

New variation data

Chicken and pig dbSNP 145 update (Chicken, Pig)

The chicken and pig databases have been updated to use dbSNP145

API and schema changes

Schema changes (all species)

  • New 'ExAC' evidence (column evidence_attribs) in the variation and variation_feature tables.
  • Remove the column 'validation_status' from the variation and variation_feature tables.

Other updates

Compara

M.mus Vs C.intestinalis synteny

The Lastz alignments were recomputed in the release 82, hence we are going to recompute the synteny.

The syntenies for Ciona_intestinalis Vs human, mouse and zebrafish are going to be deleted from release 83 because they have a coverage of <1%

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Getting distances by NCBI BlastP (v.2.2.30+)
 -- Clustering by MCL (v.14-137)
 -- Multiple Sequence Alignments with MAFFT (v.7.221)
 -- Family stable ID mapping

Schema version update (all species)

81 -> 82

Compara dumps (all species)

  • EMF / Fasta / OrthoXML / PhyloXML dumps for ProteinTrees + PhyloXML dumps for CAFE ProteinTrees
  • EMF / Fasta / OrthoXML / PhyloXML dumps for ncRNAtrees + PhyloXML dumps for CAFE ncRNAtrees

ncRNAtrees and homologies (all species)

Classification based on Rfam models (v11.0)
Multiple sequence alignments with Infernal
Phylogenetic reconstruction using RAxML
Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
Additional multiple sequence alignments with Prank (w/ genomic flanks)
Additional phylogenetic reconstruction using PhyML and NJ
Phylogenetic tree merging using TreeBeST
Per family gene dynamics using CAFE
Homology inference
Secondary structure plots

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.30+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.221)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

Rename some misnamed MLSSs

The LastZ MLSSs should have the name of the reference species in their name, e.g. "H.sap-P.anu lastz-net (on H.sap)" instead of "H.sap-P.anu lastz-net"

Make the consensus description of Families more readable

Many families have strings like ECO: |RULEBASE: or ECO: |EMBL: 1 in their description. Those probably come from new tags that UniProt has added to the descriptions of the proteins themselves. We need to remove those strings

API: removal of deprecated methods (all species)

These methods were deprecated and scheduled for deletion:

  • NCBITaxon::ensembl_alias()
  • NCBITaxon::short_name()

Core

Ensembl VM Build (all species)

The Ensembl Virtual Machine applicance will be updated to version 83.

LRG Import (Human)

Importing the latest version of Locus Reference Genomic dataset

patch_82_83_a.sql - schema_version update (all species)

Update schema_version in meta table to 83.

patch_82_83a.sql - schema_version update in production db (all species)

Update schema_version in production database to 83.

Stable ID lookup (all species)

Stable ID lookup provided for REST services

Includes lookup for RefSeq and CCDS entries

External database references update (multiple species)

Xrefs update for:

mus_musculus (mouse), homo_sapiens (human), rattus_norvegicus (rat), callithrix_jacchus (marmoset), pongo_abelii (orangutan), cavia_porcellus (guinea  pig), xiphophorus_maculatus (platyfish), oreochromis_niloticus (tilapia), sus_scrofa (pig), mustela_putorius_furo (ferret), nomascus_leucogenys (gibbon), lepisosteus_oculatus (Spotted gar), astyanax_mexicanus (cave fish), takifugu_rubripes (fugu), xenopus_tropicalis (xenopus), sarcophilus_harrisii  (Tasmanian devil)

Added start/end parameters for sequence endpoint (REST) (all species)

Allows to restrict a sequence to the specified start and end coordinates

RDF dump (all species)

RDF dumps for all species

Regulation

Tarbase v7.0 (Human, Mouse)

Updated Tarbase to current v7.0

Genebuild

Human: updated cDNA alignments (Human)

A new cdna database was created for e83: The latest set of cDNAs for human (as of Oct2015) from the European Nucleotide Archive and NCBI RefSeq (release 70) were aligned to the current genome using Exonerate.

Mouse: updated cDNA alignments (Mouse)

A new cdna database was created for e83: The latest set of cDNAs for mouse (as of Oct2015) from the European Nucleotide Archive and NCBI RefSeq were aligned to the current genome using Exonerate.

Updated mouse otherfeatures db: New CCDS import (Mouse)

This release of the mouse gene set also includes 24,831 transcript models as part of an updated version (September 2015) of CCDS

Human: GRCh38.p5 Karyotype Bands (Human)

Karyotype bands were updated in regions overlapping patches

Updated human otherfeatures db: New CCDS import (Human)

This release of the human gene set also includes 31,357 transcript models as part of an updated version (September 2015) of CCDS

Human: RefSeq-to-Ensembl model comparison (Human)

For each refseq_import transcript model present in the human otherfeatures db, a comparison is carried out with all overlapping Ensembl transcript models from the core db.

Initially the models are compared on the whole transcript level, all exons are compared in terms of genomic coordinates and the transcript sequences of the two models are also compared.

For non-coding models, if both of these comparisons match then the models are considered to match on the whole transcript level and the RefSeq model is given an attribute to say there is a match on the whole transcript level. If no overlapping Ensembl model meets the criteria the RefSeq model is given a transcript attribute to denote this.

For models where a CDS is defined there is an extra level of comparison. The coding exon coordinates, CDS and translation sequences of both models are also compared.

If all exons coordinates (coding and non-coding) and the transcript, CDS and the translation sequences all match then the RefSeq model is given an attribute to say there is a match on the whole transcript level.

Failing this, a comparison is done on the coding exons coordinates, CDS and translation sequences only. If the comparisons now match, the RefSeq transcript is given an attribute to denote that there is a match on the CDS level only.

If there are still no matching Ensembl transcripts at this point the RefSeq transcript is given an attribute to denote that there is no matching Ensembl model.

All matching Ensembl models have their stable ids listed in the value field of the corresponding transcript attribute for the RefSeq model.

Human: transcript attributes for Refseq-genomic-to-mRNA comparison (Human)

Transcript attributes will be added for the refseq_import geneset in the human otherfeatures db. Each refseq_import transcript will have an attribute to denote whether the genomic sequence that the transcript covers matches the mRNA sequence that the transcript is based on (the sequences present in the RefSeq mRNA file).

A prefect match is denoted as an alignment across the entirety of both sequences that contains no mismatches or indels. If initially there is a mismatch, the RefSeq mRNA will go through polyA clipping and the sequences will be compared again to see if a perfect match is possible post polyA clipping.

Transcripts that do not have a perfect match between the mRNA and the genomic sequence will get additional attributes to define what regions (5' UTR, CDS, 3' UTR, or 'whole transcript' if there is no CDS defined) do not align perfectly, along with a summary of the information in the alignment (match,mismatch, indel count, total indel length).

Human: updated RefSeq gene import (Human)

The imported RefSeq gene set was updated in the human otherfeatures database. Please note that RefSeq annotates gene models on cDNA sequence and not on the reference genome, meaning that when users choose to translate the RefSeq transcripts off the reference genome that the translations may contain stop codons.

Mouse: updated RefSeq gene import (Mouse)

The imported RefSeq gene set was updated in the mouse otherfeatures database. Please note that RefSeq annotates gene models on cDNA sequence and not on the reference genome, meaning that when users choose to translate the RefSeq transcripts off the reference genome that the translations may contain stop codons.

Production

EMBL and Genbank Dumps (all species)

EMBL and Genbank dumps for all species.

Ensembl 83 mart databases (all species)

  • Ensembl Genes 83
    • Retirement of the "flagged variants" (http://www.ensembl.org/Help/Glossary?id=533)
    • Renamed "Phase" attribute to "start phase" and added new "end phase" attribute to structure and sequence sections.
    • Human assembly updated from GRCh38.p3 to GRCh38.p5
  • Ensembl Variation 83
    • Retirement of the genotype and strain data. This will result in the removal of the "Strain Variants", "Strain - Other Variants (Indels, Multiple Nucleotide Polymorphisms)" and "Compare Strain genotypes" attribute sections.
    • Retirement of the "flagged variants" (http://www.ensembl.org/Help/Glossary?id=533)
    • Human assembly updated from GRCh38.p3 to GRCh38.p5
  • Ensembl Regulation 83
  • Vega 63
    • Renamed "Phase" attribute to "start phase" and added new "end phase" attribute to structure and sequence sections.
    • Human assembly updated from GRCh38.p3 to GRCh38.p5

External reference projection (all species)

Gene ontology (GO) identifiers and gene name projection to all species.

FASTA & GTF dumps (all species)

FASTA & GTF dumps for all the species

Variation

Phenotype data updates (multiple species)

  • Human phenotype data has been updated from different sources including NHGRI-EBI GWAS, OMIM, ClinVar, UniProt, Orphanet and GOA.
  • New Human phenotype association data: Cancer Gene Census
  • Other GOA data for Cow, Dog, Sheep, Zebrafish, Chicken, Macaque, Turkey
  • OMIA data for Cow, Dog, Horse, Sheep
  • RGD data for Rat
  • AnimalQTL for Cow, Horse, Chicken, Pig, Sheep
  • IMPC data for Mouse

Structural variants (multiple species)

  • Added new studies from DGVa
  • Updated some of the existing studies from DGVa

Move VEP_plugins git repository to Ensembl organisation (all species)

The VEP_plugins repo (https://github.com/ensembl-variation/VEP_plugins) was created before Ensembl had a git organisation.

The repo has been moved to the Ensembl organisation to aid discovery. Old links should continue to work.

COSMIC data update (Human)

The cancer data from COSMIC version 74 has been imported.

This import excludes the COSMIC alleles, populations and the mutations types.

dbSNP 145 rsIDs mapping (Gibbon)

The Gibbon database has been updated to include the rsIDs from dbSNP 145 for the variants submitted by Ensembl

ExAC data (Human)

  • New "ExAC" variation set available
  • New "ExAC" track available on the website
  • New evidence type of "ExAC" added

Web

Improved image export (all species)

As part of our ongoing upgrade of the website export functions, we have a new Image Export "wizard" in place of the old popup menu. This can still be found via the 'picture' icon in the blue bar above images, but it now opens a window with a form where you can choose from a selection of preset options or configure a custom output.

[screenshot]

Note that we have removed GFF export from this menu, in preparation for an upcoming revamp of the more general feature-and-sequence export. You can still export genes and a few other tracks from most images by clicking the 'Export data' button in the lefthand menu.

Retirement of archive 69 (all species)

This release cycle we will be retiring archive 69 (October 2012) in accordance with our three-year rolling retirement policy. The data will remain available on our public database server; only the web interface will be removed.

Ontology view moved (all species)

The Ontology view has been moved from the transcript to the gene panel, with table for the different Ontologies (Biological Process, Molecular Function etc) being shown in seperate views. The tables have an additionall column showing the stabel IDs of the transcruipt(s) to which the term has been mapped. An example is:

http://www.ensembl.org/Homo_sapiens/Gene/Ontologies/biological_process?db=core;g=ENSG00000072501;oid=1;r=X:53374149-53422728

Mobile site updates (all species)

The following updates to the mobile site have been released:

  • Improved behaviour when being redirected from www.ensembl.org to m.ensembl.org, for example informative messaging if the requested view is not available
  • Higher resolution share icon
  • Bug fixes (i) added ability to close "Switch to mobile site" banner (ii) logging in no longer redirects you to the desktop site

Vector image export for pairwise interaction data (all species)

Drawing of arcs has now been implemented in our PDF and SVG drawing code, which means that this data can now be displayed in all export formats.

Future Plans

Read about our future plans on our blog!