EnsemblEnsembl Home

Ensembl News for Release 78 (December 2014)

Headlines

News categories

New web displays and tools

Updated export for homologues and trees (all species)

Phase 3 of our export update includes the remaining comparative genomics views, as follows.

Orthologues and Paralogues

Orthologues and paralogues can now be exported as OrthoXML only, using the blue download button above the summary table.

In addition, homologue alignments - both cDNA and protein - are now available in the folllowing formats:

  • CLUSTALW
  • FASTA
  • Mega
  • MSF
  • Nexus
  • OrthoXML
  • Pfam
  • Phylip
  • PSI
  • Stockholm

To find this export, go to an Orthologue or Paralogue page, click on 'Alignment (protein)' or 'Alignment (cDNA)' in the table and then click on the blue button labelled 'Export alignment'.

Gene trees

The pages 'Gene tree (text)' and 'Gene tree (alignment)' have been removed and replaced with a single export interface. Click on the download icon in the blue bar above the gene tree image to open the export panel.

Available formats:

  • CLUSTALW
  • FASTA
  • Mega
  • MSF
  • NHX
  • Newick
  • Nexus
  • OrthoXML
  • PhyloXML
  • Pfam
  • Phylip
  • PSI
  • Stockholm
  • Text tree

Gene gain/loss trees

This new option allows you to export gene gain/loss trees in PhyloXML format.

Protein families

Another new export option is the ability to export all the protein alignments for a family. From the Gene Family page, click on a family stable ID and on the following page you will see a blue 'Export family alignment' button above the table of proteins.

Note: this function is only available for families with two or more genes.

Changes to legacy export interface

The PhyloXML and OrthoXML options have been removed from the list on the 'Export Data" popup panel, as they are no longer supported. Please export these formats from the relevant pages.

RNASeq track display (all species)

RNASeq tracks can now be displayed as either a "wiggle track" summary (the default) or as individual features; the summary version uses a BigWig file created from the BAM file. This speeds up the display and allows us to show this information over larger regions than is possible using the raw data.

Improved RTF export (all species)

We have now integrated many of the options from the "Configure This Page" interface, e.g. variation filtering, into the RTF export so that any changes you have made to the page will be reflected in your RTF download. You can also change these options in the export interface just before you download, but these changes will not be copied back to the page itself.

New species, assemblies and genebuilds

Vega Mouse annotation updated (Mouse)

Manual annotation of mouse from Havana has been updated and contains the data released in Vega 58

Vega pig annotation updated (Pig)

Manual annotation of pig from Havana has been updated and contains the data released in Vega 57

New alignments

GRC alignments (Human)

GRC alignments between the primary assembly and the alternate loci added.

API and schema changes

Variation API changes (all species)

  • Add a new object "Source" handling the source data used in several Variation API objects.

Adjust Unmapped_reason table (all species)

Change unmapped_reason_id from smallint to int to match Regulation requirements

Other updates

Update to Ensembl-Havana mouse merge (Mouse)

Updated Ensembl-Havana mouse gene set. This gene set is a merge of complete Ensembl gene models and the latest Havana gene annotation. All CCDS genes are included in this gene set.

The mouse GRCm38.p3 gene annotation is also included:

The patches for GRCm38.p3 were annotated using a combination of annotation projected from the primary assembly and annotation derived from cDNA and protein alignment evidence. Annotation of the patches is stored in the core database.

Mouse: assembly updated to GRCm38.p3 (Mouse)

The mouse genome assembly was updated to GRCm38.p3 and the assembly information in all mouse databases has been altered accordingly. This minor assembly update contains 17 assembly patches. The DNA sequence for the primary assembly (chromosomes, unlocalized scaffolds and unplaced scaffolds) remains unchanged.

Ensembl 78 mart databases (all species)

  • Ensembl Genes 78
    • Updated mouse assembly to GRCm38.p3
    • Added "Transcript length" in the Features, Structures and Sequences pages of the attribute section
    • Removed DN/DS attributes for species that don't have the data
    • Added "Transcript Support Level (TSL)" to the attribute section
    • Added "GENCODE basic annotation" to the attribute section
    • Added "APPRIS principal isoform annotation" to the attribute section
    • Added "Transcript Type" to the filter section
    • Added QTL data for Horse
    • Renamed "Gene biotype" and "Transcript biotype" to "Gene type" and "Transcript type" in the attribute section
  • Ensembl Variation 78
    • Updated mouse assembly to GRCm38.p3
    • Added "Chromosome postition end (bp)" in filter and attribute sections (Short Variation datasets only)
    • Added band filter to all the structural variation datasets
    • Added marker filter to all the structural variation datasets
    • Added default attributes for all the structural variation datasets
    • Renamed the following attributes
      • "1000 genomes global MAF (ALL) to "1000 genomes global Minor Allele Frequency (ALL)
      • "1000 genomes global MAC (ALL) to "1000 genomes global Minor Allele Count (ALL)
      • "Position on chromosome (bp) to "Chromosome position start (bp)"
      • "sequence region start (bp)" to "Chromosome position start (bp)" (structural variation datasets only)
      • "sequence region end (bp)" to "Chromosome position end (bp)" (structural variation datasets only)
  • Ensembl Regulation 78
    • Updated mouse assembly to GRCm38.p3
  • Vega 58
    • Updated mouse assembly to GRCm38.p3
    • Added "Transcript Type" to the filter section

Updated mouse otherfeatures db: New CCDS import (Mouse)

This release of the mouse gene set also includes nn,nnn transcript models as part of an updated version (Month 2014) of CCDS

Mouse: GRCm38.p3 Karyotype Bands (Mouse)

Karyotype bands were updated in regions overlapping patches

Merge species: updated RefSeq gene import (all species)

RefSeq GFF3 annotation were added to the otherfeatures databases. Please note that RefSeq annotates gene models on cDNA sequence and not on the reference genome, meaning that when users choose to translate the RefSeq transcripts off the reference genome that the translations may contain stop codons. In this release we will also include Immunoblobulin genes, tRNAscan-SE source data and MT genes.

Mouse: updated cDNA alignments (Mouse)

A new cdna database was created for e78: The latest set of cDNAs for mouse (as of Month 2014) from the European Nucleotide Archive and NCBI RefSeq (release nn) were aligned to the current genome using Exonerate.

Human: updated cDNA alignments (Human)

A new cdna database was created for e78: The latest set of cDNAs for human (as of Month 2014) from the European Nucleotide Archive and NCBI RefSeq (release nn) were aligned to the current genome using Exonerate.

Corrected FANTOM 5 mappings (Human)

A bug was reported on the mapping of FANTOM5 features onto the GRCh38, which we corrected.

Micro-array mapping (all species)

All species which had a genome assembly or transcript update had the appropriate alignments and xref redone.

This includes a correction to the Human array xrefs, where some AFFY_ST array xrefs were missing and others array formats had release 76 xrefs 

EMBL and Genbank Dumps (all species)

EMBL and Genbank dumps for all species.

External reference projection (all species)

Gene ontology (GO) identifiers and gene name projection to all species.

FASTA & GTF dumps (all species)

FASTA & GTF dumps for all the species

External database references update (all species)

Xrefs update for:

human, mouse, rat, cod, turkey, cat, marmoset, guinea pig, platyfish, tetraodon, orangutan, microbat, nile tilapia, gibbon, ferret, cave fish, spotted gar.

Stable ID lookup (all species)

Stable ID lookup provided for REST services

Includes lookup for RefSeq and CCDS entries

patch_77_78_b.sql - source column size increase (all species)

New Refseq data requires a larger source column in the gene and transcript tables.

patch_77_78_a.sql - schema_version update in ontology db (all species)

Update schema_version in meta table to 78.

patch_77_78a.sql - schema_version update in production db (all species)

Update schema_version in production database to 77.

LRG Import (all species)

Importing the latest version of Locus Reference Genomic dataset

Import COSMIC variants (Human)

Import COSMIC's version 71 and remap the data to GRCh38

Phenotype data updates (Human)

Human phenotype data will be updated from different sources including NHGRI GWAS, OMIM, ClinVar, UniProt and Decipher.

Database schema change (all species)

  • Add a column "copy_number" in the table "structural_variation" to store the number of copies for the CNV, at the supporting evidence level.
  • Delete the table "study_variation"
  • Update the index "type_val_idx" in the "attrib" table by extending the indexed size for the "value" column (currently limited to 40 characters).

REST support for multiple sequences (all species)

The sequence endpoint should accept requests for multiple sequences in a POST request.

Vervet-AGM Maker gene annotation for Otherfeatures (Vervet-AGM)

A gene set made for Vervet-AGM by WashU using the Maker software package has been added to the database.

C. elegans annotation update (WS245) (Caenorhabditis elegans)

Gene set and other annotations updated to data from WormBase release WS245.

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.27+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.017)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Getting distances by NCBI BlastP (v.2.2.28+)
 -- Clustering by MCL (v.12-135)
 -- Multiple Sequence Alignments with MAFFT (v.7.113)
 -- Family stable ID mapping

ncRNAtrees and homologies (all species)

Classification based on Rfam models (v11.0)
Multiple sequence alignments with Infernal
Phylogenetic reconstruction using RAxML
Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
Additional multiple sequence alignments with Prank (w/ genomic flanks)
Additional phylogenetic reconstruction using PhyML and NJ
Phylogenetic tree merging using TreeBeST
Per family gene dynamics using CAFE
Homology inference
Secondary structure plots

Compara dumps (all species)

EMF / Fasta / OrthoXML / PhyloXML dumps for ProteinTrees
EMF / Fasta / OrthoXML / PhyloXML dumps for ncRNAtrees
PhyloXML dumps for CAFE ProteinTrees
PhyloXML dumps for CAFE ncRNAtrees

HGMD data update (Human)

Import of the latest release of public HGMD data (version 2014.2 from June 2014) and remapping to GRCh38

Add HumanCoreExome chip (Human)

We add the HumanCoreExome chip to our database. All variants located on the chip will be added to the set HumanCoreExome.

Update data from Animal QTL database (Cow, Horse, Chicken, Pig, Sheep)

We update our data from the Animal Quantitative Trait Loci (QTL) Database.

Removal of obsolete tables from ensembl_website (all species)

The following tables have been dropped from the ensembl_website database as they are no longer in use - their contents were moved to either ensembl_archive or ensembl_production many releases ago:

  • ens_release
  • item_species
  • news_category
  • news_item
  • release_species
  • species

In addition, table help_record_link has been dropped as it was never used - the help_link table is used to associate URLs with help_record entries.

The associated queries have also been removed from EnsEMBL::Web::DBSQL::WebsiteAdaptor.

New tables (all species)

We are adding two new tables that will be used in the future to store an HMM classification of the proteins

New synteny analyses (all species)

New synteny data for:

  • Chicken vs Zebrafinch
  • Human vs Orangutan
  • Chicken vs Opossum

New speciestree view (all species)

it is now possible to view the ensembl species tree with this new interactive view. In previous release, you only had the possibility to download the tree in pdf but you can now view it here:

http://www.ensembl.org/info/about/species.html  (and click the link "View the full Ensembl species tree")

you can switch between radial and vertical view. You can also view NCBI or Ensembl tree.

The following browsers support the new view:

IE 9 onwards;  firefox 31 onwards; Chrome 31 onwards; Safari 5.1 onwards

And for older browser there will be the link to download the tree in pdf.

patch_77_78_a.sql - schema_version update (all species)

Update schema_version in meta table to 78.

patch_77_78_b.sql - unmapped_reason_id (all species)

Change unmapped_reason_id from smallint to int

Rendering of BigBed files (all species)

We have updated our BigBed parser to reflect changes in the UCSC bedToBigBed script, specifically the naming of the last field as 'chromStarts' in AutoSQL rather than the 'blockStarts' specified in the online documentation.

This allows us to correctly render features that contain this information, i.e. as a series of joined blocks instead of a single alignment.

We have also updated our online documentation about the BED format.

Structural variations (Human)

Added new studies and updated other studies from DGVa.

patch_77_78_c.sql (all species)

Integrate schema changes to support Regulation requirements

Multiple assembly support for REST sequence endpoints (Human)

For human, sequence will be available for previous assemblies as well as the current one

patch_77_78_c.sql - master_unmapped_reason (all species)

master_unmapped_reason patched to match datatype with core schema.

Retitrement of archive 65 (all species)

This release cycle we will be retiring archive 65 (December 2011) in accordance with our three-year rolling retirement policy. The data will remain available on our public database server; only the web interface will be removed.

Pre Species on the species list page (all species)

We now have a duplicate row for species which is both on pre and ensembl. One row is the species with the ensembl assembly and the row with the faded species image is the one with the pre assembly.

http://www.ensembl.org/info/about/species.html

public MySQL server for GRCh37.ensembl.org resources (all species)

The public MySQL server for the GRCh37 databases (ensembldb.ensembl.org port 3337) will contain two full sets of databases - one on the current schema and one on the previous schema.

VEP : quick preview of results (all species)

When pasting compatible data (all formats apart from Pileup) into the input box for VEP you will be presented with a blue 'Quick results for Variant' box. Choosing this will generate a quick preview of the consequences, limited to running the first Variation in the list against the Ensembl Transcript data set. If you wish to view the complete set of results for all Variations in your file then you can still submit the job to our Tools queue as usual.

patch_77_78_a.sql - schema_version update (all species)

Update schema_version in meta table to 78.

Ensembl VM Build (all species)

The Ensembl Virtual Machine applicance will be updated to version 78.

Archive of previous news

Future Plans

Read about our future plans on our blog!