News for Platyfish Ensembl Release 80 (May 2015)

News categories

New web displays and tools

New export options for comparative views

As part of our ongoing upgrade of the export interface, Release 80 includes the following new features:

Homologues

Sequence export has been added for both orthologues and paralogues. Unaligned gene sequences can be exported in FASTA format, whilst multiple alignments can be exported in all our usual BioPerl formats (including FASTA); both types can be exported as DNA or amino acids.

Note that at the moment you cannot filter the paralogues by species, as that option is not available within the paralogue page itself.

Gene Trees

Previously, the only export available from individual nodes of the tree was in FASTA and Newick formats. These links have been replaced with one that opens the export interface, so you can choose any of the same formats as the full tree export.

OrthoXML filtering

Exports in orthoXML now honour the current page settings, allowing you to export only the homologues you see.

New styles for BigWig files on karyotype

Last release we added the ability to display your BigWig data on whole chromosomes or the entire karyotype, as line graphs or histograms. In release 80 we have added two new track styles for line graphs:

1 - "raw mean", i.e. mean values scaled relative to one another, instead of relative to the maximum value in that region (the default setting)

2 - "mean with whiskers", which combines the line graph with "whiskers" to display the minimum and maximum values within each bin.

To use either style, go to the track in the "Configure this page" screen, and choose from the style dropdown.

New userdata track type: long-range interactions

We are very pleased to announce that Ensembl now supports long-range pairwise interaction data, which can be drawn as arcs on Region in Detail. Scores are indicated using a grey-to-black gradient, and labels can be displayed by selecting the appropriate track style from the configuration menu.

Initially we are supporting the two formats developed by WashU for their Epigenomics browser: a simple text-based format which can be pasted into the form or uploaded from your computer, and a richer tabix-indexed format that must be attached via a URL. More information on both formats can be found in our online documentation.

We hope to support more formats in the future, so please let us know which formats you are currently using!

New species, assemblies and genebuilds

Vega Zebrafish annotation updated

Manual annotation of zebrafish from Havana has been updated and contains the data released in Vega 59

API and schema changes

API: new method to get the multiple alignment of several homologues

The new method is GeneTree::get_alignment_of_homologues($ref_member)

Other updates

Compara

Schema version update

79 -> 80

API: new methods to fetch data from a Gene / Transcript / DnaFrag

To improve the usability of our API, we'll add methods to fetch the data directly from the Core objects, without having to create Members and DnaFrags. Members and DnaFrags will still be used to represent the data on our side, though

D.rer GRCz10 pairwise alignments and syntenies

  • lastz D.rer vs T.rub (on D.rer)
  • lastz D.rer vs T.nig (on D.rer used to be T.nig)
  • lastz D.rer vs G.acu (on D.rer used to be G.acu)
  • lastz D.rer vs O.lat (on D.rer)
  • lastz D.rer vs O.nil (on D.rer)
  • lastz D.rer vs X.mac (on D.rer)
  • lastz D.rer vs L.ocu (on D.rer)
  • lastz D.rer vs G.mor (on D.rer)
  • lastz D.rer vs A.mex (on D.rer)
  • lastz D.rer vs P.for (on D.rer)
  • lastz D.rer vs G.gal (on G.gal)
  • lastz D.rer vs H.sap (on H.sap)
  • lastz D.rer vs M.mus (on M.mus)
  • lastz D.rer vs X.tro (on D.rer)
  • lastz D.rer vs L.cha (on D.rer)
  • lastz D.rer vs P.mar (on D.rer)
  • lastz D.rer vs C.sav (on D.rer)
  • lastz D.rer vs C.int (on D.rer)

Synteny maps will be generated when both species have their karyotype stored in the database

R.nor Rnor_v6.0 pairwise alignments and syntenies

  • lastz R.nor vs M.mus (on M.mus) + synteny
  • lastz R.nor vs H.sap (on H.sap) + synteny

D.rer GRCz10 multiple alignments

  • 5-way fish EPO alignments
  • 11-way fish EPO-2X alignments

R.nor Rnor_v6.0 multiple alignments

  • 17-way eutherian EPO alignments
  • 39-way eutherian EPO-2X alignments
  • 23-way amniota MercatorPecan alignments

We will also regenerate the "Age Of Base" human track from the new 17way EPO MSA

Core

Ensembl VM Build

The Ensembl Virtual Machine applicance will be updated to version 80.

patch_79_80_a.sql - schema_version update

Update schema_version in meta table to 80.

Stable ID lookup

Stable ID lookup provided for REST services

Includes lookup for RefSeq and CCDS entries

patch_79_80a.sql - schema_version update in production db

Update schema_version in production database to 80.

patch_79_80_a.sql - schema_version update in ontology db

Update schema_version in meta table to 80.

patch_79_80_b.sql

increase length of dbprimary_acc column in xref table

patch_79_80_c.sql

increase length of synonym column in seq_region_synonym table

patch_79_80_d.sql

increase length of value column in genome_statistics table

Production

EMBL and Genbank Dumps

EMBL and Genbank dumps for all species.

Ensembl 80 mart databases

  • Ensembl Genes 80
    • Added Protein domains start and end attributes (protein-based coordinate)
    • Renamed "ENCODE region" filter to "ENCODE Pilot Regions", added a link to the publication (http://www.genome.gov/26525202
    • Renamed the following Uniprot filters and attributes
      • "UniProt/Swissprot ID" to "UniProt/Swissprot Accession"
      • "UniProt/TrEMBL ID" to "UniProt/Swissprot Accession"
      • "UniProt Genename ID" to "UniProt Gene Name"
      • "Uniprot Genename Transcript Name" to "Uniprot Transcript Name"
    • Updated rat (Rnor_6.0) and zebrafish (GRCz10) assemblies
    • New Phenotype source filter in the "Phenotype" filter section
    • Renamed "APPRIS principal isoform annotation" filter and attributes to "APPRIS annotation"
  • Ensembl Variation 80
    • New Phenotype source filter in the "General variation" filter section
    • Updated rat (Rnor_6.0) and zebrafish (GRCz10) assemblies
  • Ensembl Regulation 80
  • Vega 60
    • Updated rat (Rnor_6.0) and zebrafish (GRCz10) assemblies
    • Added Protein domains start and end attributes (protein-based coordinate)

External reference projection

Gene ontology (GO) identifiers and gene name projection to all species.

FASTA & GTF dumps

FASTA & GTF dumps for all the species

New Ensembl BioMart documentation

We now have a brand new Ensembl BioMart documentation, we have re-organised, updated and added the following new pages:

  • Combining multiple species datasets
  • BiomaRt, Bioconductor R package

  • BioMart perl API

  • BioMart RESTful access (Perl and wget)

You can find the new documentation on the following page: http://www.ensembl.org/info/data/biomart/index.html

Web

Deprecation of Sanger::Graphics

To improve long-term maintainability of the Perl GD drawing code, in release 79 we moved all necessary functionality from the Sanger::Graphics namespace into EnsEMBL::Draw. All Sanger::Graphics modules have been deprecated as of release 80, and they will be removed in release 82.

This change should only affect developers whose code calls methods directly from Sanger::Graphics instead of inheriting from EnsEMBL::Draw modules.

Note that a few helper modules, including ColourMap, have been moved into EnsEMBL::Draw::Utils.

New web dependency - ensembl-io

Starting with release 80, the Ensembl webcode will have an additional dependency: ensembl-io, our Git repository for file-parsing code. This can be checked out from GitHub the same way as our other repos, and is also included in the "web" group of repos used by ensembl-git-tools, so that the command

git ensembl --clone web

will automatically clone ensembl-io in addition to existing web code.

Initially this dependency will only affect variation data, but the plan is to integrate the new parsers into other areas of the website such as user uploads. Deprecation of the old parser modules will be announced in due course.

Gene Expression Atlas Widget

The Gene Expression Atlas widget has been embedded in ensembl. You can now view where the gene is expressed anatomically (where exactly in the species) and also which experiment it is associated with.

www.ensembl.org/Homo_sapiens/Gene/ExpressionAtlas?g=ENSG00000174485;r=15:65658046-65792293 

The code has been added to the widgets plugin.

GlyphSets at risk

The following GlyphSets are unused in the core Ensembl webcode. Third-party integrators should be aware that they are at risk of deletion in a future release. If you have any use for the following GlyphSets, please contact the Ensembl team.

_text, ctcf, fg_wiggle, GlyphSet_feature, histone_modifications, ld2, lsv_variations, missing, P_protdas, P_separator, preliminary, restrict, simple_histogram, tsv_missing, urlfeature, Vrefseqs

Tools (BLAST & BLAT) gap initiation update

Gap initiation and extension penalties has been updated/changed to reflect the correct order which is dependant on the matrix (BLOSUM and PAM) and the same has been done for BLASTN which is dependant on the match and mismatch score.

Download the ncRNA secondary-structure as SVG

You can now download the ncRNA secondary structure view as  SVG. A link has been added at the top of the view.

Track label improvements for images

Some tracks in images now appear within sections, grouping common tracks within a category.

Each section is identified by a heading underlined in a certain colour, and each track within that section by the same colour being used on its left-hand side.

Also, some tracks now have labels within the image itself, to allow particularly long labels. These in-image labels can be configured on or off via the configuration panel.

Initially these features are primarily targeted at trackhub support, but will be increasingly used for other tracks as the opportunity is identified.