EnsemblEnsembl Home

News for Ensembl Release 73 (September 2013)

News categories

Compara

Pairwise alignments (all species)

Lastz alignments:

  • Flycatcher vs Human
  • Flycatcher vs Chicken
  • Duck vs Human
  • Duck vs Chicken

Lastz patch alignments:

  • human_ref vs human_patches
  • human haplotype alignments for high coverage
  • remove DELETED or UPDATED pairwise alignment patches from the release databas

ProteinTrees and homologies (all species)

 

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

  • all-vs-all blastp (ncbi-blast-2.2.27+)
  • Clustering using hcluster_sg
  • Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.017)
  • Phylogenetic reconstruction using TreeBeST
  • Homology inference
  • Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
  • GeneTree stable ID mapping
  • Per family gene dynamics using CAFE (v2.2)

ncRNAtrees and homologies (all species)

  • Classification based on Rfam models (v11.0)
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE (v2.2)
  • Homology inference
  • Secondary structure plots

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

  • Getting distances by NCBI BlastP (v.2.2.27+)
  • Clustering by MCL (v.12-135)
  • Multiple Sequence Alignments with MAFFT (v.7.017)
  • Family stable ID mapping

Compara dumps (all species)

 

  • [ ] EMF dumps for ProteinTrees
  • [ ] EMF dumps for ncRNAtrees
  • [ ] OrthoXML dumps for ProteinTrees
  • [ ] OrthoXML dumps for ncRNAtrees
  • [ ] PhyloXML dumps for ProteinTrees
  • [ ] PhyloXML dumps for ncRNAtrees

API/schema changes (all species)

  • Protein-tree pipeline: switch to NCBI-blast

  • API: drop support for deprecated methods in: MethodLinkSpeciesSet, GeneTreeNode, AlignedMemberAdaptor, SequenceAdaptor, GenomeDB

  • API: remove deprecated objects / adaptors: ProteinTreeAdaptor, NCTreeAdaptor, Subset(Adaptor)

Core

LRG Import (all species)

Importing the latest version of Locus Reference Genomic dataset

Stable ID cleanup (all species)

Removal of confusing stable id history to provide a more useful lookup

Stable ID lookup (all species)

Stable ID lookup provided for REST services

Misc-scripts cleanup (all species)

Removing a number of files and directories from ensembl checkout misc-scripts

Some have been moved to ensembl-production, others are simply deprecated and not maintained any more

patch 72_73_a.sql - schema version update for ontology (all species)

Update schema_version in ontology meta table to 73.

External database references update (multiple species)

Xref updates for

  • homo sapiens
  • danio rerio
  • pan troglodytes
  • otolemur garnettii
  • gadus morhua
  • dasypus novemcinctus
  • taeniopygia guttata
  • ictidomys tridecemlineatus
  • pelodiscus sinensis
  • canis familiaris
  • sus scrofa

Ensembl VM Build (all species)

The Ensembl Virtual Machine applicance will be updated to version 73.

patch 72_73_a.sql - schema version update (all species)

Update schema_version in meta table to 73.

Object_xref enum expansion (all species)

Object_xref ensembl_object_type expanded to include marker

patch 72_73a.sql - schema version update for production (all species)

Update schema_version in production database meta table to 73.

Genebuild

Update to Ensembl-Havana GENCODE gene set (release 18) (Human)

Updated Ensembl-Havana gene set (GENCODE release 18). This gene set is a merge of complete Ensembl gene models and the latest Havana gene annotation. All CCDS genes are included in this gene set.

The human GRCh37.p12 gene annotation is also included:

The patches for GRCh37.p12 were annotated using a combination of manual annotation, annotation projected from the primary assembly and annotation derived from cDNA and protein alignment evidence. Annotation of the patches is stored in the core database.

Human: assembly updated to GRCh37.p12 (Human)

The human genome assembly was updated to GRCh37.p12 and the assembly information in all human databases has been altered accordingly. This minor assembly update contains 194 assembly patches. The DNA sequence for the primary assembly (chromosomes 1-22, X, Y, unlocalized scaffolds and unplaced scaffolds) remains unchanged.

Havana merge for Zebrafish (Zebrafish)

An updated set of Zebrafish genes will be released. There hasn't been a new Ensembl genebuild but the Havana annotations have been updated and we therefore re-ran the merge of the two gene sets.

Human: updated RefSeq gene import (Human)

The imported RefSeq gene set was updated in the human otherfeatures database. Please note that RefSeq annotates gene models on cDNA sequence and not on the reference genome, meaning that when users choose to translate the RefSeq transcripts off the reference genome that the translations may contain stop codons.

Duck genebuild (Duck)

Duck is a new species to Ensembl as from release 73. Here we provide gene annotation on the genome assembly BGI_duck_1.0. This assembly and annotation were available through our Pre! site for some time and they are now available through our main site.

Mouse: updated RefSeq gene import (Mouse)

The imported RefSeq gene set was updated in the mouse otherfeatures database. Please note that RefSeq annotates gene models on cDNA sequence and not on the reference genome, meaning that when users choose to translate the RefSeq transcripts off the reference genome that the translations may contain stop codons.

Human: GRCh37.p12 Karyotype Bands (Human)

Karyotype bands were updated in regions overlapping patches

Updated human otherfeatures db: New CCDS import (Human)

This release of the human gene set also includes 27,747 transcript models as part of an updated version (Jul 2013) of CCDS

Flycatcher new assembly and genebuild (Flycatcher)

The flycatcher assembly FicAlb_1.4 was released. We have produced new gene annotation on this assembly.

Flycatcher Otherfeatures database (Flycatcher)

Flycatcher EnsEMBL longest translations from chicken and zebra finch have been aligned to FicAlb_1.4 to generate gene models and are made available through the website and otherfeatures database.

Flycatcher RNASeq database and Bam files (Flycatcher)

In addition to the gene annotation for FicAlb_1.4, an rnaseq database will be released where users can view BAM files and transcript models for a number of tissues including embryo, liver, heart, kidney and brain.

Updated mouse otherfeatures db: New CCDS import (Mouse)

This release of the mouse gene set also includes 22,983 transcript models as part of an updated version (Jul 2013) of CCDS

Mouse: updated cDNA alignments (Mouse)

A new cdna database was created for e73: The latest set of cDNAs for mouse (as of July 2013) from the European Nucleotide Archive and NCBI RefSeq (release 59) were aligned to the current genome using Exonerate.

Human: updated cDNA alignments (Human)

A new cdna database was created for e73: The latest set of cDNAs for human  (as of July 2013) from the European Nucleotide Archive and NCBI RefSeq (release 56) were aligned to the current genome using Exonerate.

Rabbit gene set updated using RNAseq data (Rabbit)

RNAseq data has been used to update the protein-coding gene set.

Vega Human annotation updated (Human)

Manual annotation of human from Havana has been updated and contains the data released in Vega release 53.

Vega Zebrafish annotation updated (all species)

Manual annotation of zebrafish from Havana has been updated and contains the data released in Vega 53.

Rabbit RNASeq database and Bam files (Rabbit)

In addition to the new core DB for Rabbit, an rnaseq database will be released where users can view BAM files and transcript models for a number of tissues including testis, liver, heart, kidney and brain.

Fixed RNASeq ftp broken links from the Location view (multiple species)

Fixed FTP broken links available from the analysis description for RNASeq alignments.

Production

Ensembl 73 mart databases (all species)

  • Ensembl Genes 73
    • Updated Human assembly to GRCh37.p12
    • Added new species Duck (Anas platyrhynchos) and Flycatcher (Ficedula albicollis)
    • Added new "QTL chromosome name" and "QTL region" filters in the region section for Rat, Cow and Pig.
    • Added "EntrezGene transcript name" in the filter and attribute sections for several species
  • Ensembl Variation 73
    • Added new "QTL chromosome name" and "QTL region" filters in the region section for Rat, Cow and Pig.
    • Updated the human somatic variation database to COSMIC 65 data
    • Added a new "year" attribute in the Variation citation section
    • Removed "phenotype name" from the attribute section
  • Vega 53
    • Updated Human assembly to GRCh37.p12
  • Ensembl Regulation 73

Splicing events (all species)

The ASTD project computationally predicted genes in a similar way to Ensembl every release with a focus on alternative mRNA structures (splicing events, poly(A) sites, TSS) and features (ppt, exon-exon junction types)

Since 2010, the storage and display of the alternative information is now an entire part of Ensembl for the following species:

  • Homo sapiens,
  • Mus musculus,
  • Rattus norvegicus,
  • Danio rerio,
  • Caenorhabditis elegans,
  • Drosophila melanogaster

These data have been updated for this release for the species listed above.

FASTA & GTF dumps (all species)

FASTA & GTF dumps for all the species

External reference projection (all species)

Gene ontology (GO) identifiers and gene name projection to all species.

EMBL and Genbank Dumps (all species)

EMBL and Genbank dumps for all species.

New ensembl-production directory in CVS (all species)

A new ensembl-production directory has been added to CVS and will be included in the release 73 branch. This contains mainly internally used production pipeline code that has been moved from the misc_scripts directory in the ensembl-core checkout.

Variation

New dbSNP imports (Zebrafish, Chicken, Rat, Pig)

dbSNP Build 138 data will be imported

Import of genotyping chip assay lists (Cow, Horse, Chicken)

Variant lists from the Affymetrix Axiom Chicken Genotyping Array and the Illumina EquineSNP50, BovineHD, BovineLD and BovineSNP50 arrays will be imported and made available as tracks in the browser and variation_sets for API access.

Add new value to evidence classification (Human)

A new evidence value 'ESP' will be added to our current list of classifications for summarising the data supporting a variant. The new evidence value indicates that the variant was dicovered in the NHLBI GO Exome Sequencing Project.

Schema changes (Human)

- add column phased_gt to genotype tables to indicate that data is phased

- add column year to publication table to store year of publication

Update ESP data (Human)

We update data from NHLBI GO Exome Sequencing Project (ESP) to EVS-v.0.0.20.

Import HGMD-PUBLIC (Human)

Import the HGMD-PUBLIC data from the release 2013.2, with regulatory data.

Import COSMIC variants (Human)

Import COSMIC's version 65.

Structural variations (Human, Mouse)

  • Update studies
  • Import new studies.

Odds ratio data (Human)

We add odds ratio data from the NHGRI GWAS catalog.

PhenCode import (Human)

Variant data from the PhenCode project will be imported and presented as a new variation_set and track

Mouse phenotype data (Mouse)

We add phenotype data for mouse from EuroPhenome, International Mouse Phenotyping Consortium and WTSI Mouse Genetics Project.

dbGaP phenotype data (Human)

Phenotype data associated with the dbSNP variants from dbGaP

Web

New search engine (all species)

In release 73 we will be switching over to using the Solr search engine, which builds on our existing Lucene search as follows:

  • faceted searching - restrict an existing search by species or category
  • Google-style search listings
  • Highlighting of your search term in results
  • Suggestions of similar terms (in case you mistyped a word)
  • Autocomplete for "real words" (e.g. enzyme names)
  • Preview of top result

Additional enhancements may be introduced in future releases.

Alternative display styles for assembly exceptions (all species)

In order to accommodate the large number of assembly exceptions in Human, we have created alternative display styles for this track on the overview image of the whole chromosome on Location pages.

By default, all exceptions of the same type (e.g. MHC regions) will be collapsed into a single feature, but you will be able to revert to the expanded view using the same controls as other tracks, i.e. the "gear" icon on the blue toolbar, or the popup menu on the track name at the lefthand side of the image.

Database ensembl_website has been split into two (all species)

For better maintenance of our archive system, some of the tables from the database ensembl_website_73 have been moved into a new database, ensembl_archive_73, and the corresponding webcode updated. This change will allow us in future to connect all archive sites via a symlink to a single database with up-to-date information on which other archives are up and running, so that we can link between them.

The new database will be needed if you are setting up a local mirror of Ensembl and wish to link to our archives, but is not otherwise required.

New "Quick Guide" pages (all species)

For release 73 we have added more pages that give short tips on how to use some of Ensembl's most popular features. These pages can be accessed via boxes on the main home page or can be found in the relevant section of 'Help and Documentation'.

New topics include:

Change to EnEMBL::Web::Configuration child modules (all species)

If you use custom versions of EnsEMBL::Web::Configuration::<Object>.pm in your plugins to alter the available components, please be aware that we have made the key "summary" a reserved word*. You will therefore need to change your code as per the following example:

  #change components of the existing Summary node
  my $node = $self->get_node('Summary');
  $node->set('components',[qw(
                              summary      EnsEMBL::Web::Component::Gene::GeneSummary
                              transcripts  EnsEMBL::Web::Component::Gene::TranscriptsImage)],
  );

to

  #change components of the existing Summary node
  my $node = $self->get_node('Summary');
  $node->set('components',[qw(
                              gene_summary  EnsEMBL::Web::Component::Gene::GeneSummary
                              transcripts   EnsEMBL::Web::Component::Gene::TranscriptsImage)],
  );

* This was needed in order to configure the chromosome image on Location pages

Future Plans

Read about our future plans on our blog!