EnsemblEnsembl Home

News for Ensembl Release 65 (December 2011)

News categories

Compara

Pairwise alignments (all species)

 

  • [ ] human vs chimpanzee LastZ alignments
  • [ ] human vs bushbaby LastZ alignments
  • [ ] cod vs  Danio rerio LastZ alignments
  • [ ] cod vs stickleback LastZ alignments
  • [ ] human vs cod Translated Blat alignments

Multiple alignments (all species)

  • [ ] 6way-primate epo alignments to incorporate new chimpanzee 
  • [ ] 12way-mammal-epo alignments to incorporate new chimpanzee
  • [ ] 19way-amniota-pecan alignments to incorporate new chimpanzee
  • [ ] 35way-mammal low-coverage-epo alignments (new chimpanzee and bushbaby)
  • [ ] 6way-fish-epo alignments (addition of cod) (cancelled)

Syntenies (all species)

[ ] human chimpanzee synteny

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

  • Clustering using hcluster_sg
  • Multiple sequence alignments using MCoffee or Mafft
  • Phylogenetic reconstruction using TreeBeST
  • Homology inference
  • Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues)
  • GeneTree stable ID mapping

ncRNAtrees and homologies (all species)

  • Classification based on Rfam model
  • Multiple sequence alignments with infernal
  • Phylogenetic reconstruction using RaxML
  • Phylogenetic reconstruction using FastTree2 and RaxML-light for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Homology inference

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

  • Clustering by MCL
  • Multiple Sequence Alignments with MAFFT
  • Family stable ID mapping

Compara dumps (all species)

  • [ ] EMF dumps for 19 way PECAN multiple aligments
  • [ ] BED files for 19 way GERP constrained elements
  • [ ] EMF dumps for 12 way EPO multiple aligments
  • [ ] EMF dumps for 6way EPO multiple alignments
  • [ ] EMF dumps for 35 way low-coverage alignments
  • [ ] BED files for 35 way low-coverage alignments
  • [ ] EMF dumps for 6way EPO fish multiple alignments
  • [ ] BED dumps for 6way EPO fish multiple alignments
  • [ ] Data dumps for ProteinTrees
  • [ ] Data dumps for ncRNAtrees
  • [ ] OrthoXML dumps for ProteinTrees
  • [ ] OrthoXML dumps for ncRNAtrees
  • [ ] maybe PhyloXML dumps for ProteinTrees ?
  • [ ] maybe PhyloXML dumps for ncRNAtrees ?
  • [ ] Ancestral sequences for primates

API/schema changes (all species)

[ ] changes in the schema: adding tables for tree's properties and node's properties to make tag storage and extraction more efficient

API/schema changes (all species)

[ ] left_index and right_index now start from 1 for every tree. This should avoid the deadlocks on lr_index_offset

API/schema changes (all species)

[ ] bugfix: Storing several lost_taxon_id for each node

API/schema changes (all species)

[ ] rationalise the use of tags in protein_tree_tag:

API/schema changes (all species)

[ ] DnaFragRegion and SyntenyRegion should not be inheriting from NestedSet

Web (all species)

[ ] Statistics of protein trees and homologies on the web (either static files, or dynamic page + table)

Web (all species)

[ ] OrthoXML export function (for any gene tree)

Web (all species)

[ ] Tree support information (which ones of the 5 initial trees are supporting the current node) (+corresponding field in the schema)

Web (all species)

[ ] Information about lost taxa

Core

Merge of stable_id tables with their object tables (all species)

patch_64_65_b.sql

Schema patch to merge stable_id tables with exon, gene, operon, operon_transcript, translation and transcript tables

Changes to the core API, stable_id mapping scripts, xref_mapping, test modules.

Xref projections (all species)

Gene ontology (GO) identifiers and gene names will be projected for all species. Modify Modify projection code to replace UniProt IDs with display labels where applicable.

LRG Import (Human)

Import of new LRG sequences

Memory leak fixes (all species)

A number of leaks have been found & fixed in the Ensembl API caused by circular references and unintentional object usage.

Schema change; new enum for xref.info_type (all species)

The enum field `xref`.`info_type` has been given a new value called CHECKSUM to support xrefs assigned on the basis of a checksum mapping.

Region Report tool (all species)

A new tool plus supporting libraries for creating summaries of requested chromosomal regions.

external database references (all species)

Human, Mouse, Pig, Ciona Intestinalis, Cow, Pig, Platypus, Chick, Dog, Horse, Tetraodon external database references have been updated. 

 

Splicing events (multiple species)

The ASTD project computationally predicted genes in a similar way to Ensembl every release with a focus on alternative mRNA structures (splicing events, poly(A) sites, TSS) and features (ppt, exon-exon junction types)

Since 2010, the storage and display of the alternative information is now an entire part of Ensembl for the following species:
    - Homo sapiens,
    - Mus musculus,
    - Rattus norvegicus,
    - Danio rerio,
    - Caenorhabditis elegans,
    - Drosophila melanogaster

These data have been updated for this release for the species listed above.

Retrieval of a Transcript's Gene (all species)

Support has been added to retrieve the Bio::EnsEMBL::Gene from an instance of Bio::EnsEMBL::Transcript.

Support for Slice Retrieval by a Location String (all species)

Bio::EnsEMBL::DBSQL::SliceAdaptor can retrieve a Slice for a toplevel location string

EnsemblGenomes

Updated Drosophila melanogaster gene models (Fruitfly)

 

FlyBase gene models based on release 5.39 (FB2011_07)

Regulation database will be updated

Variation database will be updated

 

Updated Drosophila melanogaster Regulation database (Fruitfly)

Regulation database updated to reflect new gene models.

Updated Drosophila melanogaster Variation database (Fruitfly)

Variation database updated to reflect transcript model changes.

Funcgen

Regulatory Genome Segmentation (Human)

Data and API support has been added for genome segmentation data, based on a combination of chromhmm and segway analyses from the ENCODE project.

Segmentation feature tracks are now available in the Regulation section of the configuration panel under 'Regulatory features'.

 

Experiment View (Human, Mouse)

An Experiment view has been developed to improve access and visualisation of experimental meta data.  This will include archive IDs and source projects used as input for the Human and Mouse Regulatory builds.

New Mouse Regulatory Build (Mouse)

  • H3K4me3, Oct4, Rbbp5, Wdr5 ChIP-Seq for ES cell-line from Ang et al. (2011)
  • H3K4me4 ChIP-Seq for MEL cell-line, from ENCODE

Updated Microarray Probe Mappings (all species)

Microarray probe/probeset mappings have been updated for:

  • Human
  • Chimp
  • Danio rerio

 

Reorganized and Updated documentation (all species)

Documentation regarding Regulation data, sources and methodology was reorganized and updated.

Database schema patches (all species)

  • patch_64_65_a: schema version update
  • patch_64_65_b: Add analysis_id to feature_type to support SegmentationFeature states
  • patch_64_65_c: add hermaphrodite as a gender
  • patch_64_65_d: add SegmentationFeature table for the new Segmentation tracks
  • patch_64_65_e: force regulatory_attribute type to be either 'motif' or 'annotated'
  • patch_64_65_f: Add segmentation as an input_set type
  • patch_64_65_g: Table options clean up

Genebuild

New Chimpanzee assembly (Chimpanzee)

The first genebuild on new Chimpanzee assembly CHIMP2.1.4.

New bushbaby assembly (Bushbaby)

The bushbaby gene annotation in e65 is based on the high coverage assembly OtoGar3 provided by the Broad Institute.

The gene set for bushbaby was generated using bushbaby and primate proteins as well as human ensembl translations.

The final gene set consists of 19506 protein_coding genes, 1151 pseudogenes and 7276 ncRNAs.

More detailled information can be found here.

New species: Atlantic cod (all species)

The Atlantic cod (Gadus morhua) has been added to Ensembl. The genebuild involved a combination of annotation approaches: the standard genebuild procedure and whole-genome alignment and projection from stickleback.

Human cdna update (Human)

Cdna alignments for human using the most up-to-date set of cDNAs from the European Nucleotide Archive and NCBI RefSeq

Update human otherfeatures db: new CCDS import (Human)

Update to CCDS set for human

Projection of annotation to GRC assembly patches (Human)

Annotation from the primary assembly is projected to the assembly patches. The projected annotation is then supplemented with annotation based on evidence alignment. This annotation is stored in the human otherfeatures database.

Update to Ensembl-Havana GENCODE gene set (release 10) (Human)

Updated Ensembl-Havana gene set (GENCODE release 10) based on updated Ensembl gene set and latest Havana gene annotation.

 

Vega human annotation updated (Human)

Manual annotation of human from Havana has been updated. The data represent the annotation presented in Vega release 45.

Mouse cDNA update (Mouse)

The latest set of cDNAs for mouse (as of 14/OCT/2011) from the European Nucleotide Archive and NCBI RefSeq were aligned to the current genome using Exonerate. There are 4.216 new cDNA and a total of 34,664 new alignments for Ensembl 65.

Zebrafish VEGA Merge (Zebrafish)

Manual annotation of zebrafish from Vega has been updated.  This represents the annotation presented in Vega release 45. 

Zebrafish Markers (Zebrafish)

Zebrafish SATMAP markers have been given a separate track so they can be easily distinguished from the other markers.

Vega zebrafish annotation updated (Zebrafish)

Manual annotation of zebrafish from Havana has been updated. The data represent the annotation presented in Vega release 45.

MT annotation for anole, elephant, panda, rabbit and turkey (Anole lizard, Elephant, Rabbit, Panda, Turkey)

MT sequences have been added to the main assembly.

Annotation for those sequences has also been provided

Flagging obsolete Uniprot proteins (all species)

Flagging obsolete Uniprot proteins used as supporting evidence for the transcripts and the exons

Flagging obsolete Ensembl proteins (all species)

Flagging obsolete Human Ensembl proteins used as supporting evidence for the transcripts and the exons

Analyses updates (all species)

Update in logic_names and descriptions for a more consistent system across species

Pfam version numbers removed (Cow, Mouse, Chimpanzee)

Pfam hit names have versions which will be removed

Assembly name update (Lamprey)

Assembly name for lamprey changed from Petromyzon_marinus_7.0 to Pmarinus_7.0

MT annotation for Tasmanian Devil (Tasmanian devil)

MT sequences have been added to the main assembly.

Annotation for those sequences has also been provided

 

Karyotype bands for patches (Human)

Store karyoptype bands on the patches so they can be displayed in the browser.

Production

Ensembl 65 mart databases (all species)

Ensembl Genes 65:

Added new species Atlantic Cod (Gadus morhua)

Updated Fruitfly to FlyBase gene models based on release 5.39 (BDGP5)
Updated Chimpanzee assembly to CHIMP2.1.4, Bushbaby assembly to OtoGar3 and updated the Lamprey assembly name to Pmarinus_7.0

The PFAM ID's have had the version removed so filters will function correctly
The Gene Ontology and GOSlim links have been updated
dN and dS values have been added for paralogs
"Microarray" attribute section has been given the new title "Microarray probes/probesets" and all microarray filters and attributes have been labelled with either Probeset or Probe to be more informative
Added exon strand to the attributes
Transcript events have been updated for human, mouse, rat, Danio rerio, C. elegans and Drosophila melanogaster


Ensembl Variation 65:

Updated human to dbSNP 134
Updated the human somatic variation database to COSMIC 55 data
Structural variation datasets have been updated to include data for multiple mouse strains along with additional structural variation arising from the schema modifications
Added minor allele, minor allele frequency, minor allele count and clinical_significance to the attributes

Ensembl Regulation 65:

New "Multiple Chromosome Region" filters have been added to allow for the selection of features from specified locations.

Vega 45:

Human and Danio manual annotation has been updated to represent the annotation presented in Vega release 45.

Variation

Human dbSNP 134 import (Human)

Imports of the dbSNP Build 134 for human.

 

Import new data types:

  • Global minor allele frequencies
  • Clinical significance
  • Suspect variants (will be failed with a new reason code)

Co-locating dbSNP variants will not be merged.

 

 

Schema changes (all species)

Changes in the structural variation tables:

  • Added features for the supporting evidences
  • Merged the structural_variation and supporting_structural_variation tables
  • Added phenotype and sample information
  • Added a failed_structural_variation table
  • Created a table to link the structural variants to their supporting evidences

Changes in the genotype tables: rebuilt most of the genotype tables

 

Changes to support the new data from dbSNP:

  • Added columns minor_allele, minor_allele_freq and minor_allele_count to the variation table
  • Added clinical_significance_attrib_id column to the variation table, and added new attributes to the attrib table to identify clinical significance under the attrib_type 'dbsnp_clin_sig'
  • Added a new failed description to identify variants marked as suspect by dbSNP (failed_description_id = 16)

 

Structural variations (Dog, Human, Mouse, Pig)

Updates data and adds new studies

Remapping Chimpanzee variations (Chimpanzee)

Remaps the chimp variations to the new assembly (CHIMP2.1.4).

Phenotype annotations (Human)

Updates from the following sources:

  • NHGRI GWAS catalog
  • EGA
  • OMIM
  • UniProt

Ancestral alleles (Chimpanzee, Orangutan)

Added ancestral alleles using Compara alignments

Import COSMIC release 55 (Human)

Import COSMIC's latest release

Protein function predictions for new human genes (Human)

We will do an 'update' run of the protein function prediction pipeline to compute predictions for new and updated human transcripts.

 

We will also attempt a complete new run using Compara alignments in place of SIFT and PolyPhen's own alignment pipelines, depending on how this goes we may release this set or the update set described above.

 

The attempt to use Compara alignments didn't work out this release (for a number of reasons), so we're going with the previous approach. We will investigate this further for future releases.

 

LRG variation (Human)

Submitted variation data for the gene CYBB (LRG_53)

Web

Segmentation Features (Human)

Displaying segmentation track for the functional genomic display.  Also available on the location and gene display.

Saving configurataions (all species)

You can now save configurations by clicking the "Save as..." button below the navigation menu when configuring a page.

Saved configurations can be activated by going to the "Manage Configurations" tab, or using the "Load configurations" button.

 

You can also group configurations from across the site into sets, to make it possible to activate them all at once.

Manage data interface changes (all species)

The Manage data interface has been given an overhaul to improve consistency of display and functionality (sharing, saving, renaming, deleting) across all sources of data.

Adding custom data (all species)

When you add data to Ensembl, it is now only turned on for the page you are currently using, rather than for all pages on the site.

Future Plans

Read about our future plans on our blog!