EnsemblEnsembl Home

Headlines

News categories

New web displays and tools

Gencode Basic Renderer (Human, Mouse)

A new renderer, GENCODE Basics has been added for GENCODE. The GENCODE Basic set comprises only a subset of the transcripts ie. the fragments and other problematic biotypes are excluded from teh basic set.

New VEP interface (all species)

The VEP web interface has been completely overhauled and now offers:

  • new results page with summary charts, interactive filtering and more
  • more options and configuration
  • 750 variant limit removed - limits are now imposed only on uploaded file size
  • VEP now runs on a job submission system

Highlighting the current feature (all species)

In Region in detail and similar views, we currently highlight the feature you have come from so that it's easier to find amongst all the tracks. However if you would like to turn off this highlighting, e.g. in order to have a cleaner image to export, you can now do so via the control panel.

You'll find the option under 'Information and decorations', labelled 'Highlight current feature'.

New species, assemblies and genebuilds

Merged genes and transcripts can be fetched using 'source' column (Zebrafish, Human, Mouse, Pig)

From this release we will introduce a new use for the gene.source column and the new transcript.source column. These columns will now indicate whether genes have been annotated by both Ensembl and Havana ('ensembl_havana'), Ensembl only ('ensembl), or Havana only ('havana'). This will feed into BioMart to make it easier for users to fetch genes and transcripts from only the annotation sources they are interested in. In release 74 and earlier releases, this information could be found using the analysis.logic_name. Note: An addiitonal source, 'insdc', is used for genes and transcripts on the mitochondrial chromosome because they are imported from the MT genbank file.

New C.elegans core database (WS240) (Caenorhabditis elegans)

The C.elegans reference annotation will be updated to WormBase version WS240. The reference genome will remain the same (version WBcel235). 

Mouse RNAseq database (Mouse)

This is RNAseq database for mouse. It contains models that are build by using Sanger RNAseq data.  

Vega Zebrafish annotation updated (all species)

Manual annotation of zebrafish from Havana has been updated and contains the data released in Vega 55.

New variation data

New variation database for Turkey (Turkey)

The first ensembl variation database will be created for turkey using dbSNP139

New alignments

New track - Age of Base (Human)

In release 75 we have added a new track for human, showing the timing of the most recent mutation as determined by inter-species whole genome alignments. You can find the track in the comparative genomics menu under "Conservation regions" (or search for "age of base").

Each base pair in which the human reference genome differs by substitution from one of its inferred ancestral genomes is coloured in either grey (event prior to the primate branch), blue (primate specific), red (human-specific, fixed variant), or yellow (human-specific segregating variant, i.e. SNP). Clicking on a mutation position reveals the sub-tree of species which have inherited the same mutation from their common ancestor. It also reveals a score that represents the age of the mutation in arbitrary units, and determines the intensity of the colouring. The more recent the mutation, the lower the score and the darker the colour.

Note that this is a beta version of the track - if you find it useful, please let us know!

Other updates

Human: updated RefSeq gene import (Human)

The imported RefSeq gene set was updated in the human otherfeatures database. Please note that RefSeq annotates gene models on cDNA sequence and not on the reference genome, meaning that when users choose to translate the RefSeq transcripts off the reference genome that the translations may contain stop codons.

Havana merge for Zebrafish (Zebrafish)

An updated set of Zebrafish genes is included in this release. There hasn't been a new Ensembl genebuild but the Havana annotations have been updated and we therefore re-ran the merge of the two gene sets.

This is also the first merge using a new merge pipeline that has been in development since the beginning of the year. The new software delivers slightly different results, but the bulk of the annotations are the same. The biggest difference is an increase in merged protein coding genes by about 960 (compared to running the older pipeline on the same input data).

Ensembl 75 mart databases (all species)

  • Ensembl Genes 75
    • Renamed Saccharomyces cerevisiae assembly from EF4 to R64-1-1
    • Added new Transcript source filter and attribute for all the species
    • Added new filter and attribute for VEGA protein ID and WormBase Gene Sequence-name accesssion.
    • Added new variation species turkey (Meleagris gallopavo)
    • Renamed "Protein domains" filter and attribute sections to "Protein domains and families".
  • Ensembl Variation 75
    • Added new variation species turkey (Meleagris gallopavo)
    • Renamed Saccharomyces cerevisiae assembly from EF4 to R64-1-1

Mouse: updated cDNA alignments (Mouse)

A new cdna database was created for e75: The latest set of cDNAs for mouse (as of January 2013) from the European Nucleotide Archive and NCBI RefSeq (release 62) were aligned to the current genome using Exonerate.

Updated human otherfeatures db: New CCDS import (Human)

This release of the human gene set also includes 29,033 transcript models as part of an updated version (December 2013) of CCDS

Updated mouse otherfeatures db: New CCDS import (Mouse)

This release of the mouse gene set also includes 23,084 transcript models as part of an updated version (Dec 2013) of CCDS

Human: updated cDNA alignments (Human)

A new cdna database was created for e75: The latest set of cDNAs for human (as of December 2013) from the European Nucleotide Archive and NCBI RefSeq (release nn) were aligned to the current genome using Exonerate.

EMBL and Genbank Dumps (all species)

EMBL and Genbank dumps for all species.

External reference projection (all species)

Gene ontology (GO) identifiers and gene name projection to all species.

FASTA & GTF dumps (all species)

FASTA & GTF dumps for all the species

Splicing events (multiple species)

The ASTD project computationally predicted genes in a similar way to Ensembl every release with a focus on alternative mRNA structures (splicing events, poly(A) sites, TSS) and features (ppt, exon-exon junction types)

Since 2010, the storage and display of the alternative information is now an entire part of Ensembl for the following species:

  • Homo sapiens,
  • Mus musculus,
  • Rattus norvegicus,
  • Danio rerio,
  • Caenorhabditis elegans,
  • Drosophila melanogaster

These data have been updated for this release for the species listed above.

New dbSNP imports (Dog, Horse, Opossum, Platypus, Zebra Finch)

Dog, horse, opossum, platypus and zebrafinch will be updated to dbSNP 139

Sift protein impact predictions for horse (Horse)

Sift predictions will be available for the ensembl horse proteome

Phenotype data updates (Human, Mouse)

Human phenotype data will be updated from sources including ClinVar and Decipher.

Mouse phenotype data from IMPC will be updated.

Citation data update (Human)

Human variation citation data will updated from EPMC and UCSC

New C.elegans funcgen database (Caenorhabditis elegans)

Included are probe mappings for ten arrays: one Affymetrix, six Agilent, and and three custom arrays.

Structural variations (Human, Mouse, Pig)

DGVa data will be updated and new studies imported

PolyPhen update (Human)

Polyphen predictions will be updated using code version 2.2.2, release 405 and the latest available databases.

HGMD data update (Human)

The latest release of public HGMD data (version 2013.3 from September 2013) will be imported

NHLBI ESP data update (Human)

Human NHLBI ESP data will be updated to version v.0.0.22.

Saccharomyces cerevisiae (Saccharomyces cerevisiae)

Updated assembly name from EF4 to R64-1-1 to match the official INSDC name. Added lift over mappings

Correcting spelling mistakes in sheep (Sheep)

Changing Merino to Gansu fine wool sheep

Correcting spelling mistakes in the tissue samples

LRG Import (all species)

Importing the latest version of Locus Reference Genomic dataset

Gene and Transcript Adaptor support for fetch_all_by_Source() (all species)

GeneAdaptor and TranscriptAdaptor will support the retrieval of their respective feature objects by the new source column

SQLite Support (all species)

The Ensembl core API will support SQLite databases. This work has been contributed by the Anacode team at the WTSI.

result_set.name unique (patch_74_75_b) (Human, Mouse)

The name field of the result set table now has a unique key, and the names have been updated by appending the relevant analysis logic name, in line with the other set tables.

input_subset.analysis_id (patch_74_75_c) (all species)

An analysis_id has been added to the input_subset table, which will mirror the input_set.analysis_id. 

Consequently, InputSubset has been changed to inherit from Set, and the feature_type validation in of Set subclass constructors has been moved to the Set constructor.

This work is a prerequisite to the retiring the InputSet class/table.

InputSet retired (patch_74_75_d) (all species)

The InputSet class has been retired and ResultSet will be used directly instead.  The result_set_input table has been patched to replace input_set entries with input_subset entries.  The ResultSet classes have been updated to make the association of dbfile_registry_entry record optional.

The InputSet class and table will remain in the schema, until all dependant code has been migrated to the new usage model.

Array size (all species)

The Array size attribute and associated methods/constructor parameters have been deprecated or removed.

patch_74_75a.sql - schema_version update in production db (all species)

Update schema_version in meta table to 75.

patch_74_75_c.sql (all species)

Adding a new table genome_statistics

Populated during the production run, it contains basic statistics on the number of genes, the length of the genome or the number of prediction

Mircroarray mapping (all species)

Microarray mapping has been updated for those species with new genome assemblies, new gene builds or new arrays.

patch_74_75_e.sql (all species)

Attrib related tables do not allow duplicates

Unique key constraints added to enforce this

External database references update (multiple species)

Xrefs update for:

human, sloth, seq squirt, chicken, kangaroo rat, stickelback, coelacanth, wallaby, pika, medaka, hyrax, megabat, fugu, tarsier, alpaca, dolphin, tree shrew, zebrafish

Experiment FeatureType and CellType (patch_74_75_f) (all species)

The experiment table has had additional feature_type_id and cell_type_id fields added, and the associated API classes have been updated. This is in line with the current suage within the analysis pipelines and is to prevent the experiment class being used as a study where many feature/cell types can be associated.

ProteinTrees and homologies (all species)

 

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

  • all-vs-all blastp (ncbi-blast-2.2.28+)
  • Clustering using hcluster_sg
  • Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.113)
  • Phylogenetic reconstruction using TreeBeST
  • Homology inference
  • Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
  • GeneTree stable ID mapping
  • Per family gene dynamics using CAFE (v2.2)

ncRNAtrees and homologies (all species)

 

  • Classification based on Rfam models (v11.0)
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE
  • Homology inference
  • Secondary structure plots

Protein Families (all species)

 

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

  • Getting distances by NCBI BlastP (v.2.2.28+)
  • Clustering by MCL (v.12-135)
  • Multiple Sequence Alignments with MAFFT (v.7.113)
  • Family stable ID mapping

Compara dumps (all species)

 

  • [ ] Data dumps for ProteinTrees
  • [ ] Data dumps for ncRNAtrees
  • [ ] OrthoXML dumps for ProteinTrees
  • [ ] OrthoXML dumps for ncRNAtrees
  • [ ] PhyloXML dumps for ProteinTrees
  • [ ] PhyloXML dumps for ncRNAtrees

API/schema changes (all species)

 

  •  Extend genome_db table (and the corresponding API) with two extra fields (has_karyotype and is_high_coverage)
  •  Annotation of web display information in species_tree_node's instead of species_set_tags

Saccharomyces cerevisiae - Otherfeatures (Saccharomyces cerevisiae)

Updated assembly name from EF4 to R64-1-1 to match the official INSDC name. Added lift over mappings

Link from Region in Detail to individual exons (all species)

The popup menu that appears when you click on a transcript now includes a link to the Exon table if you click on an individual exon, and the exon you clicked on is shown in bold on the table. Note that this link will only appear when you are zoomed in enough for the click coordinates to clearly identify a single exon.

Ghrelin gene added (Chinese softshell turtle)

New data has been provided by the P. sinsensis community that has allowed us to annotate the ghrelin gene.

Retirement of archive 61 (all species)

This release cycle we will be retiring archive 61 (Feb 2011), in accordance with our three-year rolling retirement policy. The data will remain available on our public database server; only the web interface will be removed.

Remove read_coverage table (all species)

The read_coverage table and associated API support will be removed.

There are only a few individuals across our resequencing data with read coverage data, much of which has been remapped between assemblies and may no longer be reliable.

Removing this will speed up code and clean up some of the web displays.

patch_74_75_a.sql - schema_version update in ontology db (all species)

Update schema_version in meta table to 75.

patch_74_75b.sql - longer code in attrib_type (all species)

'code' column in master_attrib_type table expanded

patch_74_75_f.sql - longer code (all species)

'code' column in attrib_type table longer

Retirement of ensembl-draw (all species)

The ensembl-draw repository has been merged with ensembl-webcode.

The files that were in ensembl-draw can now be found in ensembl-webcode/modules/Sanger/Graphics and ensembl-webcode/modules/Bio/EnsEMBL.

Documentation move (all species)

To aid internal management of git permissions, we will be moving ensembl-webcode/htdocs/info into a separate public plugin, docs. No page URLs will change, but external developers will need to enable this plugin in order to display documentation for the API, etc on an Ensembl-powered website.

ensembl-webcode directory (all species)

Web code is now stored inside the top level directory ensembl-webcode.

A new variable has been added to SiteDefs, called $ENSEMBL_WEBROOT, which has the value of "$ENSEMBL_SERVERROOT/ensembl-webcode", and is used when locating files and directories that are inside ensembl-webcode. $ENSEMBL_SERVERROOT remains unchanged.

New TarBase microRNA target sites (Human, Mouse)

Our conservative MiRanda miRNA targets set (which is no longer maintained), will be replaced by predictions from Diana TarBase:

    http://diana.imis.athena-innovation.gr/DianaTools/index.php?r=site/index

TarBase v6.0 has replaced (drop-in) MiRanda targets as ExternalFeatures. Separate adapter classes  will follow in r76. 

TarBase analysis added (Human, Mouse)

Added TarBase_v6.0 to analysis table

Stable ID lookup (all species)

Stable ID lookup provided for REST services

Removal of "default action" (all species)

We have removed the web behaviour whereby invalid URLs for genomic views were silently redirected to the default view for that gene/location/etc. This was causing issues with some scripts connecting to the website, including our own selenium testing. Invalid URLs now show a custom 404 component within the standard page template.

Search enhancments (all species)

Ensembl search has a number of improvements including (i) more extensive highlighting of the search term on the results page, (ii) improved ordering of results, and (iii) better handling of non alphanumeric characters in search queries.

Deleted transcripts with stop codons (Marmoset)

We deleted 15 single-transcript genes and an additional 6 transcripts from multi-transcript genes where the translation had stop codons.

Deleted ENSRNOG00000042244 (Rat)

One gene, reported by a user, has been deleted from the rat gene set.

patch_74_75_a.sql - schema_version update (all species)

Update schema_version in meta table to 75.

patch_74_75_b.sql - transcript source (all species)

Adding the source column to the transcript table

Deleted transcript ENSSSCT00000011005 (Pig)

Transcript ENSSSCT00000011005 was deleted. There remains an overlapping transcript within the same gene that has been manually annotated by Havana.

patch_74_75_d.sql - default source for transcripts (all species)

The new Transcript source column required a default value, that is now set to ensembl.

Archive of previous news

Future Plans

Read about our future plans on our blog!