EnsemblEnsembl Home

Ensembl News for Release 77 (October 2014)

Headlines

News categories

New web displays and tools

New interface for alignment export (all species)

For Phase 2 of our export upgrade, we have ported the genomic alignment pages to the new export interface. We have also upgraded our version of BioPerl to 1.6.1, which allows us to support the latest write_aln options in Bio::AlignIO. Export file formats supported in release 77 are:

  • CLUSTALW
  • FASTA
  • Mega
  • MSF
  • Nexus
  • Pfam
  • Phylip
  • PSI
  • RTF (text alignment views only)
  • Stockholm*

This change affects the following pages:

  • Location -> Alignments (image)
  • Location -> Alignments (text)
  • Gene -> Genomic alignments
  • Variation -> Phylogenetic context

The old "Export data" button has been disabled on these pages and replaced with a new button within the page. The Alignments image has a download icon on the toolbar:

and the text-based alignments use the same "Download sequence" button as the pages upgraded in release 76.

* Please note that sites which still use the old BLAST interface have a dependency on BioPerl 1.2.3 and therefore cannot offer Stockholm export

New icons on transcript table (all species)

Two new icons have been added to the transcript table, showing confidence levels from different projects.

See the following news items for more details on this data:

  • APPRIS tags
  • Transcript Support Levels

Gene expression page has moved (all species)

We have decided to move the Gene Expression table to our per-species documention, as the data is not applicable to every gene. You can now find the list of tissues and associated datasets on the homepage of every species with an RNASeq database, e.g. Homo_sapiens/Info/Expression.

Display of Alternative alleles (all species)

We have added a new Gene panel that shows a table of manually curated alternative alleles. An example of this would be a gene on the human GRCh38 reference assembly and one on a GRC haplotype patch. This table also provides links to Region comparison that are preconfigured to show the regions of interest.

Links between alternative allleles have also been added to Gene Summary panels and to the menus that appear when mousing over a gene on a location view.

New species, assemblies and genebuilds

Vervet Monkey assembly and genebuild (Vervet-AGM)

The Vervet Monkey assembly ChlSab1.1 was released. We have produced new gene annotation on this assembly.

Vega Human annotation updated (Human)

Manual annotation of human from Havana has been updated and contains the data released in Vega 57

Vega Rat annotation included (Rat)

Manual annotation of rat from Havana is included. This represents the data released in Vega 57.

APPRIS tags (Human)

APPRIS labels were imported for human. APPRIS is a system that deploys a range of computational methods to provide value to the annotations of the human genome. APPRIS also selects one of the CDS for each gene as the principal isoform. APPRIS defines the principal variant by combining protein structural and functional information and information from the conservation of related species.

Other updates

Vervet Monkey RNASeq database and Bam files (Vervet-AGM)

In addition to the gene annotation for ChlSab1.1, an rnaseq database will be released where users can view BAM files and transcript models.

Update to Ensembl-Havana rat merge (Rat)

Updated Ensembl-Havana rat gene set; this is the first merge for rat. This gene set is a merge of complete Ensembl gene models and the latest Havana gene annotation.

Human: updated RefSeq gene import (Human)

The imported RefSeq gene set was updated in the human otherfeatures database. Please note that RefSeq annotates gene models on cDNA sequence and not on the reference genome, meaning that when users choose to translate the RefSeq transcripts off the reference genome that the translations may contain stop codons.

Update to Ensembl-Havana GENCODE gene set (release 21) (Human)

Updated Ensembl-Havana gene set (GENCODE release 21). This gene set is a merge of complete Ensembl gene models and the latest Havana gene annotation. All CCDS genes are included in this gene set.

Transcript Support Levels (Human, Mouse)

Transcript Supports Levels (TSLs) were imported from UCSC. TSLs for human are based on the GENCODE 20 gene set. TSLs for mouse are based on the GENCODE M2 gene set.

Ensembl 77 mart databases (all species)

  • Ensembl Genes 77
    • Added new species Vervet-AGM (Chlorocebus sabaeus)
    • Added "Associated Gene Name" (internal name "external_gene_name") in the id list limit filter section
    • Renamed "Reference ID" to "Variation Name" and "Source description" to "Variation source description" in the Variation attribute section for germline and somatic. 
    • Renamed internal name:
      • For the "Variation name" attribute from "external_id" to "variation_name"
      • For the "Variation Source" attribute from "source_name" to "germ_line_variation_source".
      • For the "Variation name" somatic attribute from "somatic_reference_id" to "somatic_variation_name"
  • Ensembl Variation 77
    • Optimized the somatic and germinal structural variation templates
    • Removed "Phenotype significance" (internal name "phenotype_significance") from the filter and attribute sections
  • Vega 57
    • Added new species Rat (rattus norvegicus)
    • Added "Associated Gene Name" (internal name "external_gene_name") in the id list limit filter section

Updated human otherfeatures db: New CCDS import (Human)

This release of the human gene set also includes 30,493 transcript models as part of an updated version (August 2014) of CCDS.

Updated mouse otherfeatures db: New CCDS import (Mouse)

This release of the mouse gene set also includes 23,861 transcript models as part of an updated version (August 2014) of CCDS

Merge species: updated RefSeq gene import (Zebrafish, Human, Mouse, Rat, Pig)

RefSeq GFF3 annotation from human, mouse, rat, zebrafish and pig were added to their respective otherfeatures databases. Please note that RefSeq annotates gene models on cDNA sequence and not on the reference genome, meaning that when users choose to translate the RefSeq transcripts off the reference genome that the translations may contain stop codons.

Human: updated cDNA alignments (all species)

A new cdna database was created for e77: The latest set of cDNAs for human (as of August 2014) from the European Nucleotide Archive and NCBI RefSeq (release nn) were aligned to the current genome using Exonerate.

Mouse: updated cDNA alignments (all species)

A new cdna database was created for e77: The latest set of cDNAs for mouse (as of August 2014) from the European Nucleotide Archive and NCBI RefSeq (release nn) were aligned to the current genome using Exonerate.

Phenotype data updates (Human, Mouse)

  • Human phenotype data will be updated from different sources including ClinVar and Decipher.
  • Mouse phenotype data from IMPC will be updated.

Structural variations (Human)

Added new studies and updated other studies from DGVa.

Update Sequence Ontology terms (all species)

We update terms from the Sequence Ontology:

  • nc_transcript_variant will be updated to non_coding_transcript_variant
  • non_coding_exon_variant will be updated to non_coding_transcript_exon_variant

LRG Import (Human)

Importing the latest version of Locus Reference Genomic dataset

Improved GeneAdaptor::fetch_nearest_Gene_by_Feature() (all species)

We are improving our code for retrieving the nearest gene to any Ensembl feature.

patch_76_77a.sql - schema_version update in production db (all species)

Update schema_version in production database to 77.

Stable ID lookup (all species)

Stable ID lookup provided for REST services

Includes lookup for RefSeq and CCDS entries

patch_76_77_a.sql - schema_version update in ontology db (all species)

Update schema_version in meta table to 77.

External database references update (all species)

Xrefs update for:

human, rat, anolis, cow, dog, ciona intestinalis, horse, chicken, gorilla, macaque, opposum, platypus, medaka, sheep, chimp, lamprey, pig, zebrafinch, shrew

Introduce new variation class (all species)

Add genetic_marker as a new variation class.

Vervet Monkey otherfeatures database (Vervet-AGM)

Vervet Monkey EnsEMBL longest translations from human have been aligned to ChlSab1.1 to generate gene models and are made available through the website and otherfeatures database.

Amazon Molly genebuild (Amazon molly)

We have made an improved geneset for the Amazon molly compared to the previous release (e76)

Add variation_attrib table (all species)

Add variation_attrib table.

For now this will be used by Ensembl Genomes to link homeologous variants on polyploid genomes, though main Ensembl may find use cases for this in future.

FASTA & GTF dumps (all species)

FASTA & GTF dumps for all the species

External reference projection (all species)

Gene ontology (GO) identifiers and gene name projection to all species.

EMBL and Genbank Dumps (all species)

EMBL and Genbank dumps for all species.

8-way primate EPO multiple alignments (all species)

 callithrix jacchus
 chlorocebus sabeus
 gorilla gorilla
 homo sapiens
 macaca mulatta
 pan troglodytes
 papio anubis
 pongo abelii

17-way mammal EPO multiple alignments (all species)

 bos taurus
 callithrix jacchus
 canis familiaris
 chlorocebus sabeus
 equus caballus
 felis catus
 gorilla gorilla
 homo sapiens
 macaca mulatta
 mus musculus
 oryctolagus cuniculus
 ovis aries
 pan troglodytes
 papio anubis
 pongo abelii
 rattus norvegicus
 sus scrofa

39-way mammal low-coverage EPO multiple alignments (all species)

 ailuropoda melanoleuca
 bos taurus
 callithrix jacchus
 canis familiaris
 cavia porcellus
 chlorocebus sabeus
 choloepus hoffmanni
 dasypus novemcinctus
 dipodomys ordii
 echinops telfairi
 equus caballus
 erinaceus europaeus
 felis catus
 gorilla gorilla
 homo sapiens
 ictidomys tridecemlineatus
 loxodonta africana
 macaca mulatta
 microcebus murinus
 mustela putorius_furo
 mus musculus
 myotis lucifugus
 nomascus leucogenys
 ochotona princeps
 oryctolagus cuniculus
 otolemur garnettii
 ovis aries
 pan troglodytes
 papio anubis
 pongo abelii
 procavia capensis
 pteropus vampyrus
 rattus norvegicus
 sorex araneus
 sus scrofa
 tarsius syrichta
 tupaia belangeri
 tursiops truncatus
 vicugna pacos

Pairwise alignments (Human, Vervet-AGM)

LastZ: human and vervet monkey (H.sap-C.sab (on H.sap))

Citation data (all species)

Citation data will be updated from Europe PMC and UCSC. Cited variants will now be flagged when they fail standard QC filters but will still be displayed in the usual tracks.

A new column 'display' will be added to the variation table to facilitate this.

dbSNP SubSNP ids no longer held as synonymns (multiple species)

dbSNP SubSNP ids will no longer be held as synonyms. They will be retained in allele records and retrieval of variants by SubSNP id through the web site will still be supported.

23-way amniota-pecan multiple alignments (all species)

 macaca_mulatta
 ornithorhynchus_anatinus
 monodelphis_domestica
 pongo_abelii
 equus_caballus
 taeniopygia_guttata
 oryctolagus_cuniculus
 anolis_carolinensis
 meleagris_gallopavo
 callithrix_jacchus
 bos_taurus
 gorilla_gorilla
 pan_troglodytes
 sus_scrofa
 mus_musculus
 canis_familiaris
 felis_catus
 rattus_norvegicus
 gallus_gallus
 ovis_aries
 homo_sapiens
 papio_anubis
 chlorocebus_sabeus

Synteny (Human, Vervet-AGM)

SYNTENY: homo_sapiens(GRCh38) - chlorocebus_sabeus(ChlSab1.1)

Primate ancestral alleles (all species)

 macaca_mulatta
 pongo_abelii
 callithrix_jacchus
 gorilla_gorilla
 pan_troglodytes
 homo_sapiens
 papio_anubis
 chlorocebus_sabeus

Age of Base (Human)

BigBed file for the human Age of Base track.

Compara dumps (all species)

EMF / Fasta / OrthoXML / PhyloXML dumps for ProteinTrees
EMF / Fasta / OrthoXML / PhyloXML dumps for ncRNAtrees
PhyloXML dumps for CAFE ProteinTrees
PhyloXML dumps for CAFE ncRNAtrees
EMF and MAF dumps for epo_39_eutherian_mammals
MAF dumps for epo_8_primates
EMF and MAF dumps for pecan_23_amniota
MAF dumps for epo_17_eutherian_mammals
BED files for constrained elements
MAF dumps for H.sap-C.sab LASTZ

ncRNAtrees and homologies (all species)

Classification based on Rfam models (v11.0)
Multiple sequence alignments with Infernal
Phylogenetic reconstruction using RAxML
Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
Additional multiple sequence alignments with Prank (w/ genomic flanks)
Additional phylogenetic reconstruction using PhyML and NJ
Phylogenetic tree merging using TreeBeST
Per family gene dynamics using CAFE
Homology inference
Secondary structure plots

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Getting distances by NCBI BlastP (v.2.2.28+)
 -- Clustering by MCL (v.12-135)
 -- Multiple Sequence Alignments with MAFFT (v.7.113)
 -- Family stable ID mapping

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.27+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.017)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

patch_76_77_a.sql - schema_version update (all species)

Schema update to 77

Micro Array Mapping (all species)

Micro array mapping and transcript xrefs have updated for those species which have had an genome assembly of gene build update.

Retitrement of archives 64 (all species)

This release cycle we will be retiring archive 64 (September 2011) in accordance with our three-year rolling retirement policy. The data will remain available on our public database server; only the web interface will be removed.

Add Sequence Ontology type to regulatory feature consequences (all species)

The SO type for each regulatory feature overlapped by a variant will be reported both on the Variation web page and the VEP results.

patch_76_77_b.sql|CTCF feature_type update (all species)

Changing the name of the FeatureType "CTCF", class "regulatory feature" to "CTCF Binding Site"

patch_76_77_c.sql|Correct mirna so_name and accession in feature_type (all species)

Correct mirna so_name and accession in feature_type table. so_name and accessions have been swapped in some cases.

patch_76_77_d.sql|Fix errornous feature_type_id in mirna_target_feature (all species)

A few records have an incorrect feature_type_id assinged

BioPerl upgrade (all species)

Important: This change only affects developers of Ensembl-powered websites, not general users of our code.

In release 77 we have upgraded to BioPerl 1.6.1 in order to support all the latest features of BIo::AlignIO. This version has long been supported by the Ensembl API, but the old BLAST interface was dependent on features in version 1.2.3 so we were unable to upgrade the web servers until the new interface was released in Ensembl version 76. There are therefore two strategies available to developers:

1. If you are still using the old BLAST interface, do not upgrade to BioPerl 1.6.1. However if you upgrade your site to Ensembl 77, you will need to configure the export interface to use the formats available in the older version of Bio::AlignIO.

Create a file in your plugin with namespace EnsEMBL::Web::Component::DataExport::Alignments and add the following method:

sub alignment_formats {
### Configure this list to match what's available
### in the installed version of BioPerl
  my $self = shift;
  return qw(CLUSTALW FASTA Mega MSF Nexus Pfam Phylip PSI Selex);
}

2. If you don't use BLAST or have upgraded to the new interface, you should upgrade BioPerl and reconfigure your BioPerl location if it has changed. E.g. if you haven't installed it in /path/to/ensembl/bioperl-live, check in your SiteDefs.pm file for the line:

  $SiteDefs::BIOPERL_DIR                    = '/path/to/bioperl-live';

patch_76_77_a.sql - schema_version update (all species)

Update schema_version in meta table to 77.

Ensembl VM Build (all species)

The Ensembl Virtual Machine applicance will be updated to version 77.

Clones for sheep (Sheep)

Clone track for the sheep assembly

New RNASeq data matrix configuration (all species)

The configuration panel for the RNASeq data is changed to a matrix to easily turn on/off tracks together.

Archive of previous news

Future Plans

Read about our future plans on our blog!