EnsemblEnsembl Home

News for Ensembl Release 91 (December 2017)

News categories

New web displays and tools

Validation of uploaded files (all species)

The validation of uploaded files now returns error messages detailing where the validation failed, hopefully helping users to work out where their data format is incorrect.

Regulatory build track (Human, Mouse)

The scrolling Region in Detail view now has the Regulatory Build track available (and turned on by default) in those species that have this data.

New species, assemblies and genebuilds

New Primate Species (multiple species)

In this release we are adding 12 new primate species to Ensembl and updating 6 existing species with new assemblies (gorilla, gibbon, mouse lemur, chimp, tarsier, and baboon).

The new genomes are:

aotus nancymaae (Nancy Ma's night monkey)           

cebus capucinus (White-headed capuchin)      

cercocebus atys (Sooty mangabey)      

colobus angolensis palliatus (Angola colobus)      

macaca fascicularis (Crab-eating macaque)      

macaca nemestrina (Southern pig-tailed macaque)      

mandrillus leucophaeus (Drill)

pan paniscus (Bonobo)        

propithecus coquereli (Coquerel's safika)      

rhinopithecus bieti (Black snub-nosed monkey)      

rhinopithecus roxellana (Golden snub-nosed monkey)

saimiri boliviensis boliviensis (Black-capped squirrel monkey) 

Update of cat assembly and genebuild to Felis_catus_8.0 (Cat)

The Felis_catus_8.0 assembly has been loaded and a new gene set has been created using a combination of 28 tissue RNA-seq data and UniProt proteins (mainly mammal protein existence level 1 & 2 proteins).

Mouse: update to Ensembl-Havana GENCODE gene set (Mouse)

Updated Ensembl-Havana mouse gene set. This gene set is a merge of complete Ensembl gene models and the latest Havana gene annotation. All CCDS genes are included in this gene set.

New variation data

COSMIC data update (Human)

Imported cancer data from COSMIC version 82.

This import excludes the COSMIC alleles, populations and the mutations types.

HGMD-Public dataset (Human)

HGMD data will be updated to version 2017.2 (June 2017)

New regulation data

New and updated probe mapping data for primates (multiple species)

In this release we are adding 17 new funcgen databases for primate species and updating 2 existing ones (chimpanzee and macaque).

The new funcgen databases are for the following species:

aotus nancymaae (Ma's night monkey)

carlito syrichta (Tarsier)       

cebus capucinus (White-headed capuchin)      

cercocebus atys (Sooty mangabey)      

colobus angolensis palliatus (Angola colobus)

gorilla gorilla (Gorilla)

macaca fascicularis (Crab-eating macaque)      

macaca mulatta (Macaque)

macaca nemestrina (Southern pig-tailed macaque)      

mandrillus leucophaeus (Drill)

microcebusmurinus (Mouse lemur)

nomascus leucogenys (Gibbon)

pan paniscus (Bonobo)

pan troglodytes (Chimpanzee)

papio anubis (Olive baboon)

propithecus coquereli (Coquerel's safika)      

rhinopithecus bieti (Black snub-nosed monkey)      

rhinopithecus roxellana (Golden snub-nosed monkey)

saimiri boliviensis boliviensis (Squirrel monkey)

Microarray Probe Mapping Update (Caenorhabditis elegans, Guinea Pig, Fruitfly, Human, Mouse)

Updated probe mappings for:



guinea pig

drosophila melanogaster

caenorhabditis elegans

New alignments

Mouse: updated cDNA alignments (all species)

A new cdna database will be created for e91: The latest set of cDNAs for mouse (as of October 2017) from the European Nucleotide Archive and NCBI RefSeq will be aligned to the current genome using Exonerate.

Human: updated cDNA alignments (all species)

A new cdna database will be created for e91: The latest set of cDNAs for human (as of October 2017) from the European Nucleotide Archive and NCBI RefSeq will be aligned to the current genome using Exonerate.

API and schema changes

patch_90_91_(b,c,d,).sql - DB Schema update: add align_type column (all species)

Updates to add align_type column to dna_align_feature and protein_align_feature tables to support multiple align types and remove external_data column from dna_align_feature.

Other updates


ncRNAtrees and homologies (all species)

  • Classification based on Rfam models (v12.1)
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and ExaML for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE
  • Homology inference
  • Secondary structure plots

patch_90_91_a.sql - Schema version update (all species)

89 -> 90

Protein Families (all species)

Updated HMM families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Clustering by PantherScore (based on Ensembl HMM library)
 -- Multiple Sequence Alignments with MAFFT (v.7.221)

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.30+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.221)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

 -- computation of pairwise gene-order conservation score

 -- comparison of orthologies with whole-genome alignments

 -- high-confidence calls

 -- use of cd-hit to remove redundancy in blast db

Recompute multiple alignments (all species)

EPOs containing updated species should be recomputed:

  • mammals
  • Primate 

EPO low coverage:

  • mammal

Pecan alignments:

  • amniotes (only to update new assemblies)

LastZ alignments (all species)

All new and updates primates  

Human V Cat

Dog V Cat


External database references update (multiple species)

Xrefs updates for: homo_sapiens (human), mus_musculus (mouse), equus_caballus (horse), bos_taurus (cow), canis_familiaris (dog), felis_catus (cat), carlito_syrichta (tarsier), gorilla_gorilla (gorilla), microcebus_murinus (mouse lemur), nomascus_leucogenys (gibbon), papio_anubis (baboon), pan_troglodytes (chimpanzee) cebus_capucinus (capuchin), cercocebus_atys (sooty mangabey), colobus_angolensis_palliatus (angola colobus), macaca_fascicularis (crab-eating macaque, macaca_nemestrina (pig-tailed macaque), mandrillus_leucophaeus (drill),pan_paniscus (bonobo), propithecus_coquereli (coqurel's sifaka), rhinopithecus_bieti (black snub-nosed monkey), rhinopithecus_roxellana (golden snub-nosed monkey), saimiri_boliviensis_boliviensis (bolivian squirrel monkey), aotus_nancymaae (ma's night monkey), 


D. melanogaster annotation update (Fruitfly)

Gene set and other annotations updated to data from FlyBase release FB2017_04 (dmel_r6.17).


Schema patches to remove Regulation's coord_system & seq_region tables (all species)

patch_90_91_a.sql - Update schema_version in meta table to 91

patch_90_91_b.sql - Remove sequence regions from previous releases

patch_90_91_c.sql - Translate sequence region ids of regulatory features

patch_90_91_d.sql - Replace regulatory features with updated ones

patch_90_91_e.sql - Translate sequence region ids of segmentation features

patch_90_91_f.sql - Translate sequence region ids of segmentation features

patch_90_91_g.sql - Translate sequence region ids of probe features

patch_90_91_h.sql - Replace probe features with updated ones

patch_90_91_i.sql - Translate sequence region ids of annotated features

patch_90_91_j.sql - Replace annotated features with updated ones

patch_90_91_k.sql - Translate sequence region ids of external features

patch_90_91_l.sql - Replace external features with updated ones

patch_90_91_m.sql - Translate sequence region ids of mi rna target features

patch_90_91_n.sql - Replace mi rna target features with updated ones

patch_90_91_o.sql - Translate sequence region ids of motif features

patch_90_91_p.sql - Replace motif features with updated ones

patch_90_91_q.sql - Drop seq_region table

patch_90_91_r.sql - Translate coord_system_ids in meta_coord table

patch_90_91_s.sql - Replace meta coord table with the updated table

Further schema patches (all species)

patch_90_91_u.sql - Set default gender to unknown for epigenomes

patch_90_91_v.sql - Create read_file table and populate it

patch_90_91_w.sql - Create read_file_experimental_configuration table and populate it

patch_90_91_x.sql - Rename result_set to alignment in various tables and columns

patch_90_91_y.sql - Rename annotated_feature to peak

patch_90_91_z.sql - Drop input_subset table

patch_90_91_za.sql - Move peak_callings from feature_set to peak_calling

patch_90_91_zb.sql - Rename another table

patch_90_91_zc.sql - Remove peak_callings from the feature_set table

patch_90_91_zd.sql - Drop data_set table

patch_90_91_ze.sql - Drop supporting_set table

patch_90_91_zf.sql - Drop status tables

patch_90_91_zg.sql - rename table dbfile_registry to data_file and change the way alignments link to it

patch_90_91_zh.sql - Add new columns to read_file_experimental_configuration table

patch_90_91_zi.sql - Create probe_id index on probe_transcript table


Updated mouse otherfeatures db: New CCDS import (Mouse)

The latest CCDS mouse set will be imported.

Bug fix: Missing TMHs for Human (Human)

Known bug fixed:

There was an issue with TMHs when we run interproscan and we excluded it. This has been fixed in e91.

Fix stable id history (Chicken)

Some stable ids which have been removed are not correctly lable in the database and causes the id history to fail.

Change the opossum assembly name to reflect the actual name (Opossum)

The actual name of the oppossum assembly is monDom5 and not BROADO5 which is the internal name. Moving to monDom5 allows user to search for the assembly in INSDC databases. BROADO5 is now an alias for oppossum

other dbs for 2 rodents (Northern American deer mouse, Chinese hamster CHOK1GS)

Two other features dbs of rodents. 

Fixing stable ids in the external data database (Zebrafish)

Stable ids have been set incorrectly between release 89 and 90. This fix will revert all stable ids to their state in release 89.

The RefSeq GFF3 import is the most important beneficiary of the set


Ensembl 91 mart databases (all species)

  • Ensembl Genes 91
    • Addition of new primates and updated assemblies 
  • Mouse Genes 91
  • Ensembl Variation 91
  • Ensembl Regulation 91
    • Renamed datasets "hsapiens_annotated_feature" and "mmusculus_annotated_feature" to "hsapiens_peak" and "mmusculus_peak"


Structural variants (multiple species)

  • Added new studies from DGVa
  • Updated some of the existing studies from DGVa

Phenotype data updates (all species)

  • Updated Human phenotype data from different sources including NHGRI-EBI GWAS, OMIM, ClinVar, UniProt, Cosmic Gene Census, DDG2P, MIM Morbid and Orphanet.
  • OMIA data for several species
  • AnimalQTL data for several species
  • RGD data for Rat
  • ZFIN data for Zebrafish
  • IMPC data for Mouse
  • MGI data for Mouse

PharmGKB links added (Human)

Links to the PharmGKB website will be provided from our human variant pages.

New dbSNP data for chicken (Chicken)

Loaded dbSNP build 150 for chicken

New dbSNP data for cow (Cow)

Loaded dbSNP build 150 for cow

New dbSNP data for horse (Horse)

Loaded dbSNP build 150 for horse

New dbSNP data for pig (Pig)

Loaded dbSNP build 150 for pig

New dbSNP data for sheep (Sheep)

Loaded dbSNP build 150 for sheep

New dbSNP data for zebrafish (Zebrafish)

Loaded dbSNP build 150 for zebrafish

New dbSNP data for cat (Cat)

Loaded dbSNP build 148 for cat

New dbSNP data for mouse (Mouse)

Loaded dbSNP build 150 for mouse

New dbSNP data for macaque (Macaque)

Loaded dbSNP build 150 for macaque

LD web tool (Human)

We developed a new web tool for linkage disequilibrium (LD) calculation. The tool can calculate LD for all pairs of variants in a given region, for all pairs of variants from a given list of variants of for a given variant and all variants that are not further away than a given window size from the given variant. We can compute LD for human variants represented in the 1000 Genomes Project.


C. elegans annotation update (WS260) - Core (Caenorhabditis elegans)

Gene set and other annotations updated to data from WormBase release WS260.

Future Plans

Read about our future plans on our blog!