News for Ensembl Release 89 (May 2017)
- New species, assemblies and genebuilds
- New regulation data
- Other updates
New species, assemblies and genebuilds
Mouse: update to Ensembl-Havana GENCODE gene set (Mouse)
Updated Ensembl-Havana mouse gene set. This gene set is a merge of complete Ensembl gene models and the latest Havana gene annotation. All CCDS genes are included in this gene set.
Human: updated cDNA alignments (Human)
A new cdna database will be created for e89: The latest set of cDNAs for human (as of April 2017) from the European Nucleotide Archive and NCBI RefSeq will be aligned to the current genome using Exonerate.
Mouse: updated cDNA alignments (Mouse)
A new cdna database will be created for e89: The latest set of cDNAs for mouse (as of April 2017) from the European Nucleotide Archive and NCBI RefSeq will be aligned to the current genome using Exonerate.
mouse lemur lincRNA (Mouse Lemur)
Remove 3 lincRNAs
New regulation data
Microarray Probe Mapping Update (all species)
Update microarray probe mappings for all arrays of all species
Map array probes onto 15 mouse strains (all species)
Map array probes onto the below mouse strains:
ncRNAtrees and homologies (all species)
- Classification based on Rfam models (v12.1)
- Multiple sequence alignments with Infernal
- Phylogenetic reconstruction using RAxML
- Phylogenetic reconstruction using FastTree2 and ExaML for very big families
- Additional multiple sequence alignments with Prank (w/ genomic flanks)
- Additional phylogenetic reconstruction using PhyML and NJ
- Phylogenetic tree merging using TreeBeST
- Per family gene dynamics using CAFE
- Homology inference
- Secondary structure plots
patch_88_89_a.sql - Schema version update (all species)
88 -> 89
Protein Families (all species)
Updated HMM families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.
-- Clustering by PantherScore (based on Ensembl HMM library)
-- Multiple Sequence Alignments with MAFFT (v.7.221)
ProteinTrees and homologies (all species)
GeneTrees (protein-coding) with new/updated genebuilds and assemblies
-- all-vs-all blastp (ncbi-blast-2.2.30+)
-- Clustering using hcluster_sg
-- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.221)
-- Phylogenetic reconstruction using TreeBeST
-- Homology inference
-- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
-- GeneTree stable ID mapping
-- Per family gene dynamics using CAFE (v2.2)
-- computation of pairwise gene-order conservation score
-- comparison of orthologies with whole-genome alignments
-- high-confidence calls
GO terms for transcripts (all species)
GO terms have been introduced for some miRNAs. As a result, GO terms are now linked to transcripts rather than translations.
External database references update (multiple species)
Xrefs updates for: homo_sapiens (human), mus_musculus (mouse), danio_rerio (zebrafish), loxondonta africana (elephant), notamacropus eugenii (wallaby), papio anubis (olive baboon), ochotona princeps (pika), procavia capensis (hyrax), pteropus vampirus (megabat), tupaia belangeri (tree shrew), anas platyrhynchos (duck), dasypus novemcinctus (armadillo), ictidomys tricemlineatus (squirrel), gasterosteus aculeatus (stickleback)
Vega xrefs retirement (Zebrafish, Human, Mouse, Rat, Pig)
Links to Vega resources have been removed for the following species: human, mouse, rat, zebrafish, pig
Correction of VISTA Enhancers (Human)
The VISTA Enhancers for human have been incorrectly mapped and will be updated.
Database schema changes (all species)
patch_88_89_a - Schema change
patch_88_89_b - Create table probe_seq
patch_88_89_c - Create table probe_feature_transcript
patch_88_89_d - Create table probe_transcript
patch_88_89_e - Create table probe_set_transcript
patch_88_89_f - Remove probe features from object_xref and xref table
patch_88_89_g - Remove probe mappings from the xref tables
patch_88_89_h - Remove probe set mappings from the xref tables.
patch_88_89_i - Add link columns to array table
patch_88_89_j - Added array_chip_id column to probe_set table
patch_88_89_k - Added probe_seq_id column to probe table
Updated VISTA enhancers to newest version (Mouse)
Updated VISTA enhancers to newest version
Deprecate methods (all species)
The following methods have been deprecated and will be removed in Ensembl release 93
Human protein features for mappings between Ensembl proteins and PDB structures with chains (Human)
Protein features that represent the mapping between human Ensembl proteins (ENSP) and PDB protein structures (including their corresponding PDB chains) have been added to the Ensembl human core database under the "sifts_import" logic name. This data has been imported from SIFTS, which is a resource for residue-level mapping between UniProt and PDB, and from GIFTS, which is a database containing alignments between UniProt and Ensembl proteins.
Remove duplicated Genscan prediction (Platyfish)
Genscan prediction are duplicated which makes it harder to process the ab initio file from the FTP
Ensembl 89 mart databases (all species)
- Ensembl Genes 89
- Updated Microarray probes/probesets for all the species
- Dataset "meugenii_gene_ensembl" was renamed to "neugenii_gene_ensembl"
- Dataset "tsyrichta_gene_ensembl" was renamed to "csyrichta_gene_ensembl"
- GO and GOSlim terms were moved from Translation to Transcript
- Mouse Genes 89
- Microarray probes/probesets added for all the mouse strains
- GO and GOSlim terms were moved from Translation to Transcript
- Ensembl Variation 89
- New filters for regulatory and motif consequence types
- Ensembl Regulation 89
- Updated VISTA Enhancers for human and mouse
Change of scientific name for tarsier (Tarsier)
Tarsier has changed its scientifc name from Tarsius syrichta to Carlito syrichta.
Change of scientific name for wallaby (Wallaby)
Wallaby has changed its scientifc name from Macrobus eugenii to Notmacrobus eugenii.
COSMIC data update (Human)
Imported cancer data from COSMIC version 80.
This import excludes the COSMIC alleles, populations and the mutations types.
Structural variants (multiple species)
- Added new studies from DGVa
- Updated some of the existing studies from DGVa
- Updated 1000 Genomes study which now include structural variants on the X and Y chromosomes
Phenotype data updates (all species)
- Updated Human phenotype data from different sources including NHGRI-EBI GWAS, OMIM, ClinVar, UniProt, Cosmic Gene Census, DDG2P, MIM Morbid and Orphanet.
- OMIA data for several species
- AnimalQTL data for several species
- RGD data for Rat
- ZFIN data for Zebrafish
- IMPC data for Mouse
- MGI data for Mouse
strain_gtype_poly table to be dropped (all species)
The strain_gtype_poly table will be dropped.
PhenCode records merged with dbSNP records (Human)
PhenCode records will be merged with dbSNP records. PhenCode names will be available as variation synonyms.
Read about our future plans on our blog!