News for Human Ensembl Release 89 (May 2017)

News categories

New species, assemblies and genebuilds

Human: updated cDNA alignments

A new cdna database will be created for e89: The latest set of cDNAs for human (as of April 2017) from the European Nucleotide Archive and NCBI RefSeq will be aligned to the current genome using Exonerate.

New regulation data

Microarray Probe Mapping Update

Update microarray probe mappings for all arrays of all species

Map array probes onto 15 mouse strains

Map array probes onto the below mouse strains:

mus_musculus_129s1svimj
mus_musculus_aj
mus_musculus_akrj
mus_musculus_balbcj
mus_musculus_c3hhej
mus_musculus_c57bl6nj
mus_musculus_casteij
mus_musculus_cbaj
mus_musculus_dba2j
mus_musculus_fvbnj
mus_musculus_lpj
mus_musculus_nodshiltj
mus_musculus_nzohlltj
mus_musculus_pwkphj
mus_musculus_wsbeij

Other updates

Compara

ncRNAtrees and homologies

  • Classification based on Rfam models (v12.1)
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and ExaML for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE
  • Homology inference
  • Secondary structure plots

patch_88_89_a.sql - Schema version update

88 -> 89

Protein Families

Updated HMM families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Clustering by PantherScore (based on Ensembl HMM library)
 -- Multiple Sequence Alignments with MAFFT (v.7.221)

ProteinTrees and homologies

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.30+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.221)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

 -- computation of pairwise gene-order conservation score

 -- comparison of orthologies with whole-genome alignments

 -- high-confidence calls

Core

External database references update

Xrefs updates for: homo_sapiens (human), mus_musculus (mouse), danio_rerio (zebrafish), loxondonta africana (elephant), notamacropus eugenii (wallaby), papio anubis (olive baboon), ochotona princeps (pika), procavia capensis (hyrax), pteropus vampirus (megabat), tupaia belangeri (tree shrew), anas platyrhynchos (duck), dasypus novemcinctus (armadillo), ictidomys tricemlineatus (squirrel), gasterosteus aculeatus (stickleback)

Vega xrefs retirement

Links to Vega resources have been removed for the following species: human, mouse, rat, zebrafish, pig

GO terms for transcripts

GO terms have been introduced for some miRNAs. As a result, GO terms are now linked to transcripts rather than translations.

Regulation

Correction of VISTA Enhancers

The VISTA Enhancers for human have been incorrectly mapped and will be updated.

Database schema changes

patch_88_89_a - Schema change

patch_88_89_b - Create table probe_seq

patch_88_89_c - Create table probe_feature_transcript

patch_88_89_d - Create table probe_transcript

patch_88_89_e - Create table probe_set_transcript

patch_88_89_f  - Remove probe features from object_xref and xref table

patch_88_89_g - Remove probe mappings from the xref tables

patch_88_89_h - Remove probe set mappings from the xref tables.

patch_88_89_i - Add link columns to array table 

patch_88_89_j - Added array_chip_id column to probe_set table

patch_88_89_k - Added probe_seq_id column to probe table

Deprecate methods

The following methods have been deprecated and will be removed in Ensembl release 93
Bio::EnsEMBL::Funcgen::Epigenome::tissue()
Bio::EnsEMBL::Funcgen::Epigenome::ontology_accession()

Genebuild

Human protein features for mappings between Ensembl proteins and PDB structures with chains

Protein features that represent the mapping between human Ensembl proteins (ENSP) and PDB protein structures (including their corresponding PDB chains) have been added to the Ensembl human core database under the "sifts_import" logic name. This data has been imported from SIFTS, which is a resource for residue-level mapping between UniProt and PDB, and from GIFTS, which is a database containing alignments between UniProt and Ensembl proteins.

Production

Ensembl 89 mart databases

  • Ensembl Genes 89
    • Updated Microarray probes/probesets for all the species
    • Dataset "meugenii_gene_ensembl" was renamed to "neugenii_gene_ensembl"
    • Dataset "tsyrichta_gene_ensembl" was renamed to "csyrichta_gene_ensembl"
    • GO and GOSlim terms were moved from Translation to Transcript
  • Mouse Genes 89
    • Microarray probes/probesets added for all the mouse strains
    • GO and GOSlim terms were moved from Translation to Transcript
  • Ensembl Variation 89
    • New filters for regulatory and motif consequence types
  • Ensembl Regulation 89
    • Updated VISTA Enhancers for human and mouse

Variation

COSMIC data update

Imported cancer data from COSMIC version 80.

This import excludes the COSMIC alleles, populations and the mutations types.

Structural variants

  • Added new studies from DGVa
  • Updated some of the existing studies from DGVa
  • Updated 1000 Genomes study which now include structural variants on the X and Y chromosomes

PhenCode records merged with dbSNP records

PhenCode records will be merged with dbSNP records. PhenCode names will be available as variation synonyms.

Phenotype data updates

  • Updated Human phenotype data from different sources including NHGRI-EBI GWAS, OMIM, ClinVar, UniProt, Cosmic Gene Census, DDG2P, MIM Morbid and Orphanet.
  • OMIA data for several species
  • AnimalQTL data for several species
  • RGD data for Rat
  • ZFIN data for Zebrafish
  • IMPC data for Mouse
  • MGI data for Mouse

strain_gtype_poly table to be dropped

The strain_gtype_poly table will be dropped.

About this species