News for Guinea Pig Ensembl Release 81 (July 2015)

News categories

New web displays and tools

Trackhub settings support

In this release we have added support for trackhub visibility settings; in other words, tracks that are turned on by default in the hub's trackDb.txt file should automatically be shown in Ensembl.

The only exception is for hubs that we configure internally, such as the Genome Reference Consortium GRIT hub. For these, all tracks will be hidden by default, but you can find them by searching in the Control Panel.

Transcript sequence markup

Transcript sequences can now be marked up to show exons as alternating upper and lower case characters, rather than grey/blue text. Simply check the "Show exons as alternating upper/lower case" box in the "Configure this page" panel on Transcript cDNA or Transcript Protein pages.

This markup option will also carry over to the sequence export if RTF format is chosen.

Other updates

Compara

Schema version update

80 -> 81

Pairwise alignments: human and mouse patches

  • Human is patched to GRCh38.p3, so

    • we will upload human_ref-to-human_patches alignments done by Genebuilders 
    • we will run human_patches-to-high_coverage_species lastz alignments

  • Mouse is patched to GRCm38.p4, so

    • we will upload mouse_ref-to-mouse_patches alignments done by Genebuilders 
    • we will run mouse_patches-to-high_coverage_species lastz alignments

ProteinTrees and homologies

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.30+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.221)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

Protein Families

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Getting distances by NCBI BlastP (v.2.2.30+)
 -- Clustering by MCL (v.14-137)
 -- Multiple Sequence Alignments with MAFFT (v.7.221)
 -- Family stable ID mapping

ncRNAtrees and homologies

Classification based on Rfam models (v11.0)
Multiple sequence alignments with Infernal
Phylogenetic reconstruction using RAxML
Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
Additional multiple sequence alignments with Prank (w/ genomic flanks)
Additional phylogenetic reconstruction using PhyML and NJ
Phylogenetic tree merging using TreeBeST
Per family gene dynamics using CAFE
Homology inference
Secondary structure plots

Compara dumps

  • EMF / Fasta / OrthoXML / PhyloXML dumps for ProteinTrees + PhyloXML dumps for CAFE ProteinTrees
  • EMF / Fasta / OrthoXML / PhyloXML dumps for ncRNAtrees + PhyloXML dumps for CAFE ncRNAtrees

Core

Ensembl VM Build

The Ensembl Virtual Machine applicance will be updated to version 81.

LRG Import

Importing the latest version of Locus Reference Genomic dataset

patch_80_81_a.sql - schema_version update

Update schema_version in meta table to 81.

patch_80_81_a.sql - schema_version update in ontology db

Update schema_version in meta table to 81.

patch_80_81a.sql - schema_version update in production db

Update schema_version in production database to 81.

Stable ID lookup

Stable ID lookup provided for REST services

Includes lookup for RefSeq and CCDS entries

UTR features in the REST API

UTR features can be retrieved using the REST API

Band information in the REST API

Band information can be retrieved via the overlap endpoint in the REST API

New UTR, CDS and ExonTranscript features

The Ensembl API supports the retrieval of UTR, CDS and ExonTranscript features

UTR features represent the non-coding exons of a transcript, CDS features represent the coding exons of a transcript

ExonTranscript features are Exons which retain the link to their parent transcript as well as their rank

GFF3 dumps

Ensembl gene annotation will be provided in GFF3 files, along with the already existing GTF files

Genebuild

Upgrade remaining species to rnaseq matrix

For some species we have RNASeq data but have not yet displayed options in an RNASeq matrix for the users. This requires changes to the analysis_description, analysis_web_data and web_data tables in the ensembl_production database

Production

EMBL and Genbank Dumps

EMBL and Genbank dumps for all species.

Ensembl 81 mart databases

  • Ensembl Genes 81
    • Human assembly updated from GRCh38.p2 to GRCh38.p3
    • Mouse assembly updated from GRCm38.p3 to GRCm38.p4
  • Ensembl Variation 81
    • Human assembly updated from GRCh38.p2 to GRCh38.p3
    • Mouse assembly updated from GRCm38.p3 to GRCm38.p4
    • Added new structural variation species Sheep (Ovis aries)
  • Ensembl Regulation 81
    • New mouse regulation build data
  • Vega 61
    • Human assembly updated from GRCh38.p2 to GRCh38.p3
    • Mouse assembly updated from GRCm38.p3 to GRCm38.p4

External reference projection

Gene ontology (GO) identifiers and gene name projection to all species.

FASTA & GTF dumps

FASTA & GTF dumps for all the species

Web

replace i icon with ? icon

The "i" icon when clicked gives you the help/documentation page has been replaced with a new "?" icon.

The "i" icon in the tracks configuration has been left unchanged as this is more information rather than help.

Public plugins sqlite and sge_blast removed

The said two plugins, one for SQLite support for user db and other for SGE BLAST, were outdated and have been removed from the public-plugins repository.

Retirement of archives 67 and 59

This release cycle we will be retiring archive 67 (May 2012) in accordance with our three-year rolling retirement policy. Due to the arrival of GRCz10 in Ensembl 80 we will also be retiring archive 59 (Aug 2010) which currently shows Zebrafish Zv8 annotation.The data will remain available on our public database server; only the web interfaces will be removed.

User accounts/session database configuration changed

Database used for user accounts and sesison records was configured using the conf/SiteDefs.pm configurations as below:

$SiteDefs::ENSEMBL_USERDB_TYPE = 'mysql';
$SiteDefs::ENSEMBL_USERDB_NAME = 'ensembl_accounts';
$SiteDefs::ENSEMBL_USERDB_USER = 'mysqluser';
$SiteDefs::ENSEMBL_USERDB_HOST = 'localhost';
$SiteDefs::ENSEMBL_USERDB_PORT = 3306;
$SiteDefs::ENSEMBL_USERDB_PASS = '';

These configurations have been removed and now the database is configured by adding configurations in conf/ini-file/MULTI.ini as below:

[databases]
DATABASE_ACCOUNTS = ensembl_accounts
DATABASE_SESSION = ensembl_accounts

[DATABASE_ACCOUNTS]
HOST = localhost
PORT = 3306
USER = mysqluser
PASS =

[DATABASE_SESSION]
HOST = localhost
PORT = 3306
USER = mysqluser
PASS =

Changes to the code can be seen here: