EnsemblEnsembl Home

Ensembl Comparative Genomics News

Release 90

New external data for the pig genome (Pig)

New database containing all the cDNAs and the PacBio long reads alignments

ncRNAtrees and homologies (all species)

  • Classification based on Rfam models (v12.1)
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and ExaML for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE
  • Homology inference
  • Secondary structure plots

patch_89_90_a.sql - Schema version update (all species)

89 -> 90

Protein Families (all species)

Updated HMM families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Clustering by PantherScore (based on Ensembl HMM library)
 -- Multiple Sequence Alignments with MAFFT (v.7.221)

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.30+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.221)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

 -- computation of pairwise gene-order conservation score

 -- comparison of orthologies with whole-genome alignments

 -- high-confidence calls

 -- use of cd-hit to remove redundancy in blast db

patch_89_90_e.sql - new seq_member_projection and seq_member_projection_stable_id table (all species)

  • To hold information about projected members
  • Required by mouse strains and cd-hit implementation
  • allows for a representative sequence to be used in both blast and HMM classification and its resultant data to be projected onto other members of the cluster for expediency

S.scrofa alignments (all species)

We will compute LastZ alignments for new pig assembly:

  • pig v cow
  • pig v sheep
  • human v pig
  • mouse v pig

Recompute multiple alignments (all species)

EPOs containing updated species should be recomputed:

  • mammals

Pecan alignments:

  • amniotes (only to update new assemblies)

Mouse alignments (all species)

We will compute LastZ alignments for new mice, mus caroli and mus pahari

  • caroli v mouse
  • pahari v mouse
  • caroli v human
  • pahari v human

For consistency we will also align mus spretus against the reference mouse

patch_89_90_b.sql - DB Schema update:genomic_align_tree.parent_id (all species)

allow NULL in genomic_align_tree.parent_id

patch_89_90_c.sql - Mark constrained_element.p_value as NOT NULL (all species)

This will also fix a bug in the code whereby pvalue=0 was stored as NULL. From now on, p-values are always defined and 0 will be used

patch_89_90_d.sql - Allow NULL in genomic_align_tree.left_node_id/right_node_id (all species)

NULL was previously not allowed and led to 0 being used

patch_89_90_f.sql - Add missing biotype-groups in the gene_member table (all species)

The list of biotype groups was missing a few ones like "pseudogene", but also other genes that are meant to be ignored but need to be loaded for consistency

LastZ alignments: rodent collection (all species)

All the new rodents will be aligned against human and mouse with LastZ

Chinese Hamster Ovary Cell Line alignments (all species)

We will import the alignment with mouse computed by Eagle Genomics and will compute the alignment against human

Release 89

ncRNAtrees and homologies (all species)

  • Classification based on Rfam models (v12.1)
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and ExaML for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE
  • Homology inference
  • Secondary structure plots

patch_88_89_a.sql - Schema version update (all species)

88 -> 89

Protein Families (all species)

Updated HMM families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Clustering by PantherScore (based on Ensembl HMM library)
 -- Multiple Sequence Alignments with MAFFT (v.7.221)

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.30+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.221)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

 -- computation of pairwise gene-order conservation score

 -- comparison of orthologies with whole-genome alignments

 -- high-confidence calls

Release 88

Schema: new dnafrag.codon_table_id column (all species)

to indicate which codon table should be used to translate sequences of this dnafrag. The information is a copy of what is stored in the Core database, but necessary to 1) make the dN/dS more efficient and 2) be able to handle alternative codon tables in the absence of a core database

Schema: new exon_boundaries table (all species)

Used to keep track of all the exon coordinates

Schema: new gene_member.biotype_group column (all species)

to indicate whether the gene is protein-coding, is a short ncRNA etc. This allows to load all the genes in one operation and make the homology pipelines filter their dataset using the compara database only instead of queryng the core databse

Schema: new genome_db.strain_name column (all species)

Used to indicate the name of this strain (complements taxon_id which only provides the species name)

Schema: new seq_member.has_translation_edits and seq_member.has_transcript_edits columns (all species)

used to flag the seq_members that have hardcoded transcript / protein sequences. When this happens, the data (exon coordinates + transcript sequence + translation sequence) is not in sync and some analyses have to be discarded

patch_87_88_a.sql - Schema version update (all species)

87 -> 88

H.sap alignments (all species)

We will topup all LastZ alignments for human vs all target species that have a karyotype.

Family REST endpoints (all species)

Addition of family REST endpoints

Pruned EPO alignments (all species)

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; -webkit-text-stroke: #000000} span.s1 {font-kerning: none}

Method to retrieve EPO alignments for a given species subset.

Cafe Tree REST endpoints (all species)

Addition of Cafe Tree REST endpoints.

Protein Families (all species)

Updated HMM families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Clustering by PantherScore (based on Ensembl HMM library)
 -- Multiple Sequence Alignments with MAFFT (v.7.221)

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.30+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.221)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

 -- computation of pairwise gene-order conservation score

 -- comparison of orthologies with whole-genome alignments

 -- high-confidence calls

ncRNAtrees and homologies (all species)

  • Classification based on Rfam models (v12.1)
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and ExaML for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE
  • Homology inference
  • Secondary structure plots

Release 87

patch_86_87_a.sql - Schema version update (all species)

86 -> 87

Protein Families (all species)

Updated HMM families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Clustering by PantherScore (based on Ensembl HMM library)
 -- Multiple Sequence Alignments with MAFFT (v.7.221)

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.30+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.221)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

 -- computation of pairwise gene-order conservation score

 -- comparison of orthologies with whole-genome alignments

ncRNAtrees and homologies (all species)

  • Classification based on Rfam models (v12.1)
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE
  • Homology inference
  • Secondary structure plots

patch_86_87_b.sql - Schema update (all species)

New table with gene quality information from the geneset_QC pipeline.

Pairwise alignments:mouse patches (all species)

Mouse is patched to GRCm38.p5, so

we will upload mouse_ref-to-mouse_patches alignments done by Genebuilders 

we will run mouse_patches-to-high_coverage_species lastz alignments

Schema: Move per-species stats to species_tree_node_tag (all species)

Per-species multiple aligner stats moved from method_link_species_set_tag to species_tree_node_tag

API: removal of deprecated method (all species)

This method was scheduled for deletion in past releases of Ensembl. It is now going to be removed.

  • SequenceAdaptor::fetch_by_dbIDs in favour of fetch_all_by_dbID_list

API: scheduling methods for deletion (all species)

These methods are now scheduled for deletion.

  • fetch_all_by_genome_pair
  • fetch_all_by_Member_paired_species

Release 86

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.30+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.221)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

 -- computation of pairwise gene-order conservation score

 -- comparison of orthologies with whole-genome alignments

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Getting distances by NCBI BlastP (v.2.2.30+)
 -- Clustering by MCL (v.14-137)
 -- Multiple Sequence Alignments with MAFFT (v.7.221)
 -- Family stable ID mapping

Schema version update (all species)

85 -> 86

Compute wga_coverage scores for homologies (all species)

Add whole genome alignment coverage for each homology as a quality measure

G.gal pairwise alignments (Chicken)

Recompute all LASTZ alignements for chicken:

  • human
  • mouse
  • opossum
  • anole lizard
  • zebra finch
  • flycatcher
  • duck
  • turkey
  • chinese softshell turtle
  • xenopus
  • zebrafish
  • c. savignyi

Mouse lemur v human LASTZ (Mouse Lemur)

Recompute pairwise alignment

Macaque v human LASTZ (Macaque)

Recompute pairwise alignment

Remove the "species_tree" column from the species_tree_root table (all species)

The "species_tree" column was used to keep a stringified version of the tree (in newick) but the API is in fact able to generate a customised newick string using the species_tree_node table

Update syntenies (all species)

Syntenies need to be recomputed for the new assemblies:

  • chicken
  • mouse lemur
  • macaque

Recompute multiple alignments (all species)

EPOs containing updated species should be recomputed:

  • sauropods
  • primates
  • mammals

Pecan alignments:

  • vertebrates (only to update new assemblies)

Recompute low coverage multiple alignments (all species)

EPO LOW COVERAGE alignments need to be updated:

  • sauropods
  • mammals

ncRNAtrees and homologies (all species)

  • Classification based on Rfam models (v12.0)
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE
  • Homology inference
  • Secondary structure plots

Schema: new species_tree_root_id column in the gene_tree_root table (all species)

This column will link a gene-tree to the species-tree it's been reconciled with. This way, it will be obvious which gene-trees are reconciled with which species-trees, especially given the fact that intermediate trees of the same family can have different reconciliation parameters

Schema: new high_confidence column in the homology table (all species)

New column to define a subset of high-confidence orthologues based on various QC steps

Schema: homology stats (number of orthologes, etc) moved to a separate table (all species)

The new table is gene_member_hom_stats

Release 85

Schema version update (all species)

84 -> 85

Schema change (gene_tree_root_tag / gene_tree_root_attr) (all species)

84->85

Adding extra attributes to the gene_tree_root_attr  table (model_name and division)

 

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Getting distances by NCBI BlastP (v.2.2.30+)
 -- Clustering by MCL (v.14-137)
 -- Multiple Sequence Alignments with MAFFT (v.7.221)
 -- Family stable ID mapping

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.30+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.221)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

 -- computation of pairwise gene-order conservation score

 -- comparison of orthologies with whole-genome alignments

schema update (NOT NULL) (all species)

Mark some columns as NOT NULL in the following tables: (ncbi_taxa_name, species_set_tag, method_link_species_set_tag, dnafrag, genomic_align_block, genomic_align, constrained_element and peptide_align_feature)

Add size-distribution data for WGAs (all species)

Add size-distribution data for whole genome alignments

Blast-clustering and short proteins (all species)

all-vs-all blastp with optimized parameters for <100aa proteins

Schema change (species_tree_node_tag / species_tree_node_attr) (all species)

84->85

Promoting some of the tags in the species_tree_node_tag table to attributes

 

Schema change (method_link_species_set_tag / method_link_species_set_attr) (all species)

84->85

Promoting some tags in the method _link_species_set_tag table to attributes

 

Release 84

Gene-tree endpoint: new option to prune by target species / taxon (all species)

The endpoint currently returns the entire, which may contain all the species. If the user is only interested in some species, there is now an option to prune the tree on the server side

ncRNAtrees and homologies (all species)

Classification based on Rfam models (v12.0)
Multiple sequence alignments with Infernal
Phylogenetic reconstruction using RAxML
Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
Additional multiple sequence alignments with Prank (w/ genomic flanks)
Additional phylogenetic reconstruction using PhyML and NJ
Phylogenetic tree merging using TreeBeST
Per family gene dynamics using CAFE
Homology inference
Secondary structure plots

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- families identified using HMMs

 -- distribute the HMMs

 -- Multiple Sequence Alignments with MAFFT (v.7.221)
 -- Family stable ID mapping

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.30+) with better blastp parameters for short proteins

 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.221)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

Schema version update (all species)

83 -> 84

Schema & API changes to accommodate ortholog quality metrics (all species)

Add additional columns to the homology table:

--goc_score: quality score derived from gene order conservation analysis

--wga_coverage: quality score derived from whole genome alignment coverage

Edit Homology and HomologyAdaptor to account for the changes

API: removal of deprecated methods (all species)

These methods were scheduled for deletion in past releases of Ensembl / in e84. There are now going to be removed.

Obsolete since the redesign of member sequences

  • AlignedMember::alignment_string()
  • AlignedMember::alignment_string_bounded()
  • AlignedMember::cdna_alignment_string()
  • MemberSet::print_sequences_to_fasta()
  • SeqMember::get_exon_bounded_sequence()
  • SeqMember::get_other_sequence()
  • SeqMember::sequence_cds()
  • SeqMember::sequence_exon_bounded()
  • SeqMember::sequence_exon_cased()

Obsolete since the redesign of the species-tree reconciliation

  • GeneTreeNode::get_value_for_tag('taxon_id')
  • GeneTreeNode::get_value_for_tag('taxon_name')
  • Homology::node_id()
  • Homology::ancestor_tree_node_id()
  • Homology::tree_node_id()
  • Homology::subtype()
  • Homology::taxonomy_alias()

Obsolete since the redesign of member objects

  • Member::chr_name()
  • Member::chr_start()
  • Member::chr_end()
  • Member::chr_strand()
  • GeneMember::member_id()
  • GeneMember::get_all_peptide_Members()
  • GeneMember::get_canonical_Member()
  • GeneMember::get_canonical_peptide_Member()
  • GeneMember::get_canonical_transcript_Member()
  • SeqMember::member_id()
  • MemberAdaptor::fetch_by_source_stable_id()
  • MemberAdaptor::fetch_all_by_source_stable_ids()
  • MemberAdaptor::fetch_all_by_source_genome_db_id()
  • SeqMemberAdaptor::fetch_all_by_gene_member_id()
  • SeqMemberAdaptor::fetch_all_canonical_by_source_genome_db_id()
  • SeqMemberAdaptor::fetch_canonical_member_for_gene_member_id()
  • FamilyAdaptor::fetch_all_by_Member()
  • FamilyAdaptor::fetch_by_Member_source_stable_id()

Others

  • GenomeDB::short_name()
  • GenomeDB::assembly_default()

Schema change: gene-tree attributes (all species)

Some gene-tree tags will be promoted to "attributes", and be stored in a new table (gene_tree_root_attr) instead of gene_tree_root_tag

There is no change at the API level

Schema change: description field in the family table (all species)

The 255-characters long "description" field of the "family" table is being extended to a TEXT type (which can accommodate 64K of text)

Release 83

M.mus Vs C.intestinalis synteny (all species)

The Lastz alignments were recomputed in the release 82, hence we are going to recompute the synteny.

The syntenies for Ciona_intestinalis Vs human, mouse and zebrafish are going to be deleted from release 83 because they have a coverage of <1%

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Getting distances by NCBI BlastP (v.2.2.30+)
 -- Clustering by MCL (v.14-137)
 -- Multiple Sequence Alignments with MAFFT (v.7.221)
 -- Family stable ID mapping

Schema version update (all species)

81 -> 82

Compara dumps (all species)

  • EMF / Fasta / OrthoXML / PhyloXML dumps for ProteinTrees + PhyloXML dumps for CAFE ProteinTrees
  • EMF / Fasta / OrthoXML / PhyloXML dumps for ncRNAtrees + PhyloXML dumps for CAFE ncRNAtrees

ncRNAtrees and homologies (all species)

Classification based on Rfam models (v11.0)
Multiple sequence alignments with Infernal
Phylogenetic reconstruction using RAxML
Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
Additional multiple sequence alignments with Prank (w/ genomic flanks)
Additional phylogenetic reconstruction using PhyML and NJ
Phylogenetic tree merging using TreeBeST
Per family gene dynamics using CAFE
Homology inference
Secondary structure plots

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.30+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.221)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

Rename some misnamed MLSSs (all species)

The LastZ MLSSs should have the name of the reference species in their name, e.g. "H.sap-P.anu lastz-net (on H.sap)" instead of "H.sap-P.anu lastz-net"

Make the consensus description of Families more readable (all species)

Many families have strings like ECO: |RULEBASE: or ECO: |EMBL: 1 in their description. Those probably come from new tags that UniProt has added to the descriptions of the proteins themselves. We need to remove those strings

API: removal of deprecated methods (all species)

These methods were deprecated and scheduled for deletion:

  • NCBITaxon::ensembl_alias()
  • NCBITaxon::short_name()

Release 82

Compara dumps (all species)

  • EMF / Fasta / OrthoXML / PhyloXML dumps for ProteinTrees + PhyloXML dumps for CAFE ProteinTrees
  • EMF / Fasta / OrthoXML / PhyloXML dumps for ncRNAtrees + PhyloXML dumps for CAFE ncRNAtrees

ncRNAtrees and homologies (all species)

Classification based on Rfam models (v11.0)
Multiple sequence alignments with Infernal
Phylogenetic reconstruction using RAxML
Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
Additional multiple sequence alignments with Prank (w/ genomic flanks)
Additional phylogenetic reconstruction using PhyML and NJ
Phylogenetic tree merging using TreeBeST
Per family gene dynamics using CAFE
Homology inference
Secondary structure plots

Schema: first/last_release columns in genome_db (all species)

The two columns would allow to track with more precision when each genome was added, and which ones are current

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.30+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.221)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Getting distances by NCBI BlastP (v.2.2.30+)
 -- Clustering by MCL (v.14-137)
 -- Multiple Sequence Alignments with MAFFT (v.7.221)
 -- Family stable ID mapping

Schema version update (all species)

81 -> 82

Schema: first/last_release columns in method_link_species_set (all species)

The two columns would allow to track with more precision when each dataset was produced, and which ones are current

Schema: New species_set_header table, with first/last_release columns (all species)

New header table that allows us to check the database integrity with foreign keys. The table would also contain information about when each set was used

Replace TBlat with LastZ (all species)

Recompute some TBlat pairwise comparisons with LastZ (which gives a higher coverage):

  • {M.mus, G.gal, T.nig} vs X.tro
  • X.tro vs L.cha
  • G.acu vs {L.cha, P.mar}
  • {M.mus, P.mar} vs C.int
  • G.gal vs C.sav

Release 81

Schema version update (all species)

80 -> 81

Pairwise alignments: human and mouse patches (all species)

  • Human is patched to GRCh38.p3, so

    • we will upload human_ref-to-human_patches alignments done by Genebuilders 
    • we will run human_patches-to-high_coverage_species lastz alignments

  • Mouse is patched to GRCm38.p4, so

    • we will upload mouse_ref-to-mouse_patches alignments done by Genebuilders 
    • we will run mouse_patches-to-high_coverage_species lastz alignments

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.30+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.221)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Getting distances by NCBI BlastP (v.2.2.30+)
 -- Clustering by MCL (v.14-137)
 -- Multiple Sequence Alignments with MAFFT (v.7.221)
 -- Family stable ID mapping

ncRNAtrees and homologies (all species)

Classification based on Rfam models (v11.0)
Multiple sequence alignments with Infernal
Phylogenetic reconstruction using RAxML
Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
Additional multiple sequence alignments with Prank (w/ genomic flanks)
Additional phylogenetic reconstruction using PhyML and NJ
Phylogenetic tree merging using TreeBeST
Per family gene dynamics using CAFE
Homology inference
Secondary structure plots

Compara dumps (all species)

  • EMF / Fasta / OrthoXML / PhyloXML dumps for ProteinTrees + PhyloXML dumps for CAFE ProteinTrees
  • EMF / Fasta / OrthoXML / PhyloXML dumps for ncRNAtrees + PhyloXML dumps for CAFE ncRNAtrees

Release 80

API: new method to get the multiple alignment of several homologues (all species)

The new method is GeneTree::get_alignment_of_homologues($ref_member)

Schema version update (all species)

79 -> 80

API: new methods to fetch data from a Gene / Transcript / DnaFrag (all species)

To improve the usability of our API, we'll add methods to fetch the data directly from the Core objects, without having to create Members and DnaFrags. Members and DnaFrags will still be used to represent the data on our side, though

D.rer GRCz10 pairwise alignments and syntenies (all species)

  • lastz D.rer vs T.rub (on D.rer)
  • lastz D.rer vs T.nig (on D.rer used to be T.nig)
  • lastz D.rer vs G.acu (on D.rer used to be G.acu)
  • lastz D.rer vs O.lat (on D.rer)
  • lastz D.rer vs O.nil (on D.rer)
  • lastz D.rer vs X.mac (on D.rer)
  • lastz D.rer vs L.ocu (on D.rer)
  • lastz D.rer vs G.mor (on D.rer)
  • lastz D.rer vs A.mex (on D.rer)
  • lastz D.rer vs P.for (on D.rer)
  • lastz D.rer vs G.gal (on G.gal)
  • lastz D.rer vs H.sap (on H.sap)
  • lastz D.rer vs M.mus (on M.mus)
  • lastz D.rer vs X.tro (on D.rer)
  • lastz D.rer vs L.cha (on D.rer)
  • lastz D.rer vs P.mar (on D.rer)
  • lastz D.rer vs C.sav (on D.rer)
  • lastz D.rer vs C.int (on D.rer)

Synteny maps will be generated when both species have their karyotype stored in the database

R.nor Rnor_v6.0 pairwise alignments and syntenies (all species)

  • lastz R.nor vs M.mus (on M.mus) + synteny
  • lastz R.nor vs H.sap (on H.sap) + synteny

D.rer GRCz10 multiple alignments (all species)

  • 5-way fish EPO alignments
  • 11-way fish EPO-2X alignments

R.nor Rnor_v6.0 multiple alignments (all species)

  • 17-way eutherian EPO alignments
  • 39-way eutherian EPO-2X alignments
  • 23-way amniota MercatorPecan alignments

We will also regenerate the "Age Of Base" human track from the new 17way EPO MSA

Release 79

Schema version update (all species)

78 -> 79

Schema: support for polyploid genomes (all species)

There will be a new column "genome_component" in the genome_db table, with the appropriate API support

New SPECIES_TREE method (all species)

This will allow us to store species trees independently of datasets (needed for http://www.ensembl.org/info/about/speciestree.html)

API: remove GenomicAlignBlockAdaptor::fetch_all_by_MethodLinkSpeciesSet_GroupID() (all species)

The method is unused in the project, and there is no SQL index to speed it up.

Medaka vs Tetraodon lastz (all species)

Medaka vs Tetraodon lastz pairwise alignments (O.lat-T.nig lastz-net (on O.lat))

Medaka vs Tetraodon synteny (all species)

Medaka vs Tetraodon synteny

Web: statistics of the synteny analysis between two species (all species)

A new page similar to the statistics of the pairwise alignments, but over the synteny blocks. Since there is no specific alignmentsfor the synteny regions, the statistics only consist in covered / uncovered lengths.

Release 78

GRC alignments (Human)

GRC alignments between the primary assembly and the alternate loci added.

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.27+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.017)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Getting distances by NCBI BlastP (v.2.2.28+)
 -- Clustering by MCL (v.12-135)
 -- Multiple Sequence Alignments with MAFFT (v.7.113)
 -- Family stable ID mapping

ncRNAtrees and homologies (all species)

Classification based on Rfam models (v11.0)
Multiple sequence alignments with Infernal
Phylogenetic reconstruction using RAxML
Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
Additional multiple sequence alignments with Prank (w/ genomic flanks)
Additional phylogenetic reconstruction using PhyML and NJ
Phylogenetic tree merging using TreeBeST
Per family gene dynamics using CAFE
Homology inference
Secondary structure plots

Compara dumps (all species)

EMF / Fasta / OrthoXML / PhyloXML dumps for ProteinTrees
EMF / Fasta / OrthoXML / PhyloXML dumps for ncRNAtrees
PhyloXML dumps for CAFE ProteinTrees
PhyloXML dumps for CAFE ncRNAtrees

New tables (all species)

We are adding two new tables that will be used in the future to store an HMM classification of the proteins

New synteny analyses (all species)

New synteny data for:

  • Chicken vs Zebrafinch
  • Human vs Orangutan
  • Chicken vs Opossum

Release 77

8-way primate EPO multiple alignments (all species)

 callithrix jacchus
 chlorocebus sabeus
 gorilla gorilla
 homo sapiens
 macaca mulatta
 pan troglodytes
 papio anubis
 pongo abelii

17-way mammal EPO multiple alignments (all species)

 bos taurus
 callithrix jacchus
 canis familiaris
 chlorocebus sabeus
 equus caballus
 felis catus
 gorilla gorilla
 homo sapiens
 macaca mulatta
 mus musculus
 oryctolagus cuniculus
 ovis aries
 pan troglodytes
 papio anubis
 pongo abelii
 rattus norvegicus
 sus scrofa

39-way mammal low-coverage EPO multiple alignments (all species)

 ailuropoda melanoleuca
 bos taurus
 callithrix jacchus
 canis familiaris
 cavia porcellus
 chlorocebus sabeus
 choloepus hoffmanni
 dasypus novemcinctus
 dipodomys ordii
 echinops telfairi
 equus caballus
 erinaceus europaeus
 felis catus
 gorilla gorilla
 homo sapiens
 ictidomys tridecemlineatus
 loxodonta africana
 macaca mulatta
 microcebus murinus
 mustela putorius_furo
 mus musculus
 myotis lucifugus
 nomascus leucogenys
 ochotona princeps
 oryctolagus cuniculus
 otolemur garnettii
 ovis aries
 pan troglodytes
 papio anubis
 pongo abelii
 procavia capensis
 pteropus vampyrus
 rattus norvegicus
 sorex araneus
 sus scrofa
 tarsius syrichta
 tupaia belangeri
 tursiops truncatus
 vicugna pacos

Pairwise alignments (Human, Vervet-AGM)

LastZ: human and vervet monkey (H.sap-C.sab (on H.sap))

23-way amniota-pecan multiple alignments (all species)

 macaca_mulatta
 ornithorhynchus_anatinus
 monodelphis_domestica
 pongo_abelii
 equus_caballus
 taeniopygia_guttata
 oryctolagus_cuniculus
 anolis_carolinensis
 meleagris_gallopavo
 callithrix_jacchus
 bos_taurus
 gorilla_gorilla
 pan_troglodytes
 sus_scrofa
 mus_musculus
 canis_familiaris
 felis_catus
 rattus_norvegicus
 gallus_gallus
 ovis_aries
 homo_sapiens
 papio_anubis
 chlorocebus_sabeus

Synteny (Human, Vervet-AGM)

SYNTENY: homo_sapiens(GRCh38) - chlorocebus_sabeus(ChlSab1.1)

Primate ancestral alleles (all species)

 macaca_mulatta
 pongo_abelii
 callithrix_jacchus
 gorilla_gorilla
 pan_troglodytes
 homo_sapiens
 papio_anubis
 chlorocebus_sabeus

Age of Base (Human)

BigBed file for the human Age of Base track.

Compara dumps (all species)

EMF / Fasta / OrthoXML / PhyloXML dumps for ProteinTrees
EMF / Fasta / OrthoXML / PhyloXML dumps for ncRNAtrees
PhyloXML dumps for CAFE ProteinTrees
PhyloXML dumps for CAFE ncRNAtrees
EMF and MAF dumps for epo_39_eutherian_mammals
MAF dumps for epo_8_primates
EMF and MAF dumps for pecan_23_amniota
MAF dumps for epo_17_eutherian_mammals
BED files for constrained elements
MAF dumps for H.sap-C.sab LASTZ

ncRNAtrees and homologies (all species)

Classification based on Rfam models (v11.0)
Multiple sequence alignments with Infernal
Phylogenetic reconstruction using RAxML
Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
Additional multiple sequence alignments with Prank (w/ genomic flanks)
Additional phylogenetic reconstruction using PhyML and NJ
Phylogenetic tree merging using TreeBeST
Per family gene dynamics using CAFE
Homology inference
Secondary structure plots

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

 -- Getting distances by NCBI BlastP (v.2.2.28+)
 -- Clustering by MCL (v.12-135)
 -- Multiple Sequence Alignments with MAFFT (v.7.113)
 -- Family stable ID mapping

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.27+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.017)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

Release 76

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

 -- all-vs-all blastp (ncbi-blast-2.2.27+)
 -- Clustering using hcluster_sg
 -- Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.017)
 -- Phylogenetic reconstruction using TreeBeST
 -- Homology inference
 -- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
 -- GeneTree stable ID mapping
 -- Per family gene dynamics using CAFE (v2.2)

ncRNAtrees and homologies (all species)

Classification based on Rfam models (v11.0)

Multiple sequence alignments with Infernal

Phylogenetic reconstruction using RAxML

Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families

Additional multiple sequence alignments with Prank (w/ genomic flanks)

Additional phylogenetic reconstruction using PhyML and NJ

Phylogenetic tree merging using TreeBeST

Per family gene dynamics using CAFE

Homology inference

Secondary structure plots

Protein Families (all species)

New pipeline that makes the Families consistent with the gene-trees. It includes all the Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

Clustering using the TreeFam 10 HMM library

Multiple Sequence Alignments with MAFFT (v.7.017)

Family stable ID mapping

API/Schema changes (all species)

- Several objects now inherit from Storable (methods: dbID(), adaptor(), new_fast(), new())

- Methods scheduled for deletion in e76 have been removed

- Split member into seq_member and gene_member + members depend on dnafrags

lastz alignments (all species)

lastz H.sap-C.hof (on H.sap)  ( choloepus_hoffmanni, homo_sapiens ) 

lastz H.sap.O.ari (on H.sap) ( ovis_aries, homo_sapiens )

lastz H.sap-C.por (on H.sap)  ( cavia_porcellus, homo_sapiens )

lastz H.sap-D.nov (on H.sap)  ( dasypus_novemcinctus, homo_sapiens )

lastz H.sap-D.ord (on H.sap)  ( dipodomys_ordii, homo_sapiens ) 

lastz H.sap-E.eur (on H.sap)  ( erinaceus_europaeus, homo_sapiens )

lastz H.sap-E.tel (on H.sap)  ( echinops_telfairi, homo_sapiens )

lastz H.sap-L.afr (on H.sap)  ( homo_sapiens, loxodonta_africana )

lastz H.sap-M.eug (on H.sap)  ( homo_sapiens, macropus_eugenii )

lastz H.sap-M.mur (on H.sap)  ( microcebus_murinus, homo_sapiens )

lastz H.sap-O.pri (on H.sap)  ( ochotona_princeps, homo_sapiens )

lastz H.sap-P.cap (on H.sap)  ( procavia_capensis, homo_sapiens )

lastz H.sap-P.vam (on H.sap)  ( pteropus_vampyrus, homo_sapiens )

lastz H.sap-S.ara (on H.sap)  ( sorex_araneus, homo_sapiens )

lastz H.sap-T.bel (on H.sap)  ( tupaia_belangeri, homo_sapiens )

lastz H.sap-T.syr (on H.sap)  ( tarsius_syrichta, homo_sapiens )

lastz H.sap-T.tru (on H.sap)  ( tursiops_truncatus, homo_sapiens )

lastz H.sap-V.pac (on H.sap)  ( vicugna_pacos, homo_sapiens )

lastz H.sap-I.tri (on H.sap)  ( homo_sapiens, ictidomys_tridecemlineatus )

lastz H.sap-M.fur (on H.sap)  ( homo_sapiens, mustela_putorius_furo ) 

lastz H.sap-M.luc (on H.sap)  ( homo_sapiens, myotis_lucifugus ) 

lastz H.sap-A.mel (on H.sap)  ( homo_sapiens, ailuropoda_melanoleuca )

lastz H.sap-E.cab (on H.sap)  ( equus_caballus, homo_sapiens )

lastz H.sap-M.dom (on H.sap)  ( monodelphis_domestica, homo_sapiens )

lastz H.sap-O.ana (on H.sap)  ( ornithorhynchus_anatinus, homo_sapiens )

lastz H.sap-B.tau (on H.sap)  ( homo_sapiens, bos_taurus )

lastz H.sap-C.fam (on H.sap)  ( homo_sapiens, canis_familiaris )

lastz H.sap-C.jac (on H.sap)  ( homo_sapiens, callithrix_jacchus ) 

lastz H.sap-F.cat (on H.sap)  ( homo_sapiens, felis_catus ) 

lastz H.sap-G.gor (on H.sap)  ( homo_sapiens, gorilla_gorilla ) 

lastz H.sap-M.mul (on H.sap)  ( macaca_mulatta, homo_sapiens )

lastz H.sap-M.mus (on H.sap)  ( homo_sapiens, mus_musculus )

lastz H.sap-N.leu (on H.sap)  ( homo_sapiens, nomascus_leucogenys )

lastz H.sap-O.cun (on H.sap)  ( homo_sapiens, oryctolagus_cuniculus )

lastz H.sap-O.gar (on H.sap)  ( homo_sapiens, otolemur_garnettii )

lastz H.sap-P.abe (on H.sap)  ( homo_sapiens, pongo_abelii )

lastz H.sap-P.tro (on H.sap)  ( homo_sapiens, pan_troglodytes )

lastz H.sap-R.nor (on H.sap)  ( homo_sapiens, rattus_norvegicus )

lastz H.sap-S.har (on H.sap)  ( homo_sapiens, sarcophilus_harrisii )

lastz H.sap-S.scr (on H.sap)  ( homo_sapiens, sus_scrofa )

lastz H.sap-A.pla (on H.sap)  ( homo_sapiens, anas_platyrhynchos )

lastz H.sap-F.alb (on H.sap)  ( homo_sapiens, ficedula_albicollis )

lastz H.sap-G.gal (on H.sap)  ( homo_sapiens, gallus_gallus )

lastz H.sap-P.sin (on H.sap)  ( homo_sapiens, pelodiscus_sinensis )

lastz H.sap (on H.sap) ( homo_sapiens )

lastz C.sav-H.sap (on H.sap) ( ciona_savignyi,homo_sapiens )

lastz H.sap-A.car (on H.sap) ( homo_sapiens,anolis_carolinensis )

lastz H.sap-C.int (on H.sap) ( homo_sapiens,ciona_intestinalis )

lastz H.sap-D.rer (on H.sap) ( homo_sapiens,danio_rerio )

lastz H.sap-G.acu (on H.sap) ( gasterosteus_aculeatus,homo_sapiens )

lastz H.sap-G.mor (on H.sap) ( homo_sapiens,gadus_morhua )

lastz H.sap-L.cha (on H.sap) ( homo_sapiens,latimeria_chalumnae )

lastz H.sap-O.lat (on H.sap) ( oryzias_latipes,homo_sapiens )

lastz H.sap-O.nil (on H.sap) ( homo_sapiens,oreochromis_niloticus )

lastz H.sap-P.mar (on H.sap) ( homo_sapiens,petromyzon_marinus )

lastz H.sap-T.gut (on H.sap) ( taeniopygia_guttata,homo_sapiens )

lastz H.sap-T.nig (on H.sap) ( tetraodon_nigroviridis,homo_sapiens )

lastz H.sap-T.rub (on H.sap) ( takifugu_rubripes,homo_sapiens )

lastz H.sap-X.mac (on H.sap) ( homo_sapiens,xiphophorus_maculatus )

lastz H.sap-X.tro (on H.sap) ( homo_sapiens,xenopus_tropicalis ) 

lastz M.gal-H.sap (on H.sap) ( homo_sapiens,meleagris_gallopavo )

lastz M.mus-C.por (on M.mus) ( mus_musculus,cavia_porcellus )

lastz C.fam-F.cat (on C.fam) ( canis_familiaris,felis_catus )

lastz C.fam-M.fur (on C.fam) (canis_familiaris,mustela_putorius_furo )

lastz M.mus-S.ara (on M.mus) ( mus_musculus,sorex_araneus)

lastz B.tau-T.tru (on B.tau) (bovis_taurus,tursiops_truncatus)

lastz B.tau-F.cat (on B.tau) (bovis_tarurus,felis_catus)

lastz B.tau-M.fur (on B.tau) (bovis_taurus,mustela_putorius_furo)

lastz B.tau-P.vam (on B.tau) (bovis_taurus,pteropus_vampyrus)

lastz H.sap-P.anu (on H.sap) (homo_sapiens, papio_anubis)

lastz H.sap-C.pyg (on H.sap) (homo_sapiens, chlorocebus pygerythrus)

lastz G.gal-M.dom (on G.gal) (gallus_gallus,monodelphis_domestica)

lastz O.lat-M.mus (on O.lat) (oryzias_latipes, mus_musculus)

Syntenies (all species)

H.sap-M.dom (on H.sap)

H.sap-O.ana (on H.sap)

H.sap-B.tau (on H.sap) 

H.sap-C.fam (on H.sap) 

H.sap-E.cab (on H.sap)

H.sap-M.gal (on H.sap)

H.sap-C.jac (on H.sap)

H.sap-F.cat (on H.sap)

H.sap-G.gor (on H.sap)

H.sap-M.mul (on H.sap)

 H.sap-M.mus (on H.sap)

H.sap-O.cun (on H.sap)

H.sap-P.tro (on H.sap)

H.sap-R.nor (on H.sap)

H.sap-S.scr (on H.sap) 

H.sap-G.gal (on H.sap)

H.sap-A.car (on H.sap)

H.sap-D.rer (on H.sap)

H.sap-G.acu (on H.sap) 

H.sap-O.lat (on H.sap)

H.sap-T.gut (on H.sap) 

H.sap-T.nig (on H.sap)

C.fam-F.cat (on C.fam)

B.tau-F.cat (on B.tau)

H.sap-O.ari (on H.sap)

Release 75

ProteinTrees and homologies (all species)

 

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

  • all-vs-all blastp (ncbi-blast-2.2.28+)
  • Clustering using hcluster_sg
  • Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.113)
  • Phylogenetic reconstruction using TreeBeST
  • Homology inference
  • Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
  • GeneTree stable ID mapping
  • Per family gene dynamics using CAFE (v2.2)

ncRNAtrees and homologies (all species)

 

  • Classification based on Rfam models (v11.0)
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE
  • Homology inference
  • Secondary structure plots

Protein Families (all species)

 

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

  • Getting distances by NCBI BlastP (v.2.2.28+)
  • Clustering by MCL (v.12-135)
  • Multiple Sequence Alignments with MAFFT (v.7.113)
  • Family stable ID mapping

Compara dumps (all species)

 

  • [ ] Data dumps for ProteinTrees
  • [ ] Data dumps for ncRNAtrees
  • [ ] OrthoXML dumps for ProteinTrees
  • [ ] OrthoXML dumps for ncRNAtrees
  • [ ] PhyloXML dumps for ProteinTrees
  • [ ] PhyloXML dumps for ncRNAtrees

API/schema changes (all species)

 

  •  Extend genome_db table (and the corresponding API) with two extra fields (has_karyotype and is_high_coverage)
  •  Annotation of web display information in species_tree_node's instead of species_set_tags

New track - Age of Base (Human)

In release 75 we have added a new track for human, showing the timing of the most recent mutation as determined by inter-species whole genome alignments. You can find the track in the comparative genomics menu under "Conservation regions" (or search for "age of base").

Each base pair in which the human reference genome differs by substitution from one of its inferred ancestral genomes is coloured in either grey (event prior to the primate branch), blue (primate specific), red (human-specific, fixed variant), or yellow (human-specific segregating variant, i.e. SNP). Clicking on a mutation position reveals the sub-tree of species which have inherited the same mutation from their common ancestor. It also reveals a score that represents the age of the mutation in arbitrary units, and determines the intensity of the colouring. The more recent the mutation, the lower the score and the darker the colour.

Note that this is a beta version of the track - if you find it useful, please let us know!

Release 74

15 way-mammal-epo alignments (all species)

with new Sheep assemby and Cat

21 way-amniota-pecan alignments (all species)

with new Sheep assembly

10way teleost fish EPO_LOW_COVERAGE (all species)

with new spotted gar and cave fish assemblies

4 way saurian reptiles EPO (Anole lizard, Chicken, Zebra Finch, Turkey)

In release 74 we have extended the original three-way neognath bird EPO alignment to include anole lizard, and have renamed it 4-way saurian reptile EPO.

7 way saurian reptiles EPO (multiple species)

In release 74 we have created a new low coverage EPO alignment that includes the original neognath birds (chicken, turkey and zebra finch) plus two new birds, flycatcher and duck, and two reptile species, the anole lizard and the Chinese softshell turtle.

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

  • all-vs-all blastp (ncbi-blast-2.2.27+)
  • Clustering using hcluster_sg
  • Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.017)
  • Phylogenetic reconstruction using TreeBeST
  • Homology inference
  • Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
  • GeneTree stable ID mapping
  • Per family gene dynamics using CAFE (v2.2)

ncRNAtrees and homologies (all species)

  • Classification based on Rfam models (v11.0)
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE
  • Homology inference
  • Secondary structure plots

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

  • Getting distances by NCBI BlastP (v.2.2.27+)
  • Clustering by MCL (v.12-135)
  • Multiple Sequence Alignments with MAFFT (v.7.017)
  • Family stable ID mapping

Compara dumps (all species)

  • Data dumps for ProteinTrees
  • Data dumps for ncRNAtrees
  • OrthoXML dumps for ProteinTrees
  • OrthoXML dumps for ncRNAtrees
  • PhyloXML dumps for ProteinTrees
  • PhyloXML dumps for ncRNAtrees
  • EMF dumps for 4 way EPO sauropsids
  • EMF dumps for 7 way sauropsidEPO_LOW_COVERAGE multiple alignments
  • EMF dumps for 15 way mammal EPO multiple alignments
  • EMF dumps for 37 way EPO_LOW_COVERAGE multiple alignments
  • EMF dumps for 21 way amniota-pecan multiple alignments
  • EMF dumps for 10 way teleost fish EPO_LOW_COVERAGE multiple alignments  
  • BED files  for 37 way EPO_LOW_COVERAGE alignments
  • BED files for 10 way teleost fish EPO_LOW_COVERAGE
  • BED files for 7 way sauropsids EPO_LOW_COVERAGE

API/Schema change: new Locus object (all species)

base class for DnaFragRegion, GenomicAlign and Member

API/Schema change: New SpeciesTree API (all species)

  • New API to deal with species trees (+schema change in the species_tree_* tables)

API/Schema change: New API methods (+ schema change) to link gene tree nodes and homologues to the ancestral taxa (all species)

via the new fields in the species_tree_node table

API/Schema change: Inclusion of all the alternative alleles in the gene projections (all species)

between the reference sequence and the alternative sequences

API/Schema change: Changes in the CAFEGeneFamily API (all species)

to work with the new SpeciesTree API

Release 73

Pairwise alignments (all species)

Lastz alignments:

  • Flycatcher vs Human
  • Flycatcher vs Chicken
  • Duck vs Human
  • Duck vs Chicken

Lastz patch alignments:

  • human_ref vs human_patches
  • human haplotype alignments for high coverage
  • remove DELETED or UPDATED pairwise alignment patches from the release databas

ProteinTrees and homologies (all species)

 

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

  • all-vs-all blastp (ncbi-blast-2.2.27+)
  • Clustering using hcluster_sg
  • Multiple sequence alignments using MCoffee (Version_9.03.r1318) or Mafft (mafft-7.017)
  • Phylogenetic reconstruction using TreeBeST
  • Homology inference
  • Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues) (codeml/PAML v4.3)
  • GeneTree stable ID mapping
  • Per family gene dynamics using CAFE (v2.2)

ncRNAtrees and homologies (all species)

  • Classification based on Rfam models (v11.0)
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE (v2.2)
  • Homology inference
  • Secondary structure plots

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

  • Getting distances by NCBI BlastP (v.2.2.27+)
  • Clustering by MCL (v.12-135)
  • Multiple Sequence Alignments with MAFFT (v.7.017)
  • Family stable ID mapping

Compara dumps (all species)

 

  • [ ] EMF dumps for ProteinTrees
  • [ ] EMF dumps for ncRNAtrees
  • [ ] OrthoXML dumps for ProteinTrees
  • [ ] OrthoXML dumps for ncRNAtrees
  • [ ] PhyloXML dumps for ProteinTrees
  • [ ] PhyloXML dumps for ncRNAtrees

API/schema changes (all species)

  • Protein-tree pipeline: switch to NCBI-blast

  • API: drop support for deprecated methods in: MethodLinkSpeciesSet, GeneTreeNode, AlignedMemberAdaptor, SequenceAdaptor, GenomeDB

  • API: remove deprecated objects / adaptors: ProteinTreeAdaptor, NCTreeAdaptor, Subset(Adaptor)

Release 72

patch 71_72_a.sql - schema version update (all species)

Update schema_version in meta table to 72.

lastz_patch pairwise alignments (all species)

  • human_ref vs human_patches
  • human haplotype alignments for high coverage lastz-net alignments

LASTZ pairwise alignments (all species)

Pairwise alignments for the mitochondrion only:

  • Human vs Alpaca
  • Human vs Dolphin
  • Human vs Hedgehog
  • Human vs Hyrax
  • Human vs Lesser hedgehog tenrec
  • Medaka vs Stickleback
  • Medaka vs Platyfish
  • Human vs Pika
  • Human vs Tarsier
  • Human vs Tree Shrew
  • C.intestinalis vs C.savignyi

Tblat-net pairwise alignments (all species)

Pairwise alignments for the mitochondrion only:

  • Human vs C.savignyi
  • Zebrafish vs C.savignyi
  • Chicken vs C.savignyi
  • Human vs Lamprey
  • Zebrafish vs Lamprey
  • Stickleback vs Lamprey
  • Lamprey vs C.intestinalis
  • Human vs Medaka
  • Zebrafish vs Medaka
  • Human vs Platyfish
  • Zebrafish vs Platyfish
  • Xenopus vs Tetraodon
  • Human vs Xenopus
  • Xenopus vs Zebrafish
  • Xenopus vs Coelacanth
  • Mouse vs Xenopus
  • Rat vs Xenopus
  • Chicken vs Xenopus

ncRNAtrees and homologies (all species)

  • Classification based on Rfam models (v11.0)
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE
  • Homology inference
  • Secondary structure plots

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

  • Clustering using hcluster_sg
  • Multiple sequence alignments using MCoffee or Mafft
  • Phylogenetic reconstruction using TreeBeST
  • Homology inference
  • Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues)
  • GeneTree stable ID mapping
  • Per family gene dynamics using CAFE

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

  • Getting distances by NCBI BlastP (v.2.2.27+)
  • Clustering by MCL (v.12-135)
  • Multiple Sequence Alignments with MAFFT (v.7.017)
  • Family stable ID mapping

Gene tree dumps (all species)

  • Data dumps for ProteinTrees
  • Data dumps for ncRNAtrees
  • OrthoXML dumps for ProteinTrees
  • OrthoXML dumps for ncRNAtrees
  • PhyloXML dumps for ProteinTrees
  • PhyloXML dumps for ncRNAtrees

API: New member_production_counts table (all species)

New member_production_counts table for fast web member look-ups

Release 71

LASTZ pairwise alignments (all species)

New alignments with chicken

  • human vs chicken lastz
  • mouse vs chicken lastz
  • chicken vs turkey lastz
  • chicken vs zebrafinch lastz
  • chicken vs lizard lastz

New fish alignments with medaka

  • medaka vs cod lastz
  • medaka vs tilapia lastz
  • medaka vs platyfish lastz

Tblat-net pairwise alignments (all species)

New alignments with chicken

  • human vs chicken
  • chicken vs xenopus
  • chicken vs zebrafish
  • chicken vs ciona savignyi

Multiple alignments (all species)

Updated multiple alignements to incorporate the new chicken assembly

  • 20way amniote pecan alignments
  • 3way bird EPO alignments

Updated 6way primate EPO alignment

New 8way fish EPO-low coverage alignment

Syntenies (all species)

Updated syntenies with chicken

  • human vs chicken synteny
  • mouse vs chicken synteny
  • chicken vs turkey synteny
  • chicken vs lizard synteny

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

  • Clustering using hcluster_sg
  • Multiple sequence alignments using MCoffee or Mafft
  • Phylogenetic reconstruction using TreeBeST
  • Homology inference
  • Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues)
  • GeneTree stable ID mapping
  • Per family gene dynamics using CAFE

ncRNAtrees and homologies (all species)

  • Classification based on Rfam models (v11.0)
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE
  • Homology inference
  • Secondary structure plots

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

  • Finding pairwise distances using NCBI BLAST (v2.2.27+)
  • Clustering by MCL (v.12.135)
  • Multiple Sequence Alignments with MAFFT (v.7.017)
  • Family stable ID mapping

Gene tree dumps (all species)

  • Data dumps for ProteinTrees
  • Data dumps for ncRNAtrees
  • OrthoXML dumps for ProteinTrees
  • OrthoXML dumps for ncRNAtrees
  • PhyloXML dumps for ProteinTrees
  • PhyloXML dumps for ncRNAtrees

Multiple-alignment dumps (all species)

  • EMF dumps for 20 way amniotes PECAN multiple aligments 
  • BED files for 20 way amniotes GERP constrained elements
  • EMF dumps for 3 way birds EPO multiple aligments
  • EMF dumps for 6way primates EPO multiple alignments
  • EMF dumps for 8 way fish EPO low-coverage alignments
  • BED files for 8 way fish GERP constrained elements

API: Member deprecated (all species)

New modules for genes and gene products

API: drop support for Member-Attribute (all species)

This was deprecated 3 releases ago

Release 70

Schema: drop lr_index_offset table (all species)

The "lr_index_offset" table is dropped. All the left/right_index are now in local spaces (i.e. they start at 1 for each tree). As a consequence, left/right_index are not unique any more, but the pairs (root_id, left/right_index) are

Schema + API: support of alternative gene sequences (all species)

The two tables sequence_cds and sequence_exon_bounded are dropped and replaced with a more generic other_member_sequence. This will allow to store more related sequences

DBSQL/SequenceAdaptor is changed to reflect the change

API: more generic AlignedMemberSet (all species)

AlignedMemberSet can now deal with "other" sequences: use any sequence stored in the other_member_sequence table. This is an extension of the module, the default behaviour is still identical.

Schema: new canonical_member_id column, subset(_member) not populated any more (all species)

To ease the schema change of the member table, we have introduced a new column "canonical_member_id" in the member table. The tables subset and subset_member are not needed any more and are now empty. The modules Subset and DBSQL/SubsetAdaptor are now deprecated

Schema: new ref_root_id column in gene_tree_root (all species)

Alternative trees (trees built on the same list of genes, but belonging to different clustersets) are now linked via a "ref_root_id" column in gene_tree_root instead of tags.

The fetch_all_linked_trees() method still works the same way as in e69

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

  • Clustering using hcluster_sg
  • Multiple sequence alignments using MCoffee or Mafft
  • Phylogenetic reconstruction using TreeBeST
  • Homology inference
  • Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues)
  • GeneTree stable ID mapping
  • Per family gene dynamics using CAFE

ncRNAtrees and homologies (all species)

  • Classification based on Rfam models
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE
  • Homology inference
  • Secondary structure plots

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

  • Clustering by MCL
  • Multiple Sequence Alignments with MAFFT
  • Family stable ID mapping

Gene tree dumps (all species)

  • Data dumps for ProteinTrees
  • Data dumps for ncRNAtrees
  • OrthoXML dumps for ProteinTrees
  • OrthoXML dumps for ncRNAtrees
  • PhyloXML dumps for ProteinTrees
  • PhyloXML dumps for ncRNAtrees

Multiple alignments (all species)

Updated multiple alignments to incorporate new rat and cat assemblies

  • 13way-mammal-epo alignments (with new rat assembly)
  • 36way-mammal low-coverage-epo alignments (with new rat and cat assemblies)
  • 20way-amniota-pecan alignments (with new rat and cat assembly)

LASTZ pairwise alignments (Cat, Human, Mouse, Rat)

New alignments with rat and cat

  • human vs rat lastz 
  • mouse vs rat lastz
  • human vs cat lastz

Tblat-net pairwise alignments (C.intestinalis, Zebrafish, Rat, Xenopus)

  • rat vs danio tblat-net
  • rat vs xenopus tblat-net
  • rat vs ciona int tblat-net

Multiple-alignment dumps (all species)

  • EMF dumps for 13 way EPO multiple aligments 
  • EMF dumps for 36 way low-coverage alignments
  • BED files for 36 way low-coverage alignments
  • BED files for 20 way GERP constrained elements
  • EMF dumps for 20 way PECAN multiple aligments

Syntenies (Cat, Human, Mouse, Rat)

Updated syntenies with dog and mouse

  • human cat synteny
  • human rat synteny
  • mouse rat synteny

Schema: member_id moved from gene_tree_member to gene_tree_node (all species)

This will ease the storage of multiple alignments

Schema: drop column tree_support in gene_tree_node_attr (all species)

The tag "tree_support" is now internally stored as a tag in gene_tree_node_tag. From the user point of view, get_tagvalue() will now return an array of method names, instead of the single string that was the concatenation of the names

Schema + API: support of several multiple alignments per tree (all species)

Two new tables are created: gene_align and gene_align_member. gene_tree_member is dropped. There is a new column "gene_align_id" in gene_tree_root to link a tree to an alignment

Web: option to visualize the gene-trees with alternative phylogenetic models (all species)

As the two gene-tree pipelines now store the intermediate gene-trees before they are merged into the final trees, the data will now be available on the web.

This includes PhyML and NJ models for both pipelines, and secondary-structure trees for ncRNAs.

Web: Options to display the super-trees (all species)

Very large gene families are split into sub-families to make the tree reconstruction feasible. The links between such sub-families are now displayed and called "super-trees"

Schema: drop protein_tree_member_score table (all species)

The table protein_tree_member_score will be dropped. Alignment scores are now stored in gene_align

Schema: drop protein_tree_hmmprofile table (all species)

The protein_tree_hmmprofile table is dropped. Protein-tree HMM profiles are now stored in hmm_profile

Release 69

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

  • Clustering using hcluster_sg
  • Multiple sequence alignments using MCoffee or Mafft
  • Phylogenetic reconstruction using TreeBeST
  • Homology inference
  • Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues)
  • GeneTree stable ID mapping
  • Per family gene dynamics using CAFE

ncRNAtrees and homologies (all species)

  • Classification based on Rfam models
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE
  • Homology inference

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

  • Clustering by MCL
  • Multiple Sequence Alignments with MAFFT
  • Family stable ID mapping

Gene tree dumps (all species)

  • Data dumps for ProteinTrees
  • Data dumps for ncRNAtrees
  • OrthoXML dumps for ProteinTrees
  • OrthoXML dumps for ncRNAtrees
  • PhyloXML dumps for ProteinTrees
  • PhyloXML dumps for ncRNAtrees

Multiple alignments (all species)

Updated multiple alignments to incorporate rabbit  and ferret

  • 13way-mammal-epo alignments (with addition of rabbit)
  • 36way-mammal low-coverage-epo alignments (with addition of ferret)

Multiple-alignment dumps (all species)

  • EMF dumps for 13 way EPO multiple aligments 
  • EMF dumps for 36 way low-coverage alignments
  • BED files for 36 way low-coverage alignments

LASTZ pairwise alignments (Human, Mouse, Ferret)

New alignments with ferret

  • human vs ferret lastz 
  • mouse vs ferret lastz

TBlat pairwise alignments (Zebrafish, Human, Medaka, Platyfish)

New alignments with platyfish

  • platyfish vs danio tblat
  • platyfish vs medaka tblat
  • platyfish vs human tblat

Web: new static page for EPO pipeline info (all species)

A new static page explaining the steps involved in running the EPO pipeline.

Wil be linked to the compara/analyses.html page.

API: removed Graph::CGObject (all species)

The base classes in the Graph namespace have been refactored.

Graph::CGObject is not needed any more and thus removed. If the functionalities are still needed, one can use Bio::EnsEMBL::Compara::Taggable and Bio::EnsEMBL::Storable instead

API/schema changes (all species)

* Schema changes in the CAFE tables
* Addition of "fetch_lca_tree", "fetch_all_with_lca" and "fetch_all_lca_trees" methods to the CAFEGeneFamily adaptor
* Addition of "lca_id", "lca_reroot", "lca_taxon_id" and "get_leaf_with_genome_db_id" methods in the CAFEGeneFamily API
* Super-trees for ncRNA families

Release 68

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

  • Clustering using hcluster_sg
  • Multiple sequence alignments using MCoffee or Mafft
  • Phylogenetic reconstruction using TreeBeST
  • Homology inference
  • Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues)
  • GeneTree stable ID mapping
  • Per family gene dynamics using CAFE

ncRNAtrees and homologies (all species)

  • Classification based on Rfam models
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE
  • Homology inference

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

  • Clustering by MCL
  • Multiple Sequence Alignments with MAFFT
  • Family stable ID mapping

Gene tree dumps (all species)

  • Data dumps for ProteinTrees
  • Data dumps for ncRNAtrees
  • OrthoXML dumps for ProteinTrees
  • OrthoXML dumps for ncRNAtrees
  • PhyloXML dumps for ProteinTrees
  • PhyloXML dumps for ncRNAtrees

Multiple-alignment dumps (all species)

  • EMF dumps for 19 way PECAN multiple aligments
  • BED files for 19 way GERP constrained elements
  • EMF dumps for 12 way EPO multiple aligments 
  • EMF dumps for 6way EPO multiple alignments
  • EMF dumps for 35 way low-coverage alignments
  • BED files for 35 way low-coverage alignments

Syntenies (all species)

Updated syntenies with dog and mouse

  • human dog synteny
  • dog mouse synteny
  • dog horse synteny
  • human mouse synteny
  • mouse rat synteny
  • mouse pig synteny
  • mouse cow synteny
  • mouse chicken synteny
  • mouse platypus synteny

Multiple alignments (all species)

Updated multiple alignments to incorporate new dog and mouse

  • 12way-mammal-epo alignments
  • 19way-amniota-pecan alignments
  • 35way-mammal low-coverage-epo alignments

LASTZ pairwise alignments (all species)

Updated alignments with mouse and dog, new alignments with turtle

  • human vs mouse lastz 
  • mouse vs rat lastz
  • mouse vs dog lastz
  • mouse vs cow lastz
  • mouse vs pig lastz
  • mouse vs platypus
  • mouse vs chicken
  • human vs dog
  • dog vs horse
  • dog vs panda
  • human vs Chinese softshell turtle
  • lizard vs Chinese softshell turtle

 

TBlat pairwise alignments (all species)

 

Updated alignments with mouse

  • mouse vs danio tblat
  • mouse vs xenopus
  • mouse vs ciona int

lastz_patch pairwise alignments (all species)

 

  • human_ref vs human_patches
  • human haplotype alignments for high coverage lastz-net alignments

Schema: HMM profiles (all species)

Renaming nc_profile table to hmm_profile and expanding model_id and model_name

API: CAFE (all species)

Renaming of methods: tree_is_significant and node_is_significant become is_tree_significant and is_node_significant respectively

API: Subset object (all species)

Removed the member list from Subset and the related methods. Subset is now a list of member_id

API: GeneTree (all species)

Make some GeneTreeNode/NestedSet methods directly available from GeneTree (without going to the root node first)

Schema: primary keys (all species)

Some 'UNIQUE NOT NULL' keys have been promoted to PRIMARY KEYs:

  • method_link_species_set_tag
  • subset_member
  • family_member
  • gene_tree_member
  • homology_member

Schema: foreign key in gene_tree_node (all species)

gene_tree_node has a new foreign key: parent_id must be a valid node_id

API/Schema: protein sequences (all species)

The two tables sequence_cds and sequence_exon_bounded now use the member_id as primary key. All the methods to access the sequences are now in SequenceAdaptor (and removed from GeneTreeNodeAdaptor and MemberAdaptor)

API: Member-Attribute pairs (all species)

Families and Homologies should consist of AlignedMembers, not of Member-Attribute pairs

API: New module: AlignedMemberSet (all species)

Having a base class "AlignedMemberSet" for Family, Homology, GeneTree with methods like getAllMembers, getSimpleAlign, etc

Reorganize the gene tree adaptors (all species)

GeneTreeNodeAdaptor will focus on GeneTreeNode objects.All the methods that are intended to return trees will be deprecated

ProteinTreeAdaptor and NCTreeAdaptor will be deprecated to promote GeneTreeAdaptor

API: subroot (all species)

The methods to access subroot are now deprecated as this concept is not used any more in our data

API: BaseAdaptor (all species)

The Compara BaseAdaptor is now using the same mechanism for left joins as the Core BaseAdaptor

Genes of protein tree pipeline (all species)

Include the missing ig_* and tr_* genes (they are currently not loaded for all the species)

API: DNA Pairwise and Multiple Alignment pipelines (all species)

Remove all the modules in Bio::EnsEMBL::Compara::Production::GenomicAlignBlock. These have been replaced with new modules in Bio::EnsEMBL::Compara::RunnableDB. Remove the pipeline scripts from ensembl-compara/scripts/pipeline. New init_pipeline modules can be found in Bio::EnsEMBL::Compara::PipeConfig. Documentation on how to run the new pipelines is in ensembl-compara/docs/

Schema: clusterset_id (all species)

The "clusterset_id" field in gene_tree_root is now a VARCHAR(20). This field tells the method that has been used to generate the tree. The default method is called 'default'

Release 67

LASTZ pairwise alignments (all species)

  • human vs ground squirrel lastz
  • human vs pig lastz
  • human vs dog lastz
  • dog vs horse lastz
  • dog vs panda lastz
  • pig vs cow lastz

TBlat pairwise alignments (all species)

  • Nile Tilapia vs human tblat 
  • Nile Tilapia vs mouse tblat
  • Nile Tilapia vs zebrafish tblat

lastz_patch pairwise alignments (all species)

  • human_ref vs human_patches
  • human haplotype alignments for high coverage lastz-net alignments
  • remove low coverage pairwise alignment patches from the release database.

Multiple alignments (all species)

  • 12way-mammal-epo alignments to incorporate new dog and pig
  • 19way-amniota-pecan alignments to incorporate new dog and pig
  • 35way-mammal low-coverage-epo alignments to incorporate new dog and pig

Syntenies (all species)

  • human dog synteny
  • human pig synteny
  • dog mouse synteny
  • dog horse synteny
  • pig cow synteny

Protein trees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

  • Clustering using hcluster_sg
  • Multiple sequence alignments using MCoffee or Mafft
  • Phylogenetic reconstruction using TreeBeST
  • Homology inference
  • Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues)
  • GeneTree stable ID mapping
  • Per family gene dynamics using CAFE

ncRNA trees and homologies (all species)

  • Classification based on Rfam models
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE
  • Homology inference

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

  • Clustering by MCL
  • Multiple Sequence Alignments with MAFFT
  • Family stable ID mapping

API: New module: SpeciesSet (all species)

Creation of a SpeciesSet object instead of a raw array of GenomeDBs, and make scripts/API use it

Compara dumps (all species)

  • Data dumps for ProteinTrees
  • Data dumps for ncRNAtrees
  • OrthoXML dumps for ProteinTrees
  • OrthoXML dumps for ncRNAtrees
  • PhyloXML dumps for ProteinTrees
  • PhyloXML dumps for ncRNAtrees
  • EMF dumps for 19 way PECAN multiple aligments
  • BED files for 19 way GERP constrained elements
  • EMF dumps for 12 way EPO multiple aligments
  • EMF dumps for 6way EPO multiple alignments
  • EMF dumps for 35 way low-coverage alignments
  • BED files for 35 way low-coverage alignments

Schema: constrained_element (all species)

Change the type of the "p_value" field from the "constrained_element" table.

Schema: constrained_element (all species)

Drop the "taxonomic_level" field from the "constrained_element" table.

Pipeline documentation (all species)

 Web documentation for ncRNA pipeline

Schema: split gene_tree_root:tree_type to tree_type and member_type (all species)

The new member_type field will be 'protein' or 'ncrna'

The new version of tree_type will be 'clusterset', 'supertree' or 'tree'

API: ConstrainedElement::get_SimpleAlign (all species)

modify the get_SimpleAlign method to get the mlss_id for the alignments (from which the constrained elements  were generated) fom the method_link_species_set_tag table. No need to explicitly use a method_link_species_set object as as a parameter for the get_SimpleAlign method.

API: New MethodLinkSpeciesSetTags (all species)

The tag 'msa_mlss_id' is now associated to all the constrained element and conservation score MLSSs, to link to the actual multiple alignment MLSS that has generated them.

New Compara DBSQL::BaseAdaptor (all species)

Creation of a new Compara-specific BaseAdaptor that extends the Core one, by offering SQL joins and "final clauses" to the generic_fetch

NestedSetAdaptor, MemberAdaptor and BaseRelationAdaptor are actually now using it

Canonical members for all gene members (all species)

We now store the canonical peptide / transcript member of every gene member in the same way.

The preferred methods to access them are now Member::get_canonical_Member and MemberAdaptor::fetch_canonical_member_for_gene_member_id (other ones are deprecated)

Release 66

Pairwise alignments (all species)

 

  • [ ] human vs chimpanzee lastz
  • [ ] human vs gorilla lastz
  • [ ] human vs orang lastz
  • [ ] human vs gibbon
  • [ ] human vs macaque
  • [ ] human vs marmoset lastz 
  • [ ] human vs human lastz import from UCSC
  • [ ] ciona intestinalis vs ciona savignyi lastz

Add new lastz_patch alignments

  • [ ] human_ref vs human_patches

Add new tblat-alignments

  • [ ] ciona intestinalis vs human tblat
  • [ ] ciona intestinalis vs mouse tblat
  • [ ] ciona intestinalis vs zebrafish tblat
  • [ ] ciona intestinalis vs lamprey tblat
  • [ ] human vs Coelacanth tblat
  • [ ] danio vs Coelacanth tblat
  • [ ] stickleback vs Coelacanth tblat
  • [ ] xenopus vs Coelacanth tblat

 

Syntenies (all species)

 

  • [ ] human chimpanzee synteny
  • [ ] human gorilla synteny
  • [ ] human orang synteny
  • [ ] human macaque synteny
  • [ ] human marmoset synteny

 

ProteinTrees and homologies (all species)

 

  • Clustering using hcluster_sg
  • Multiple sequence alignments using MCoffee or Mafft
  • Phylogenetic reconstruction using TreeBeST
  • Homology inference
  • Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues)
  • GeneTree stable ID mapping
  • Per family gene dynamics using CAFE

 

ncRNAtrees and homologies (all species)

 

  • Classification based on Rfam models
  • Multiple sequence alignments with Infernal
  • Phylogenetic reconstruction using RAxML
  • Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Per family gene dynamics using CAFE
  • Homology inference

 

Protein Families (all species)

 

  • Clustering by MCL
  • Multiple Sequence Alignments with MAFFT
  • Family stable ID mapping

Compara dumps (all species)

 

  • [ ] Data dumps for ProteinTrees
  • [ ] Data dumps for ncRNAtrees
  • [ ] OrthoXML dumps for ProteinTrees
  • [ ] OrthoXML dumps for ncRNAtrees
  • [ ] PhyloXML dumps for ProteinTrees
  • [ ] PhyloXML dumps for ncRNAtrees
  • [ ] Ancestral sequences for primates

 

API / Schema Change (all species)

[ ] Schema: adding a new "header" table for a tree root, moving general properties of the tree into that table (including protein_tree_stable_id)

API / Schema Change (all species)

[ ] Schema: keeping the super-tree structures in protein_tree_node (thus removing all the super_protein_tree_* tables)

API / Schema Change (all species)

[ ] Schema: merging the nc_tree_* and protein_tree_* tables

API / Schema Change (all species)

[ ] Schema: removal of the genomic_align_group table

API / Schema Change (all species)

[ ] Schema: addition of node_id to the genomic_align table

API / Schema Change (all species)

[ ] Schema: move level_id from the genomic_align table to the genomic_align_block table

API / Schema Change (all species)

[ ] Schema: addition of visible to the genomic_align table

API / Schema Change (all species)

[ ] Schema: allow the taxon_id in the genome_db table to be NULL

API / Schema Change (all species)

[ ] Schema: New tables for CAFE Analysis (CAFE_tree_node, CAFE_tree_attr, CAFE_tree, CAFE_analysis)

API / Schema Change (all species)

[ ] API: Create method_link_species_set_tag and store there the reference (in pairwise alignments) and maybe pipeline trees.

API / Schema Change (all species)

[ ] API: New adaptor for CAFE trees and new CAFE_tree_node module 

API / Schema Change (all species)

[ ] API: New DNA pipelines: PairAligner_conf.pm, MercatorPecan_conf.pm, EpoLowCoverage_conf.pm replace the loadPairAlignerSystem.pl, loadChainNetSystem.pl, loadMultipleAlignerSystem.pl and loadLowCoverageAlignerSystem.pl scripts. These scripts are not longer supported.

Web (all species)

[ ] Information about lost taxa

API / Schema Change (all species)

[ ] Schema: New method_link_species_set_tag table and API support for it

API / Schema Change (all species)

Several deprecated methods in the API have been removed and code calling them has been modified accordingly.

Release 65

Pairwise alignments (all species)

 

  • [ ] human vs chimpanzee LastZ alignments
  • [ ] human vs bushbaby LastZ alignments
  • [ ] cod vs  Danio rerio LastZ alignments
  • [ ] cod vs stickleback LastZ alignments
  • [ ] human vs cod Translated Blat alignments

Multiple alignments (all species)

  • [ ] 6way-primate epo alignments to incorporate new chimpanzee 
  • [ ] 12way-mammal-epo alignments to incorporate new chimpanzee
  • [ ] 19way-amniota-pecan alignments to incorporate new chimpanzee
  • [ ] 35way-mammal low-coverage-epo alignments (new chimpanzee and bushbaby)
  • [ ] 6way-fish-epo alignments (addition of cod) (cancelled)

Syntenies (all species)

[ ] human chimpanzee synteny

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

  • Clustering using hcluster_sg
  • Multiple sequence alignments using MCoffee or Mafft
  • Phylogenetic reconstruction using TreeBeST
  • Homology inference
  • Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues)
  • GeneTree stable ID mapping

ncRNAtrees and homologies (all species)

  • Classification based on Rfam model
  • Multiple sequence alignments with infernal
  • Phylogenetic reconstruction using RaxML
  • Phylogenetic reconstruction using FastTree2 and RaxML-light for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Homology inference

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms (including human non-reference haplotypes) and newest Uniprot Metazoa.

  • Clustering by MCL
  • Multiple Sequence Alignments with MAFFT
  • Family stable ID mapping

Compara dumps (all species)

  • [ ] EMF dumps for 19 way PECAN multiple aligments
  • [ ] BED files for 19 way GERP constrained elements
  • [ ] EMF dumps for 12 way EPO multiple aligments
  • [ ] EMF dumps for 6way EPO multiple alignments
  • [ ] EMF dumps for 35 way low-coverage alignments
  • [ ] BED files for 35 way low-coverage alignments
  • [ ] EMF dumps for 6way EPO fish multiple alignments
  • [ ] BED dumps for 6way EPO fish multiple alignments
  • [ ] Data dumps for ProteinTrees
  • [ ] Data dumps for ncRNAtrees
  • [ ] OrthoXML dumps for ProteinTrees
  • [ ] OrthoXML dumps for ncRNAtrees
  • [ ] maybe PhyloXML dumps for ProteinTrees ?
  • [ ] maybe PhyloXML dumps for ncRNAtrees ?
  • [ ] Ancestral sequences for primates

API/schema changes (all species)

[ ] changes in the schema: adding tables for tree's properties and node's properties to make tag storage and extraction more efficient

API/schema changes (all species)

[ ] left_index and right_index now start from 1 for every tree. This should avoid the deadlocks on lr_index_offset

API/schema changes (all species)

[ ] bugfix: Storing several lost_taxon_id for each node

API/schema changes (all species)

[ ] rationalise the use of tags in protein_tree_tag:

API/schema changes (all species)

[ ] DnaFragRegion and SyntenyRegion should not be inheriting from NestedSet

Web (all species)

[ ] Statistics of protein trees and homologies on the web (either static files, or dynamic page + table)

Web (all species)

[ ] OrthoXML export function (for any gene tree)

Web (all species)

[ ] Tree support information (which ones of the 5 initial trees are supporting the current node) (+corresponding field in the schema)

Web (all species)

[ ] Information about lost taxa

Release 64

Pairwise alignments (all species)

  • human vs cow lastz alignments
  • human vs tasmanian devil lastz alignments
  • human haplotype alignments for high coverage blastz-net alignments
  • pig vs cow lastz alignments
  • opossum vs tasmanian devil lastz alignments
  • human vs lamprey tblat alignments
  • lamprey vs Ciona intestinalis tblat alignments
  • lamprey vs Danio rerio tblat alignments
  • lamprey vs Gasterosteus aculeatus tblat alignments

Multiple alignments (all species)

  • 12way-mammal EPO alignments to incorporate new cow
  • 19way-amniota Pecan alignments to incorporate new cow
  • 35way-mammal low-coverage-EPO alignments (new cow)

Syntenies (all species)

  • human-cow synteny
  • human-gorilla synteny
  • pig-cow synteny

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

  • Clustering using hcluster_sg
  • Multiple sequence alignments using MCoffee without the exon-disaligner module, or Mafft
  • Phylogenetic reconstruction using TreeBeST
  • Homology inference including the recent 'possible_ortholog','putative gene split' and 'contiguous gene split' exceptions
  • Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues)
  • GeneTree stable ID mapping

 

ncRNAtrees and homologies (all species)

  • Classification based on Rfam model
  • Multiple sequence alignments with infernal
  • Phylogenetic reconstruction using RaxML
  • Phylogenetic reconstruction using FastTree2 and RaxML-light for very big families
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Homology inference

Protein Families (all species)

Updated MCL families including all Ensembl transcript isoforms and newest Uniprot Metazoa.

  • Clustering by MCL
  • Multiple Sequence Alignments with MAFFT
  • Family stable ID mapping

API/schema changes (all species)

  • changes in the class hierarchy (NestedSet-Member-AlignedMember) to achieve better flexibility and fight code redundancy
  • changes in the API: introduction of GeneTreeNode and GeneTreeMember classes to better represent our trees

 

Release 63

pairwise alignments (all species)


  • human vs marmoset lastz
  • human vs microbat lastz

multiple alignments (all species)

  • 6way-primate-epo alignments to incorporate new marmoset
  • 12way-mammal-epo alignments to incorporate new marmoset
  • 19way-amniota-pecan alignments to incorporate new marmoset
  • 35way-mammal low-coverage-epo alignments ( new marmoset and microbat )
  • 5way-fish (new mappings with HMM derived anchors)

 

syntenies (all species)

  • human marmoset synteny

 

ProteinTrees and homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies

  • Clustering using hcluster_sg
  • Multiple sequence alignments using MCoffee, without the exon-disaligner module (AKA decaf)
  • Phylogenetic reconstruction using TreeBeST
  • Homology inference including the recent 'possible_ortholog','putative gene split' and 'contiguous gene split' exceptions
  • Pairwise gene-based dN/dS scores for high coverage species pairs only
  • GeneTree stable ID mapping

 

ncRNAtrees and homologies (all species)

  • Classification based on RFAM model
  • Multiple sequence alignments with infernal
  • Phylogenetic reconstruction using RaxML
  • Additional multiple sequence alignments with Prank (w/ genomic flanks)
  • Additional phylogenetic reconstruction using PhyML and NJ
  • Phylogenetic tree merging using TreeBeST
  • Homology inference

 

families (all species)

  • Clustering by MCL
  • Multiple Sequence Alignments with MAFFT
  • Family stable ID mapping

 

data dumps (all species)

  • EMF dumps for 19 way PECAN multiple aligments
  • BED files for 19 way GERP constrained elements
  • EMF dumps for 12 way EPO multiple aligments
  • EMF dumps for 35 way low-coverage alignments
  • BED files for 35 way low-coverage alignments
  • EMF dumps for 6 way EPO primate multiple aligments
  • BED files for 5 way fish EPO alignments
  • EMF dumps for 5 way fish EPO alignments
  • Data dumps for ProteinTrees
  • Data dumps for ncRNAtrees

 

Release 62

Families (all species)

Updated MCL families including all Ensembl transcript isoforms and newest Uniprot Metazoa. * Clustering by MCL * Multiple Sequence Alignments with MAFFT * Family stable ID mapping

Gene Homologies (all species)

GeneTrees (protein-coding) with new/updated genebuilds and assemblies * Clustering using hcluster_sg * Multiple sequence alignments using MCoffee * Phylogenetic reconstruction using TreeBeST * Homology inference including the recent 'possible_ortholog', 'putative gene split' and 'contiguous gene split' exceptions * Pairwise gene-based dN/dS scores for high coverage species pairs only * GeneTree stable ID mapping

GeneTrees (ncRNA) with new/updated genebuilds and assemblies (all species)

* Classification based on RFAM model * Multiple sequence alignments with infernal * Phylogenetic reconstruction using RaxML * Additional multiple sequence alignments with Prank (w/ genomic flanks) * Additional phylogenetic reconstruction using PhyML and NJ * Phylogenetic tree merging using TreeBeST * Homology inference

Pairwise Alignments (all species)

* Non-reference alignments for human vs high coverage blastz-net * human vs gibbon lastz. * human vs marmoset lastz * human vs rabbit lastz * xenopus vs mouse tblat-net * xenopus vs chicken tblat-net * xenopus vs tetraodon tblat-net * xenopus vs human tblat-net * xenopus vs danio tblat-net

Multiple alignments (all species)

* update 6way-primate-epo alignments to incorporate new marmoset seq_region names * update 12way-mammal-epo alignments to incorporate new marmoset seq_region names * update 19way-amniota-pecan alignments to incorporate new marmoset seq_region names * 35way-mammal low-coverage-epo alignments (addition of gibbon and new marmoset seq_region names)

schema changes (all species)

* meta.meta_value has been extended to TEXT (previously it was VARCHAR) and the corresponding indexes have been fixed. * analysis.module has been extended to VARCHAR(255) - previously it was VARCHAR(80) * mapping_session.prefix column has been added to allow EnsEmblGenomes to track their different types of stable_ids

Release 61

Families (all species)

Updated MCL families including all Ensembl transcript isoforms and newest Uniprot Metazoa.
    * Clustering by MCL
    * Multiple Sequence Alignments with MAFFT
    * Family stable ID mapping

Gene Homologies (all species)

GeneTrees with new/updated genebuilds and assemblies
    * Updated build of ncRNA trees
    * Clustering using hcluster_sg   
    * Multiple Sequence Alignments using consistency-based MCoffee meta-aligner
    * Homology inference including the recent 'possible_ortholog' type and 'putative gene split' and
      'contiguous gene split' exceptions
    * Pairwise gene-based dN/dS calculations for high coverage species pairs only
    * GeneTree stable ID mapping

Pairwise Alignments (all species)

Human - Lizard tBlat - net

Pairwise Alignments (all species)

Human - Turkey tBlat net

Pairwise Alignments (all species)

Turkey - Chicken Lastz

Pairwise Alignments (all species)

Lizard - Chicken Lastz

Pairwise Alignments (all species)

Dog - Horse Lastz

Pairwise Alignments (all species)

**Removing chicken - zebrafinch tBlat

Multiple alignments (all species)

Chicken - Turkey -Zebrafinch EPO multiple alignment

19-way pecan multiple alignment (all species)

Added turkey + Rabbit + Lizard genomes

Release 60

Families (all species)

Updated MCL families including all Ensembl transcript isoforms and newest Uniprot Metazoa

Clustering by MCL

Multiple Sequence Alignments with MAFFT

Family stable ID mapping

Gene Homologies (all species)

GeneTrees with new/updated genebuilds and assemblies

Updated build of ncRNA trees

Clustering using hcluster_sg

Multiple Sequence Alignments using consistency-based MCoffee meta-aligner (mafftgins+muscle+kalign+probcons) and exon-skipping aware "skipper" algorithm

Homology inference including the recent 'possible_ortholog' type and 'putative gene split' and 'contiguous gene split' exceptions

Pairwise gene-based dN/dS calculations for high coverage species pairs only

GeneTree stable ID mapping

Pairwise Alignments (all species)

Lastz-net alignments

  • H.sap-A.mel
  • H.sap-O.cun
  • C.fam-A.mel

Blat-alignments

  • H.sap-D.rer
  • M.mus-D.rer
  • R.nor-D.rer
  • G.gal-D.rer
  • T.rub-D.rer
  • D.rer-X.tro
  • C.int-D.rer
  • C.sav-D.rer
  • G.acu-D.rer
  • O.lat-D.rer
  • D.rer-T.nig

Non-reference alignments for human vs high coverage blastz-net alignments

  • H.sap-P.tro
  • H.sap-G.gor
  • H.sap-P.pyg
  • H.sap-M.mul
  • H.sap-M.mus
  • H.sap-R.nor
  • H.sap-C.fam
  • H.sap-B.tau
  • H.sap-S.scr
  • H.sap-E.cab
  • H.sap-O.ana
  • H.sap-M.dom
  • H.sap-G.gal

Multiple alignments (all species)

34 way epo low coverage

12 way epo eutherian mammals

5 way epo fish

Synteny (all species)

H.sap-C.jac

H.sap-O.cun

Release 59

Pairwise alignments (all species)

Add new haplotype alignments for human vs high coverage blastz-net alignments

    * H.sap-P.tro
    * H.sap-G.gor
    * H.sap-P.pyg
    * H.sap-M.mul
    * H.sap-M.mus
    * H.sap-R.nor
    * H.sap-C.fam
    * H.sap-B.tau
    * H.sap-S.scr
    * H.sap-E.cab
    * H.sap-O.ana
    * H.sap-M.dom
    * H.sap-G.gal

New blastz-net due to changes in marmoset assembly

    * H.sap-C.jac

Multiple alignments (all species)

    * 33 way epo low coverage
    * 6 way epo primates
    * 11 way epo eutherian mammals
    * Add MT alignments for fish, 11 and 33 way EPO alignments
    * Map changes to marmoset assembly in 16-way mecator/pecan placental mammals

Compara dumps (all species)

    * EMF dumps for 6 way  EPO multiple alignments
    * EMF dumps for 11 way EPO multiple alignments
    * EMF dumps for 16 way mecator/pecan multiple alignments
    * EMF dumps for 33 way EPO multiple aligments
    * BED files for 33 way GERP constrained elements
    * EMF dumps for GeneTrees

 

Families (all species)

Updated MCL families including all Ensembl transcript isoforms and newest Uniprot Metazoa.

    * Clustering by MCL
    * Multiple Sequence Alignments with MAFFT
    * Family stable ID mapping

 

Gene Homlogies (all species)

GeneTrees with new/updated genebuilds and assemblies

    * Updated build of ncRNA trees
    * Clustering using hcluster_sg
    * Multiple Sequence Alignments using consistency-based MCoffee meta-aligner
(mafftgins+muscle+kalign+probcons) and exon-skipping aware "skipper" algorithm
    * Homology inference including the recent 'possible_ortholog' type and 'putative gene split' and
'contiguous gene split' exceptions
    * Pairwise gene-based dN/dS calculations for high coverage species pairs only
    * GeneTree stable ID mapping

 

Schema changes (all species)

* Addition of extra column in dnafrag table to hold information on whether the fragment is reference or non-reference.

 

Future Plans

Read about our future plans on our blog!