Ensembl Regulation News
Microarray Probe Mapping Update (multiple species)
Update probe mappings for all species with a funcgen db inclusing mouse strains
New probe mapping data for five new rodents (Guinea Pig, Golden Hamster, Upper Galilee mountains blind mole rat, Chinese hamster CHOK1GS, Chinese hamster CriGri)
Probe mappings for five new rodents are available in release 90.
Schema patches (all species)
patch_89_90_a.sql - Update schem_version in meta table to 90
patch_89_90_b.sql - Track the sequence type (genomic or a transcript) and id (seq region name or stable id) of the sequence on which a probe feature was found
patch_89_90_c.sql - Add stable id index to the probe_transcript table to speed up generation of the transcript pages on the website
patch_89_90_d.sql - Add stable id index to the probe_set_transcript table
Microarray Probe Mapping Update (all species)
Update microarray probe mappings for all arrays of all species
Correction of VISTA Enhancers (Human)
The VISTA Enhancers for human have been incorrectly mapped and will be updated.
Map array probes onto 15 mouse strains (all species)
Map array probes onto the below mouse strains:
Database schema changes (all species)
patch_88_89_a - Schema change
patch_88_89_b - Create table probe_seq
patch_88_89_c - Create table probe_feature_transcript
patch_88_89_d - Create table probe_transcript
patch_88_89_e - Create table probe_set_transcript
patch_88_89_f - Remove probe features from object_xref and xref table
patch_88_89_g - Remove probe mappings from the xref tables
patch_88_89_h - Remove probe set mappings from the xref tables.
patch_88_89_i - Add link columns to array table
patch_88_89_j - Added array_chip_id column to probe_set table
patch_88_89_k - Added probe_seq_id column to probe table
Updated VISTA enhancers to newest version (Mouse)
Updated VISTA enhancers to newest version
Deprecate methods (all species)
The following methods have been deprecated and will be removed in Ensembl release 93
GTEx Update (Human)
Update to GTEx v6 used by the eQTL REST endpoint
Database schema changes (all species)
patch_87_88_b.sql - Allow seq_region name to be longer
patch_87_88_c.sql - sample_regulatory_feature_id field for regulatory build
VISTA Enhancer Updates (Human, Mouse)
Update to the latest VISTA data
Schema patches (all species)
patch_86_87_a.sql - Update schema_version in meta table to 87
patch_86_87_b.sql - Change data type of certain columns to facilitate foreing key constraints
patch_86_87_c.sq l- Remove obsolete coloumns from external_feature_file
patch_86_87_d.sql - Add 'unknown' as a valid gender in the epigenome table
patch_86_87_e.sql - Increase data_set.name length
Reprocess data from ENCODE project for mouse (Mouse)
The peak calling component of the Ensembl Regulation Sequencing Analysis pipeline has been improved. It is now following a well defined approach for calling narrow / broad peaks but as a result all the existing mouse data in Ensembl's Regulation database need to be reprocessed.
Micro-array mapping (Mouse)
Updates of the links from the probes to the transcripts.
Please not that this update is only done for the reference genome, not the 16 new strains.
Patches (all species)
-- schema version update
-- Drop tables epigenome_lineage and lineage - Not used anymore
-- Add column (production name) to feature_type table
-- Add new columns (read_length, is_paired_end, paired_with, file_size) to input_subset table to accommodate paired-end data
-- Add QC tables'
Reprocess human regulation data (Human)
In the last release (e85) 19 histone modifications were missing from our database (Known Bug). We will be adding these and redo the peak calling for all data currently in our database using our Ensembl Regulation Sequencing Analysis pipeline.
30 new epigenomes from the Roadmap Epigenomics Project (Human)
The NIH Roadmap Epigenomics Mapping Consortium was launched with the goal of producing a public resource of human epigenomic data to catalyze basic biology and disease-oriented research. The Consortium leverages experimental pipelines built around next-generation sequencing technologies to map DNA methylation, histone modifications, chromatin accessibility and small RNA transcripts in stem cells and primary ex vivo tissues selected to represent the normal counterparts of tissues and organ systems frequently involved in human disease
More information about the Roadmap Epigenomics Project can be found here.
Here is the complete list of new epigenomes that are going to be processed for release 85:
Fetal Intestine Small
B cells (PB) Roadmap
iPS DF 19.11
Fetal Muscle Trunk
T cells (PB) Roadmap
Fetal Muscle Leg
Monocytes-CD14+ (PB) Roadmap
Fetal Adrenal Gland
Fetal Intestine Large
Natural Killer cells (PB)
iPS DF 6.9
Reprocess data from ENCODE and BLUEPRINT projects for human (Human)
The peak calling component of the Ensembl Regulation Sequencing Analysis pipeline has been improved. It is now following a well defined approach for calling narrow / broad peaks but as a result all of the existing ENCODE and BLUEPRINT data in Ensembl's Regulation database need to be reprocessed.
Database schema patches (all species)
The Ensembl Regulation Database Schema has received a number of improvements for the Ensembl Release 85. Here is the complete list:
patch_84_85_a.sql - schema version
Update schema_version in meta table to 85
patch_84_85_b.sql - rename cell_type table to epigenome
The cell_type table may contain entries that are not specific to a single cell type
patch_84_85_c.sql - new columns to the epigenome table
Modify existing or add new columns to the epigenome table
patch_84_85_d.sql - add columns to experiment table
Add new columns to the experiment table in order to facilitate ERSA control grouping
patch_84_85_e.sql - add/modify columns in input_subset table
Add/modify columns in input_subset table
patch_84_85_f.sql - drop replicate column from result_set table
There is no need for a replicate column in the result_set table.
patch_84_85_g.sql - update dbentry related tables.
Updates to catch up with developments from the core schema and allow xrefs to be stored for epigenomes.
patch_84_85_h.sql - Store file types.
Store file types along with the files.
patch_84_85_i.sql - Normalise regulatory feature table: Create a non redundant version of the regulatory features
Also remove column "projected".
patch_84_85_j.sql - Normalise regulatory feature table
Create a linking table between regulatory features and feature sets.
patch_84_85_k.sql - Normalise regulatory feature table
Link up the new non redundant regulatory features. The new regulatory_feature_ids are set. The connection is made using the stable ids.
patch_84_85_l.sql - Normalise regulatory feature table
Link up the regulatory attributes with the linking table.
patch_84_85_m.sql - Normalise regulatory feature table
Clean up temporary columns and tables.
patch_84_85_n.sql - Make activity an enum.
patch_84_85_o.sql - Delete all MultiCell regulatory features feature set entries.
patch_84_85_p.sql - Delete MultiCell feature_set and epigenome
patch_84_85_q.sql - Rename table regulatory_attribute to regulatory_evidence
patch_84_85_r.sql - Drop unused empty tables
These tables are empty and of no use. No need to keep them
patch_84_85_s.sql - modify 'table_name' column in result_set_input table
Remove enum values from 'table_name' column that are of no use
patch_84_85_t.sql - Drop table regbuild_string
patch_84_85_u.sql - Remove regulatory build entries from feature_set table, relink everything else.
patch_84_85_v.sql - Move meta entries regarding regulatory build to the regulatory_build table
patch_84_85_w.sql - Extend the name length in the input_subset table
patch_84_85_x.sql - Remove unused columns in the experiment table
Remove primary_design_type, description, mage_xml_id, display_url
patch_84_85_y.sql - Table for storing epigenomes used in the regulatory build
patch_84_85_z.sql - Move segmentation entries from result_set table into the new segmentation_file table.
patch_84_85_za.sql - Move entries provided by external sources from the result_set table into the new external_feature_file table.
patch_84_85_zb.sql - Bugfix, the primary key was wrongly named.
API changes, deprecation list (all species)
This list contains all methods, modules and scripts deprecated in the Ensembl Funcgen API. A method is deprecated when it is not functional any more (schema/data change) or has been replaced by a better one. Backwards compatibility is provided whenever possible. When a method is deprecated, a deprecation warning is thrown whenever the method is used. The warning also contains instructions on replacing the deprecated method and when it will be removed. A year after deprecation (4 Ensembl releases), the method is removed from the API.
### Deprecated in Ensembl Release 85, to be removed in Ensembl Release 89 ### - Bio::EnsEMBL::Funcgen::**ResultSet**::*replicate()* - Bio::Ensembl::Funcgen::DBSQL::**CellTypeAdaptor** - Bio::Ensembl::Funcgen::**CellType** - Bio::EnsEMBL::Funcgen::**DataSet**::*cell_type()* - Bio::EnsEMBL::Funcgen::**Experiment**::*cell_type()* - Bio::EnsEMBL::Funcgen::**Importer**::*cell_type()* - Bio::EnsEMBL::Funcgen::**SetFeature**::*cell_type()* - Bio::EnsEMBL::Funcgen::**Set**::*cell_type()* - Bio::EnsEMBL::Funcgen::**Epigenome**::*efo_id()* - Bio::EnsEMBL::Funcgen::**Experiment**::*primary_design_type()* - Bio::EnsEMBL::Funcgen::**Experiment**::*description()* - Bio::EnsEMBL::Funcgen::**Experiment**::*mage_xml()* - Bio::EnsEMBL::Funcgen::**Experiment**::*mage_xml_id()* - Bio::EnsEMBL::Funcgen::**RegulatoryFeature**::*cell_type_count()* - Bio::EnsEMBL::Funcgen::DBSQL::**ExperimentAdaptor**::*fetch_all_by_CellType()* - Bio::EnsEMBL::Funcgen::DBSQL::**MotifFeatureAdaptor**::*fetch_all_by_Slice_CellType()* - Bio::EnsEMBL::Funcgen::DBSQL::**SetAdaptor**::*fetch_all_by_CellType()* - Bio::EnsEMBL::Funcgen::**RegulatoryFeature**::*get_focus_attributes* - Bio::EnsEMBL::Funcgen::**RegulatoryFeature**::*get_nonfocus_attributes* - Bio::EnsEMBL::Funcgen::**RegulatoryFeature**::*is_unique_to_FeatureSets* - Bio::EnsEMBL::Funcgen::**RegulatoryFeature**::*get_other_RegulatoryFeatures* - Bio::EnsEMBL::Funcgen::**RegulatoryFeatureAdaptor**::*fetch_all_by_stable_ID*
### Removed in Ensembl Release 85 ### - Bio::Ensembl::Funcgen::**Experiment**::*date()* - Bio::Ensembl::Funcgen::**ResultSet**::*get_replicate_set_by_chip_channel_id()* - Bio::Ensembl::Funcgen::DBSQL::**BaseAdaptor**::*list_dbIDs()* - Bio::Ensembl::Funcgen::DBSQL::**BaseAdaptor**::*_constrain_status()* - Bio::Ensembl::Funcgen::DBSQL::**BaseAdaptor**::*fetch_all_by_status()* - Bio::Ensembl::Funcgen::DBSQL::**DBEntryAdaptor**::*list_regulatory_feature_ids_by_extid()* - Bio::Ensembl::Funcgen::DBSQL::**DBEntryAdaptor**::*list_probeset_ids_by_extid()* - Bio::Ensembl::Funcgen::DBSQL::**DBEntryAdaptor**::*list_feature_type_ids_by_extid()* - Bio::Ensembl::Funcgen::DBSQL::**DBEntryAdaptor**::*list_external_feature_ids_by_extid()* - Bio::Ensembl::Funcgen::DBSQL::**DBEntryAdaptor**::*list_annotated_feature_ids_by_extid()* - Bio::Ensembl::Funcgen::DBSQL::**DBEntryAdaptor**::*list_probe_ids_by_extid()* - Bio::Ensembl::Funcgen::DBSQL::**FeatureSetAdaptor**::*fetch_all_by_type()* - Bio::Ensembl::Funcgen::DBSQL::**InputSubsetAdaptor**::*fetch_by_name_and_experiment()* - Bio::Ensembl::Funcgen::DBSQL::**ProbeFeatureAdaptor**::*fetch_all_by_probeset()* - Bio::Ensembl::Funcgen::DBSQL::**ResultFeatureAdaptor**::*fetch_all()* - Bio::Ensembl::Funcgen::DBSQL::**ResultFeatureAdaptor**::*fetch_by_dbID()* - Bio::Ensembl::Funcgen::DBSQL::**ResultFeatureAdaptor**::*fetch_all_by_dbID_list()* - Bio::Ensembl::Funcgen::DBSQL::**ResultFeatureAdaptor**::*fetch_all_by_logic_name()* - Bio::Ensembl::Funcgen::DBSQL::**ResultFeatureAdaptor**::*_list_seq_region_ids()* - Bio::Ensembl::Funcgen::DBSQL::**ResultSetAdaptor**::*store_dbfile_data_dir()* - Bio::Ensembl::Funcgen::DBSQL::**ResultSetAdaptor**::*_fetch_dbfile_data_dir()*
GTEX eQTL data (Human)
GTEX eQTL data for 14 tissues has been added to the Gene Regulation view for human.
WGE CRISPR-Cas9 target site predictions (Human, Mouse)
Added the Wellcome Trust Sanger Institute Genome Editing (WGE) predicted CRISPR sites in the mouse and human exomes. Data published here: WGE: A CRISPR database for genome engineering. Alex Hodgkins; Anna Farne; Sajith Perera; Tiago Grego; David J. Parry-Smith; William C. Skarnes; Vivek Iyer (Bioinformatics 2015) doi:10.1093/bioinformatics/btv308
20 haematopoietic primary cell epigenomes from the BLUEPRINT project (Human)
Blueprint is a large scale research project for deciphering the epigenome of blood cells.
Please find more about Blueprint here.
CD14+ CD16- monocyte from Venous Blood
CD14+ CD16- monocyte from Cord Blood
CD4+ ab T cell from Venous Blood
CD8+ ab T cell from Cord Blood
CM CD4+ ab T cell from Venous Blood
eosinophil from Venous Blood
EPC from Venous Blood
erythroblast from Cord Blood
HUVEC prol from Cord Blood
M0 macrophage from Cord Blood
M0 macrophage from Venous Blood
M1 macrophage from Cord Blood
M1 macrophage from Venous Blood
M2 macrophage from Cord Blood
M2 macrophage from Venous Blood
MSC from Venous Blood
naive B cell from Venous Blood
neutro myelocyte from Bone Marrow
neutrophil from Cord Blood
neutrophil from Venous Blood
Provide segmentation features via Bigbed files (all species)
Store segmentation features in bigbed files instead of our mysql db in order to plan our future scaling needs.
This means that the segmentation features are not going to be available on Biomart anymore.
The bigbed files will be made available through the ftp site.
patch_83_84_f.sql (all species)
Deprcated has_evidence, the coloumn is now used to report the activity of the RegulatoryFeature.
patch_83_84_a.sql (all species)
Update schema_version in meta table to 84
patch_83_84_b.sql (all species)
Drop unique key for cell_type.efo_id
It's possible for two or more entries to have the same efo_id (i.e. different biological replicates of the same cell type)
patch_83_84_c.sql (all species)
Add not null constraint to cell_type.display_label
The display_label column is used for displaying the cell type name on the genome browser, therefore it has to be not null
patch_83_84_d.sql (all species)
Add segmentation enum to result_set.feature_class
This is necessery for registering segmentation result sets.
patch_83_84_e.sql (all species)
increase length of regbuild_string.name,
new cell_type.names are longer
API change: ResultSet.pm & ResultSetAdaptor.pm (all species)
remove checks in order to allow result_sets (i.e. methylation, segmentation) which are not linked to input_subsets to be displayed
API change: Delete deprecated modules and methods (all species)
Tarbase v7.0 (Human, Mouse)
Updated Tarbase to current v7.0
patch_81_82_a.sql - schema_version update (all species)
Update schema_version in meta table to 82
Mouse regulatory Build update (Mouse)
The Regulatory Build on Mouse was re-computed, converting the "old style" build to the "new style" build, as was done on human in E!76. All Regulatory Builds in Ensembl are now updated to the new style.
We took the opportunity to increase the number of cell types to 8.
patch_80_81_c Drop experiment.date (all species)
The experiment.date field has been dropped from the funcgen schema and the related Bio::EnsEMBL::Funcgen::Experiment::date method has been deprecated.
Collection Files replaced by BigWigs (all species)
The Funcgen density maps, which were previously stored in an in-house flat file format (Collection Files) were ported to the more common standard, BigWig.
The Funcgen API now provides only a path to the files, which can be read using the Ensembl::IO repository.
patch_80_81_a.sql - schema_version update (all species)
patch_80_81_a.sql - schema_version update
patch_80_81_b.sql|add gender 'mixed' to table cell_type (all species)
patch_80_81_b.sql|add gender 'mixed' to table cell_type
patch_79_80_c - stable_id changed to varchar (all species)
The regulatory_feature.stable_id field was changed form an int to a varchar. API support was implemented to handle this.
Micro Array Mapping (all species)
Micro array mappping was carried out for species with updated gene builds:
Added the missing transcript annotations for
Added new Human Affymetrix array:
Human Segmentation adjacent feature merge (all species)
Adjacent segmentation features with the same segmentation classification were merged into a single feature.
patch_79_80_b dbfile_registry unique key (all species)
A unique key patch was applied to the dbfile_registry table.
BindingMatrix (all species)
Adding matrix method to BindingMatrix to store the matrix array. This will be used to generate the frequency string.
patch_79_80_a.sql - schema_version update (all species)
patch_79_80_a.sql - schema_version update
Corrected motif feature mappings (all species)
We discovered that some motif features were not mapped properly onto the the GRCh38 assembly, in particular, some motifs did not map at all onto the genome. We re-ran the pipeline to provide a complete coverage of known motifs.
patch_78_79_a.sql - schema_version update (all species)
Update schema_version in meta table to 79.
patch_78_79_b.sql - binding_matrix add unique key (all species)
Chaning key on name/analysis to unique.
Corrected FANTOM 5 mappings (Human)
A bug was reported on the mapping of FANTOM5 features onto the GRCh38, which we corrected.
Micro-array mapping (all species)
All species which had a genome assembly or transcript update had the appropriate alignments and xref redone.
This includes a correction to the Human array xrefs, where some AFFY_ST array xrefs were missing and others array formats had release 76 xrefs
patch_77_78_a.sql - schema_version update (all species)
Update schema_version in meta table to 78.
patch_77_78_b.sql - unmapped_reason_id (all species)
Change unmapped_reason_id from smallint to int
patch_76_77_a.sql - schema_version update (all species)
Schema update to 77
Micro Array Mapping (all species)
Micro array mapping and transcript xrefs have updated for those species which have had an genome assembly of gene build update.
patch_76_77_b.sql|CTCF feature_type update (all species)
Changing the name of the FeatureType "CTCF", class "regulatory feature" to "CTCF Binding Site"
patch_76_77_c.sql|Correct mirna so_name and accession in feature_type (all species)
Correct mirna so_name and accession in feature_type table. so_name and accessions have been swapped in some cases.
patch_76_77_d.sql|Fix errornous feature_type_id in mirna_target_feature (all species)
A few records have an incorrect feature_type_id assinged
New Regulatory Build (Human)
We reviewed the Build algorithm, as described in an earlier blog post. The Ensembl Regulatory Build on human is now constructed from the segmentations, to ensure consistency. In addition, the new Regulatory Build annotates the RegulatoryFeatures at the MultiCell level, each cell-type differing only by an activity variable.
New segmentations (Human)
We extended the number of segmentation to 18 celltypes, with the inclusion of cell lines A549, DND41, HMEC, HSMMT, HSMM, IMR90, Monocytes-CD14+, NHA, NHDFAD, NHEK, NHLF and Osteoblasts. The segmentations were run with Segway, at a 200bp resolution.
FANTOM 5 enhancers and promoters (Human)
We added tables with the locations of the enhancers and promoters called by the FANTOM 5 consortium.
JASPAR motifs (all species)
We updated the motifs to match the new release of JASPAR TF binding motifs.
result_set.name unique (patch_74_75_b) (Human, Mouse)
The name field of the result set table now has a unique key, and the names have been updated by appending the relevant analysis logic name, in line with the other set tables.
input_subset.analysis_id (patch_74_75_c) (all species)
An analysis_id has been added to the input_subset table, which will mirror the input_set.analysis_id.
Consequently, InputSubset has been changed to inherit from Set, and the feature_type validation in of Set subclass constructors has been moved to the Set constructor.
This work is a prerequisite to the retiring the InputSet class/table.
InputSet retired (patch_74_75_d) (all species)
The InputSet class has been retired and ResultSet will be used directly instead. The result_set_input table has been patched to replace input_set entries with input_subset entries. The ResultSet classes have been updated to make the association of dbfile_registry_entry record optional.
The InputSet class and table will remain in the schema, until all dependant code has been migrated to the new usage model.
Array size (all species)
The Array size attribute and associated methods/constructor parameters have been deprecated or removed.
Mircroarray mapping (all species)
Microarray mapping has been updated for those species with new genome assemblies, new gene builds or new arrays.
Experiment FeatureType and CellType (patch_74_75_f) (all species)
The experiment table has had additional feature_type_id and cell_type_id fields added, and the associated API classes have been updated. This is in line with the current suage within the analysis pipelines and is to prevent the experiment class being used as a study where many feature/cell types can be associated.
New TarBase microRNA target sites (Human, Mouse)
Our conservative MiRanda miRNA targets set (which is no longer maintained), will be replaced by predictions from Diana TarBase:
TarBase v6.0 has replaced (drop-in) MiRanda targets as ExternalFeatures. Separate adapter classes will follow in r76.
TarBase analysis added (Human, Mouse)
Added TarBase_v6.0 to analysis table
Updated microarrays (Human, Mouse, Rat)
- Illumina HumanHT-12 had its version number '_V3' appended, and the new HumanHT-12_V4 was added.
- The Illumina HumanRef-8_V3 was added.
- Agilent SurePrint G3 GE 8x60k had its version number '_V2' appended
- The Affy HuGene-1_0-st-v1 was updated to HuGene-2_0-st-v1
- The Illumina MouseRef-8_V2 was added.
- The Illumina RatRef-12 array had its version '_V1' appended.
input_set_input_subset_split (patch_73_74b) (all species)
The input_set and input_subset tables we patched to allowe input_subsets to exist independant of input_set records. The input_set format and vendor feilds were dropped in favour of an analysis field, and cell_type, feature_type and experiment fields were added to the input_subset table.
Drosophila microarray mappings (Fruitfly)
ProbeTranscriptAlignments were redone to correct a very minor feature duplication issue, which had no effect on transcript xrefs.
Probe design support (patch_74_74_e) (all species)
The probe_design support has been dropped from the API and the schema. This does not effect the current expression array designs, but was related to an historical table used in the design of tiling arrays.
Status name length increased (patch_73_74_d) (all species)
The length of the name field of that status_name table was increased.
result_set input_subset support (patch_73_74_c) (all species)
The ResultSet API and schema was patched to allow input_subsets as supporting sets, and a replicate field was also added to reflect the input_subset replicate value.
API update (all species)
Rabbit microarray probeset (Rabbit)
We imported the new Agilent rabbit array (GEO GPL16709) .
patch 71_72_a.sql - schema version update (all species)
Update schema_version in meta table to 72.
patch 71_72_b.sql - associated_xref (all species)
Request from Core to add 2 tables to have schema support for associated xrefs.
supporting_set table PK change (all species)
Add 'type' to supporting set PK:
ALTER TABLE supporting_set DROP PRIMARY KEY, ADD PRIMARY KEY(`data_set_id`,`supporting_set_id`,`type`);
VISTA Enhancer update (Human, Mouse)
The Human and Mouse VISTA Enhancers have been updated to the latest release (28/1/2013).
Microarray mapping (all species)
All microarray mappings and transcript annotations have been updated for species with new assemblies and gene builds. The new Affymetrix Human PrimeView gene expression array was also imported.
patch_70_71_c - removed design tables (all species)
Two used tables were remove, design_type and experimental_design.
patch_70_71_a - Schema version update (all species)
This patch updates the schema version from 70 to 71.
patch_70_71_b - analysis key clean up (all species)
This patch removes a duplciate logic_name key frm the analysis table.
Regulation GFF updates (Human, Mouse)
Updates have been made to the various GFF dumps from the funcgen DBs. These have addressed some minor format and content issues which now bring these in line with the GFF3 specification. Namely removal of spaces between attribute fields and addition of sequence ontology terms in the 'type' field. Additionally, scores have been added to the MotifFeature dumps (more info here http://www.ensembl.org/info/docs/funcgen/regulatory_build.html#tfbs).
Regulatory Build: Mouse (Mouse)
A new mouse regulatory build is available with:
- Much broader coverage for Mouse Embryonic Fibroblast (MEF) build: H3K9ac, H3K4me1&2 + various Transcription Factors
- Addtional data on other cell types for: CTCF, PolII, H3K4me1 and H3K36me3
patch_69_70_b - regulatory_feature.bound_start/end_length (all species)
The bound_seq_region_start/end fields of the regulatory_feature table will be changed to length fields. RegulatoryFeatures do not meet the standard definition of an Ensembl Feature in that they have two sets of loci (normal seq_region loci, and bound seq_region loci), as such the core projection code does not currently deal adjusting the bound loci, resulting in some anomalous results (e.g. X PAR to Y PAR mapping). The true start and end values will now be calculated dynamically by the API, hence addressing the above issue.
patch_69_70_a - schema version (all species)
The schema_version meta table entry has been updated to 70.
Regulatory Build Analysis (Human, Mouse)
A change been made to the regulatory build analysis (build_regulatory_features.pl), to address an issue where core evidence from adjacent RegulatoryFeatures was included. This manifested in the browser as MotifFeatures appearing outside the core region.
MotifFeature (PWM) revision (Human, Mouse)
PWM alignment is now restricted to species specific binding matrices where possible. If no PWM is available for a given species, then the closest species will be chosen.
This has been patched in the existing human and mouse AnnotatedFeature associations, and also in the human RegulatoryFeatures. (Mouse regulatory build has been fully updated).
New ENCODE WGBS data (Human)
A new whole genome bisulphite sequencing data set has been incoprorated for the GM12878 lymphoblastoid cell line.
Microarray mapping update (Mouse, Chimpanzee, Rat)
The following species will be updated due to assembly and/or genebuild changes:
Regulatory Build: Human (Human)
The human regulatory build has been patched to remove unfiltered blacklisted features from the Y chromosome, and to remove some anomalous MotfiFeatures (See 'Regulatory Build Analysis' item).
Black list filtering (Human)
Peak calls based on ChIP-Seq and DNase1 date are filtered using a list of black list regions curated by the ENCODE project. In releases 64-69, a bug was introduced caused by the addition of filtering support for the Y pseudo-autosomal regions in human. This resulted in all black list regions on the human Y chromosome (including the PARs) being omitted from the filtering. The effect of this is two fold: PAR regions appear to have duplicate data at a given location, as data from the corresponding X PAR is projected across; a small amount of low quality regulatory features (~150-200) and associated supporting evidence have not been filtered out. This will be rectified in the following script:
AnnotatedFeature GFF dumps (all species)
These have been altered slightly to include the following new attributes:
- Peak summit
- Feature class
- Cell type
CoordSystemAdaptor::fetch_by_name (all species)
Some bugs were fixed in the Bio::EnsEMBL::Funcgen::DBSQL::CoordSystemAdaptor fetch_by_name method:
- Versions now sourced from the core DB
- Versions now matched accounts for case
- Absent CoordSystems now caught
FeatureType regulatory evidence methods (all species)
The Funcgen FeatureType and FeatureTypeAdaptor have been updated to improve support for usage by the web code. New methods have also been added:
sam2bed.pl -one_based (all species)
The sam2bed.pl script has been changed to default to the bed standard half open or 0 start based coordinate system. Specifying -one_based will force the usage of closed coordinates i.e. Ensembl standard.
Funcgen DBAdaptor changes (all species)
Some species setting code was removed as this is now handled by ConfigRegistry.
A namespace bug was also addressed whereby the funcgen CoordSystemAdaptor was having to be generated with an FG prefix (e.g. get_FGCoordSystemAdaptor) to avoid clashing with the core CoordSystemAdaptor. This was rectified in the Reigstry and so the correct auto load method has been introduced (i.e. get_CoordSystemAdaptor). This does not affect retrieval from the Registry.
DNA Methylation Bisulphite Sequencing (Human, Mouse)
Numerous DNA methylation bisulphite sequencing data sets have been integrated into the funcgen database. This is a mixture of new data and data which was being handled via an external DAS server (details below).
These data will be represented as ResultSet, with the individual features being handled by the new DNAMethylationFeature and DNAMethylationFeatureAdaptor classes. The underlying data storage is handled via bigBed files, which are available within the set 'data files', now part of the regular Ensembl release (see ftp://ftp.ensembl.org/pub/data_files/).
Configuration of the new integrated DNA Methylation data will be handled in the same way as the old external data, i.e. via the DNA Methylation configuration panel menu.
Summary of data sets:
- Whole-genome bisulphite maps for human h1ESC and IMR90 cell types from Lister et, al., Nature 2009.
- 44 human ENCODE RRBS set (previously available via DAS).
- Two mouse whole-genome bisulphite data sets are provided for ES and NP cell lines from Stadler et. al., Nature 2011.
All the above were filtered for a minimum of 10 reads (coverage).
Mouse VISTA enhancers (Mouse)
The Mouse VISTA enhancer set (http://enhancer.lbl.gov/) will be reloaded and mapped to the new GRCm38 assembly. These will be available as ExternalFeatures via the 'VISTA enhancer set' FeatureSet, and also available in the browser cia the 'Other regulatory regions' section of the configuration panel.
list_dbID methods (all species)
The is now a dynamic list_dbID method in the funcgen BaseAdaptor which uses the correct table name for a given adaptor. Hence this method has been removed from all inheriting adaptors.
RegulatoryBuild Analysis (Human, Mouse)
Updates have been made to the data contained in the analysis and analysis_description table which represents the 'Regulatory Build'.
Funcgen DB patches (all species)
A number of patches have been applied to the funcgen DBs, which are reflected in teh efg.sql file.
patch_68_69_a.sql - Schema version update
patch_68_69_b.sql - DNA Methylation support
patch_68_69._c.sql - Outstanding xref tidy up
patch_68_69_d.sql - xref.id index_fix
Mouse GRCm38 Update (Mouse)
All existant mouse data sets have been migrated to the new GRCm38 assembly:
All *-Seq (DNase1, Transcription Factors, Histone mods etc) have been re-aligned and re-analysed to provide updated peaks calls. The Mouse Regulatory Build has been rerun and freshly mapped MotifFeatures (PWMs) have been integrated using the new peaks calls.
The cisRED and VISTA features have been projected to the new assembly using the assembly projections provided by the GRC. Unfortunately the miRanda miRNA Target set for mouse has been retired due to a very poor projection. We are looking to replace this with a new miRNA target set in the near future.
MicroArray mapping (all species)
Array mapping and transcript annotation with be re-run for all species which have array data and have a new assembly or a new gene build.
Human miRanda miRNA targets (Human)
This set has been re-imported using a revised pipeline which aims to clean/filter annotations made on previously gene builds.
Patches (all species)
The following SQL patches have been applied:
patch_67_68_a.sql - Updated schema version.
patch_67_68_b.sql - Moved the archive_id and data_url fields from experiment to input_subset.
patch_67_68_c.sql - Added replicate field to input_set and input_subset, added is_control to input_subset.
patch_67_68_d.sql - Change feature_set.experiment_id to input_set_id
Experiment/InputSet/InputSubset/FeatureSet methods (all species)
Changes have been made to Experiment, InputSet and InputSubSet to better represent the true data model of experiments with replicates and controls.
Experiment::archive_id and data_url have been migrated to the InputSubset class as archive_id and display_url.
Both InputSet and InputSubSet have acquired replicate methods, and InputSubSet has also acquired an is_control method.
The FeatureSet class has also had changes to reflect the above. FeatureSet::get_Experiment and source_label have been deprecated to make way for get_InputSet and source_labels
Patches (all species)
a: Schema version patch
b: attribute_feature.attribute_feature_idx - Facilitates retrieval of a RegualtoryFeature given one of it's attribute features.
c: result_feature.partition_removal - Removes unused window size based partitions
d: regulatory_feature.binary_string - Extends the length of the binary string field
Signal collection (.col) file storage (Human, Mouse)
To reduce redundancy and simply the organisation of the signal collection (.col) files, an assembly centric directory structure has been adopted. This is now inline with how other data files are stored.
Note: This declaration was originally, 'Collector::ResultFeature and .col files'
Changes to the Collector::ResultFeature have been post-poned.
New Human Regulatory Build (Human)
- 150 New ChIP-Seq and Dnase-Seq datasets from ENCODE
- Including 3 new cell lines: HMEC, HSMM, NH-A
- Including 4 new Transcription Factors with available Jaspar PWMs:
- FoxA1: MA0148.1
- FoxA2: MA0047.1; MA0047.2; PB0015.1;
- HNF4A: MA0114.1; PB0030.1; PB0134.1
- MEF2C: MA0005.1
- K562b sets are now part of the K562 regulatory build
- H3K27me3 sets are now built using CCAT, like H3K36me3
- Some sets were deprecated:
- K562 Nfya and Nfyb were deprecated in ENCODE
- HepG2 PolII, Srebp1, Srebp2 were removed from the HepG2 regulatory build as these experiments are annotated with treatments.
MicroArray Mapping (C.intestinalis, Human, Mouse, Rat)
MicroArray mapping has been updated for all species which have had gene build or genome assembly updates. In addition to this we have also added some 'SurePrint' array from Agilent.
Patches (all species)
The following patches have been applied to the funcgen schemas:
patch_65_66_a - Updated meta schema_version to 66
patch_65_66_b - Added tissue field to cell_type. Added lineage and cell_type_lineage tables
patch_65_66_c - Modify array_chip.design_id to 100 varchar
patch_65_66_d - Modify unmapped_object unique index to be more performant
patch_65_66_e - Added regbuild_string table to handle longer strings
VISTA Enhancers updated (Human, Mouse)
The VISTA Enhancer set (http://enhancer.lbl.gov/) has been updated, more than doubling the available sites for human (1447) and adding an entirely new small set for mouse (212).
Regulatory Genome Segmentation (Human)
Data and API support has been added for genome segmentation data, based on a combination of chromhmm and segway analyses from the ENCODE project.
Segmentation feature tracks are now available in the Regulation section of the configuration panel under 'Regulatory features'.
Experiment View (Human, Mouse)
An Experiment view has been developed to improve access and visualisation of experimental meta data. This will include archive IDs and source projects used as input for the Human and Mouse Regulatory builds.
New Mouse Regulatory Build (Mouse)
- H3K4me3, Oct4, Rbbp5, Wdr5 ChIP-Seq for ES cell-line from Ang et al. (2011)
- H3K4me4 ChIP-Seq for MEL cell-line, from ENCODE
Updated Microarray Probe Mappings (all species)
Microarray probe/probeset mappings have been updated for:
- Danio rerio
Reorganized and Updated documentation (all species)
Documentation regarding Regulation data, sources and methodology was reorganized and updated.
Database schema patches (all species)
- patch_64_65_a: schema version update
- patch_64_65_b: Add analysis_id to feature_type to support SegmentationFeature states
- patch_64_65_c: add hermaphrodite as a gender
- patch_64_65_d: add SegmentationFeature table for the new Segmentation tracks
- patch_64_65_e: force regulatory_attribute type to be either 'motif' or 'annotated'
- patch_64_65_f: Add segmentation as an input_set type
- patch_64_65_g: Table options clean up
New Regulatory Data (Human, Mouse)
- New Mouse MEL cell-line regulatory build, including Dnase-Seq, and ChIP-Seq for CTCF, p300, cMyb, USF2, Rad21, NELFe and Max. All data is from ENCODE, following their data policies.
- New Human CD4 ChIP-Seq data for CBP, p300, MOF, PCAF, Tip60, HDAC1, HDAC2, HDAC3 and HDAC6 (Wang et al, 2009). A new regulatory build was made to incorporate this data.
DNA methylation DAS tracks (Human)
We have updated the set of DNA methylation DAS tracks using data for over 40 cell lines available from the ENCODE project.
MicroArray Mapping (Cow, Human, Mouse, Rat)
Micro array mapping has been performed for those species with new assemblies or updated gene builds.
Corrected CTCF, Nanog, p300 and Smad1 Mouse ES ChIP-Seq datasets (Mouse)
We identified an issue with our analysis of the CTCF, Nanog, p300 and Smad1 for the ES cell line from Chen et al., 2008. The raw reads were clipped to 26bp, re-aligned, and new peak calls were made. This correction enabled a significant increase in the number of mapped reads, making the new peaks more trustworthy.
New H3K36me3 Peak Calls using CCAT (Human, Mouse)
We have changed the peak calling method for H3K36me3 datasets. The new method uses CCAT (Xu et al, 2010), configured for histone marks and with a sliding window of 1000bp. This method has enabled an increase in the number and length of H3K36me3 calls.
MotifFeatures: PWM Scores Rounded (all species)
MotifFeature PWM scores were rounded to 3 decimal places.
patch_63_64_a - Schema version (all species)
The schema_cersion entry in the meta table has been patched to version 64.
patch_63_64_b - Cell type experimental factor ontology ID (all species)
The cell_type table has had an efo_id field added to represent links to the Experimental Factor Ontology.
patch_63_64_c - Experimental meta data (all species)
A patch has been applied to add fields to capture experimental meta data e.g. archive & pubmed IDs
patch_62_63_a - Schema Version (all species)
This patch updates the schema version
patch_62_63_b - binding_matrix.analysis_id (all species)
This patch updates the the analysis_id field of the binding_matrix tables to a smallint
MicroArray Mapping (all species)
Mapping of expression arrays to Ensembl Transcripts has been updated for Human, Mouse, Rat and Zebrafish. We also included an updated version of the Phalanx OneArray for Rat.
RegulatoryFeatureAdaptor::fetch_all (all species)
The base fetch_all method has been over-ridden for the RegulatoryFeatureAdaptor, this now defaults to returning only the MultiCell RegulatoryFeatures, as the other generic methods do.
ResultFeatureAdaptor method over-rides (all species)
Where appropriate some of the base feature adaptor methods have been over-riden, this prevents some API errors due to the nature of the ResultFeature storage
Added Motif Features for Missing Jaspar Matrices (Human)
Motif Features were added for the following Jaspar Matrices:
- E2F1: MA0024.1 (2315 sites)
- NFKB: MA0105.1 (2040)
- BHLHE40: PB0111.1 (46); PB0007.1 (276)
- Nrsf: MA0138.1 (4321)
Changed Motif Features score to [0-1] relative affinity scale (Human, Mouse)
Instead of showing the absolute score from the MOODs software, we now display a [0-1] linear relative value between the maximum (1) and minimum (0) score. This is to make it coherent with the API BindingMatrix::relative_affinity function and to make it easier for the user to interpret the score.
patch_62_64_c : Added binding_matrix.threshold (all species)
A new threshold float field was added to the Binding Matrix to store the minimum score for Motif Features from each matrix (patch_62_63_c).
Added species-specific thresholds to Binding Matrices (Human, Mouse)
Added to each Binding Matrix the lowest score for Motif Features belonging to that matrix and that species. This will make it easier for people using the API to know if the potential binding affinity for a given sequence goes above the currently used threshold (ie would be classified as a binding site).
Cleaned Regulatory Regions in chromosomal boundaries (Human, Mouse)
In some rare cases, regulatory regions can pass the boundaries of sequence regions (chromosomes or scaffolds). These cases will be removed as they are likely to be artifactual. For release 63, 18 mouse regulatory features were removed.
Update of Regulation Metadata (all species)
CTCF is now classified generically as a "Transcription Factor" instead of "Insulator"
ResultFeatureAdaptor ExperimentalChip support removed (all species)
Support for ResultFeatures originating from ExperimentalChips (i.e. array experiments) has been removed.
patch_61_62_a Update meta schema version (all species)meta schema_version has be updated to 62
patch_61_62_b interdb_stable_id (all species)A interdb_stable_id field has been added to the motif_feature and external_feature tables. NOTE: This is not an 'Ensembl stable ID', and will only be used internally to enable inter-DB linking between the variation and funcgen schemas. These are not guaranteed to be stable between Ensembl releases.
patch_61_62_c feature_type Sequence Ontology fields (all species)so_name and so_accession will be added to the feature_type table to enable display of Sequence Ontology information and linking to the ensembl_ontology DB
ResultFeature DBFile Collections (Human, Mouse)Where possible data from the result_feature table has been moved outside of the database to indexed binary '.col' files. The ResultFeatureAdaptor now uses the new core DBFile::CollectionAdaptor and DBFile::FileAdaptor to access these data directly.
Array Mapping (Human, Mouse, Rat, Xenopus)
Genomic and transcript alignments and transcript xref annotation has been re-run for all species with new genome assemblies or genebuilds.
New Array Imports (Rat)
New Illumina Infinium arrays: HumanMethylation27K HumanMethylation450K New Rat array: Phalanx OneArray
Patch_61_62_d: Experimental Group Description (all species)
This change serves to support a better annotation of data sources.
Update of Human functional genomics data (Human)
New datasets from ENCODE and the Epigenomics Roadmap, covering existing cell lines. The Regulatory Build was rerun for cell lines with new data.
Binding Matrix: simpler representation of matrix frequencies (all species)
This change intends to make the representation simpler, towards something that can applied to different formats.
patch_61_62_e Addition of dbfile_registry table (all species)
A dbfile_registry table has been added to store the filepaths of result feature collection (.col) files
PolIII Transcription Associated Regulatory Features (all species)
The Regulatory Build now also annotates Regulatory Features associated to PolIII Transcription.
Mouse RegBuild update (Mouse)
The Mouse RegulatoryBuild has been rerun to include some MotifFeatures which we re previously being excluded. Input data sets remained the same, hence structures are unaffected.
patch_61_62_f regulatory_feature.fset_seq_region_idx (all species)
A new regulatory_feature unique index has replaced two older indexes.
Core schema patches (all species)
Core schema patches have been propagated to the funcgen schema where appropriate i.e. patch_61_62_g - synonym_field_extension patch_61_62_h - external_db.db_name_idx patch_61_62_i - drop_external_db.display_label_linkable
patch_61_62_j - extend meta.meta_key (all species)Extended the length of the meta_key field to handle longer values
Array Mapping (all species)
Array mapping was updated on all species which have had an update to their genome assemblies or gene builds. The probe/set to transcript xrefs were recalculated across all species.
Mouse Regulatory Build (all species)
The mouse RegulatoryBuild was re-run to re-introduce some data which had been omitted in the previous build.
Array Mapping (all species)
The array mapping pipeline will be run for those species which have new assemblies, gene build or new array designs. This includes an update to the latest version of the Phalanx OneArray for human.
BindingMatrix (all species)
A new BindingMatrix class will represent position weight matrices (PWMs) loaded from Jaspar or inferred directly from Chip-Seq data. This will ultimately be able to identify the consequence of a sequence change at a given location, with respect to the PWM score. patch_59_60_c.sql contains the relevant changes to update the schema to support this data.
MotifFeature (all species)
A new MotifFeature class has been added to represent the genomic mapping of a position weight matrix (BindingMatrix). patch_59_60_c.sql contains the relevant schema updates.
Schema patch: Schema version (all species)
patch_59_60_a.sql updates the meta table, changing the schema_version meta_value to 60.
Schema patch: associated_feature_type (all species)
patch_59_60_b.sql updates the associated_feature_type table to support feature_type to feature_type associations. The relevant adaptors have also been updated to reflect the new table fields and values.
RegulatoryBuild update (all species)
The human RegulatoryBuild has been updated and re-annotated based on the new ChIP-Seq data sets.
Position Weight Matrix (PWM) mapping and visualisation (all species)
PWM mappings which used to be associated with the RegulatoryFeatures, are now associated with the AnnotatedFeatures representing the specific Transcription Factor Binding Site predictions. This utilises the new MotifFeature and BindingMatrix classes. These new data are available as new tracks in the Regulation panel as well as Region in Detail.
New chip-seq datasets from ENCODE (all species)
93 new ENCODE Chip-Seq datasets for existing cell lines will be added.
Schema patch: probe_feature.cigar_line (all species)
patch_59_60_d.sql The probe_feature table has been patched to change the cigar_line field to a varchar from a free text field.
Schema patch: regulatory_attribute.attribute_table_name (all species)
patch_59_60_e.sql The regulatory_attribute table has been modified to allow 'motif' as an attribute_feature_table.
ResultFeature Collection Normalisation (all species)
All result_feature collections (Human & Mouse) have been normalised using a simple by read count method(RPKM), enabling more meaningful visual comparisons using the multi-wiggle displays.
Schema patch: annotated_feature.summit (all species)
The annotated_feature table now has a dedicated peak 'summit' field (patch_58_59_c.sql). API support has been added both to the eFG and the ensembl-analysis APIs.
Schema patch: probe.description (all species)
The probe table has been patched (patch_58_59_b) to add a description column. Appropriate API support has also been added to the eFG and ensembl-analysis APIs
Schema path: regulatory_feature.binary_string_project (all species)
The regulatory_feature table has been redefined (patch_58_59_d) to add a new binary_string and projected columns to store information generated from the Regulatory Build. API has also be added to the RegulatoryFeature class and associated adaptor.
Mouse Projection Regulatory Build (all species)
The Mouse Regulatory Build has been regenerated using a new 'projection' method. Core(MultiCell) RegulatoryFeatures are only be projected to these sparsely annotated cell lines if more annotation is available i.e. RegulatoryFeatures are not be built if there is no supporting evidence on the given cell line.
This reintroduces some data from different cell lines which do not have associated 'core' features (e.g. DNAse1) along with some new TFBS data.
New chip-seq datasets (all species)
Drosophila melanogaster update (all species)
- Affy mapping update for D. melanogaster 5.25
- Regulatory elements update from REDfly v2.2
Human Regulatory Build - 6 New cell lines & data update (all species)
The Human Regulatory Build has been regenerated to incorporate the 5 new ENCODE cell lines: GM12878, H1ES, HeLa, HepG2, and NHEK.
The new data has also been used to update the core MultiCell and existing cell line specific build i.e. K562, IMR90, GM06990, CD4.
Array Mapping (all species)
Array mapping and Transcript xrefs have been regenerated for species which have updated genome assemblies or genebuilds.
Mouse ChIP-Seq data (all species)
Saccharomyces cerevisiae funcgen database (all species)
Database containing Affy array mapping data for Saccharomyces cerevisiae. Currently part of Ensembl Genomes Release 5. Currently 58, will be patched to 59.
Meta schema_type (all species)
A meta entry has been added to the eFG DBs to capture the schema type 'funcgen' (patch_58_59_e).
Schema patch: result_feature partitions (all species)
The result_feature table partitions have been modified slightly to reflect the true default zoom levels in the 'Region in Detail' view (patch_58_59_f). Collection data has been regenerated accordingly.
miRanda miRNA Target update (all species)
The human miRanda miRNA Targets set has been updated and a new mouse set has been added. These have been supplied by the Enright lab at the EBI . NOTE: These were generated using a revised conservative methodology and hence are a subset of those available via http://www.ebi.ac.uk/enright-srv/microcosm/htdocs/targets/
Read about our future plans on our blog!