Ensembl Regulation provides resources used for studying gene expression and its regulation in human and mouse, with a focus on the transcriptional and post-transcriptional mechanisms. Our database includes the Ensembl Regulatory Build, an up-to-date and comprehensive summary of regulatory features across the genome, as well as popular curated external resources.
Ensembl maintains a catalogue of genomic regions that could be involved in gene transcriptional regulation in human and mouse, called the Regulatory Build. These regions, called regulatory features, are inferred from publicly available experimental data sets, including:
- Open chromatin assays (DNase-seq)
- Histone modification assays (ChIP-seq)
- Transcription factor binding assays (ChIP-seq)
The different types of regulatory features annotated include:
- Promoter flanking regions
- CTCF binding sites
- Transcription factor binding sites
- Open chromatin regions
For each cell type the regulatory features are assigned labels to describe their activity levels. These include:
- ACTIVE, when the region bears/displays an active epigenetic signature
- POISED, when the region bears/displays a poised epigenetic signature
- REPRESSED, when the region is epigenetically repressed
- INACTIVE, when the region bears no epigenetic modifications from the ones included in the Regulatory Build, or
- NA, when there is no available data in the cell type for this feature.
Genome segmentation datasets are generated as part of our Regulatory Build pipeline for each cell type using algorithms such as ChromHMM (Ernst et al., 2011) or Segway (Hoffman et al., 2011). These algorithms detect recurring signal patterns, called states, from a collection of genome-wide assays, such as DNase-seq and ChIP-seq, across the different cell types. They then assign a state to each basepair per epigenome. Following this stage, the 25 states are assigned a functional label, including CTCF, Distal, Heterochromatin, Open Chromatin, Transcription Factor Binding Site, Gene, Predicted Weak enhancer/Cis-reg element, Proximal, Tss, Poised and Repressed, based on a decision tree described here. For more information please also see our segmentation analyses documentation.
Ensembl Regulation resources also provide hyper- and hypo-methylated CpGs, as these have been studied using Reduced Representation Bisulfite Sequencing (RRBS) assays on forty five cell lines and Whole Genome Bisulphite Sequencing (WGBS) assays on two cell lines.
Other Regulatory Data
Ensembl Regulation databases also store data directly imported from external sources:
- Predicted transcription start sites and enhancers from the FANTOM5 project
- microRNA target predictions for human and mouse using Diana TarBase
- Experimentally validated human enhancers from the VISTA Browser
- NGG CRISPR sites identified by the Wellcome Trust Sanger Institute Genome Editing group
- Expression quantitative trait loci (eQTL) data from 44 cell types from the Genotype-Tissue Expression (GTEx) project.
Ensembl stores microarray probe mappings for several species and technologies, including:
- Affymetrix: IVT and ST gene expression arrays
- Codelink: gene expression array
- Agilent: whole genome, CGH and SurePrint arrays
- Illumina: whole genome and Infinium methylation arrays
- Phalanx: OneArray
Displaying Regulatory Features
Regulation data can be accessed in the browser from various angles:
You can configure the Region in detail panel to display tracks linked to regulation. Watch our Region in detail video to find out how to add tracks. The Regulation tracks are grouped into subdivisions:
- Regulatory Features: these can be visualised using the 'Regulatory Build' track.
- Activity Levels: select the cell type/line of interest and display the activity levels for each regulatory feature defined in the Regulatory Build.
- Segmentation Features: select the cell type/line of interest and display the genomic state assignment in the region.
- Open Chromatin and Transcription Factor Binding Sites (TFBS): display of signal or peaks from assays measuring open chromatin (DNAse-seq) or transcription factor binding (ChIP-seq) in various cell types/lines.
- Histones and Polymerases: display of signal or peaks from experimental ChIP-seq assays measuring histone marks/modifications or binding of RNA Polymerases II and III.
- DNA Methylation: RRBS and WGBS methylation tracks.
- Other regulatory regions: imported tracks from external databases.
Species-specific microarray probe mappings can be visualised by turning on tracks from the separate 'Oligo probes' section at the bottom of the configuration panel.
Click on the Regulation link in the left hand side menu to view the regulatory features and GTEx SNP-gene associations in the vicinity of your gene. Note that a gene is not necessarily controlled by nearby regulatory elements, but rather by distal elements.
Clicking on a regulatory feature will open a Regulation tab with information about the evidence supporting that regulatory feature as well as cell-specific activity estimates. Different views can be selected:
- Summary: this view displays the selected regulatory feature from the Regulatory Build. No cell-type specific activity levels are displayed by default. To turn those on click on 'Select cells'.
- Details by cell type: this view displays the activity of the regulatory feature in any selected cell type (HUVEC by default) along with a default set of supporting evidence. You can display more cell types and/or evidence tracks by clicking on Configure this page or the Select cells/Select evidence button above the image.
- Feature Context: this view displays the regulatory features in a wider context around the chosen regulatory feature and their activity levels across all available cell types.
- Evidence: this view displays the complete list of supporting core evidence, such as histone modifications presence, transcription factor binding and open chromatin, for the chosen regulatory feature in a table.
In addition to the browser, the regulation data in Ensembl can also be accessed through :
- BioMart (Ensembl Regulation database).
- Perl APIs, including the Regulation API. See the Regulation API tutorial for more information.
- MySQL Ensembl funcgen database.
Daniel R Zerbino, Steven P Wilder, Nathan Johnson, Thomas Juettemann and Paul R Flicek
The Ensembl Regulatory Build
Genome Biology 16:56
Daniel R Zerbino, Nathan Johnson, Thomas Juettemann, Dan Sheppard, Steven P Wilder, Ilias Lavidas, Michael Nuhn, Emily Perry, Quentin Raffaillac-Desfosses, Daniel Sobral, Damian Keefe, Stafan Gräf, Ikhlak Ahmed, Rhoda Kinsella, Bethan Pritchard, Simon Brent, Ridwan Amode, Anne Parker, Steven Trevanion, Ewan Birney, Ian Dunham and Paul Flicek
Ensembl Regulation Resources
Database pii: bav119
Daniel R. Zerbino, Nathan Johnson, Thomas Juettemann, Steven P. Wilder and Paul Flicek
WiggleTools: parallel processing of large collections of genome-wide datasets for visualization and statistical analysis
Andrew Yates, Wasiu Akanni, M. Ridwan Amode, Daniel Barrell, Konstantinos
Billis, Denise Carvalho-Silva, Carla Cummins, Peter Clapham, Stephen
Fitzgerald, Laurent Gil, Carlos García-Girón,
Leo Gordon, Thibaut Hourlier, Sarah E. Hunt, Sophie H. Janacek, Nathan
Johnson, Thomas Juettemann, Stephen Keenan, Ilias Lavidas, Fergal J. Martin,
Thomas Maurel, William McLaren, Daniel N. Murphy, Rishi Nag, Michael Nuhn,
Anne Parker, Mateus Patricio, Miguel Pignatelli, Matthew Rahtz,
Harpreet Singh Riat, Daniel Sheppard, Kieron Taylor, Anja Thormann,
Alessandro Vullo, Steven P. Wilder, Amonida Zadissa, Ewan Birney,
Jennifer Harrow, Matthieu Muffato, Emily Perry, Magali Ruffier,
Giulietta Spudich, Stephen J. Trevanion, Fiona Cunningham,
Bronwen L. Aken, Daniel R. Zerbino and Paul Flicek
Nucleic Acids Research 4;44(D1):D710-6