EnsemblEnsembl Home
Regulatory Segmentation

Regulatory Segmentation

The ENCODE combined segmentation classifies the genome into regions such as predicted promoters, enhancers, and repressed (see table below). These are genomic regions of similar signal over 14 assays to obtain. Each defines a single-track summary of the functional architecture of the human genome in one of six cell types.

The assays were generated in the ENCODE project for GM12878, K562, H1-hESC, HepG2, HeLa-S3, and HUVEC, and were chosen to maximise information content about the state of the genome. These assays (including control input sequencing) were coordinated across all cell lines and constituted from three classes of data:

Input Data ClassDescription
Open chromatinDNase1 hypersensitivity and FAIRE
Transcription factorsPolII and CTCF
Histone modificationsH3k4me1, H3k4me2, H3k4me3, H3k9ac, H3k27ac, H3k27me3, H3k36me3, H4k20me1

Two unsupervised segmentation programs were used:

  • ChromHMM (Ernst et al., 2011)

    ChromHMM labels each assay as high or low in 200 base pair bins over the whole human genome and runs a 25-state Hidden Markov Model.

  • Segway (Hoffman et al., 2011)

    A Dynamic Bayesian Network approach using base-pair resolution real valued signal data, trained over the ENCODE pilot regions (1% of the genome), and fitted over the whole genome.

The segmentations produced by these two methods were then combined based on their agreements in an automated fashion, in order to maximise resolution and biological interpretability. The segments were then labelled according to their signal distribution and genomic location, giving the following classifications:

CTCFCTCF enriched element
WEPredicted weak enhancer or open chromatin cis regulatory element
TPredicted transcribed region
EPredicted enhancer
PFPredicted promoter flanking region
RPredicted repressed or low activity region
TSSPredicted promoter region including transcription start site

The following graphic shows clustering of informative features used to generate the different classes of segment. The x-axis refers to the segment class and the y-axis shows different groups of experiments for a given feature type e.g DNase1, H3K4me2 etc. The colour indicates whether an experimental mark is strongly associated to a segment class.

GM12878 Signal Distribution For example, we observe the following associations:
  • Transcriptional Activation: H3k4me1, H3K4me2, H3k4me3, H3k9ac, H3k27ac
  • Transcriptional Elongation: H3k36me3
  • Transcriptional Repression: H3k27me3
In the repressed state ‘R’, H4K20me1 is absent and H3K27me3 is less present than in CTCF binding regions. This is due to the fact that the 'R' state represents a diversity of states that blur the apparent signal.

Regulatory Segmentation in the Browser

There is one segmentation track available for each of the six ENCODE cell lines. These tracks are on by default in most regulation views, but not in the Location tab. Once in the Location tab, you need to configure the page to turn on the Segmentation tracks as show below.

Selecting segmentation tracks

The colours used for each of the segmentation classes follows the agreed ENCODE standard (a legend is displayed at the bottom of any window displaying regulatory features).