The ENCODE combined segmentation classifies the genome into regions such as predicted promoters, enhancers, and repressed (see table below). These are genomic regions of similar signal over 14 assays to obtain. Each defines a single-track summary of the functional architecture of the human genome in one of six cell types.
The assays were generated in the ENCODE project for GM12878, K562, H1-hESC, HepG2, HeLa-S3, and HUVEC, and were chosen to maximise information content about the state of the genome. These assays (including control input sequencing) were coordinated across all cell lines and constituted from three classes of data:
|Input Data Class||Description|
|Open chromatin||DNase1 hypersensitivity and FAIRE|
|Transcription factors||PolII and CTCF|
|Histone modifications||H3k4me1, H3k4me2, H3k4me3, H3k9ac, H3k27ac, H3k27me3, H3k36me3, H4k20me1|
Two unsupervised segmentation programs were used:
- ChromHMM (Ernst et al., 2011)
ChromHMM labels each assay as high or low in 200 base pair bins over the whole human genome and runs a 25-state Hidden Markov Model.
- Segway (Hoffman et al., 2011)
A Dynamic Bayesian Network approach using base-pair resolution real valued signal data, trained over the ENCODE pilot regions (1% of the genome), and fitted over the whole genome.
The segmentations produced by these two methods were then combined based on their agreements in an automated fashion, in order to maximise resolution and biological interpretability. The segments were then labelled according to their signal distribution and genomic location, giving the following classifications:
|CTCF||CTCF enriched element|
|WE||Predicted weak enhancer or open chromatin cis regulatory element|
|T||Predicted transcribed region|
|PF||Predicted promoter flanking region|
|R||Predicted repressed or low activity region|
|TSS||Predicted promoter region including transcription start site|
The following graphic shows clustering of informative features used to generate the different classes of segment. The x-axis refers to the segment class and the y-axis shows different groups of experiments for a given feature type e.g DNase1, H3K4me2 etc. The colour indicates whether an experimental mark is strongly associated to a segment class.
For example, we observe the following associations:
Regulatory Segmentation in the Browser
There is one segmentation track available for each of the six ENCODE cell lines. These tracks are on by default in most regulation views, but not in the Location tab. Once in the Location tab, you need to configure the page to turn on the Segmentation tracks as show below.
The colours used for each of the segmentation classes follows the agreed ENCODE standard (a legend is displayed at the bottom of any window displaying regulatory features).