ENCODE data in Ensembl
How to view ENCODE data
Multiple data tracks relating to gene regulation, based on ENCODE data can be turned on in the 'region in detail' page on our GRCh37 archive for human and mouse. A full tutorial on how to use these tracks is available on the EMBL-EBI website.
Original ENCODE data
The full ENCODE datasets that were used in the Ensembl regulatory build can also be viewed in the Ensembl GrCh37 archive, by attaching a track hub to Region in Detail - the link below will do this automatically:
This creates a menu in the Control Panel on Region in Detail, from which you can add individual tracks or groups of tracks using matrix selectors. Cell type and experimental factor are the two principal axes; other dimensions can be selected by clicking on a box to open an additional submenu (see below).
Removing the track hub
To turn off tracks, simply return to the configuration matrix you used earlier, or hover your mouse over the track name on the lefthand side of the Region in Detail image and click on the X in the popup menu that appears.
To remove the track hub completely, go to "Manage Your Data" on the GRCh37 archive and click on the trash can icon on the "ENCODE data" row.
The ENCODE project
The Encyclopedia of DNA Elements (ENCODE) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.
The ENCODE Project has produced genome-wide data for over 100 different cell types for investigating different aspects of genomic regulation, including:
- chromatin structure (5C)
- open chromatin (DNase-seq and FAIRE-seq)
- histone modifications and DNA-binding of over 100 transcription factors (ChIP-seq)
- RNA transcription (RNAseq and CAGE)
ENCODE and Ensembl
The Ensembl team has played an active role in the ENCODE project:
- Ensembl is part of GENCODE, a sub-project of the ENCODE scale-up project. GENCODE is headed by Tim Hubbard and is responsible for creating the gene sets used by the ENCODE Project Consortium. Ensembl's role is to provide automatic annotation on the human reference genome assembly and to merge this annotation with manual annotation from the HAVANA team. The gene set provided by Ensembl for human is the GENCODE gene set. Tim Hubbard, Ensembl and HAVANA are all based at the Wellcome Trust Sanger Institute.
- Members of the Ensembl Regulation team were involved in constructing and running the ENCODE uniform peak-calling pipeline, creating standarised and reproducible peak calls for the ENCODE Transcription Factor ChIP-seq data, and in creating the ENCODE Combined genome segmentation, using open chromatin, histone modification, RNA polymerase and CTCF ChIP-seq signal to assign functional annotations to regions of the human genome for the six main cell types.
- Members of the Ensembl Compara team were involved in the identification of conserved regions and additional analyses for the ENCODE project.
- Members of the Ensembl team were heavily involved in the ENCODE Data Analysis Centre, headed by Ewan Birney at the European Bioinformatics Institute, which organised all the integrative analysis conducted in the Consortium and hosted the analysed data.
Ensembl uses data created by the ENCODE project in its Regulatory Build pipeline, including many ENCODE DNase-seq, FAIRE-seq and ChIP-seq short read files, as well as data from other projects/publications, as its input, before aligning and peak-calling for identification and annotation of regulatory features, and the peak-calls and signal tracks can be viewed using the Regulation section of "Configure This Page".
Ensembl Regulation also incorporates the ENCODE Combined genome segmentations, which can be viewed in the Ensembl website or accessed in BioMart or programatically through the API.