EnsemblEnsembl Home

Ensembl Regulation (funcgen) API Tutorial

Introduction

The Ensembl Regulation team deals with functional genomics data. The API and databases for Ensembl Regulation are called Funcgen.

This tutorial is an introduction to the Funcgen API. Knowledge of the Ensembl Core API and of the concepts and conventions in the Ensembl Core API tutorial is assumed. Please note that the Ensembl Core API must also be installed to use the Funcgen API.

Connecting to the funcgen Database using the Registry

Connecting to any Ensembl database is made simple by using the Bio::EnsEMBL::Registry module:

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

Bio::EnsEMBL::Registry->load_registry_from_db(
    -host => 'ensembldb.ensembl.org',
    -user => 'anonymous'
);

The use of the registry ensures you will load the correct versions of the Ensembl databases for the software release it can find on a database instance. Using the registry object, you can then create any of number of database adaptors. Each of these adaptors is responsible for generating an object of one type. The above code will be omitted from the following examples for brevity.

Regulatory Features

Regulatory Features are features like involved with regulatory aspects like:

  • Predicted promoters,
  • Predicted promoter flanking regions,
  • Predicted enhancer regions,
  • CTCF Binding Sites,
  • Transcription factor binding sites or
  • Open chromatin regions.

The are generated by the Ensembl Regulatory Build.

To fetch Regulatory Features from the funcgen database, you need to use the corresponding adaptor. To obtain all the regulatory features present in a given region of the genome, use the adaptor method fetch_all_by_Slice:


my $slice_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Human', 'Core', 'Slice');
my $slice = $slice_adaptor->fetch_by_region('chromosome', 1, 54_960_000, 54_980_000);

my $regulatory_feature_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Human', 'Funcgen', 'RegulatoryFeature');
my @regulatory_features = @{$regulatory_feature_adaptor->fetch_all_by_Slice($slice)};

foreach my $current_regulatory_feature (@regulatory_features) {
  print $current_regulatory_feature->stable_id.": ";
  print_feature($current_regulatory_feature);
  print "\tFeature Type: ".$current_regulatory_feature->feature_type->name."\n";
}

Regulatory Activities

For every regulatory feature the Ensembl Regulatory Build predicts the regulatory activity of the regulatory feature in each of the epigenomes of the regulatory build. For every epigenome there are 5 possible activities:

  1. Inactive,
  2. Repressed,
  3. Poised,
  4. Active or
  5. Na

The regulatory activities have their own object and can be queried like this:

  my $regulatory_feature_demo_stable_id = 'ENSR00000165384';

  my $regulatory_feature = $regulatory_feature_adaptor->fetch_by_stable_id($regulatory_feature_demo_stable_id);
  print "The regulatory feature with stable id: "  . $regulatory_feature->stable_id . " has the following activities: \n";

  my $regulatory_activity_adaptor = Bio::EnsEMBL::Registry->get_adaptor('homo_sapiens', 'funcgen', 'RegulatoryActivity');
  my $regulatory_activity_list    = $regulatory_activity_adaptor->fetch_all_by_RegulatoryFeature($regulatory_feature);

  foreach my $current_regulatory_activity (@$regulatory_activity_list) {
    print "\tIn the epigenome "  
      . $current_regulatory_activity->get_Epigenome->display_label 
      . ' it is: ' 
      . $current_regulatory_activity->activity 
      . "\n";

  }

Regulatory Evidence

The regulatory build can provide evidence to support its prediction. The regulatory evidence are motifs or annotated features. Annotated features are often called 'enriched regions' or 'peaks'. This example is like the previous one, but for the regulatory activities that have regulatory evidence it prints this evidence:

  my $regulatory_feature_demo_stable_id = 'ENSR00000165384';
  my $regulatory_feature = $regulatory_feature_adaptor->fetch_by_stable_id($regulatory_feature_demo_stable_id);

  print "The regulatory feature with stable id: "  . $regulatory_feature->stable_id . " has the following activities: \n";

  my $regulatory_activity_adaptor = Bio::EnsEMBL::Registry->get_adaptor('homo_sapiens', 'funcgen', 'RegulatoryActivity');
  my $regulatory_activity_list = $regulatory_activity_adaptor->fetch_all_by_RegulatoryFeature($regulatory_feature);

  my $regulatory_evidence_link_adaptor = Bio::EnsEMBL::Registry->get_adaptor('homo_sapiens', 'funcgen', 'RegulatoryEvidenceLink');

  foreach my $current_regulatory_activity (@$regulatory_activity_list) {

    print "\tIn the epigenome "  
      . $current_regulatory_activity->get_Epigenome->display_label 
      . ' it is: ' 
      . $current_regulatory_activity->activity 
      . "\n";

    my $regulatory_evidence_link_list = $regulatory_evidence_link_adaptor->fetch_all_by_RegulatoryActivity($current_regulatory_activity);

    if (@$regulatory_evidence_link_list) {
      print "  This is supported by the following evidence:\n";

      foreach my $current_regulatory_evidence_link (@$regulatory_evidence_link_list) {
        my $evidence = $current_regulatory_evidence_link->get_Evidence;
        print "  - " . $evidence->display_label . ': ' . $evidence->start . '..' . $evidence->end . "\n";
      }
    }
  }

Annotated Features: Enriched regions and peaks

Regulatory Features are built based on results from experiments like Dnase1 sensitivity assays (Dnase-Seq) to detect regions of open chromatin, or transcription factor binding assays, like Chromatin immunoprecipitation (ChIP) coupled with high throughput sequencing (ChIP-Seq). Results from these experiments are stored as Annotated Features.

ChIP-Seq studies are also used to detect histone modifications (eg. H3K36 trimethylation) and Polymerase binding sites. Annotated Features for these experiments are used in the regulatory build for generating epigenome-specific boundaries for Regulatory Features.

To specifically access Annotated Features that serve as evidence for a given Regulatory Feature, you just need to specify:

  my $annotated_features = $regulatory_activity->get_RegulatoryEvidence_by_type('annotated');

Annotated Features contain specific properties:

  • Feature Type usually representing antibody used in ChIP experiment.
  • Feature Set for this Annotated Feature (see further down in Tutorial).
  • Score. An analysis-dependent value (eg. peak-caller score)
  • The peak Summit. Precise 1bp position within annotated feature with the highest read density in a ChIP experiment. It is dependent on the analysis and sometimes it may not be present.

Here is an example how properties from annotated features can be printed:

foreach my $annotated_feature (@$annotated_features) {

  print_feature($annotated_feature);
  print $annotated_feature->feature_type->name . "\n";
  print $annotated_feature->feature_set->name  . "\n";
  print $annotated_feature->score              . "\n";
  
  # Summit is usually present, but may not exist
  if($annotated_feature->summit) {
    print $annotated_feature->summit . "\n";
  }
}

Motif Features: Transcription factor binding sites

Motif Features represent short genomic regions where a Transcription Factor is thought to be directly interacting with the DNA. These regions are called Transcription Factor binding sites. More information on how these sites are found in Ensembl is on the RegulatoryBuild page.

To obtain the motif features present in a given Regulatory Feature, you can do this:

  my $motif_features = $regulatory_activity->get_RegulatoryEvidence_by_type('motif');
or
  my $slice = $slice_adaptor->fetch_by_region('chromosome', 1, 54_960_000, 54_980_000);

  my $motif_feature_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Human', 'Funcgen', 'MotifFeature');
  my $motif_features = $motif_feature_adaptor->fetch_all_by_Slice($slice);

Motif Feature objects have these properties:

  • Binding Matrix: Position Weight Matrix used to define the site. These are currently from Jaspar and their name is the Jaspar Identifier.
  • Score. analysis-dependent value indicating degree of similarity to the binding matrix
  • Associated Annotated Features: set of features that serve as experimental support for a binding site.

Here is an example how properties from motif features can be printed:

foreach my $current_motif_feature (@$motif_features) {
  print_feature($current_motif_feature);
  print $current_motif_feature->binding_matrix->name . "\n";
  print $current_motif_feature->seq                  . "\n";
  print $current_motif_feature->score                . "\n";
  
  my $associated_annotated_features = $current_motif_feature->associated_annotated_features;
  foreach my $current_associated_annotated_feature (@$associated_annotated_features) {
    print_feature($current_associated_annotated_feature); 
  }
}

You can access Motif Features present in an annotated feature like this:

  # Returns a reference to an array
  my $motif_features = $annotated_feature->get_associated_MotifFeatures;

Feature Sets: Groups of related features

Most features can be grouped according to the type of feature and analysis used to create them. Feature sets are used to group features from one analysis by epigenome and the type of feature.

Feature Sets group two classes of features: Annotated and External. External Features are features that have been created outside of Ensembl and we import them directly into the database. Motif Features can only be grouped by Binding Matrix, and are not currently grouped by Feature Sets.

To access Feature Sets, you use the feature type adaptor:

  my $feature_set_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Human', 'Funcgen', 'FeatureSet');

You can for example get all the annotated feature sets like this:

  my $annotated_feature_sets = $feature_set_adaptor->fetch_all_by_feature_class('annotated');
Feature Sets include the following properties:
  • Feature Class of feature set indicating the type of features in the set: 'regulatory', 'annotated' or 'external'
  • Analysis used to obtain the features in this set
  • Feature Type of the features in this set. This depends on the Type of Feature set. For example, for 'annotated' feature sets, the feature type will be an antibody in a ChIP-Seq experiment.
  • Cell Type for the features in this set, if unique. May not be defined in some feature sets. External Features may not have a cell type defined.
  • Features: the features in this set.

You can print them like this:

foreach my $current_annotated_feature_set (@$annotated_feature_sets) {

    print $current_annotated_feature_set->name                 . "\n";
    print $current_annotated_feature_set->feature_class        . "\n";
    print $current_annotated_feature_set->analysis->logic_name . "\n";
    print $current_annotated_feature_set->feature_type->name   . "\n";
    print $current_annotated_feature_set->epigenome->name      . "\n";
    
    # Finally, you can also get features from this set
    #
    my $annotated_features = $current_annotated_feature_set->get_Features_by_Slice($slice);
    
    foreach my $current_feature (@$annotated_features) {
      print_feature($current_feature);
    }
}

External Features: Externally curated data

There are some Feature Sets that are either entirely or partially curated by external groups. These are stored as External Features and can be accessed as follows:

  # Fetch all external feature sets
  my $external_feature_sets = $feature_set_adaptor->fetch_all_by_feature_class('external');

  foreach my $current_external_feature_set (@$external_feature_sets) {
    print "External FeatureSet: " . $current_external_feature_set->name . "\n";
  }

If you know the name of a feature set, you can use the name to fetch the data. For example, we store data from the Vista Enhancer Browser.

  my $vista_feature_set = $feature_set_adaptor->fetch_by_name('VISTA enhancer set');

  # Now you can get all the features (in this case external features) 
  # You can also get features by Slice using get_Features_by_Slice: 
  #
  foreach my $current_vista_feature (@{$vista_feature_set->get_all_Features}){
      print_feature($current_vista_feature);
      
      # There is no epigenome for these features
      # Feature type indicates vista annotation (eg. active enhancer)
      #
      print $current_vista_feature->feature_type->name."\n";
  }

Feature Types

Feature Types provide a biological annotation for features. They are divided in classes forming biologically coherent groups (eg. Transcription Factors). This is different from the Feature Set class, which just states the origin of the data. Feature Types can be accessed using a specific adaptor:

  my $feature_type_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Human', 'Funcgen', 'FeatureType');

External Feature Types

Feature Types for External Features have a meaning that is specific to the Feature Set. For example, for features of the Vista Feature Set, the feature type indicates if the feature was active or inactive in an experiment.

Annotated Feature Types

Feature Types for Annotated Feature Sets usually correspond to the antibody used in a ChIP experiment. We also group these Feature Types by classes of functionally related Feature Types. For example, you may want to look for all Feature Sets related to Transcription Factor binding. For this, you just have to use appropriate adaptors:

  my $transcription_factor_feature_types = $feature_type_adaptor->fetch_all_by_class('Transcription Factor');
  
  foreach my $current_transcription_factor_feature_type (@$transcription_factor_feature_types) {
  
      print $current_transcription_factor_feature_type->name."\n";
      
      my $feature_sets = $feature_set_adaptor->fetch_all_by_FeatureType($current_transcription_factor_feature_type);
      
      foreach my $feature_set (@$feature_sets) {
        print "\thas feature set " . $feature_set->name."\n"; 
      }
  }

Microarrays and associated Information

This code demonstrates the use of adaptors, specifically the ArrayAdaptor. In this example, we export all microarray platforms and associated information.

  # Grab the adaptor
  my $array_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Human','Funcgen','Array');

  my $array = $array_adaptor->fetch_all;

  foreach my $current_array (@$array) {
  
    # Print some array info
    print "Array:  " . $current_array->name   ."\n";
    print "Type:   " . $current_array->type   ."\n";
    print "Vendor: " . $current_array->vendor ."\n";
    
    my $array_chips   = $current_array->get_ArrayChips;

    #Print some ArrayChip info
    foreach my $current_array_chip (@$array_chips) {
      print "ArrayChip: " 
        . $current_array_chip->name 
        . " DesignID: " 
        . $current_array_chip->design_id . "\n";
    }
    print "\n";
  }

Fetch all ProbeSets from a specific Array

In this example, probesets from the WholeGenome_4x44k_v1 array are obtained.

  my $probe_adaptor         = Bio::EnsEMBL::Registry->get_adaptor('Human', 'Funcgen', 'Probe');
  my $probe_feature_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Human', 'Funcgen', 'ProbeFeature'); 

  # Fetch a probeset from the WholeGenome_4x44k_v1 array
  my $probe = $probe_adaptor->fetch_by_array_probe_probeset_name('WholeGenome_4x44k_v1', 'A_23_P18656');
  print "Got " . $probe->class . " probe " . $probe->get_probename . "\n";

  # Fetch the feature associated with this probe
  my @pfeatures = @{$probe_feature_adaptor->fetch_all_by_Probe($probe)};
  print "\nFound ".scalar(@pfeatures)." ProbeFeatures\n";

  #Print some info about the features
  foreach my $pfeature ( @pfeatures ){
    print "\nProbeFeature found at:\t".$pfeature->feature_Slice->name."\n";
  }

Microarray Annotations

In this example, the FOXP2 transcript is fetched by it's stable_id. Then all ProbeSets that have been mapped to this transcript are fetched and printed.

  #Grab the relevant adaptors
  my $transcript_adaptor = Bio::EnsEMBL::Registry->get_adaptor("human", "Core",    "Transcript");
  my $probe_set_adaptor  = Bio::EnsEMBL::Registry->get_adaptor("human", "Funcgen", "ProbeSet");

  #Fetch the transcript and associated ProbeSets
  my $transcript = $transcript_adaptor->fetch_by_stable_id('ENST00000393489');#Foxp2
  my $probesets  = $probe_set_adaptor->fetch_all_by_external_name($transcript->stable_id);

  foreach my $probeset (@$probesets) {

    my $arrays_string = join(', ', (map { $_->name } @{$probeset->get_all_Arrays}));

    # Now get linkage annotation
    #
    my $database_entry_info;
    foreach my $current_database_entry (@{$probeset->get_all_Transcript_DBEntries($transcript)}) {

      # This will return all ProbeSet DBEntries for this transcript
      # There should really be one max per transcript per probeset/probe
      #
      $database_entry_info = $current_database_entry->linkage_annotation;
    }
    print $probeset->name 
      . " on arrays " . $arrays_string 
      . " with Probe hits $database_entry_info\n";
  }

Documentation on the current array mapping strategy can be found here. More examples for accessing microarray data can be found in the funcgen API package: ensembl-funcgen/scripts/examples/microarray_annotation_example.pl

Further help

Complete example scripts for the funcgen API can be found in the funcgen API package: ensembl-funcgen/scripts/examples/ For additional information or help mail the ensembl-dev mailing list. You will need to subscribe to this mailing list to use it. More information on subscribing to any Ensembl mailing list is available from the Ensembl Contacts page.