Introduction to the Ensembl Web Code

The Ensembl codebase is highly complex, consisting of many hundreds of modules. The following notes should help you begin to find your way around!

  1. Web code directories
  2. URL routing
  3. Allowed scripts
  4. Plugins

Web code directories

The following directories contain web-related code:

cbuild
inline C code for handling data files
conf
site-wide configuration files
ctrl-scripts
Apache startup and stop scripts
htdocs
general HTML content (e.g. code documentation)
modules
the main mod_perl codebase used to generate the site
perl
Perl "CGI" scripts used for some legacy behaviour
utils
various scripts used to maintain an Ensembl website, e.g. updating content

The following directories are typically replicated inside plugins in order to override "core" functionality:

  • conf
  • htdocs
  • modules

See plugins (below) for instructions on how to configure plugins in Ensembl.

Any other directories in your checkout will contain the Perl API, and after server startup you will see some additional autogenerated directories used to cache images and other files.

modules/EnsEMBL/Web

Most of the web code generated by the Ensembl web team lives in the EnsEMBL::Web namespace. You will not normally want to edit this code, but you can extend it by replicating the namespace in your own plugin and adding or overriding methods as required. See Extending the Ensembl web code for more details.

Througout this documentation, EnsEMBL::Web is frequently abbreviated to E::W to save space and typing!

URL routing

Ensembl now uses URL routing, that is, URLs do not necessarily correspond to physical directories but are parsed into their components and passed to a generic script that constructs an appropriate page.

The exception to this is the static content, i.e. simple HTML pages used to hold documentation about the site and project - like this page. As a general rule, if the URL is in lower case, it is static content; if the "directories" have initial capitals, the page is dynamically generated.

A typical dynamic URL is shown below:

http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=BRCA2

The URL is split on '/' into the following parts:

Species path
Usually just a single "directory", e.g. 'Homo_sapiens', although on some Ensembl-powered sites such as EnsEMBL Bacteria, a multi-directory structure may be used to group closely related species or strains. Other possible values are 'Multi' (for pages that allow access to multiple species' data, e.g. BLAST) or undef (empty) if the page is not connected to any species (e.g. user account management).
Type
This is the type of data being displayed on the page, e.g. Location, Gene, etc for genomic data, or Help, Account, etc for general web pages.
Action
This denotes the particular view or sub-display of the type of data. In our example, the Action is "Summary", meaning this is the page summarising useful information about the gene.
Function
This is an optional fourth component of the URL. It is mainly used with interactive code such as user account management, e.g. /Account/Bookmark/Edit is the URL for the form where you edit the information stored in a user bookmark.

The parameters after the ? are handled as per normal CGI parameters; in this case, we have the name of the gene we want to display information about.

The URL is parsed in E::W::Apache::SpeciesHandler - this module should be left well alone unless you know exactly what you are doing!

Allowed scripts

In order to determine what type of response a URL requires (full pages, HTML fragments etc.), the Type part of the URL is assigned a script, as follows:

Page
Normal web pages
Modal
Popup "control panel" (data export, account management)
Config
A variation on the modal page, used to create the image configuration control panels
Component
Asynchronously generated page elements (the ones that replace the animated spinner)
ZMenu
Small popup menus used for contextual navigation

This script definition takes place in the $OBJECT_TO_SCRIPT hash in conf/SiteDefs.pm (which can be extended in your plugin if you want to add data Types to Ensembl):

## ALLOWABLE DATA OBJECTS
$OBJECT_TO_CONTROLLER_MAP = {
  Gene                => 'Page',
  Transcript          => 'Page',
  Location            => 'Page',
  Variation           => 'Page',
  StructuralVariation => 'Page',
  Regulation          => 'Page',
  Marker              => 'Page',
  GeneTree            => 'Page',
  Family              => 'Page',
  LRG                 => 'Page',
  Phenotype           => 'Page',
  Experiment          => 'Page',
  Info                => 'Page',
  Search              => 'Page',
  UserConfig          => 'Modal',
  UserData            => 'Modal',
  Help                => 'Modal',
};

The value is then used in E::W::Apache::SpeciesHandler to decide which child of E::W::Controller will be used to process the request.

Plugins

The Ensembl webcode is designed to be extensible, so that you can customize your own installation without your changes being overwritten when you update to a new release.

By creating your own plugin, you can completely change the available species, alter the colour scheme or page template, or add your own views and static content.

Public plugins

A selection of plugins are included as part of the standard Ensembl checkout, enabling you to include optional features in your site.

Most public plugins have a README file giving more detailed and up-to-date information on how to use them.

public-plugins/ensembl
Used to configure the current set of Ensembl species (as seen on www.ensembl.org) - without this or a similar plugin, no data will appear on your site.
public-plugins/mirror
Used to configure your local server settings
public-plugins/genoverse
Latest stable version of the Genoverse scrolling browser
public-plugins/solr
Solr search engine
public-plugins/tools
Web interface for BLAST, VEP, etc
public-plugins/tools_hive
eHive backend for tools server
public-plugins/orm
This plugin is used to separate optional features (user accounts, ability to update databases through a web interface) from the core functionality of the Ensembl webcode. It uses Rose::DB::Object and its associated modules for database access, and thus has a lot of additional Perl dependencies
public-plugins/admin
Administrative interface for non-biological content, such as help and news. Depends on public-plugins/orm

Using plugins

Plugins are used to complement the normal system of inheritance in object-oriented Perl. Whereas a child object can inherit methods from multiple parents, a parent object normally cannot be overridden by multiple children. The plugin system "aggregates" the contents of several methods into one "master" method that can then be used by mod_perl when rendering the webpage.

The module conf/Plugins.pm controls which plugins are used by an instance of Ensembl and their order of precedence. In a standard Ensembl mirror, the module will define a plugin array as follows:

$SiteDefs::ENSEMBL_PLUGINS = [
  'EnsEMBL::Mirror'     => $SiteDefs::ENSEMBL_SERVERROOT.'/public-plugins/mirror',
  'EnsEMBL::Genoverse'  => $SiteDefs::ENSEMBL_SERVERROOT.'/public-plugins/genoverse',
# 'EnsEMBL::Solr'       => $SiteDefs::ENSEMBL_SERVERROOT.'/public-plugins/solr',
# 'EnsEMBL::Users'      => $SiteDefs::ENSEMBL_SERVERROOT.'/public-plugins/users',
  'EnsEMBL::Ensembl'    => $SiteDefs::ENSEMBL_SERVERROOT.'/public-plugins/ensembl'
  'EnsEMBL::Docs'       => $SiteDefs::ENSEMBL_SERVERROOT.'/public-plugins/docs',
];

The plugins are processed in reverse order, starting with the last one.

Important note: Regardless of any changes you may make to your Ensembl website, you must check out all the Ensembl Perl API code, as the web code has extensive dependencies on all the API modules.