Documenting web code in Ensembl

Documenting documents

The Ensembl web code is made up of a large number of Perl modules which span a wide range of tasks. The 'architecture' documentation is an attempt at introducing how the Ensembl web site uses some of these modules to maniplulate and display genomic information. In most cases, however, it does not include information on a particular method's function calls, or its inheritance.

Like many Perl projects, the role of documenting the application programming interface of the Ensembl web code has traditionally been handled by POD. As such, some web code modules contain POD style documentation within the module source code. Others, however, do not. POD's verbosity often leads to a lack of API documentation, and so in September 2006, a simpler documentation format was introduced.

Introducing e! doc

The Ensembl documentation format (e! doc) is a light weight way of including useful documentation into source code files. The source code itself is parsed automatically to generate a set of HTML pages which contain a large amount of useful information for those working with the Ensembl web code.

Method documentation

e! doc documentation can be placed anywhere inside a module's source code file, and appears prefixed with three hashes. For example:

  sub set_type {
    ### Sets data type for object.
    my $self= shift;
    $Type_of{$self} = shift if @_;
  } 

This would include the set_type method in the documentation, along with the description of the method that appears after the triple hash. Multiple lines are also permitted, so long as each one starts with a triple hash. Lines starting with returns are appended to the bottom of the documentation for a method. For example, if the above set_type method was turned into an accessor:

  sub type {
    ### Accessor for data type
    ### Returns data type
    my $self= shift;
    $Type_of{$self} = shift if @_;
    return $Type_iof{$self};
  } 

A number of short hand keywords1 can also be used for swift documenting:

  • a: for accessors
  • c: for constructors
  • d: for destructors
  • i: for initialiser
  • x: for deprecated methods

For example, the above can be reduced to:

  sub type {
    ### a
    my $self= shift;
    $Type_of{$self} = shift if @_;
    return $Type_iof{$self};
  } 

Keywords can also be combined with an additional description. For example, in a constructor:

  sub new {
    ### c 
    ### Creates a new inside-out class representing
    ### Ensembl data.
    my $self= shift;
    $Type_of{$self} = shift if @_;
    return $Type_iof{$self};
  } 

Additional HTML formatting can also be included. Empty comments are converted into blank lines.

  sub new {
    ### c 
    ### Creates a new inside-out class representing Ensembl data.
    ###
    ### This constructor returns a new Ensembl data class.
    my $self= shift;
    $Type_of{$self} = shift if @_;
    return $Type_iof{$self};
  } 

Module documentation

In addition to providing information relating to individual method calls, documentation can also be added to entire modules. The same triple hash comments can be placed between the package declaration and the first method definition to provide an overview of the functionality of a module.

  package EnsEMBL::Web::Root;

  ### The root class for web code modules
  ### in Ensembl. 

  use strict;
  use warnings;

  sub new {
    ...
  }
  
  1;

Inheritance and subclasses

The e! doc system also provides information on inheritance and subclasses. The HTML documentation pages produced include all methods defined in a class, along with any methods made available to it through inheritance. Subclasses of modules are also listed. This information is derived from the @ISA array in the perl module, so no additional documentation markup is required.

Referencing other modules and methods

When documenting one method, it is sometimes useful to make reference to associated modules and methods. When placed in double curly braces, other modules or method names become hyperlinks in the resulting HTML documentation pages.

  sub panel {
    ### Returns the {{EnsEMBL::Web::Document::Panel}} associated with 
    ### this object. If no panel is found, a new one is created with 
    ### {{new_panel}}.
    ...
  }

  sub new_panel {
    ### Creates and returns a new panel by calling
    ### {{EnsEMBL::Web::Document::Panel::new}}.
    ...
  }

Tables

In addition to standard text documentation, tabulated comments can also be included in the documentation. Tables can appear anywhere within a comment, but should be preceeded and followed by a triple underscore character. For example:

  sub new {
  ### c
  ### Inside-out class for z-menus.
  ### ___
  ### Arg 1: class name
  ### Arg 2: parameter hash
  ### Description: Constructor method for {{EnsEMBL::Web::Interface::ZMenu}}
  ###              objects. Receives a parameter hash of the following keys:
  ### Title:       title of the zmenu
  ### Type:        type of zmenu
  ### Ident:       unique identifier of zmenu (really should be unique!)
  ### Add:         array of {{EnsEMBL::Web::Interface::ZMenuItem}} objects
  ###              add to the ZMenu
  ### Remove:      array of strings of ZMenuItem objects to remove from
  ###              the ZMenu
  ### View:        The {{EnsEMBL::Web::Interface::ZMenuView}} object to
  ###              use for rendering. A new one is automatically created
  ###              if no object is specified.
  ### Returns:    A new {{EnsEMBL::Web::Interface::ZMenu}}.
  ### ___
  ###
    ...
  }

As with text documentation, references to other modules and methods can also be included in the tables.

Source code

The HTML documentation includes links to view the source code of a module. This information is populated dynamically as it is needed to speed up page loads. Method code is also displayed with a zippy animation. Who says documentation can't be fun?

Indexing the API

A collection of EnsEMBL::Web::Tools::Document classes are responsible for finding and parsing the API modules, and producing the HTML documentation pages.

An EnsEMBL::Web::Tools::Document object is responsible for finding the module files, and creates a series of EnsEMBL::Web::Tools::Document::Module objects. Each of these objects does the heavy lifting of parsing the module source code, and in turn creates a series of EnsEMBL::Web::Tools::Document::Method objects. The Document object then processes these data objects and passes them to the EnsEMBL::Web::Tools::DocumentView object, which actually writes the HTML documentation files.

Two scripts are included which will automatically perform the index on the web code and the drawing code. They assume a default directory setup, and copy the generated HTML documentation pages to the htdocs/docs directory.

  • document.pl creates the necessary EnsEMBL::Web::Document classes, performs the index, and writes the HTML output based on a collection of command line parameters.
  • update_docs.pl runs document.pl with a default set of command line parameters. This indexes all web code modules, and associated plugins by default, and places the resulting HTML documentation inside a temp directory. This directory should be copied to a web accessible location for viewing.

Viewing the documentation

A series of HTML files are created by the documentation which are displayed in three frames:

  • A complete list of all modules
  • A complete list of all methods
  • A single page listing all methods for each module
  • Some additional pages of statistics. 2

These pages are viewable in a web browser, and provide easy client side API searches with the browser's built in find function.

Footnotes:

  1. Keywords can be added or removed using the keywords accessor of the EnsEMBL::Web::Tools::Document object performing the API indexing (see Indexing the API).
  2. A documentation coverage percentage is calculated for each module based on the ratio between those methods which contain a triple hashed comment, and those that do not. Families of modules are also ranked by documenattion coverage.