Ensembl BlastView Configuration
- Registering Sequence Databases
- Registering Search Methods
- Registering Method vs. Database Links
- Configuring the ENSEMBL BLAST Database
This document provides details for configuring the Ensembl BlastView web interface.
Ensembl Configuration Overview
The Ensembl distribution is usually stored within a filesystem directory (hereafter refered to as $ENSEMBL ROOT), e.g.:
$ setenv ENSEMBL_ROOT /usr/local/ensembl
Ensembl web site configuration data are stored in text files within $ENSEMBL ROOT, in the following directory;
Species-independent configuration are stored in (hereafter refered to as MULTI.ini):
Species-default configuration are stored in (hereafter refered to
Species-specific configuration are stored in (hereafter refered to
as <SPECIES>.ini), for example:
The .ini files are separated into sections containing key-value pairs using the format:
[SECTION_HEADING] key1 = value1 key2 = value2
BlastView configuration consists of the following components:
- Register of sequence databases to search against
- Register of methods (executables) that run the search
- Register of search method vs. sequence database linkage
- Location of MySQL database for search report storage (ENSEMBL BLAST)
Databases and methods are generally species independent, and their configurations are therefore stored in the DEFAULTS.ini file. Method vs. sequence database linkage, however, is species specific, and stored in the <SPECIES>.ini files.
For interface responsiveness search reports are parsed once and the results stored in a database. The location of this database is configured in the MULTI.ini file.
Registering Sequence Databases
Sequence database configuration requires the following data;
- A "type" string, used for grouping of databases across species and methods, (e.g. CDNA ALL, PEP ALL)
- A "label" string, used for display in the interface, (e.g. "Ensembl cDNAs", "Ensembl Peptides").
To register databases, type/label pairs must be entered in the
BLAST DATASOURCES configuration section of
MULTI.ini. An additional DEFAULT key sets the type of
the default database e.g.:
[BLAST_DATASOURCES] ; Registers blast datasources. Key values are used as labels ; Keys should be registered against methods in species.ini files DEFAULT = LATESTGP CDNA_ALL = cDNAs CDNA_ABINITIO = Ab-initio cDNAs (Genscan/SNAP) PEP_ALL = Peptides PEP_ABINITIO = Ab-initio Peptides (Genscan/SNAP) RNA_NC = Ensembl Non-coding RNA genes LATESTGP = Genomic sequence LATESTGP_MASKED = Genomic sequence (masked)
Registering Search Methods
Within the context of BlastView, a search method is an algorithm/executable that takes a query sequence of a particular type (DNA/peptide), and a sequence database of a particular type (DNA/peptide), and computes some measure of sequence similarity between the two. This means that BLASTN, TBLASTX, BLASTP, BLAT etc. are entirely seperate at the configuration level. Before use with BlastView, code must be written to "wrap" each executable in a perl module with a "runnable" interface. These wrapper modules hide the differences in calling and report handling between methods from the BlastView interface code. Coding of these wrappers for different methods/system configurations is covered in the BlastView tech document. The key attributes of BlastView methods are therefore:
- A "type" string used for grouping methods across species, also used as human-readable label
- A "module" arrayref used to identify the appropriate wrapper(s) class.
To register methods, type/module pairs must be entered in
ENSEMBL BLAST METHODS configuration section of
BLASTN = [blast dna dna ensembl_wublastn] ; alternatives: ncbiblastn, ensembl_wublastn, sge_wublastn BLASTX = [blast dna peptide ensembl_wublastx] ; alternatives: ncbiblastx, ensembl_wublastx, sge_wublastx BLASTP = [blast peptide peptide ensembl_wublastp] ; alternatives: ncbiblastp, ensembl_wublastp, sge_wublastp TBLASTN = [blast peptide dna ensembl_wutblastn] ; alternatives: ncbitblastn, ensembl_wutblastn, sge_wutblastn TBLASTX = [blast dna dna ensembl_wutblastx] ; alternatives: ncbitblastx, ensembl_wutblastx, sge_wutblastx BLAT = [blat dna dna blat_gfclient]
For example, the ensembl wublastn wrapper contains the logic to run a wu-blastn search using the Compaq BSUB job submission system as used for the Ensembl Blast cluster. Conversly, the blat_gfclient wrapper runs BLAT searches over TCP-IP using a client-server model. Further wrappers are being developed to, for example, run blast searches on the same machine as the web server.
Unfortunately, it is not currently possible to override the blast method wrappers on a per-species basis. This is a known limitation of the system, and may be addressed in the future.
In addition to the ENSEMBL BLAST METHODS section, there are several attributes in the general section of DEFAULTS.ini that also affect method configuration. These are:
# Path to binaries on local machine ENSEMBL_BINARIES_PATH = /usr/local/bin # Path to binaries on remote machine ENSEMBL_BLAST_BIN_PATH = /usr/remote/bin # Path to blast databases ENSEMBL_BLAST_DATA_PATH = /usr/remote/data # Path to blast filter directory ENSEMBL_BLAST_FILTER = /usr/remote/blast/filter # Path to blast matrix directory ENSEMBL_BLAST_MATRIX = /usr/remote/blast/matrix # Path to RepeatMasker executable ENSEMBL_REPEATMASKER = /usr/remote/RepeatMasker # Path to BLAST servers ENSEMBL_BLAST_SERVERS = [ blast-01 blast-02 ]
Registering Method vs. Database Links
If you have created your blast indices with standardised file names, there is no need to configure these individually. However if you are using BLAT, you do need to configure it in the corresponding <Genus_species>.ini file. files.
Registering non-standard indices
If you have not named your files with the standard format
each of the method types registered in MULTI.ini (e.g. BLASTN,
TBLASTX etc.) must have a corresponding section
[<METHOD>_DATASOURCES] in the
[<METHOD>_DATASOURCES] section contains:
DATASOURCE_TYPE = dnaor
DATASOURCE_TYPE = peptidekey=value pair to specify the query type (dna/peptide) that the search method expects as input (see example below).
<DATABASE> = <LOCATOR>(KEY = value) pairs which are links to one or more sequence databases.
<DATABASE>is one of the database types registered in DEFAULTS.ini (e.g. CDNA_ALL, LATESTGP, PEP_KNOWN)
<LOCATOR>refers to the filesystem or TCP-IP location of the database. For the blast databases the <LOCATOR> can either be the full name of the file (see example below), or the file name can be replaced with
<DATABASE> = %_e.g.
LATESTGP = %_
In the latter case, the file name will be autogenerated on server start up to use the files with the name
[BLASTN_DATASOURCES] DATASOURCE_TYPE = dna LATESTGP = ProjectX.dna.fa CDNA_ALL = Project.cdna_all.fa CDNA_ABINITIO = Project.cdna_abinitio.fa
Configuring the ENSEMBL BLAST Database
To improve interface responsiveness, search reports are parsed once, and the results stored in a database. The location of this database is configured in the MULTI.ini file. Configuration is similar to that of all other Ensembl databases, with the name of the database being ENSEMBL BLAST. The main difference between ENSEMBL BLAST and other Ensembl databases, however, is that ENSEMBL BLAST is a read/write database, so the configured database user must have write permission. For example:
[databases] DATABASE_BLAST = ensembl_blast [DATABASE_BLAST] HOST = localhost PORT = 3306 USER = admin_user PASS = secret
The database schema of the ENSEMBL BLAST database is distributed within the perl code, rather than available by FTP download. Firstly, an empty database should be created. E.g.
$ mysql -u admin_user -p secret -e "create database ensembl_blast"
Next, a script can be run that creates the database automatically. E.g.
The correct execution of this script can be checked as follows:
$ mysqldump -u admin_user -p secret ensembl_blast
The above should result in output similar to the following. Note that the blast result, blast hit and blast hsp tables are timestamped. Re-running the blast database.pl script will cause the blast result, blast hit and blast hsp tables to be rotated, meaning that, whilst old results/hsps/hits will still be available, new searches will be stored in the new tables. It is simple, therefore, to maintain the DATABASE BLAST database by dropping old tables. See the utils/blast cleaner.pl script for an example of how this i/s done.
-- -- Table structure for table "blast_table_log" -- CREATE TABLE blast_table_log ( table_id int(10) unsigned NOT NULL auto_increment, table_name varchar(32) default NULL, table_type enum("TICKET","RESULT","HIT","HSP") default NULL, table_status enum("CURRENT","FILLED","DELETED") default NULL, use_date date default NULL, create_time datetime default NULL, delete_time datetime default NULL, num_objects int(10) default NULL, PRIMARY KEY (table_id), KEY table_name (table_name), KEY table_type (table_type), KEY use_date (use_date), KEY table_status (table_status) ) TYPE=MyISAM; -- -- Table structure for table "blast_ticket" -- CREATE TABLE blast_ticket ( ticket_id int(10) unsigned NOT NULL auto_increment, create_time datetime NOT NULL default "0000-00-00 00:00:00", update_time datetime NOT NULL default "0000-00-00 00:00:00", ticket varchar(32) NOT NULL default "", object longblob, PRIMARY KEY (ticket_id), UNIQUE KEY ticket (ticket), KEY create_time (create_time), KEY update_time (update_time) ) TYPE=MyISAM; -- -- Table structure for table "blast_result20030821" -- CREATE TABLE blast_result20030821 ( result_id int(10) unsigned NOT NULL auto_increment, ticket varchar(32) default NULL, object longblob, PRIMARY KEY (result_id), KEY ticket (ticket) ) TYPE=MyISAM; CREATE TABLE blast_hit20030821 ( hit_id int(10) unsigned NOT NULL auto_increment, ticket varchar(32) default NULL, object longblob, PRIMARY KEY (hit_id), KEY ticket (ticket) ) TYPE=MyISAM; -- -- Table structure for table "blast_hsp20030821" -- CREATE TABLE blast_hsp20030821 ( hsp_id int(10) unsigned NOT NULL auto_increment, ticket varchar(32) default NULL, object longblob, chr_name varchar(32) default NULL, chr_start int(10) unsigned default NULL, chr_end int(10) unsigned default NULL, PRIMARY KEY (hsp_id), KEY ticket (ticket) ) TYPE=MyISAM MAX_ROWS=705032704 AVG_ROW_LENGTH=4000;
By following the above steps, the BlastView interface should be
available for use on the next restart of the Ensembl web
server. Nowever, if the configuration .ini files have been changed,
the following file should be deleted before server restart (this is a
filesystem cache of the config):
For further details about how BlastView works, please see the BlastView technical documentation:
- BlastView search API [PDF]