EnsemblEnsembl Home

BLAST/BLAT setup instructions

As with our old code, the Tools system combines BLAST and BLAT sequence searches into one interface. However Ensembl now uses NCBI BLAST by default, though the Tools system can be configured to use wu-blast instead.

Setting up BLAT

  1. Install the BLAT executable, available from the UCSC.
  2. Download the required FASTA files from the Ensembl FTP site, and use them to create 2bit files for rapid indexing, as described in the UCSC BLAT specifications N.B. you can use the same Ensembl pipeline, described below, to produce both BLAT and BLAST indices.
  3. Configure the locations of these files in public-plugins/mirror/conf/SiteDefs.pm:

        $SiteDefs::ENSEMBL_BLAT_BIN_PATH    = '/usr/local/bin/gfClient';
        $SiteDefs::ENSEMBL_BLAT_TWOBIT_DIR  = '/usr/local/ensembl/tools_data/blat';
    

Setting up NCBI-BLAST

  1. Install a suitable version of the BLAST software, available from the NCBI FTP site.
  2. Generate the BLAST indices. We use an eHive pipeline, which is fully documented in the ensembl-production repository.
  3. Configure your BLAST paths in SiteDefs.pm:

    $SiteDefs::ENSEMBL_NCBIBLAST_BIN_PATH = '/path/to/ncbi-blast/bin'; # path to blast executables  
    $SiteDefs::ENSEMBL_NCBIBLAST_MATRIX = '/path/to/ncbi-blast/data'; # path to blast matrix files 
    $SiteDefs::ENSEMBL_NCBIBLAST_DATA_PATH_DNA = "/path/to/blast/dna"; # path for the blast DNA index files 
    $SiteDefs::ENSEMBL_NCBIBLAST_DATA_PATH = "/path/to/genes"; # path for the blast index files (other than DNA) 
    $SiteDefs::ENSEMBL_REPEATMASK_BIN_PATH = '/path/to/RepeatMasker'; # path to RepeatMasker executable
    

Excluding data sources

By default, the BLAST interface shows all data sources used by Ensembl. If you do not have all these datasources for your species, you will need to make a copy of the ENSEMBL_BLAST_DATASOURCES_ALL code block in public-plugins/tools/conf/ini-files/MULTI.ini and edit it to your requirements. E.g.

Original version in tools plugin:

[ENSEMBL_BLAST_DATASOURCES_ALL]
ORDER           = [LATESTGP LATESTGP_MASKED LATESTGP_SOFT CDNA_ALL CDNA_ABINITIO NCRNA PEP_ALL PEP_ABINITIO] ; order preserved
LATESTGP        = dna Genomic sequence
LATESTGP_MASKED = dna Genomic sequence (hard masked)
LATESTGP_SOFT   = dna Genomic sequence (soft masked)
CDNA_ALL        = dna cDNAs (transcripts/splice variants)
CDNA_ABINITIO   = dna Ab-initio cDNAs (Genscan/SNAP)
NCRNA           = dna Ensembl Non-coding RNA genes
PEP_ALL         = peptide Proteins (GENCODE/Ensembl)
PEP_ABINITIO    = peptide Ab-initio Peptides (Genscan/SNAP)

Edited version in your plugin:

[ENSEMBL_BLAST_DATASOURCES_ALL]
ORDER           = [LATESTGP LATESTGP_MASKED LATESTGP_SOFT CDNA_ALL PEP_ALL PEP_ABINITIO] ; order preserved
LATESTGP        = dna Genomic sequence
LATESTGP_MASKED = dna Genomic sequence (hard masked)
LATESTGP_SOFT   = dna Genomic sequence (soft masked)
CDNA_ALL        = dna cDNAs (transcripts/splice variants)
PEP_ALL         = peptide Proteins (GENCODE/Ensembl)
PEP_ABINITIO    = peptide Ab-initio Peptides (Genscan/SNAP)