HumanEnsembl Home

BLAST

Example page

BLAST is an alignment program that determines sequence identity between a query sequence and a large set of target sequences. The Ensembl installations of BLAST and BLAT allows you to align a protein or nucelotide sequence to any genome in Ensembl.

What is BLAT? BLAT, the BLAST-like Alignment Tool, quickly finds alignments to DNA sequences. It is not as flexible as BLAST, in that you need an exact or nearly-exact match to see a hit. As it is fast, it is the default alignment program in the Ensembl page when the query and target sequences are both nucleotide.

BLAST, the Basic Local Alignment Search Tool, allows searching more distantly related sequences. Ensembl uses the Washington University School of Medicine WU-BLAST 2.0 implementation for its sequence similarity search options.

Using Ensembl BLAT/BLAST. Once you enter your query sequence (as FASTA, or as an ID in the sequence ID or accession box), and parameters, click RUN. BLAT will immediately give results. In the case of BLAST, you will see an intermediate page. Click Retrieve periodically until the View Results button appears.

Click View Results to show the final, formatted results.

Step by Step: Entering a Query Sequence

Paste in a sequence in FASTA format, making sure no line numbers are included. Alternatively, upload a sequence file from a public sequence database such as UniProt, EMBL or NCBI RefSeq simply by typing in the sequence accession number.

You may also enter a ticket or job identifier from a previous BLAST search. However, these are only saved for one week, or one month if you have logged in to the Ensembl website.

Click the appropriate button to specify whether the query sequence is protein or nucleotide.

Step by Step: Target Sequence or Database

For any genomic sequence in Ensembl, select latest_GP. Masked genomes have been run through the RepeatMasker program. More than one organism may be selected with the cntrl key. You may also select a cDNA library, all or ab initio. The all option accesses Ensembl transcripts, which are based on protein and mRNA information. Ab initio will show possible cDNAs based on the sequence alone, these are predictions.

Similarly, all peptides refers to the Ensembl peptides. The ab initio peptides are merely predictions.

Step by Step: BLAT and BLAST options

BLAT can be chosen for nucelotide queries against nucelotide databases. The following BLAST options appear:

BLASTN
nucleotide against nucleotide searches
BLASTP
protein versus protein searches
TBLASTN
protein query versus nucleotide sequences
BLASTX
DNA versus protein
TBLASTX
a translated DNA query against a translated DNA database

Options are described in reference 2 at the bottom of this article:

  • Exact matches [Exact sensitivity]
  • Near-exact matches [Low sensitivity]
  • Near-exact matches oligo [Oligo sensitivity]
  • Allow some local mismatch [Medium sensitivity]
  • Distant homologies [High sensitivity]
  • No optimisation [Default sensitivity]

Everything set, click RUN to start the search, or customise parameters first.

Step by Step: Customisation of parameters

Use the configure button to alter the default parameters.

W       - Word size for seeding alignments
wink    - Step-size for sliding-window used to seed alignments.h
T       - Neigborhood word threshold score- not blastn
hitdist - Max distance between words for two-hit seeding
          One-hit seeding by default
M       - Match score - blastn only
N       - Missmatch score - blastn only
matrix  - BLOSUM scoring matrix - not blastn
Q       - Cost of first gap character
R       - Cost of second and remaining gap characters
nogap   - Turn off gapped alignments
X       - Alignment extension cutoff

+---------+-------------------+----------------------------------+
|         | W  |wink| T  |hit-| M  | N  |mat-| Q  | R  |no- | X  |
|         |    |    |    |dist|    |    |rix |    |    |gap |    |
|---------+----+----+----+----+----+----+----+----+----+----+----+
| BLASTN                                                         |
|   exact | 15 | 15 |  . |  0 |  1 | -3 |  . | 10 | 10 |  1 |  5 |
|     low | 15 |  1 |  . |  0 |  1 | -3 |  . |  3 |  3 |  0 |  ? |
|   oligo | 11 |  1 |  . |  0 |  1 | -3 |  . |  3 |  3 |  0 |  ? |
|  medium | 11 |  1 |  . |  0 |  1 | -1 |  . |  2 |  1 |  0 |  ? |
|    high |  9 |  1 |  . |  0 |  1 | -1 |  . |  2 |  1 |  0 |  ? |
| default | 11 |  1 |  . |  0 |  5 | -4 |  . | 10 | 10 |  0 |  ? |
+---------+----+----+----+----+----+----+----+----+----+----+----+
| BLASTP                                                         |
| TBLASTN                                                        |
|   exact |  6 |  1 |999 |  0 |  . |  . | 80 |  9 |  2 |  0 |  ? |
|     low |  4 |  1 | 16 | 40 |  . |  . | 80 |  9 |  2 |  0 |  ? |
|   oligo |  4 |  1 | 16 |  0 |  . |  . | 80 |  9 |  2 |  0 |  ? |
|  medium |  3 |  1 | 15 | 40 |  . |  . | 62 |  9 |  2 |  0 |  ? |
|    high |  3 |  1 | 15 |  0 |  . |  . | 45 |  9 |  2 |  0 |  ? |
| default |  3 |  1 | 11 |  0 |  . |  . | 62 |  9 |  2 |  0 |  ? |
+---------+----+----+----+----+----+----+----+----+----+----+----+
| BLASTX                                                         |
| TBLASTX                                                        |
|   exact |  6 |  1 |999 |  0 |  . |  . | 80 |  9 |  2 |  1 | 10 |
|     low |  4 |  1 | 20 | 40 |  . |  . | 62 |  9 |  2 |  0 |  ? |
|  medium |  4 |  1 | 20 |  0 |  . |  . | 62 |  9 |  2 |  0 |  ? |
|    high |  3 |  1 | 15 | 40 |  . |  . | 62 |  9 |  2 |  0 |  ? |
| default |  3 |  1 | 12 |  0 |  . |  . | 62 |  9 |  2 |  0 |  ? |
+---------+----+----+----+----+----+----+----+----+----+----+----+

Step by Step: Results

Initially, the RESULTS page shows a ticket ID for the current query. Results are stored on our server for one week, so that they can be accessed later with this ID or a bookmark to the results page. Click the Retrieve button to see the status of the current query.

When a search is complete, its status will change from Job Queued to Parsing Results. After minutes to an hour, clicking the Retrieve button will cause the Raw Results link to appear, and eventually the View Results button. Click View Results for the formatted BLAST hits.

Both BLAT and BLAST show results distributed on a Karyotype, if it is available for the species, showing hit or match locations of HSPs, high scoring pairs. Hits are shown as arrows, and the best hit is boxed.

The diagram in the centre of the results page shows the query sequence as a chain of black and white boxes, and hits as red filled boxes. A Summary Table at the bottom of the page will list all hits in order of low to high score, but this can be customised. Links in front of each row, showing one BLAST/BLAT match, show:

G
the hit in the context of the genome or target sequence.
S
the hit in the context of the query sequence.
A
the sequence alignment.
C
the hit drawn in the Ensembl Region in Detail page

REFERENCES

1) BLAT - The BLAST-Like Alignment Tool

W. James Kent

Genome Res. 2002 Apr;12,4:656-664.

Abstract Full text

2) BLAST

Joseph Bedell, Ian Korf and Mark Yandell

OReilly & Associates, 2003

Text