Variant Effect Predictor Input form
When you reach the VEP web interface, you will be presented with a form to enter your data and alter various options.
- First select the correct species for your data. Ensembl hosts many vertebrate genomes; genomes for plants, protists and fungi can be found at Ensembl Genomes.
- You can optionally choose a name for the data you upload - this can make it easier for you to identify jobs and files that you have uploaded to the VEP at a later point.
- Select the input format for your data. Ensure you select the correct format for your data - the VEP will check the format of the first line of data that you input and report an error if you select the wrong format, or if your data is incorrectly formatted.
- You have three options for uploading your data:
- File upload - click the "Choose file" button and locate the file on your system
- Paste file - simply copy and paste the contents of your file into the large text box
- File URL - point the VEP to a file hosted on a publically accessible address. This can be either a http:// or ftp:// address.
For some species you can select which transcript database to use. The default is to use Ensembl transcripts, which offer the most rich annotation through the VEP.
The Gencode basic set is a subset of the Ensembl transcripts that contains all of the same genes but has some partial and lower quality transcripts removed.
You can also select to use RefSeq transcripts from the otherfeatures database; note though that these transcripts are simply aligned to the reference genome and the database is missing much of the annotation found when using the main Ensembl database (e.g. protein domains, CCDS identifiers). When using RefSeq transcripts you may choose to include aligned EST and CCDS transcripts also.
Identifiers and co-located variants
- Gene symbol - add the gene symbol for the gene to the output. This will typically be, for example, the HGNC identifier for genes in human. Equivalent to --symbol in the VEP script.
- CCDS - add the Consensus CDS transcript identifier where available. Equivalent to --ccds
- Protein - add the Ensembl protein identifer (ENSP). Equivalent to --protein
- Uniprot - add identifiers for translated protein products from three UniProt-related databases (SWISSPROT, TREMBL and UniParc). Equivalent to --uniprot
- HGVS - generate HGVS identifiers for your input variants relative to the transcript coding sequence (HGVSc) and the protein sequence (HGVSp). Equivalent to --hgvs
Find co-located known variants - report known variants from the Ensembl Variation database that overlap with your input. A list of variant sources imported can be viewed here. Note that this feature is only available for species with an Ensembl Variation database. Equivalent to --check_existing.
The VEP will also allow you to compare the alleles of your input variant to that of the existing variant by selecting "Yes and compare alleles" from the drop-down menu. By selecting this, the VEP will only report the existing variant ID if none of the alleles in your input variant are novel.
For example, if your input variant has alleles A/G, and the existing variant has alleles A/T, then the existing variant will not be reported. If instead your input variant has alleles A/T, then the existing variant will be reported. This is equivalent to using --check_alleles in the VEP script.
For known variants the VEP can also provide PubMed IDs of publications citing the variant (equivalent to --pubmed).
The VEP can also report minor allele frequency (MAF) data for existing variants from two major genotyping projects, the 1000 Genomes Project and the NHLBI-ESP; this only applies when you have selected human as your species.
- 1000 Genomes global - the combined phase 1 population (i.e. all individuals from all populations). Equivalent to --gmaf
- 1000 Genomes continental - the four continent-level populations - AFR (African), AMR (American), ASN (Asian) and EUR (European). Equivalent to --maf_1kg
- ESP - AA (African American) and EA (European American) populations. Equivalent to --maf_esp
- Transcript biotype - add the transcript biotype to the output. Equivalent to --biotype in the VEP script.
- Protein domains - report protein domains from Pfam, Prosite and InterPro that overlap input variants. Equivalent to --domains
- Exon and intron numbers - report the exon or intron number that a variant falls in as NUMBER / TOTAL, i.e. exon 2/5 means the variant falls in the 2nd of 5 exons in the transcript. Equivalent to --numbers
- Identify canonical transcripts - adds a flag to the output indicating if the reported transcript is the canonical transcript for the gene. Equivalent to --canonical
- SIFT predictions - SIFT predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids. Only available in popular species. For both SIFT and PolyPhen the VEP can report either a score between 0 and 1, a prediction in words, or both. Equivalent to --sift
- PolyPhen predictions - PolyPhen is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. Equivalent to --polyphen
Get regulatory region consequences - in addition to predicting consequences with overlapping transcripts, the VEP can find overlaps with known regulatory regions as determined in the Ensembl Regulatory build.
Using this option, the VEP will also report if a variant falls in a transcription factor binding motif, and give a score that reflects whether the altered motif sequence is more or less similar to the consensus.
Get regulatory consequences is equivalent to --regulatory
- By frequency - filter variants by minor allele
frequency (MAF). Two options are provided:
- Exclude common variants - filter out variants that are co-located with an existing variant that has a frequency greater than 0.01 (1%) in the 1000 Genomes global population. Equivalent to --filter_common in the VEP script.
- Advanced filtering - enabling this option allows you to specify a population and frequency to compare to, as well whether matching variants should be included or excluded from the results.
- Return results for variants in coding regions only - exclude variants that don't fall in a coding region of a transcript. Equivalent to --coding_only
- Restrict results - for many variants the VEP will
report multiple consequence types - typically this is because
the variant overlaps more than one transcript. For each of these
options the VEP uses consequence ranks that are subjectively
determined by Ensembl. This
table gives all of the consquence types predicted by
Ensembl, ordered by rank. Note that enabling one of these
options not only loses potentially relevant data, but in some
cases may be scientifically misleading. Options:
- Show one selected consequence - pick one consequence type across all those predicted for the variant; the output will include transcript- or feature-specific information. Consequences are chosen by the canonical, biotype status and length of the transcript, along with the ranking of the consequence type according to this table. This is the best method to use if you are interested only in one consequence per variant. Equivalent to --pick
- Show one selected consequence per gene - pick one consequence type for each gene using the same criteria as above. Note that if a variant overlaps more than one gene, output for each gene will be reported. Equivalent to --per_gene
- Show only list of consequences per variant - give a comma-separated list of all observed consequence types for each variant. No transcript-specific or gene-specific output will be given. Equivalent to --summary
- Show most severe per variant - only the most severe of all observed consequence types is reported for each variant. No transcript-specific or gene-specific output will be given. Equivalent to --most_severe
Once you have clicked "Run", your input will be checked and submitted to the VEP as a job. All jobs associated with your session or account are shown in the "Recent Tickets" table. You may submit multiple jobs simultaneously.
The "Jobs" column of the table shows the current status of the job.
- Queued - your job is waiting to be submitted to the system
- Running - your job is currently running
- Done - your job is finished - click the ticket name to be taken to the results page
- Failed - there is a problem with your job - click the ticket name to see more details
You may delete a job by clicking the trash can icon . If you are logged in to Ensembl, you can save the job by clicking the save icon
You may also resubmit a job (for example, to re-run with the same data but change some parameters) by clicking the edit icon