Variant Effect Predictor Download and install
Use git to download the ensembl-vep package:
git clone https://github.com/Ensembl/ensembl-vep.git cd ensembl-vep
Then follow the installation instructions.
Users without the git utility installed may download a zip file from GitHub, though we would always recommend using git if possible.
curl -L -O https://github.com/Ensembl/ensembl-vep/archive/release/90.zip unzip 90.zip cd ensembl-vep-release-90/
To update from a previous version:
cd ensembl-vep git pull git checkout release/90 perl INSTALL.pl
To use an older version (this example shows how to set up release 87):
cd ensembl-vep git checkout release/87 perl INSTALL.pl
Previous versions (ensembl-tools)
New in version 90 (August 2017)
- gnomAD exomes allele frequencies now available with --af_gnomad, replacing ExAC. gnomAD genomes and ExAC are available via custom annotation.
- VEP is now available as a Docker image.
- RefSeq transcripts in VEP cache files are now "corrected" from the reference genome sequence.
- VEP's algorithm for matching colocated known variants has been overhauled - details.
- Change VEP's default (5kb) up/downstream distance with --distance. This supercedes the functionality of the UpDownDistance VEP plugin.
- Feed input directly to VEP with --input_data.
- Suppress header output with --no_headers.
- Detailed installation instructions for Bio::DB::BigFile to access bigWig custom annotation files.
Previous version history: Show
New in version 89 (May 2017)
- exclude known variants with unknown (null) alleles with --exclude_null_alleles.
- write compressed output with --compress_output.
- improved matching of alleles in custom VCF files.
- API perldoc documentation added.
New in version 88 (March 2017)
- ensembl-vep is now the officially supported version of VEP
- Documentation updated to reflect switch to ensembl-vep. See the Ensembl archive site for documentation of the obsolete ensembl-tools VEP.
- The VEP script is now named simply
- Directly use tabix-indexed GFF/GTF files as annotation sources
- Allele-specific reporting of frequencies (--af and more) and custom VCF annotations
- --check_existing now compares alleles by default, disable with --no_check_alleles
- Report the highest allele frequency observed in any population from 1000 genomes, ESP or ExAC using --max_af
- Get genomic HGVS nomenclature with --hgvsg
- Find the gene or transcript with the nearest transcription start site (TSS) to each input variant with --nearest
- filter_vep supports field/field comparisons e.g. AFR_AF > #EUR_AF
- Exclude predicted (XM and XR) transcripts when using RefSeq or merged cache with --exclude_predicted
- Filter transcripts used for annotation with --transcript_filter
- pileup input format no longer supported
Versions of VEP up to and including 87 were released as part of the ensembl-tools package. See download links above.
New in version 87 (December 2016)
- Shiny new code available for beta testing!
- Some minor speed optimisations
- Improve checks for valid chromosome names in input
- Haplosaurus beta released - generate whole-transcript haplotype sequences from phased genotype data
New in version 86 (October 2016)
- Chromosome synonyms supported when using VEP caches; may be loaded manually with --synonyms
New in version 85 (July 2016)
- --pick now uses translated length instead of genomic transcript length
- Support for epigenomes in regulatory features
New in version 84 (March 2016)
- Add tab-delimited output option
- Add transcript flags indicating if the transcript is 5'- or 3'-incomplete
- Improve annotation of long variants where invariant parts of the alternate allele overlap splice regions
New in version 83 (December 2015)
- Basic consequence calculations up to 2x faster than version 82
- HGVS calculations up to 10x faster
- FASTA sequence retrieval implements caching
- Add ExAC project frequencies with --maf_exac
- APPRIS isoform annotations now available with --appris and used by --pick and others to prioritise VEP annotations
New in version 82 (September 2015)
- Faster FASTA file access using Bio::DB::HTS/htslib and bgzipped FASTA files
- Flag genes with phenotype associations
- Some plugins now available for use via the web and REST interfaces
New in version 81 (July 2015)
- Plugin registry means plugins can be installed from the VEP installer
- GFF format now supported by VEP's cache converter
- Fixes and improvements for sequence retrieval from FASTA files
New in version 80 (May 2015)
- Flag added indicating if an overlapping known variant is associated with a phenotype, disease or trait
- HGVS notations are now 3'-shifted by default (use --shift_hgvs to force enable/disable)
- Source version information added to caches; see output file headers or use --show_cache_info
- Get the variant class using --variant_class
- CCDS status added to categories used by --pick flag (and others)
New in version 79 (March 2015)
- Focus on performance and stability: ~100% faster than version 78 and a new test suite
- New guide to getting VEP running faster
- 1000 Genomes Phase 3 data available in GRCh37 cache download (GRCh38 coming soon, see docs to access now)
- VCF output has changed slightly to match output from other tools
- Impact modifier added for each consequence type
New in version 78 (December 2014)
New in version 77 (October 2014)
New in version 76 (August 2014)
- VEP now supports caches from multiple assemblies (--assembly) on the same software version - e.g. human builds GRCh37 and GRCh38
- Protein identifiers from UniProt (SWISSPROT, TrEMBL and UniParc) now available using --uniprot
- VEP can generate JSON output using --json
- Two new analysis set options - --gencode_basic and the merged Ensembl/RefSeq cache (--merged)
- Non-RefSeq transcripts now excluded by default when using the RefSeq or merged cache; use --all_refseq to include them
- Let VEP pick one consequence per variant allele using --pick_allele
- Allele now included alongside frequency for 1000 Genomes (--maf_1kg) and ESP (--maf_esp) data
- Not strictly script-related, but the VEP REST API has come out of beta!
New in version 75 (February 2014)
- let VEP pick one consequence per variant for you using --pick; includes all transcript-specific data
- gene symbol available in RefSeq cache and when using --refseq
- Installation and use of RefSeq cache improved - remember to use --refseq with your RefSeq cache!
- Added --cache_version option, primarily to aid Ensembl Genomes users.
New in version 74 (December 2013)
- retrieve the humDiv PolyPhen prediction instead of humVar using --humdiv
- source for gene symbol available with --symbol
New in version 73 (August 2013)
- NHLBI-ESP frequencies available in cache (--maf_esp)
- Pubmed IDs for cited existing variants available in cache (--pubmed)
- Convert your cache to use tabix - much faster when retrieving co-located existing variants!
- The installer can now update the VEP to the latest version and install FASTA files
- --hgnc replaced by --symbol for non-human compatibility
- HGVS strings are now part URI-escaped to avoid "=" sign clashes
- use --allele_number to identify input alleles by their order in the VCF ALT field
- use --total_length to give the total length of cDNA, CDS and protein sequences
- add data from VCF INFO fields when using custom annotations
New in version 72 (June 2013)
- Speed and stability improvements when using forking
- Filter VEP results using filter_vep.pl
New in version 71 (April 2013)
- SIFT predictions now available for Chicken, Cow, Dog, Human, Mouse, Pig, Rat and Zebrafish
- View summary statistics for VEP runs in [output]_summary.html
- Generate HTML output using --html
- Support for simple tab-delimited format for input of structural variant data
- Cache now contains clinical significance statuses from dbSNP for human variants
NOTE: VEP version numbers have now (from release 71) changed to match Ensembl release numbers.
New in version 2.8 (December 2012)
- Easily filter out common human variants with --filter_common
- 1000 Genomes continental population frequencies now stored in cache files
New in version 2.7 (October 2012)
- build VEP cache files offline from GTF and FASTA files
- support for using FASTA files for sequence lookup in HGVS notations in offline/cache modes
New in version 2.6 (July 2012)
- support for structural variant consequences
- Sequence Ontology (SO) consequence terms now default
- script runtime 3-4x faster when using forking
- 1000 Genomes global MAF available in cache files
- improved memory usage
New in version 2.5 (May 2012)
- SIFT and PolyPhen predictions now available for RefSeq transcripts
- retrieve cell type-specific regulatory consequences
- consequences can be retrieved based on a single individual's genotype in a VCF input file
- find overlapping structural variants
- Condel support removed from main script and moved to a plugin
New in version 2.4 (February 2012)
- offline mode and new installer script make it easy to use the VEP without the usual dependencies
- output columns configurable using the --fields flag
- VCF output support expanded, can now carry all fields
- output affected exon and intron numbers with --numbers
- output overlapping protein domains using --domains
- enhanced support for LRGs
- plugins now work on variants called as intergenic
New in version 2.3 (December 2011)
- add custom annotations from tabix-indexed files (BED, GFF, GTF, VCF, bigWig)
- add new functionality to the VEP with user-written plugins
- filter input on consequence type
New in version 2.2 (September 2011)
- SIFT, PolyPhen and Condel predictions and regulatory features now accessible from the cache
- support for calling consequences against RefSeq transcripts
- variant identifiers (e.g. dbSNP rsIDs) and HGVS notations supported as input format
- variants can now be filtered by frequency in HapMap and 1000 genomes populations
- script can be used to convert files between formats (Ensembl/VCF/Pileup/HGVS to Ensembl/VCF/Pileup)
- large amount of code moved to API modules to ensure consistency between web and script VEP
- memory usage optimisations
- VEP script moved to ensembl-tools repo
- Added --canonical, --per_gene and --no_intergenic options
New in version 2.1 (June 2011)
- ability to use local file cache in place of or alongside connecting to an Ensembl database
- significant improvements to speed of script
- whole-genome mode now default (no disadvantage for smaller datasets)
- improved status output with progress bars
- regulatory region consequences now reinstated and improved
- modification to output file - Transcript column is now Feature, and is followed by a Feature_type column
New in version 2.0 (April 2011)
- support for SIFT, PolyPhen and Condel missense predictions in human
- per-allele and compound consequence types
- support for Sequence Ontology (SO) and NCBI consequence terms
modified output format
- support for new output fields in Extra column
- header section contains information on database and software versions
- codon change shown in output
- CDS position shown in output
- option to output Ensembl protein identifiers
- option to output HGVS nomenclature for variants
- support for gzipped input files
- enhanced configuration options, including the ability to read configuration from a file
- verbose output now much more useful
- whole-genome mode now more stable
- finding existing co-located variations now ~5x faster
VEP requires Perl (>=5.10 recommended, tested on 5.8, 5.10, 5.14, 5.18, 5.22) and the DBI and DBD::mysql package installed; see this guide for more information on how to install perl modules.
VEP's INSTALL.pl script will install required components of Ensembl API for you, but VEP may also be used with any pre-existing API installations you have provided their versions match the version of VEP you are using.
VEP has been developed for UNIX-like environments and works well on Linux (e.g. Ubuntu, Debian, Mint) and Mac OSX. It can also be used on Windows systems with a more involved installation process.
VEP's INSTALL.pl makes it easy to set up your environment for using the VEP. It will download and configure a minimal set of the Ensembl API for use by the VEP, and can also download cache files, FASTA files and plugins.
Run the following, and follow any prompts as they appear:
Additional non-essential components and enhancements must be installed manually.
Software components installed
Users who already have the latest version of the API installed do not need to run the script, although may find it useful for getting an up-to-date API install (with post-release patches applied), and for retrieving cache and FASTA files. The API set installed by the script is local to the VEP, and will not affect any other Ensembl API installations.
The script will also attempt to install a Perl::XS module, Bio::DB::HTS, for rapid access to bgzipped FASTA files. If this fails, you may add the --NO_HTSLIB flag when running the installer; VEP will fall back to using Bio::DB::Fasta for this functionality (more details).
Running the installer
The installer script is run on the command line as follows:
perl INSTALL.pl [options]
Users then follow on-screen prompts. Please heed any warnings, as when the script says it will delete/overwrite something, it really will!
Most users should not need to add any options, but configuration of the installer is possible with the following flags:
|Run installer without user prompts. Use "a" (API + Bio::DB::HTS/htslib), "l" (Bio::DB::HTS/htslib only), "c" (cache), "f" (FASTA), "p" (plugins) to specify parts to install e.g. -a ac for API and cache|
|Comma-separated list of species to install when using --AUTO. To install the RefSeq cache, add "_refseq" to the species name, e.g. "homo_sapiens_refseq", or "_merged" to install the merged Ensembl/RefSeq cache. Remember to use --refseq or --merged when running the VEP with the relevant cache!|
|Assembly version to use when using --AUTO. Most species have only one assembly available on each software release; currently this is only required for human on release 76 onwards.|
|Comma-separated list of plugins to install when using --AUTO. To install all available plugins, use "--PLUGINS all". To list available plugins, use "perl INSTALL.pl -a p --PLUGINS list".|
|By default the script will install the latest version of the Ensembl API (currently 90). Users can force the script to install a different version at their own risk. This flag will also set the data version (cache, FASTA) to install unless set separately with --CACHE_VERSION.|
|By default the script will download the latest version of VEP's caches and FASTA files (currently 90). Users can force the script to install a different version at their own risk. Use --VERSION to set the API version separately.|
By default the script will install the API modules in a
subdirectory of the current directory named "Bio". Using this option
users may configure where the Bio directory is created. If something
other than the default is used, this directory must either be added
to your PERL5LIB environment variable when running the VEP, or
included using perl's -I flag:
perl -I [dir] vep
By default the script will install the cache files in the ".vep"
subdirectory of the user's home area. Using this option users can
configure where cache files are installed. The --dir flag must be passed when
running the VEP if a non-default directory is given:
./vep --dir [dir]
|Run the installer with this flag to check for and download new versions of the VEP. Any existing files are backed up. You will need to rerun the installer after update to retrieve update API, cache and FASTA files.|
|Don't write any status output when using --AUTO.|
|Use this if the installer fails with out of memory errors.|
|Don't attempt to install Bio::DB::HTS/htslib|
|Don't run API tests - useful if you know a harmless failure will prevent continuation of the installer|
INSTALL.pl will set up the minimum requirements for VEP, and for most users this will be adequate. Some features and enhancements, however, require the installation of additional components. Most are perl modules that are easily installed using cpanm; see this guide for more information on how to install perl modules.
Typically users of cpanm will wish to install modules locally in their home directories; this shows how to set up a path for perl modules and install one there:
mkdir -p $HOME/cpanm export PERL5LIB=$PERL5LIB:$HOME/cpanm/lib/perl5 cpanm -l $HOME/cpanm Set::IntervalTree
To make the change to
PERL5LIB permanent, it is recommended to add the
export line to your
- Additional features
- Speed enhancements - these modules can improve VEP's runtime
In order for VEP to be able to access bigWig format custom annotation files, the Bio::DB::BigFile perl module is required. Installation involves downloading and compiling the kent source tree. The current version of the kent source tree does not work correctly with Bio::DB::BigFile, so it is necessary to install an archive version known to work (v335).
Download and unpack the kent source tree
wget https://github.com/ucscGenomeBrowser/kent/archive/v335_base.tar.gz tar xzf v335_base.tar.gz
Set up some environment variables; these are required only temporarily for this installation process
export KENT_SRC=$PWD/kent-335_base/src export MACHTYPE=$(uname -m) export CFLAGS="-fPIC" export MYSQLINC=`mysql_config --include | sed -e 's/^-I//g'` export MYSQLLIBS=`mysql_config --libs`
Modify kent build parameters
cd $KENT_SRC/lib echo 'CFLAGS="-fPIC"' > ../inc/localEnvironment.mk
Build kent source
make clean && make cd ../jkOwnLib make clean && makeIf either of these steps fail, you may have some missing dependencies. Known common missing dependencies are libpng and libssl; these may be installed, for example, with
apt-geton Ubuntu. If you do not have sudo access you may have to ask your sysadmin to install any missing dependencies.
sudo apt-get install libpng-dev libssl-devOn Mac OSX you may use
brew; the openssl libraries also need to be symbolically linked to a different path:
brew install libpng openssl cd /usr/local/include ln -s ../opt/openssl/include/openssl . cd -
On some systems (e.g. Mac OSX), a compiled file is placed in a path that Bio::DB::BigFile cannot find. You can correct this with:
ln -s $KENT_SRC/lib/x86_64/* $KENT_SRC/lib/
We'll now use cpanm to install the perl module for Bio::DB::BigFile itself. See above for guidance on this. In this example we're going to install the module to a path within your home directory. In order to do this we must modify the paths that perl looks in to find modules by adding to the
PERL5LIBenvironment module. To make this change permanent you must add the
exportline to your
mkdir -p $HOME/cpanm export PERL5LIB=$PERL5LIB:$HOME/cpanm/lib/perl5 cpanm -l $HOME/cpanm Bio::DB::BigFileIf you are prompted for the path to the kent source tree, that means something didn't go right in the compilation above. Double check that
$KENT_SRC/lib/jkweb.aexists and is not found instead at e.g.
$KENT_SRC/lib/x86_64/jkweb.a. You may copy or link the file (and the other files in that directory) to the former path.
ln -s $KENT_SRC/lib/x86_64/* $KENT_SRC/lib/
You should now be able to successfully run the appropriate test in the VEP package:
perl -Imodules t/AnnotationSource_File_BigWig.t
Using VEP in Windows
VEP was developed as a command-line tool, and as a Perl script its natural environment is a Linux system. However, there are several ways you can use VEP on a Windows machine.
You may also consider using VEP's web or REST interfaces.
Using a virtual machine you can run a virtual Linux system in a window on your machine. There are two ways to do this:
DWIMperl has a Windows package that contains base requirements for setting up VEP.
- Download and install DWIMperl for Windows
- Download and unpack the zip of the ensembl-vep package
- Open a Command Prompt (search for Command Prompt in the Start Menu)
- Navigate to the directory where you unpacked the VEP package, e.g.
Run INSTALL.pl with --NO_HTSLIB and --NO_TEST; you will see some warnings about the "which" command not being available (these will also appear when running VEP and can be ignored).
perl INSTALL.pl --NO_HTSLIB --NO_TEST
Docker allows you to run applications in virtualised "containers". A docker image for VEP is available from DockerHub:
docker pull ensemblorg/ensembl-vep docker run -t -i ensemblorg/ensembl-vep ./vep
Currently no volumes are pre-configured for the container; this is required if you wish to download data (e.g. cache files) that persists across sessions.
The following is a brief example showing how to use a directory on your local (host) machine to store cache data for VEP.
mkdir $HOME/vep_data chmod a+rwx $HOME/vep_data docker run -t -i -v $HOME/vep_data:/home/vep/.vep ensemblorg/ensembl-vep perl INSTALL.pl
You will now be prompted by the installer if you wish to re-install the API. Type "n" followed by enter to skip to cache installation. You will be presented with a list of species; type the number for your species/assembly of interest and press enter. Your data will now download and unpack; this may take a while.
If you wish to retrieve HGVS annotations it is recommended to also download the FASTA file for your species. To do this, at the next prompt type "0" and press enter. You may skip the plugin installation also.
The above process may also be performed in one command; for example, to set up the cache and corresponding FASTA for human GRCh38:
docker run -t -i -v $HOME/vep_data:/home/vep/.vep ensemblorg/ensembl-vep perl INSTALL.pl -a cf -s homo_sapiens -y GRCh38
The installer has now downloaded this data to $HOME/vep_data. VEP will automatically detect caches downloaded in this folder as it is mapped to VEP's default directory within the Docker instance.
docker run -t -i -v $HOME/vep_data:/home/vep/.vep ensemblorg/ensembl-vep /bin/bash ./vep -i examples/homo_sapiens_GRCh38.vcf -cache