EnsemblEnsembl Home

Variant Effect Predictor Other information


HGVS notations

The VEP script supports using HGVS notations as input. This feature is currently under development, and not all HGVS notation types are supported. Specifically, only notations relative to genomic (g) or coding (c) sequences are currently supported; protein (p) notations are supported in limited fashion due to the complexity involved in determining the multiple possible underlying genomic sequence changes that could produce a single protein change. The script will warn the user if it fails to parse a particular notation.

By default the VEP script uses Ensembl transcripts as its reference for determining consequences, and hence also for HGVS notations. However, it is possible to parse HGVS notations that use RefSeq transcripts as the reference sequence by using the --refseq flag when running the script. Such notations must include the version number of the transcript e.g.

NM_080794.3:c.1001C>T

where ".3" denotes that this is version 3 of the transcript NM_080794. See below for more details on how the VEP can use RefSeq transcripts.


RefSeq transcripts

Ensembl produces Core schema databases containing alignments of RefSeq transcript objects to the reference genome. This is the otherfeatures database, and is produced for human and mouse. The database also contains alignments of CCDS transcripts and Ensembl EST sequences. By passing the --refseq flag when running the VEP script, these alternative transcripts will be used as the reference for predicting variant consequences. Gene IDs given in the output when using this option are generally NCBI GeneIDs.

Users should note that RefSeq sequences may disagree with the reference sequence to which they are aligned, hence results generated when using this option should be interpreted with a degree of caution. A much more complex and stringent process is used to produce the main Ensembl Core database, and this should be used in preference to the RefSeq transcripts.

SIFT and PolyPhen predictions and scores are now calculated and referred to internally using the translated sequence, so predictions are available using the --refseq flag where the RefSeq translation matches the Ensembl translation (they will match in the vast majority of cases - most differences between Ensembl and RefSeq transcripts occur in non-coding regions).


File conversion

The VEP script can be used to convert files between the various formats that it parses. This may be useful for a user with, for example, a number of variants given in HGVS notation against RefSeq transcript identifiers. The conversion process allows these notations to be converted into genomic reference coordinates, and then used to predict consequences in the VEP against Ensembl transcripts.