EnsemblEnsembl Home

FTP Download

Custom data sets

If you want to filter or customise your download, please try Biomart, a web-based querying tool.

You can download via a browser from our FTP site, use a script, or even use rsync from the command line.

API Code

If you do not have access to git, you can obtain our latest API code as a gzipped tarball:

Download complete API for this release

Note: the API version needs to be the same as the databases you are accessing, so please use git to obtain a previous version if querying older databases.

Database dumps

Entire databases can be downloaded from our FTP site in a variety of formats. Please be aware that some of these files can run to many gigabytes of data.

Looking for MySQL dumps to install databases locally? See our web installation instructions for full details.

Each directory on ftp.ensembl.org contains a README file, explaining the directory structure.

Multi-species data

Database
Comparative genomicsMySQLEMFMAFBEDXMLAncestral Alleles
BioMartMySQL-----
Stable idsMySQL-----

Single species data

Popular species are listed first. You can customise this list via our home page.

SpeciesDNA (FASTA)cDNA (FASTA)CDS (FASTA)ncRNA (FASTA)Protein sequence (FASTA)Annotated sequence (EMBL)Annotated sequence (GenBank)Gene setsWhole databasesVariation (GVF)Variation (VCF)Variation (VEP)Regulation (GFF)Data filesBAM/BigWig
YHuman
Homo sapiens
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEPRegulation (GFF)Regulation data filesBAM/BigWig
YMouse
Mus musculus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEPRegulation (GFF)Regulation data filesBAM/BigWig
YZebrafish
Danio rerio
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP--BAM/BigWig
Alpaca
Vicugna pacos
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Amazon molly
Poecilia formosa
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP--BAM/BigWig
Anole lizard
Anolis carolinensis
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP--BAM/BigWig
Armadillo
Dasypus novemcinctus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP--BAM/BigWig
Bushbaby
Otolemur garnettii
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
C.intestinalis
Ciona intestinalis
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
C.savignyi
Ciona savignyi
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Caenorhabditis elegans
Caenorhabditis elegans
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Cat
Felis catus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP--BAM/BigWig
Cave fish
Astyanax mexicanus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP--BAM/BigWig
Chicken
Gallus gallus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP--BAM/BigWig
Chimpanzee
Pan troglodytes
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP--BAM/BigWig
Chinese softshell turtle
Pelodiscus sinensis
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP--BAM/BigWig
Cod
Gadus morhua
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Coelacanth
Latimeria chalumnae
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Cow
Bos taurus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP---
Dog
Canis lupus familiaris
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP--BAM/BigWig
Dolphin
Tursiops truncatus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Duck
Anas platyrhynchos
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Elephant
Loxodonta africana
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Ferret
Mustela putorius furo
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP--BAM/BigWig
Flycatcher
Ficedula albicollis
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP--BAM/BigWig
Fruitfly
Drosophila melanogaster
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP---
Fugu
Takifugu rubripes
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Gibbon
Nomascus leucogenys
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP--BAM/BigWig
Gorilla
Gorilla gorilla gorilla
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Guinea Pig
Cavia porcellus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Hedgehog
Erinaceus europaeus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Horse
Equus caballus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP---
Hyrax
Procavia capensis
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Kangaroo rat
Dipodomys ordii
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Lamprey
Petromyzon marinus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Lesser hedgehog tenrec
Echinops telfairi
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Macaque
Macaca mulatta
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP---
Marmoset
Callithrix jacchus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Medaka
Oryzias latipes
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Megabat
Pteropus vampyrus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Microbat
Myotis lucifugus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Mouse Lemur
Microcebus murinus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Olive baboon
Papio anubis
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP--BAM/BigWig
Opossum
Monodelphis domestica
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP--BAM/BigWig
Orangutan
Pongo abelii
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP--BAM/BigWig
Panda
Ailuropoda melanoleuca
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Pig
Sus scrofa
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP--BAM/BigWig
Pika
Ochotona princeps
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Platyfish
Xiphophorus maculatus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP--BAM/BigWig
Platypus
Ornithorhynchus anatinus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP--BAM/BigWig
Rabbit
Oryctolagus cuniculus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP--BAM/BigWig
Rat
Rattus norvegicus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP--BAM/BigWig
Saccharomyces cerevisiae
Saccharomyces cerevisiae
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP---
Sheep
Ovis aries
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP--BAM/BigWig
Shrew
Sorex araneus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Sloth
Choloepus hoffmanni
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Spotted gar
Lepisosteus oculatus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP--BAM/BigWig
Squirrel
Ictidomys tridecemlineatus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Stickleback
Gasterosteus aculeatus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Tarsier
Tarsius syrichta
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Tasmanian devil
Sarcophilus harrisii
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Tetraodon
Tetraodon nigroviridis
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP---
Tilapia
Oreochromis niloticus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP--BAM/BigWig
Tree Shrew
Tupaia belangeri
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Turkey
Meleagris gallopavo
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP---
Vervet-AGM
Chlorocebus sabaeus
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP--BAM/BigWig
Wallaby
Macropus eugenii
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Xenopus
Xenopus tropicalis
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQL--VEP---
Zebra Finch
Taeniopygia guttata
FASTAFASTAFASTAFASTAFASTAEMBLGenBankGTF GFF3MySQLGVFVCFVEP---

To facilitate storage and download all databases are GNU Zip (gzip, *.gz) compressed.

About the data

The following types of data dumps are available on the FTP site.

FASTA
FASTA sequence databases of Ensembl gene, transcript and protein model predictions. Since the FASTA format does not permit sequence annotation, these database files are mainly intended for use with local sequence similarity search algorithms. Each directory has a README file with a detailed description of the header line format and the file naming conventions.
DNA
Masked and unmasked genome sequences associated with the assembly (contigs, chromosomes etc.).
The header line in an FASTA dump files containing DNA sequence consists of the following attributes : coord_system:version:name:start:end:strand This coordinate-system string is used in the Ensembl API to retrieve slices with the SliceAdaptor.
CDS
Coding sequences for Ensembl or ab initio predicted genes.
cDNA
cDNA sequences for Ensembl or ab initio predicted genes.
Peptides
Protein sequences for Ensembl or ab initio predicted genes.
RNA
Non-coding RNA gene predictions.
Annotated sequence
Flat files allow more extensive sequence annotation by means of feature tables and contain thus the genome sequence as annotated by the automated Ensembl genome annotation pipeline. Each nucleotide sequence record in a flat file represents a 1Mb slice of the genome sequence. Flat files are broken into chunks of 1000 sequence records for easier downloading.
EMBL
Ensembl database dumps in EMBL nucleotide sequence database format
GenBank
Ensembl database dumps in GenBank nucleotide sequence database format
MySQL
All Ensembl MySQL databases are available in text format as are the SQL table definition files. These can be imported into any SQL database for a local installation of a mirror site. Generally, the FTP directory tree contains one directory per database. For more information about these databases and their Application Programming Interfaces (or APIs) see the API section.
GTF
Gene sets for each species. These files include annotations of both coding and non-coding genes. This file format is described here.
GFF3
GFF3 provides access to all annotated transcripts which make up an Ensembl gene set. This file format is described here.
EMF flatfile dumps (comparative data)

Alignments of resequencing data are available for several species as Ensembl Multi Format (EMF) flatfile dumps. The accompanying README file describes the file format.

Also, the same format is used to dump whole-genome multiple alignments as well as gene-based multiple alignments and phylogentic trees used to infer Ensembl orthologues and paralogues. These files are available in the ensembl_compara database which will be found in the mysql directory.

MAF (comparative data)

MAF files are provided for all pairwise alignments containing human (GRCh38), and all multiple alignments. The MAF file format is described here.

GVF (variation data)
GVF (Genome Variation Format) is a simple tab-delimited format derived from GFF3 for variation positions across the genome. There are GVF files for different types of variation data (e.g. somatic variants, structural variants etc). For more information see the "README" files in the GVF directory.
BED format files (comparative data)

Constrained elements calculated using GERP are available in BED format. For more information see the accompanying README file.

BED format is a simple line-based format. The first 3 mandatory columns are:

  • chromosome name (may start with 'chr' for compliance with UCSC)
  • start position. This is a 0-based position
  • end position.

More information on the BED file format...

Tarball

The entire Ensembl API is gzipped and concatenated into a single TAR file. This is updated daily.