EnsemblEnsembl Home

Genomic alignments

NB: Some of the alignment statistics are only available via the Ensembl 75 archive (January 2014), but are still valid for the current version. Redirections are explicitly stated when used.

BlastZ-net/Lastz-net Pairwise Alignment Analysis

BlastZ-net (Schwartz S et al., Genome Res.;13(1):103-7, Kent WJ et al., Proc Natl Acad Sci U S A., 2003;100(20):11484-9) or the newer version LastZ-net alignments are provided for closely related pairs of species. The alignments are the results of post-processing the raw BlastZ or LastZ results. In the first step, original blocks are chained according to their location in both genomes. The netting process chooses for the reference species the best sub-chain in each region. The reference species in the BlastZ-net or LastZ-net alignments is in bold:

Homo sapiens

Mus musculus

Canis familiaris

Sus scrofa

Bos taurus

Monodelphis domestica

Anolis carolinensis

Gallus gallus

Danio rerio

Oryzias latipes

Gasterosteus aculeatus

Ciona intestinalis

Translated Blat Pairwise Alignment Analysis

Translated blat (Kent W, Genome Res., 2002;12(4):656-64) is used to look for homologous regions between more distantly related pairs of species. We expect to find homologies mainly in coding regions. There are 2 sets of translated blat analyses: a new set where the raw results were passed through a chain and netting procedure similar to that used for the BlastZ-net analyses to produce the best sub-chain for the reference species (Translated Blat Net).

Translated Blat Net

Mus
musculus
M.mus
Rattus
norvegicus
-R.nor
Gallus
gallus
--G.gal
Xenopus
tropicalis
YES (e75)YES (e75)YES (e75)X.tro
Latimeria
chalumnae
---YES (e75)L.cha
Danio
rerio
YES (e75)-YES (e75)YES (e75)-D.rer
Oreochromis
niloticus
-----YES (e75)O.nil
Oryzias
latipes
-----YES (e75)-O.lat
Xiphophorus
maculatus
-----YES (e75)-YES (e75)X.mac
Takifugu
rubripes
-----YES (e75)---T.rub
Tetraodon
nigroviridis
---YES (e75)------T.nig
Gasterosteus
aculeatus
-----------G.acu
Petromyzon
marinus
-----YES (e75)-----YES (e75)P.mar
Ciona
intestinalis
YES (e75)YES (e75)---YES (e75)------YES (e75)C.int
Ciona
savignyi
--YES (e75)--YES (e75)--------C.sav

PECAN Multiple Alignment Analysis

Pecan is used to provide global multiple genomic alignments. First, Mercator is used to build a synteny map between the genomes and then Pecan builds alignments in these syntenic regions.

Pecan is a global multiple sequence alignment program that makes practical the probabilistic consistency methodology for significant numbers of sequences of practically arbitrary length. As input it takes a set of sequences and a phylogenetic tree. The parameters and heuristics it employs are highly user configurable, it is written entirely in Java and also requires the installation of Exonerate. Pecan source code.

23 amniota vertebrates Pecan

(method_link_type="PECAN" : species_set_name="amniotes")

SpeciesGenome length (bp)Genome coverage (bp)Genome coverage (%)Coding exon length (bp)Coding exon coverage (bp)Coding exon coverage (%)
Human (Homo sapiens)3,096,649,7261,449,597,39946.8135,204,56730,632,30787.01
Gorilla (Gorilla gorilla)3,040,677,0441,401,041,46946.0832,621,44728,109,82786.17
Chimpanzee (Pan troglodytes)3,309,577,9221,330,016,60940.1930,309,84226,442,36887.24
Orangutan (Pongo abelii)3,446,771,3961,408,678,54240.8730,528,07725,851,50684.68
Macaque (Macaca mulatta)3,097,179,9601,442,162,00546.5632,460,85227,941,53686.08
Olive baboon (Papio anubis)2,948,397,2261,308,518,87844.3831,353,51227,254,53486.93
Vervet-AGM (Chlorocebus sabaeus)2,789,656,3281,231,362,92144.1431,292,71627,240,53487.05
Marmoset (Callithrix jacchus)2,914,958,5441,255,518,99843.0733,589,70727,779,67982.70
Mouse (Mus musculus)2,730,871,7741,161,840,68642.5435,816,97129,590,86582.62
Rat (Rattus norvegicus)2,909,698,9381,100,581,65037.8234,192,81727,888,19281.56
Rabbit (Oryctolagus cuniculus)2,737,490,5011,033,829,02637.7730,778,39624,398,80379.27
Horse (Equus caballus)2,474,929,062934,981,09337.7831,526,20325,857,95782.02
Cat (Felis catus)2,455,541,136970,535,56039.5230,851,95926,140,05484.73
Dog (Canis familiaris)2,410,976,8751,024,199,40242.4833,014,76127,504,53683.31
Pig (Sus scrofa)2,808,525,991659,225,20123.4729,865,12222,020,85373.73
Cow (Bos taurus)2,670,422,2991,121,580,35442.0032,345,38827,401,53584.72
Sheep (Ovis aries)2,619,054,3881,171,040,05844.7132,776,75027,474,70983.82
Opossum (Monodelphis domestica)3,605,631,7281,281,529,96035.5433,971,16224,134,85771.05
Platypus (Ornithorhynchus anatinus)2,073,148,626444,318,25121.4324,997,75616,218,86164.88
Anole lizard (Anolis carolinensis)1,799,143,587656,232,00136.4729,512,18521,410,27672.55
Chicken (Gallus gallus)1,046,932,099484,332,90746.2625,934,23722,233,57185.73
Turkey (Meleagris gallopavo)1,061,817,103442,280,08141.6522,798,27420,069,93388.03
Zebra Finch (Taeniopygia guttata)1,233,186,341480,739,15838.9823,643,63619,023,84780.46

EPO Multiple Alignment Analysis

The EPO (Enredo, Pecan, Ortheus) pipeline is a three step pipeline for whole-genome multiple alignments. Enredo produces colinear segments from extant genomes handling both rearrangements, deletions and duplications. Pecan, as described above, is used to align these segments. Finally, Ortheus is used to create genome-wide ancestral sequence reconstructions. Further details on these methods can be found at:

The high coverage eutherian mammal alignments were generated using the recent EPO (Enredo Pecan Ortheus) pipeline.

Each alignment set can be accessed using the Compara API via the Bio::EnsEMBL::DBSQL::MethodLinkSpeciesSetAdaptor using the method_link_type and either the list of the species or the species_set_name.

5 teleost fish EPO

(method_link_type="EPO" : species_set_name="fish")

SpeciesGenome length (bp)Genome coverage (bp)Genome coverage (%)Coding exon length (bp)Coding exon coverage (bp)Coding exon coverage (%)
Zebrafish (Danio rerio)1,412,464,843364,683,05925.8243,109,54311,467,61226.60
Medaka (Oryzias latipes)869,000,216505,433,54358.1630,205,02120,317,88467.27
Fugu (Takifugu rubripes)393,312,790261,082,88766.3833,921,07224,718,72772.87
Tetraodon (Tetraodon nigroviridis)358,618,246225,077,81262.7630,080,16720,827,98769.24
Stickleback (Gasterosteus aculeatus)461,533,448327,333,45670.9232,649,41823,391,89071.65

17 eutherian mammals EPO

(method_link_type="EPO" : species_set_name="mammals")

SpeciesGenome length (bp)Genome coverage (bp)Genome coverage (%)Coding exon length (bp)Coding exon coverage (bp)Coding exon coverage (%)
Human (Homo sapiens)3,096,649,7262,755,470,38288.9835,228,30533,803,88295.96
Gorilla (Gorilla gorilla)3,040,677,0442,578,151,16584.7932,621,44728,632,03487.77
Chimpanzee (Pan troglodytes)3,309,577,9222,698,884,61981.5530,309,84228,046,35892.53
Orangutan (Pongo abelii)3,446,771,3962,614,430,04375.8530,528,07726,687,61387.42
Macaque (Macaca mulatta)3,097,179,9602,545,355,08482.1832,460,85228,453,94087.66
Olive baboon (Papio anubis)2,948,397,2262,583,387,86087.6231,353,51228,789,62391.82
Vervet-AGM (Chlorocebus sabaeus)2,789,656,3282,574,660,97192.2931,292,71629,008,44892.70
Marmoset (Callithrix jacchus)2,914,958,5442,059,056,92370.6433,589,70724,380,04772.58
Mouse (Mus musculus)2,730,871,7742,138,467,87978.3135,816,97130,299,67784.60
Rat (Rattus norvegicus)2,909,698,9382,096,479,04172.0534,224,78027,696,12680.92
Rabbit (Oryctolagus cuniculus)2,737,490,5012,182,126,81779.7130,778,39624,803,65680.59
Horse (Equus caballus)2,474,929,0622,167,651,44787.5831,526,20327,995,62588.80
Cat (Felis catus)2,455,541,1361,429,106,96258.2030,851,95918,501,81159.97
Dog (Canis familiaris)2,410,976,8752,115,051,63487.7333,014,76129,288,59488.71
Pig (Sus scrofa)2,808,525,9911,825,001,47264.9829,865,12220,539,82068.78
Cow (Bos taurus)2,670,422,2992,427,506,17890.9032,345,38828,952,17989.51
Sheep (Ovis aries)2,619,054,3882,329,435,97188.9432,776,75027,634,09684.31

8 primates EPO

(method_link_type="EPO" : species_set_name="primates")

SpeciesGenome length (bp)Genome coverage (bp)Genome coverage (%)Coding exon length (bp)Coding exon coverage (bp)Coding exon coverage (%)
Human (Homo sapiens)3,096,649,7262,798,426,03590.3735,228,30534,355,16797.52
Gorilla (Gorilla gorilla)3,040,677,0442,721,510,69789.5032,621,44730,789,79294.39
Chimpanzee (Pan troglodytes)3,309,577,9222,774,079,21983.8230,309,84229,216,91596.39
Orangutan (Pongo abelii)3,446,771,3962,764,606,35080.2130,528,07728,693,92393.99
Macaque (Macaca mulatta)3,097,179,9602,677,973,91186.4632,460,85230,641,21594.39
Olive baboon (Papio anubis)2,948,397,2262,681,303,53390.9431,353,51230,183,69096.27
Vervet-AGM (Chlorocebus sabaeus)2,789,656,3282,657,674,40495.2731,292,71630,243,85996.65
Marmoset (Callithrix jacchus)2,914,958,5442,463,878,87984.5333,589,70729,351,57887.38

4 sauropsids EPO

(method_link_type="EPO" : species_set_name="sauropsids")

SpeciesGenome length (bp)Genome coverage (bp)Genome coverage (%)Coding exon length (bp)Coding exon coverage (bp)Coding exon coverage (%)
Anole lizard (Anolis carolinensis)1,799,143,587933,866,24051.9129,512,18516,114,79754.60
Chicken (Gallus gallus)1,046,932,099969,389,72492.5925,934,23723,761,98191.62
Turkey (Meleagris gallopavo)1,061,998,909968,012,68791.1522,798,87720,700,88790.80
Zebra Finch (Taeniopygia guttata)1,233,186,341988,572,64180.1623,643,63619,638,10583.06

The full set of eutherian mammmal alignments were not generated using the EPO pipeline due to difficulties with running Ortheus on the low coverage genomes. Instead the low coverage genomes were projected on to the high coverage EPO eutherian mammal alignments using (B)lastZ-net alignments.

39 eutherian mammals EPO_LOW_COVERAGE

(method_link_type="EPO_LOW_COVERAGE" : species_set_name="mammals")

SpeciesGenome length (bp)Genome coverage (bp)Genome coverage (%)Coding exon length (bp)Coding exon coverage (bp)Coding exon coverage (%)
Human (Homo sapiens)3,096,649,7262,755,470,38288.9835,228,30533,803,88295.96
Gorilla (Gorilla gorilla)3,040,677,0442,578,151,16584.7932,621,44728,632,03487.77
Chimpanzee (Pan troglodytes)3,309,577,9222,698,884,61981.5530,309,84228,046,35892.53
Orangutan (Pongo abelii)3,446,771,3962,613,430,04375.8230,528,07726,687,33187.42
Gibbon (Nomascus leucogenys)2,936,035,3332,413,933,49782.2230,035,02928,823,67995.97
Macaque (Macaca mulatta)3,097,179,9602,545,355,08482.1832,460,85228,453,94087.66
Olive baboon (Papio anubis)2,948,397,2262,583,387,86087.6231,353,51228,789,62391.82
Vervet-AGM (Chlorocebus sabaeus)2,789,656,3282,669,882,97995.7131,292,71630,613,56197.83
Marmoset (Callithrix jacchus)2,914,958,5442,059,056,92370.6433,589,70724,380,04772.58
Tarsier (Tarsius syrichta)3,187,383,1401,239,689,80838.8922,207,44818,489,01283.26
Mouse Lemur (Microcebus murinus)2,910,103,0141,221,556,44041.9826,497,27622,211,95883.83
Bushbaby (Otolemur garnettii)2,519,404,4931,434,095,49656.9231,659,61829,319,26492.61
Mouse (Mus musculus)2,730,871,7742,138,467,87978.3135,816,97130,299,67784.60
Rat (Rattus norvegicus)2,909,698,9382,096,479,04172.0534,224,78027,696,12680.92
Kangaroo rat (Dipodomys ordii)2,165,294,802733,231,55433.8626,615,16422,639,08485.06
Squirrel (Ictidomys tridecemlineatus)2,478,393,7701,352,308,65754.5628,885,86826,788,17392.74
Guinea Pig (Cavia porcellus)2,723,219,6411,184,499,19443.5029,810,63427,100,90390.91
Pika (Ochotona princeps)4,781,903,803751,981,73915.7326,511,26019,617,26174.00
Rabbit (Oryctolagus cuniculus)2,737,490,5012,182,126,81779.7130,778,39624,803,65680.59
Tree Shrew (Tupaia belangeri)3,670,341,392976,111,49026.5924,459,05319,380,72479.24
Hedgehog (Erinaceus europaeus)3,377,541,562535,966,18715.8723,719,45819,009,25680.14
Shrew (Sorex araneus)2,944,193,050556,716,86818.9120,841,35816,545,51479.39
Microbat (Myotis lucifugus)2,034,575,3001,089,256,53453.5430,512,54626,927,00088.25
Megabat (Pteropus vampyrus)1,999,614,4631,245,844,78562.3028,943,43226,485,99191.51
Horse (Equus caballus)2,474,929,0622,167,651,44787.5831,526,20327,995,62588.80
Cat (Felis catus)2,455,541,1361,429,106,96258.2030,851,95918,501,81159.97
Dog (Canis familiaris)2,410,976,8752,115,051,63487.7333,014,76129,288,59488.71
Panda (Ailuropoda melanoleuca)2,299,509,0151,469,493,93363.9031,130,93829,470,61794.67
Ferret (Mustela putorius furo)2,410,758,0131,407,529,88758.3931,900,58129,826,70793.50
Dolphin (Tursiops truncatus)2,521,923,9361,324,098,96952.5028,474,70626,305,55592.38
Pig (Sus scrofa)2,808,525,9911,825,001,47264.9829,865,12220,539,82068.78
Cow (Bos taurus)2,670,422,2992,427,506,17890.9032,345,38828,952,17989.51
Sheep (Ovis aries)2,619,054,3882,329,435,97188.9432,776,75027,634,09684.31
Alpaca (Vicugna pacos)2,967,746,1331,074,381,68036.2019,005,74215,950,42183.92
Sloth (Choloepus hoffmanni)2,467,493,193939,269,59538.0719,729,25415,466,78578.40
Armadillo (Dasypus novemcinctus)3,631,522,7111,279,438,51235.2334,329,44628,681,77283.55
Lesser hedgehog tenrec (Echinops telfairi)3,833,584,941625,115,18116.3125,963,04320,464,27078.82
Elephant (Loxodonta africana)3,196,760,8331,282,486,53340.1231,407,75428,176,76789.71
Hyrax (Procavia capensis)2,993,339,806842,515,97828.1527,180,28322,771,97183.78

11 teleost fish EPO_LOW_COVERAGE

(method_link_type="EPO_LOW_COVERAGE" : species_set_name="fish")

SpeciesGenome length (bp)Genome coverage (bp)Genome coverage (%)Coding exon length (bp)Coding exon coverage (bp)Coding exon coverage (%)
Zebrafish (Danio rerio)1,412,464,843364,683,05925.8243,109,54311,467,61226.60
Cave fish (Astyanax mexicanus)1,191,242,57234,036,9572.8636,715,30917,425,98447.46
Cod (Gadus morhua)832,114,58848,909,8365.8829,438,35616,657,54156.58
Tilapia (Oreochromis niloticus)927,383,394131,499,52514.1837,304,12121,084,54556.52
Medaka (Oryzias latipes)869,000,216505,433,54358.1630,205,02120,317,88467.27
Platyfish (Xiphophorus maculatus)729,679,499115,960,75015.8933,492,20420,713,59361.85
Amazon molly (Poecilia formosa)748,923,461121,434,05616.2137,288,09421,288,90757.09
Fugu (Takifugu rubripes)393,312,790261,082,88766.3833,921,07224,718,72772.87
Tetraodon (Tetraodon nigroviridis)358,618,246225,077,81262.7630,080,16720,827,98769.24
Stickleback (Gasterosteus aculeatus)461,533,448327,333,45670.9232,649,41823,391,89071.65
Spotted gar (Lepisosteus oculatus)945,878,03629,020,2863.0731,448,19316,318,45451.89

7 sauropsids EPO_LOW_COVERAGE

(method_link_type="EPO_LOW_COVERAGE" : species_set_name="sauropsids")

SpeciesGenome length (bp)Genome coverage (bp)Genome coverage (%)Coding exon length (bp)Coding exon coverage (bp)Coding exon coverage (%)
Anole lizard (Anolis carolinensis)1,799,143,587933,866,24051.9129,512,18516,114,79754.60
Duck (Anas platyrhynchos)1,105,035,747819,666,51574.1823,616,86821,583,54591.39
Chicken (Gallus gallus)1,046,932,099969,389,72492.5925,934,23723,761,98191.62
Turkey (Meleagris gallopavo)1,061,998,909968,012,68791.1522,798,87720,700,88790.80
Zebra Finch (Taeniopygia guttata)1,233,186,341988,572,64180.1623,643,63619,638,10583.06
Flycatcher (Ficedula albicollis)1,116,409,277725,322,57764.9725,179,52922,077,72187.68
Chinese softshell turtle (Pelodiscus sinensis)2,202,483,752258,539,16411.7428,388,86820,566,35272.45

Ancestral sequences are inferred from the EPO multiple alignments using Ortheus. Ortheus is a probabilistic method for the inference of ancestor, a.k.a tree, alignments. The main contribution of Ortheus is the use of a phylogenetic model incorporating gaps to infer insertion and deletion events. Ancestral sequences are predicted for each node of the phylogenetic tree that relates the sequences. Each ancestral sequence is named according to the derived extant species. For example, a sequence named Hsap, Ptro, Mmul corresponds to the ancestor of the Homo sapiens, Pan troglodytes, and Macaca mulatta genomes.

Age of Base

From these ancestral sequences, we infer the age of a base, i.e. the timing of the most recent mutation for each base of the genome. Each position of the human genome is compared to its immediate inferred ancestor, then its ancestor, etc. until a difference is found. The inferred substitution event therefore occurred on a specific branch of the tree, which is identified by all the extant species which eventually descended from that branch, as illustrated below.

Age of Base schema

Conservation Analysis

Additionally we use Gerp (Cooper GM et al., Genome Res., 2005; 15:901-913) to calculate conservation scores and call constrained elements on the PECAN and EPO_LOW_COVERAGE multiple alignments. Conservation scores are estimated on a column-by-column basis. Constrained elements are stretches of the multiple alignment where the sequences are highly conserved according to the previous score.

Synteny Analysis

We calculate syntenic regions using blastz-net alignments. We look for stretches where the alignment blocks are in synteny. The search is run in two phases. In the first one, syntenic alignments that are closer than 200 kbp are grouped. In the second phase, the groups that are in synteny are linked provided that no more than 2 non-syntenic groups are found between them and they are less than 3Mbp apart.

Homo
sapiens
H.sap
Gorilla
gorilla
YESG.gor
Pan
troglodytes
YES-P.tro
Macaca
mulatta
YES--M.mul
Papio
anubis
YES---P.anu
Chlorocebus
sabaeus
YES----C.sab
Callithrix
jacchus
YES-----C.jac
Mus
musculus
YES------M.mus
Rattus
norvegicus
YES------YESR.nor
Oryctolagus
cuniculus
YES--------O.cun
Equus
caballus
YES---------E.cab
Felis
catus
YES----------F.cat
Canis
familiaris
YES------YES----C.fam
Sus
scrofa
YES------YES-----S.scr
Bos
taurus
YES------YES-----YESB.tau
Ovis
aries
YES------------YESYESO.ari
Monodelphis
domestica
YES---------------M.dom
Ornithorhynchus
anatinus
YES------YES---------O.ana
Anolis
carolinensis
YES-----------------A.car
Gallus
gallus
YES------YES-----------G.gal
Meleagris
gallopavo
YES------------------YESM.gal
Taeniopygia
guttata
YES--------------------T.gut
Danio
rerio
YES---------------------D.rer
Oryzias
latipes
YES----------------------O.lat
Tetraodon
nigroviridis
YES-----------------------T.nig
Gasterosteus
aculeatus
YES------------------------G.acu