Mouse strains
Assembly and annotation
The Mouse Genomes Project is an ongoing initiative to sequence and catalogue molecular variation across common laboratory mouse strains. Currently, high-quality reference genomes are available for 16 inbred strains (129S1/SvImJ, A/J, ARK/J, BALB/cJ, C3H/HeJ, C57BL/6NJ, CAST/EiJ, CBA/J, DBA/2J, FVB/NJ, JF1/MsJ, LP/J, NOD/ShiLtJ, NZO/HlLtJ, PWK/PhJ, and WSB/EiJ) created using a combination of short- and long-range Illumina libraries, optical maps, and third-generation sequencing data.
The strain-specific genome annotations were generated by mapping GENCODE M30 genes and transcripts via the Ensembl Human automated annotation system, supplemented by methods from the Ensembl vertebrate annotation pipeline. Mapped GENCODE structures served as the primary evidence with gaps in the annotations filled using aligned short-read transcriptomic data and full-length transcripts derived from PacBio IsoSeq long-read data.
Comparative analysis
Using our EPO pipeline, we generated a multiple genome alignment of 16 of the reference-quality genomes from the The Mouse Genomes Project with Mus musculus, Rattus norvegicus and an additional three Mus species: Mus caroli, Mus pahari and Mus spicilegus. Furthermore, we have computed a LastZ alignment of Mus spretus and the three additional Mus species against the Mus musculus reference genome.
We provide multiple sets of gene-trees and orthologues in Ensembl, two of which include genes from a mouse genome. The standard gene-trees and orthologues comprise genes from representatives of selected Ensembl species, whilst the Murinae-specific gene-trees and orthologues comprise genes from all mouse strains and include genes from Mus musculus, Rattus norvegicus, Mus spretus and the three aforementioned Mus species. A stepwise approach via one of these six species is required in order to compare genes from mouse strains to genes from species not in the Murinae set.
In accordance with the Fort Lauderdale Agreement, please check the publication status of the genome/assembly before publishing any genome-wide analyses using these data.