Schema Documentation

assembly

The assembly table states, which parts of seq_regions are exactly equal. It enables to transform coordinates between seq_regions. Typically this contains how chromosomes are made of contigs, clones out of contigs, and chromosomes out of supercontigs. It allows you to artificially chunk chromosome sequence into smaller parts. The data in this table defines the "static golden path", i.e. the best effort draft full genome sequence as determined by the UCSC or NCBI (depending which assembly you are using). Each row represents a component, e.g. a contig, (comp_seq_region_id, FK from seq_region table) at least part of which is present in the golden path. The part of the component that is in the path is delimited by fields cmp_start and cmp_end (start < end), and the absolute position within the golden path chromosome (or other appropriate assembled structure) (asm_seq_region_id) is given by asm_start and asm_end.

Column	Type	Default value	Description	Index
asm_seq_region_id	INT(10)	-	Assembly sequence region id. Primary key, internal identifier. Foreign key references to the seq_region table.	key: asm_seq_region_idx unique key: all_idx
cmp_seq_region_id	INT(10)	-	Component sequence region id. Foreign key references to the seq_region table.	key: cmp_seq_region_idx unique key: all_idx
asm_start	INT(10)	-	Start absolute position within the golden path chromosome.	key: asm_seq_region_idx unique key: all_idx
asm_end	INT(10)	-	End absolute position within the golden path chromosome.	unique key: all_idx
cmp_start	INT(10)	-	Component start position within the golden path chromosome.	unique key: all_idx
cmp_end	INT(10)	-	Component start position within the golden path chromosome.	unique key: all_idx
ori	TINYINT	-	Orientation: 1 - sense; -1 - antisense.	unique key: all_idx

See also:

seq_region
supercontigs

List of species with populated data: Show species

assembly_exception

Column	Type	Default value	Description	Index
assembly_exception_id	INT(10)	-	Assembly exception sequence region id. Primary key, internal identifier.	primary key
seq_region_id	INT(10)	-	Sequence region id. Foreign key references to the seq_region table.	key: sr_idx
seq_region_start	INT(10)	-	Sequence start position.	key: sr_idx
seq_region_end	INT(10)	-	Sequence end position.
exc_type	ENUM: HAP PAR PATCH_FIX PATCH_NOVEL	NULL	Exception type, e.g. PAR, HAP - haplotype.
exc_seq_region_id	INT(10)	-	Exception sequence region id. Foreign key references to the seq_region table.	key: ex_idx
exc_seq_region_start	INT(10)	-	Exception sequence start position.	key: ex_idx
exc_seq_region_end	INT(10)	-	Exception sequence end position.
ori	INT	-	Orientation: 1 - sense; -1 - antisense.

Column	Type	Default value	Description	Index
coord_system_id	INT(10)	-	Primary key, internal identifier.	primary key
species_id	INT(10)	1	Indentifies the species for multi-species databases.	unique key: rank_idx unique key: name_idx key: species_idx
name	VARCHAR(40)	-	Co-oridinate system name, e.g. 'chromosome', 'contig', 'scaffold' etc.	unique key: name_idx
version	VARCHAR(255)	NULL	Assembly.	unique key: name_idx
rank	INT	-	Co-oridinate system rank.	unique key: rank_idx
attrib	SET: default_version sequence_level	NULL	Co-oridinate system attrib (e.g. "top_level", "sequence_level").

Column	Type	Default value	Description	Index
data_file_id	INT(10)	-	Auto-increment surrogate primary key	primary key
coord_system_id	INT(10)	-	Coordinate system this file is linked to. Used to decipher the assembly version it was mapped to	unique key: df_unq_idx
analysis_id	SMALLINT	-	Analysis this file is linked to	unique key: df_unq_idx key: df_analysis_idx
name	VARCHAR(100)	-	Name of the file	unique key: df_unq_idx key: df_name_idx
version_lock	TINYINT(1)	0	Indicates that this file is only compatible with the current Ensembl release version
absolute	TINYINT(1)	0	Flags that the URL given is fully resolved and should be used without question
url	TEXT	NULL	Optional path to the file (can be absolute or relative)
file_type	ENUM: BAM BAMCOV BIGBED BIGWIG VCF	NULL	Type of file e.g. BAM, BIGBED, BIGWIG and VCF	unique key: df_unq_idx

Column	Type	Default value	Description	Index
seq_region_id	INT(10)	-	Primary key, internal identifier. Foreign key references to the seq_region table.	primary key
sequence	LONGTEXT	-	DNA sequence.

Column	Type	Default value	Description	Index
genome_statistics_id	INT(10)	-	Primary key, internal identifier.	primary key
statistic	VARCHAR(128)	-	Name of the statistics	unique key: stats_uniq
value	BIGINT(11)	'0'	Corresponding value of the statistics (count/length)
species_id	INT	1	Indentifies the species for multi-species databases.	unique key: stats_uniq
attrib_type_id	INT(10)	NULL	To distinguish similar statistics for different cases	unique key: stats_uniq
timestamp	DATETIME	NULL	Date the statistics was generated

Column	Type	Default value	Description	Index
karyotype_id	INT(10)	-	Primary key, internal identifier.	primary key
seq_region_id	INT(10)	-	Foreign key references to the seq_region table.	key: region_band_idx
seq_region_start	INT(10)	-	Sequence start position.
seq_region_end	INT(10)	-	Sequence end position.
band	VARCHAR(40)	NULL	Band.	key: region_band_idx
stain	VARCHAR(40)	NULL	Stain.

Column	Type	Default value	Description	Index
meta_id	INT	-	Primary key, internal identifier.	primary key
species_id	INT	1	Indentifies the species for multi-species databases.	unique key: species_key_value_idx key: species_value_idx
meta_key	VARCHAR(64)	-	Name of the meta entry, e.g. "schema_version".	unique key: species_key_value_idx
meta_value	VARCHAR(255)	-	Corresponding value of the key, e.g. "61".	unique key: species_key_value_idx key: species_value_idx

Column	Type	Default value	Description	Index
table_name	VARCHAR(40)	-	Ensembl database table name.	unique key: cs_table_name_idx
coord_system_id	INT(10)	-	Foreign key references to the coord_system table.	unique key: cs_table_name_idx
max_length	INT	NULL	Longest sequence length.

Column	Type	Default value	Description	Index
seq_region_synonym_id	INT	-	Primary key, internal identifier.	primary key
seq_region_id	INT(10)	-	Foreign key references to the seq_region table.	unique key: syn_idx key: seq_region_idx
synonym	VARCHAR(250)	-	Alternative name for sequence region.	unique key: syn_idx
external_db_id	INT	NULL	Foreign key references to the external_db table.

Column	Type	Default value	Description	Index
associated_group_id	INT(10)	-	Associated group id. Primary key, internal identifier	primary key
description	VARCHAR(128)	NULL	Optional description for this group

Column	Type	Default value	Description	Index
biotype_id	INT	-	Primary key, internal identifier.	primary key
name	VARCHAR(64)	-	Ensembl biotype name.	unique key: name_type_idx
object_type	ENUM: gene transcript	'gene'	Ensembl object type: 'gene' or 'transcript'.	unique key: name_type_idx
db_type	SET: cdna core coreexpressionatlas coreexpressionest coreexpressiongnf funcgen otherfeatures rnaseq variation vega presite sangervega	'core'	Type, e.g. 'cdna', 'core', 'coreexpressionatlas', 'coreexpressionest', 'coreexpressiongnf', 'funcgen', 'otherfeatures', 'rnaseq', 'variation', 'vega', 'presite', 'sangervega'
attrib_type_id	INT	NULL	Foreign key references to the attrib_type table.
description	TEXT	NULL	Description.
biotype_group	ENUM: coding pseudogene snoncoding lnoncoding mnoncoding LRG undefined no_group	NULL	Group, e.g. 'coding', 'pseudogene', 'snoncoding', 'lnoncoding', 'mnoncoding', 'LRG', 'undefined', 'no_group'
so_acc	VARCHAR(64)	NULL	Sequence Ontology accession of the biotype.
so_term	VARCHAR(1023)	NULL	Sequence Ontology term of the biotype.

Column	Type	Default value	Description	Index
external_db_id	INT	-	Primary key, internal identifier.	primary key
db_name	VARCHAR(100)	-	Database name.	unique key: db_name_db_release_idx
db_release	VARCHAR(255)	NULL	Database release.	unique key: db_name_db_release_idx
status	ENUM: KNOWNXREF KNOWN XREF PRED ORTH PSEUDO	-	Status, e.g. 'KNOWNXREF','KNOWN','XREF','PRED','ORTH','PSEUDO'.
priority	INT	-	Determines which one of the xrefs will be used as the gene name.
db_display_name	VARCHAR(255)	NULL	Database display name.
type	ENUM: ARRAY ALT_TRANS ALT_GENE MISC LIT PRIMARY_DB_SYNONYM ENSEMBL	-	Type, e.g. 'ARRAY', 'ALT_TRANS', 'ALT_GENE', 'MISC', 'LIT', 'PRIMARY_DB_SYNONYM', 'ENSEMBL'.
secondary_db_name	VARCHAR(255)	NULL	Secondary database name.
secondary_db_table	VARCHAR(255)	NULL	Secondary database table.
description	TEXT	NULL	Description.

Column	Type	Default value	Description	Index
associated_xref_id	INT(10)	-	Associated xref id. Primary key, internal identifier	primary key
object_xref_id	INT(10)	'0'	Object xref id this associated xref is linked to. Foreign key linked to the object_xref table	key: associated_object_idx unique key: object_associated_source_type_idx
xref_id	INT(10)	'0'	Xref which is the associated term. Foreign key linked to the xref table	key: associated_idx unique key: object_associated_source_type_idx
source_xref_id	INT(10)	NULL	Xref which is source of this association. Foreign key linked to the xref table	key: associated_source_idx unique key: object_associated_source_type_idx
condition_type	VARCHAR(128)	NULL	The type of condition this link occurs in e.g. evidence, from, residue or assigned_by	unique key: object_associated_source_type_idx
associated_group_id	INT(10)	NULL	Foreign key to allow for associated_group	key: associated_group_idx unique key: object_associated_source_type_idx
rank	INT(10)	'0'	The rank in which the association occurs within an associated_group

Column	Type	Default value	Description	Index
interpro_ac	VARCHAR(40)	-	InterPro protein accession number.	unique key: accession_idx
id	VARCHAR(40)	-	InterPro protein id.	unique key: accession_idx key: id_idx

Column	Type	Default value	Description	Index
object_xref_id	INT(10)	'0'	Composite key. Foreign key references to the object_xref table.	key: object_idx unique key: object_source_type_idx
source_xref_id	INT(10)	NULL	Composite key. Foreign key references to the xref table.	key: source_idx unique key: object_source_type_idx
linkage_type	VARCHAR(3)	NULL	Composite key. Evidence tags	unique key: object_source_type_idx

Column	Type	Default value	Description	Index
unmapped_object_id	INT(10)	-	Primary key, internal identifier.	primary key
type	ENUM: xref cDNA Marker	-	Object type: 'xref', 'cDNA', 'Marker'.
analysis_id	SMALLINT	-	Foreign key references to the analysis table.	key: anal_exdb_idx
external_db_id	INT	NULL	Foreign key references to the external_db table.	unique key: unique_unmapped_obj_idx key: anal_exdb_idx key: ext_db_identifier_idx
identifier	VARCHAR(255)	-	External database identifier.	unique key: unique_unmapped_obj_idx key: id_idx key: ext_db_identifier_idx
unmapped_reason_id	INT(10)	-	Foreign key references to the unmapped_reason table.	unique key: unique_unmapped_obj_idx
query_score	DOUBLE	NULL	Actual mapping query score.
target_score	DOUBLE	NULL	Target mapping query score.
ensembl_id	INT(10)	'0'	Foreign key references to the seq_region, transcript, gene, @translation tables depending on ensembl_object_type.	unique key: unique_unmapped_obj_idx
ensembl_object_type	ENUM: RawContig Transcript Gene Translation	'RawContig'	Ensembl object type: 'RawContig', 'Transcript', 'Gene','Translation'.	unique key: unique_unmapped_obj_idx
parent	VARCHAR(255)	NULL	Foreign key references to the dependent_xref table, in case the unmapped object is dependent on a primary external reference which wasn't mapped to an ensembl one.	unique key: unique_unmapped_obj_idx

Column	Type	Default value	Description	Index
density_feature_id	INT(10)	-	Primary key, internal identifier.	primary key
density_type_id	INT(10)	-	Density type. Foreign key references to the density_type table.	key: seq_region_idx
seq_region_id	INT(10)	-	Sequence region. Foreign key references to the seq_region table.	key: seq_region_idx key: seq_region_id_idx
seq_region_start	INT(10)	-	Sequence start position.	key: seq_region_idx
seq_region_end	INT(10)	-	Sequence end position.
density_value	FLOAT	-	Density value.

Column	Type	Default value	Description	Index
ditag_id	INT(10)	-	Primary key, internal identifier.	primary key
name	VARCHAR(30)	-	Ditag name.
type	VARCHAR(30)	-	Ditag type.
tag_count	smallint(6)	1	Tag count.
sequence	TINYTEXT	-	Sequence.

Column	Type	Default value	Description	Index
ditag_feature_id	INT(10)	-	Primary key, internal identifier.	primary key
ditag_id	INT(10)	'0'	Foreign key references to the ditag table.	key: ditag_idx
ditag_pair_id	INT(10)	'0'	Ditag pair id.	key: ditag_pair_idx
seq_region_id	INT(10)	'0'	Foreign key references to the seq_region table.	key: seq_region_idx
seq_region_start	INT(10)	'0'	Sequence start position.	key: seq_region_idx
seq_region_end	INT(10)	'0'	Sequence end position.	key: seq_region_idx
seq_region_strand	TINYINT(1)	'0'	Sequence region strand: 1 - forward; -1 - reverse.
analysis_id	SMALLINT	'0'	Foreign key references to the analysis table.
hit_start	INT(10)	'0'	Alignment hit start position.
hit_end	INT(10)	'0'	Alignment hit end position.
hit_strand	TINYINT(1)	'0'	Alignment hit strand: 1 - forward; -1 - reverse.
cigar_line	TINYTEXT	-	Used to encode gapped alignments.
ditag_side	ENUM: F L R	-	Ditag side: L - start, R - end, F - 5\'tag only

Column	Type	Default value	Description	Index
intron_supporting_evidence_id	INT(10)	-	Surrogate primary key	primary key
analysis_id	SMALLINT	-	Foreign key references to the analysis table.	unique: key
seq_region_id	INT(10)	-	Foreign key references to the seq_region table.	unique: key key: seq_region_idx
seq_region_start	INT(10)	-	Sequence start position.	unique: key key: seq_region_idx
seq_region_end	INT(10)	-	Sequence end position.	unique: key
seq_region_strand	TINYINT(2)	-	Sequence region strand: 1 - forward; -1 - reverse.	unique: key
hit_name	VARCHAR(100)	-	External entity name/identifier.	unique: key
score	DECIMAL(10,3)	NULL	Score supporting the intron
score_type	ENUM: NONE DEPTH	'NONE'	The type of score e.g. NONE
is_splice_canonical	TINYINT(1)	0	Indicates if the splice junction can be considered canonical i.e. behaves according to accepted rules

Column	Type	Default value	Description	Index
map_id	INT(10)	-	Primary key, internal identifier.	primary key
map_name	VARCHAR(30)	-	Map name.

Column	Type	Default value	Description	Index
marker_id	INT(10)	-	Primary key, internal identifier.	primary key key: marker_idx
display_marker_synonym_id	INT(10)	NULL	Marker synonym.	key: display_idx
left_primer	VARCHAR(100)	-	Left primer sequence.
right_primer	VARCHAR(100)	-	Right primer sequence.
min_primer_dist	INT(10)	-	Minimum primer distance.
max_primer_dist	INT(10)	-	Maximum primer distance.
priority	INT	NULL	Priority.	key: marker_idx
type	ENUM: est microsatellite	NULL	Type, e.g. 'est', 'microsatellite'.

Upcoming Ensembl Platform Transition

Ensembl Core - Schema documentation

List of the tables:

Assembly Tables

External References

Features

Fundamental Tables

ID Mapping

Misc

Assembly Tables

External References

Features

Fundamental Tables

ID Mapping

Misc

About Us

Get help

Our sister sites

Follow us

Column	Type	Default value	Description	Index
marker_feature_id	INT(10)	-	Primary key, internal identifier.	primary key
marker_id	INT(10)	-	Foreign key references to the marker table.
seq_region_id	INT(10)	-	Foreign key references to the seq_region table.	key: seq_region_idx
seq_region_start	INT(10)	-	Sequence start position.	key: seq_region_idx
seq_region_end	INT(10)	-	Sequence end position.
analysis_id	SMALLINT	-	Foreign key references to the analysis table.	key: analysis_idx
map_weight	INT(10)	NULL	The number of times that this marker has been mapped to the genome, e.g. a marker with map weight 3 has been mapped to 3 locations in the genome.

Column	Type	Default value	Description	Index
marker_synonym_id	INT(10)	-	Primary key, internal identifier.	primary key key: marker_synonym_idx
marker_id	INT(10)	-	Foreign key references to the marker table.	key: marker_idx
source	VARCHAR(20)	NULL	Marker source.
name	VARCHAR(50)	NULL	Alternative name for marker.	key: marker_synonym_idx

Column	Type	Default value	Description	Index
misc_feature_id	INT(10)	'0'	Foreign key references to the misc_feature table.	key: misc_feature_idx unique key: misc_attribx
attrib_type_id	SMALLINT(5)	'0'	Foreign key references to the attrib_type table.	key: type_val_idx unique key: misc_attribx
value	TEXT	-	Attribute value.	key: type_val_idx key: val_only_idx unique key: misc_attribx

Column	Type	Default value	Description	Index
misc_set_id	SMALLINT(5)	-	Primary key, internal identifier.	primary key
code	VARCHAR(25)	''	Set code, e.g. bac_map	unique key: code_idx
name	VARCHAR(255)	''	Code name, e.g. BAC map
description	TEXT	-	Code description, e.g. Full list of mapped BAC clones
max_length	INT	-	Longest feature, e.g. 500000

Column	Type	Default value	Description	Index
prediction_exon_id	INT(10)	-	Primary key, internal identifier.	primary key
prediction_transcript_id	INT(10)	-	Foreign key references to the prediction_transcript table.	key: transcript_idx
exon_rank	SMALLINT	-	Exon rank
seq_region_id	INT(10)	-	Foreign key references to the seq_region table.	key: seq_region_idx
seq_region_start	INT(10)	-	Sequence start position.	key: seq_region_idx
seq_region_end	INT(10)	-	Sequence end position.
seq_region_strand	TINYINT	-	Sequence region strand: 1 - forward; -1 - reverse.
start_phase	TINYINT	-	Exon start phase.
score	DOUBLE	NULL	Prediction score.
p_value	DOUBLE	NULL	Prediction p-value.

Column	Type	Default value	Description	Index
repeat_consensus_id	INT(10)	-	Primary key, internal identifier.	primary key
repeat_name	VARCHAR(255)	-	Repeat name.	key: name
repeat_class	VARCHAR(100)	-	E.g. 'Satellite', 'tRNA', 'LTR'.	key: class
repeat_type	VARCHAR(40)	-	E.g. 'Satellite repeats', 'Tandem repeats', 'Low complexity regions'.	key: type
repeat_consensus	TEXT	NULL	Repeat consensus sequence.	key: consensus

Column	Type	Default value	Description	Index
repeat_feature_id	INT(10)	-	Primary key, internal identifier.	primary key
seq_region_id	INT(10)	-	Foreign key references to the seq_region table.	key: seq_region_idx
seq_region_start	INT(10)	-	Sequence start position.	key: seq_region_idx
seq_region_end	INT(10)	-	Sequence end position.
seq_region_strand	TINYINT(1)	'1'	Sequence region strand: 1 - forward; -1 - reverse.
repeat_start	INT(10)	-	Repeat sequence start.
repeat_end	INT(10)	-	Repeat sequence end
repeat_consensus_id	INT(10)	-	Foreign key references to the repeat_consensus table.	key: repeat_idx
analysis_id	SMALLINT	-	Foreign key references to the analysis table.	key: analysis_idx
score	DOUBLE	NULL	Analysis score.

Column	Type	Default value	Description	Index
simple_feature_id	INT(10)	-	Primary key, internal identifier.	primary key
seq_region_id	INT(10)	-	Foreign key references to the seq_region table.	key: seq_region_idx
seq_region_start	INT(10)	-	Sequence start position.	key: seq_region_idx
seq_region_end	INT(10)	-	Sequence end position.
seq_region_strand	TINYINT(1)	-	Sequence region strand: 1 - forward; -1 - reverse.
display_label	VARCHAR(255)	-	Display label for the EnsEMBL web site.	key: hit_idx
analysis_id	SMALLINT	-	Foreign key references to the analysis table.	key: analysis_idx
score	DOUBLE	NULL	Analysis score.

Column	Type	Default value	Description	Index
intron_supporting_evidence_id	INT(10)	-	Foreign key references to the intron_supporting_evidence table	primary key
transcript_id	INT(10)	-	Foreign key references to the transcript table.	primary key key: transcript_idx
previous_exon_id	INT(10)	-	Foreign key to exon indicating the left hand flanking exon of the intron (assume forward strand)
next_exon_id	INT(10)	-	Foreign key to exon indicating the right hand flanking exon of the intron (assume forward strand)

Column	Type	Default value	Description	Index
alt_allele_id	INT	NULL	Primary key, internal identifier.	primary key
gene_id	INT	-	Foreign key references to the gene table.	key: gene_id,alt_allele_group_id key: gene_idx
alt_allele_group_id	INT	-	A group ID to show which alleles are related	key: gene_id,alt_allele_group_id