TDM Gene Management System 6.0
From ICISWiki
Back to Main Page > ICIS 6.0 Home Page > ICIS 6.0 Models & Schemata
MANAGEMENT OF GENETIC DATA
Acknowledgment
Several sections of this page simply cross reference their counterparts on the ICIS 5.4 GEMS narratives. The ICIS 6.0 GEMS schema, however, is modified and follows more closely the nomenclature and architecture of the Generation Challenge Programme Demeter domain model for genotyping data.
Introduction
See ICIS 5.4 GEMS Narrative Introduction.
Molecular Data
See ICIS 5.4 GEMS Narrative Molecular Data.
Molecular Variants
See ICIS 5.4 GEMS Narrative Molecular Variants.
Derived Data
See ICIS 5.4 GEMS Narrative Derived Data.
Genotype Data Model
The GEMS extended provides for scoring genotyping data (in addition to DMS), either as a data warehouse for data held in DMS or for large datasets held only in GEMS. This concept is the genoptype data point, which has the following attributes:
- Identifier - Unique identifier of the data point score.
- Observation Unit(s) - Optional observation unit (or observation unit if more than one 'Detected Variantwas record in the same observation unit) from which the data point was extracted, in known
- Study - The study in which the data point was measured.
- Dataset - The study effect representation in which where the data is to be held. See below.
- Sample - Optional identifier of the DNA sample, if not defined it was assumed that the sample was not recorded independently from the germplasm entry. If there is only one entry per germplasn entry (gid) in a specific study then the gid of originating germplasm can be used as the sample id, otherwise it must be null or a unique identifier for the sample and not the originating germplasm (gid).
- Individual - Optional individual (or plant) number using consecutive numbering scheme starting at 1 and ending at n, where n is the number of individuals genotyped for a specific germplasm entry (gid) in the study. Must be included if more than one individual is genotyped in specific study for a specific germplasm entry (gid).
- Germplasm - The germplasm entry measured in this data point.
- Genome Feature Detector - The genome feature detector (marker) used in this data point. This attribute is redundant since it is possible to determine the Genome Feature Detector from the Detected Variant, however it is included here to speed up queries by Genome Feature Detector.
- Detected Variant Number - An index of the Detected Variant, which is only used if one database is used for each Detected Variant.
- Detected Variant(s) - The Detected Variant or Detected Variants measured or found by the Genome Feature Detector for this data point
- Incidence - The indicates the frequency of that the Detected Variant in the sample relative to other alleles. In the case of dominant markers or co-dominant markers using a genotyping platform where it is not possible to determine the relative frequency of a Detected Variant for a given Genome Feature Detector, this incidence would be '1' indicating the presence of only this allele or '0' indicating the absence of the allele. Where the genotyping platform in use can determine the relative frequency of the allele present in the sample, the incidence will be the allele frequency with '1' indicating the presence of only this allele in homozygote form '0.5' indicating the presence of only this allele in hetrozygote form or '0' indicating the absence of the allele. In the case of bulked DNA samples taken from segregating or open pollinated variety it may be possible to have a incidence (relative frequency) ranging from 0 to 1, inclusive. In all cases a negative value indicates that the genotype was measured in the study, but no incidence could be determined. Different negative numbers could be used to denote different types of this 'missing' data, these would be recorded in the study metadata. However -1 should be reserved for 'missing' data due to a unknown problem. A Null value indicates that the genotype was not measured in this study and is used in some circumstances to 'pad' the data so that it is a complete matrix of data.
Gene Management System (GEMS) and Data Management System (DMS)
Whether the genotyping data is entered directly into GEMS or or is copied into GEMS from DMS, a new Representation in DMS must be added. This Representation provides details on which of the concepts above are recorded in the dataset. If the dataset was previously loaded into DMS there will be two Representations, one for the original workbook data and one for the newly added Representation. The type of the new added Representation need to be decided. The following factors need to be record. Some are optional and will only be present if the cooresponding data is available in the gems_genotyping_scores table
| Factor or Variant Name | Trait | Scale | Method | Mandatory | Factor/Variate |
| Observation Unit 1 | | | | No1 | |
| Observation Unit 2 | | | | No2 | |
| Sample ID3 | GERMPLASM IDENTIFICATION(251) | GID(91) | DNA SAMPLE(495) | No | Factor |
| Individual | GERMPLASM IDENTIFICATION(251) | Entry Number(103) | Not Specified (?) | No4 | Factor |
| Germplasm ID | GERMPLASM IDENTIFICATION(251) | GID(91) | Seed Source(92) | Yes | Factor |
| Genomic Feature Assay ID | MARKER_DETECTOR(503) | MD_ID(516) | GEMSID(497) | Yes | Factor |
| Detected Variant Label | MOLECULAR_VARIANT(501) | ALLELE LABEL (?) | Not Specified (?) | No | Variate |
| Detected Variant Number | MOLECULAR_VARIANT(501) | ALLELE NUMBER(504) | Count(521) | No5 | Factor |
| Detected Variant 1 | MOLECULAR_VARIANT(501) | MVID(503) | First Variant (?) | Yes | Variate/ Factor8 |
| Detected Variant 2 | MOLECULAR_VARIANT(501) | MVID(503) | Second Variant (?) | No6 | Variate |
| Incidence9 | MOLECULAR_VARIANT(501) | PRESENT/ABSENT(508) | Not Specified(508) | No7 | Variate |
| Frequency9 | MOLECULAR_VARIANT(501) | FREQUENCY (513) | Not Specified(519) | No | Variate |
| Quality | CONFIDENCE OF SCORE | RATIO (?) | Not Specified (?) | No | Variate |
1Required if data transferred from DMS.
2Required if data transferred from DMS and data points comprises more than one observation unit.
3Usually only recorded in a local database.
4Required if more than one individual with the same gid is recorded and no separate sample id is recorded.
5Required if more than row is used to record Detected Variants
6Required if co-dominant Detector is used and only one row is used to record Detected Variants
7Required if dominant Detector is used or if more than one row is used to record Detected Variants
8Concidered a Factor if Incidence is recorded
9Incidence and Frequence and mutually exclusive
The Gene Management System Table Structure
Generation Challenge Programme Domain Models
The ICIS 6.0 schema is largely inspired by the structure and class nomenclature of the Generation Challenge Programme Genotype and Genomic Mapping domain models. These models (as Unified Modeling Language diagrams, July 2007 Release 1.8) are replicated here below.
Figure 1. GCP Genotype Model
Figure 2. GCP Genomic Mapping Model
GEMS uses the following table structure to store details on genotype domain model objects (table names and attributes follow the Generation Challenge Programme convention).
Figure 3. ICIS GEMS 6.0 Schema
Notation used:
Table Relationships
The central table of GEMS 6.0 schema is the Entity table (gems_entity) which is the superclass which records all generic information about any data object in the GEMS. The class_id field of this table link to the gms_class table that each major class, sub class or type of actual GEMS entity: LinkageMap, GenomicFeature, etc.
Table Definitions
Generic Entity Tables
GEMS_ENTITY
The GEMS_ENTITY table contains one entry for every major entity object managed by GEMS and keeps track of all generic information about that Entity. In this way the GEMS_ENTITY table is in most cases a direct implementation of the GCP Entity class. In other cases the sub-tables of the GEMS_ENTITY table may represent, at least SimpleFeature objects, which have a specific type or class.
The important rule to note is that although the GEMS_ENTITY table autoincrements its primary keys, any and all subclassed objects (and corresponding table entries) share the identical primary key to the GEMS_ENTITY to which they correspond (e.g. a given entry in the GEMS_LINKAGE_MAP table will have the same primary key value as its corresponding entry in the GEMS_ENTITY table). All the tables below, except the linkage or data tables defined last 'extend' this table. Please note the class_id field is duplicated in all the sub-tables for entity. In some DBDS, this duplication can be performed through inheritance.
The GEMS_ENTITY table has relations with several other tables that provide for the identification, citations and annotation (see the GCP domain model for Entity) that describe the given GEMS_ENTITY entry.
Table 1. GEMS_ENTITY table
| Fieldname | Description | Type |
|---|---|---|
| gentity_id | Long, primary key, autoincrement | Unique ID for the Entity |
| class_id | Long, foreign key to the GMS_CLASS table | The class, sub-class or type of the GEMS Entity |
| description | varchar(255) | Description of the specific GEMS Entity |
GMS_CLASS (move to GMS)
The GMS_CLASS stores a list of various non-overlapping classes, sub classes and types for all ICIS Objects, largely described within the GCP Domain Model for Genotyping, including the following:
- GenomicFeature: a specific segment (subsequence) of DNA in the genome (specified Germplasm) or generally within a class of organism(i.e. species, e.g. Oryza sativa) of interest. A GenomicFeature may (or may not) have a known location ("locus") on specified maps, a known content ("genotype") in specified Germplasm and/or a specified biological/technical biological (e.g. "gene") or technical (e.g. "SSR marker") function of interest.
- LinkageMap: a genetic or physical map of genomic loci
- Locus: experimentally observed assignment ("MapFeature") of a specified GenomicFeature to a specific LinkageMap
- GenomicFeatureDetector: a specific biological reagent and/or information (e.g. PCR primers) that detect a specific subset of possible GenomicFeatures in a genome.
- GenomicFeatureDetectionProtocol: non-biologically specific protocol for using a GenomicFeatureDetector in an experimental assay.
- GenomicFeatureAssay: the potentially many-to-many join of GenomicFeatureDetector with available GenomicFeatureDetectionProtocols
- DetectedVariant: one specified instance of GenomicFeature content ("allele") detected by a GenomicFeatureDetector. If the variant can be analytically interpreted to belong to a specified GenomicFeature, then it is considered in GCP domain model terms to be a GenomicFeatureVariant
The ontology_id and the root_term_id fields are corresponding GCP Ontologies ID (e.g. C0_500) and Term ID (e.g. 0000017 for protein based marker) from the GCP Crop Ontology Project.The parent_id gives the options to define sub-clases of classes. This is used in two possible ways. First the sub-class may define a smaller sub-tree of the ontology tree defined by the parent class. Second, the sub class may be defined in a separate ontolgy from the parent class. This second option in effect makes linkages between different ontologies and should only be used a short term solution to a fully defined single ontology.
Table x. GEMS_CLASS table
| Fieldname | Description | Type |
|---|---|---|
| class_id | Long, primary key | Unique ID of the class |
| class_name | varchar(50), not null | Name of the class |
| ontology_id | Long, foreign key to the OMS | If known, the ontology ID of the ontology which defines are members of this class |
| root_term_id | Long, foreign key to the OMS | If known, the term ID which defines the Ontology subtree for all valid members |
| parent_id | Long, foreign key to the class_id | If the class if in fact a sub-class, the parent_id gives a link to its parent ID. In this case ontology_id and term_id are not defined |
GEMS_NAMES
GEMS_NAMES table serves as a catalogue of alternative names for example for genomic feature detectors and molecular variants. gnames_id is the unique identifier of a specific name and gentity_id is the unique ID of GEMS_ENTITY to which the name belongs. class_id is the 'class' or type of names to which the alternative name belong. gnlocn is location where Name was assigned (more information please)
Note: gnstat from GEMS 5.5 was deleted since only alternative names (gnstat=1) is stored in gems_names Ex.
GNAMES_ID = 2715 GENTITY_ID = 1200 GNVAL = RM105_143 ... this entry indicates that "RM105_143" is a name with identified by unique id = 2715 in the GEMS_NAMES table, that is a name of the entity identified by the foreign key unique id = 1200 pointing to the GEMS_Entity table.
Table 2. GEMS_NAMES table
| Fieldname | Description | Type |
|---|---|---|
| gnames_id | Long, primary key | Unique ID of the Name |
| gentity_id | Long, not null | Unique ID of GEMS_ENTITY to which the name belongs |
| class_id | Long, foreign key to the GMS_CLASS table, not null | Class of Name |
| term_id | Long, foreign key to OMS | If the name type belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class. |
| gn_value | Varchar(255), not null | Name Value |
| gn_location_id | Long | Location where Name was assigned |
| gevidence_id | Long | Unique ID of GEMS_EVIDENCE documenting the support of this name assignment |
GEMS_ANNOTATION
The GEMS_ANNOTATION table contains the values of GEMS_ENTITY feature annotation (i.e. GCP Feature data attached to the Entity).
Table 3. GEMS_ANNOTATION table
| Fieldname | Description | Type |
|---|---|---|
| gannotation_id | Long, primary key | Unique ID for the annotation |
| class_id | Long, foreign key to the GMS_CLASS table, not null | Unique ID for the "type" of annotation as described in the gms_class table |
| term_id | Long, foreign key to OMS | If the annotation type belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class. |
| gentity_id | Long foreign key to the GEMS_ENTITY primary key, not null | Unique ID for the annotated GEMS_ENTITY |
| gfmetadata_id | Long foreign key to the GEMS_FEATURE_METADATA primary key , not null | Unique Property ID of the annotation type |
| ga_value | Varchar(255), not null | Value, encoded as a string |
| ga_value_id | Long foreign key to a pertinent value table | May points to a pertinent entry in another pertinent table, based on the value class_id, if the value cannot be adequately represented in a simple string. |
| ga_index | Long | Sequence number for a Feature value series (where the given annotation Feature value is a "vector" of values; rank 0 applies for a single valued feature) |
GEMS_FEATURE_METADATA
GEMS_FEATURE_METADATA table defines the Feature type (a.k.a. ICIS "property"), unit (a.k.a. ICIS scale) and supportingEvidence (a.k.a. ICIS method) associated with a given type of GEMS_ENTITY annotation (i.e. GEMS_ANNOTATION entry gmetadata_id foreign key).
Table 4. GEMS_FEATURE_METADATA table
| Fieldname | Description | Type |
|---|---|---|
| gfmetadata_id | Long, primary key | Unique ID for the combination of annotation Feature type, unit and supportingEvidence |
| class_id | Long, foreign key to the GMS_CLASS table, not null | Unique ID for the "type" of annotated Feature as described in the gms_class table |
| term_id | Long, foreign key to OMS | If the metadata type belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class. |
| unit_id | Long, foreign key to the GMS_CLASS table, not null | Unique ID for the "unit" of the value of the annotated Feature as described in the gms_class table |
| gevidence_id | Long foreign key into GEMS_EVIDENCE table, not null | Unique ID for the GEMS_EVIDENCE associated with the Feature |
GEMS_EVIDENCE
The GEMS_EVIDENCE table documents the Evidence supporting a given GEMS_ANNOTATION assertion, including the source of the evidence (e.g. publication, study, analysis program, etc.), observation_date, score and the identity of the curator who recorded the evidence.
Table 5. GEMS_EVIDENCE table
| Fieldname | Description | Type |
|---|---|---|
| gevidence_id | Long, primary key | Unique ID for the Evidence entry |
| class_id | Long, foreign key to the GMS_CLASS table, not null | Specifies the data type of the source_id (i.e. table identification for the source_id foreign key, e.g. BIBLIOREFS, STUDY, etc) |
| term_id | Long, foreign key to OMS | If the evidence type belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class. |
| userid | Long foreign key into the ICIS users table, not null | Curator of the specific evidence |
| source_id | Long, not null | Unique id of the pertinent entry in the evidence source table (e.g. BIBLIOREFS, STUDY, etc) |
| ge_date | Long, not null | Date when the Feature Evidence was asserted |
| ge_score | varchar(64) | Some source specific measurement of statistical confidence of the source assertion of the annotation Feature value |
GEMS_LIST
GEMS_LIST table stores the collection details of markers, genes or alleles sets. Only the list details are stored in this table and actual elements of the list are stored in the GEMS_LIST_ELEMENTS table. A list of lists may be represented by indicating the ID of the list it belongs to in the glist_parent column.
Ex.
glist_id = 3204 glist_name = phi100175 glist_parent = 0 glist_id = 3381 glist_name = phi1 glist_parent = 3204 glist_id = 3382 glist_name = phi2 glist_parent = 3204 The gene phi100175 list contains locus phi1 and phi2
Table 6. GEMS_LIST table
| Fieldname | Description | Type |
|---|---|---|
| glist_id | Long, primary key | Unique List ID |
| gl_name | varchar(50), not null | Name of the list |
| dataset_id | INT DEFAULT NULL | The id of the dataset (effect) to which this list is associated if part of a study |
| class_id | Long, foreign key to the GMS_CLASS table, not null | Unique ID for the "type" of list as described in the gms_class table |
| term_id | Long, foreign key to OMS | If the list type belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class. |
| gl_date | Long, not null | Date the list was created (format:YYYMMDD) |
| userid | Long foreign key into the ICIS users table, not null | Curator of the list |
| gl_description | varchar(255), not null | Title or description of the list |
| gl_parent | Long | ID for the parent list |
| gl_status | Long | Status of list or folder: 0=Folder, 1=Open List, 1x = hidden, 1xx = locked, 1xxx= final, 9= Deleted |
Specific (scientific) Entity Tables
All the tables below extend the GEMS_ENTITY concept and as such the primary key for each of these tables is the same as the coordisponding gentity_id in the GEMS_ENTITY table. Please note the class_id field is duplicated in all the sub-tables for entity. In some DBDS, this duplication can be performed through inheritance.
GEMS_GENOMIC_FEATURE
The GEMS_GENOMIC_FEATURE table is redundant. All Genomic feature entities can be found in the GEMS_MAP_FEATURE table, but with the class: GenomicFeature. The Genomic Feature can have a list of Genomic Feature Variants, these can be found in the GEMS_DETECTED_VARIANT table, when the value of the gfeature_id is not null
GEMS_LINKAGE_MAP
GEMS_LINKAGE_MAP table contains information on genetic, physical, or other kinds of LinkageMap objects (see GCP genomic map domain model description).
Table 7. GEMS_LINKAGE_MAP table
| Fieldname | Description | Type |
|---|---|---|
| glmap_id | Long, primary key | Unique ID for the LinkageMap |
| glm_name | varchar(50), not null | map name |
| class_id | Long, foreign key to the GMS_CLASS table, not null | Is a controlled vocabulary designating the class of the map |
| term_id | Long, foreign key to OMS | If the map type belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class. |
| glm_parent_id | Long foreign key pointing reflexively into the GEMS_LINKAGE_MAP | Unique ID for the LinkageMap within which the given LinkageMap entry resides. For most linkage groups, this is the parent chromosome map. For chromosome maps, the given genome map (set). |
GEMS_MAP_FEATURE
GEMS_MAP_FEATURE table contains information on genetic, physical, or other kinds of GCP MapFeature (see GCP genomic map domain model description). The table also hold Genomic Features, which are indicated by the class 'GenomicFeature'.
Note: This table was previously called GEMS_LOCUS.
Table 8. GEMS_MAP_FEATURE table
| Fieldname | Description | Type |
|---|---|---|
| gmfeature_id | Long, primary key | Unique ID for the a map feature |
| gmf_name | varchar(50), not null | Name of the map feature |
| class_id | Long, foreign key to OMS | If the name belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class. |
| term_id | Long, foreign key to OMS | If the map feature belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class. |
| glmap_id | Long foreign key to the LinkageMap, not null | Unique ID for the a LinkageMap |
| gmf_position | Position in the Map | |
| gmf_start | Double | Start position in the Map; the units are defined by the map class |
| gmf_stop | Double | Stop position in the Map; the units are defined by the map class |
GEMS_GENOMIC_FEATURE_DETECTOR
GEMS_GENOMIC_FEATURE_DETECTOR table documents the biologically specific reagents/information required to detect a specified subset of GenomicFeature entities in a genome.
Table 9. GEMS_GENOMIC_FEATURE_DETECTOR table
| Fieldname | Description | Type |
|---|---|---|
| gfdetector_id | Long, primary key | Unique ID of the Genomic Feature Detector |
| gfd_name | varchar(50), not null | Name of the feature detector |
| class_id | | |
| term_id | Long, foreign key to OMS | If the detector belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class. |
GEMS_GENOMIC_FEATURE_DETECTION_PROTOCOL
GEMS_GENOMIC_FEATURE_DETECTION_PROTOCOL documents all the generic (non-biological) reagents and methodology for using instances of GenomicFeatureDetector to identify instances of GenomicFeature experimentally. Such reagents and protocols can typically be reused with many distinct GenomicFeatureDetector instances, hence, are reusable across several GenomicFeatureAssays. Given that all of the documentation of this object may be represented in the GEMS_ENTITY table and its table relationships, the GEMS_GENOMIC_FEATURE_DETECTION_PROTOCOL may be represented solely as a subclass indexed set of entries in the GEMS_ENTITY table, hence this table is redundant (i.e. doesn't need to be implemented directly).
Table 10. GEMS_GENOMIC_FEATURE_DETECTION_PROTOCOL table
| Fieldname | Description | Type |
|---|---|---|
| gfdprotocol_id | Long, primary key | Unique identifier of the given protocol |
| gfdp_name | varchar(50), not null | Name of the protocol |
| class_id | | |
| term_id | Long, foreign key to OMS | If the protocol belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class. |
GEMS_DETECTED_VARIANT
GEMS_DETECTED_VARIANT stores information on molecular variants associated with GenomicFeature instances that are detected by GenomicFeatureDetector systems. Each molecular variant is uniquely identified by the gvariant_id field. Each gvariant_id is associated with the GenomicFeatureDetector that resolves it, using the gfdetector_id foreign key. If analysis can assign the specific molecular variant to a specified GenomicFeature, then the gfeature_id field that links the variant to an entry in the GEMS_GENOMIC_FEATURE table (this makes the entry a full GCP GenomicFeatureVariant instance).
Table 11. GEMS_DETECTED_VARIANT table
| Fieldname | Description | Type |
|---|---|---|
| gdvariant_id | Long, primary key | Unique Molecular Variant ID |
| gdv_name | varchar(50), not null | Name of the molecular variant |
| class_id | | |
| term_id | Long, foreign key to OMS | If the detected variant belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class. |
| gfdetector_id, not null | Long foreign key into the GEMS_GENOME_FEATURE_DETECTOR table | GenomicFeatureDetector that identified the variant |
| gmfeature_id | Long foreign key into the GEMS_MAP_FEATURE table | GenomicFeature to which the variant is specifically assigned (if any). If defined the detected variant is a Genome Feature Variant |
Linkage and Data Tables
GEMS_LIST_ELEMENTS
The actual list elements of the lists in the gems_list are stored in the GEMS_LIST_ELEMENTS table.
Ex.
glist_id = 3381 glelement_name = phi1_1 gelement_entry_id = 1 glist_id = 3381 glelement_name = phi1_2 gelement_entry_id = 2 locus phi1 with glist_id 3381 in the gems_list table, contains alleles phi1_1 and phi1_2.
Table 12. GEMS_LIST_ELEMENTS table
| Fieldname | Description | Type |
|---|---|---|
| glist_id | Long, primary key | Unique List ID |
| gentity_id | Long foreign key to the GEMS_ENTITY table, not null | Unique ID for the GEMS_ENTITY to which the list belongs |
| gle_name | varchar(50) | Name of this element of the list |
| gle_index | Long, not null | Sort order of this element |
| gle_status | Long | Status of the entry: 0 = active, 5 = textline value, 9 = deleted |
GEMS_PUBLICATION
The GEMS_PUBLICATION table is the join table between GEMS_ENTITY and the ICIS BIBLIOREFS table.
Table 13. GEMS_PUB table
| Fieldname | Description | Type |
|---|---|---|
| gentity_id | Long foreign key to the GEMS_ENTITY table, not null | Unique ID for the GEMS_ENTITY described within the given publication |
| ref_id | Long foreign key to the ICIS BIBLIOREFS table, not null | Unique ID of the publication |
GEMS_GENOMIC_FEATURE_ASSAY
GEMS_GENOMIC_FEATURE_ASSAY (Polymorphic Detector) contains the different combination of GenomicFeatureDetector with GenomicFeatureDetectionProtocol. That is, a GenomicFeatureDetector can have one or more applied conditions (or protocols) for experimental assay. A protocol/condition can also be used for different GenomicFeatureDetector instances. This creates a many-to-many relationship between the markers and protocols, modeled by this join table.
Table 14. GEMS_GENOMIC_FEATURE_ASSAY table
| Fieldname | Description | Type |
|---|---|---|
| gfassay_id | Long, primary key | Unique identifier |
| gfdetector_id | Long foreign key into the GEMS_GENOMIC_FEATURE_DETECTOR table, not null | Unique detector identifier |
| gfdprotocol_id | Long foreign key into the GEMS_GENOMIC_FEATURE_DETECTION_PROTOCOL table, not null | Unique detector identifier |
GEMS_GENOTYPE_SCORES
The GEMS_GENOTYPE_SCORES table contains one entry for a single genotyping score of a specific DetectedVariant:' ("allele") of a GenomicFeatureDetector:("marker"). This table can be used in several ways depending on the type of the dataset (see Examples).
See GEMS data migration from DMS database to migrate GEMS data into the GEMS_GENOTYPE_SCORES table from a DMS database.
Table 15.GEMS_GENOTYPE_SCORES table
| Fieldname | Description | Definition (MySQL) |
|---|---|---|
| dpid | Unique identifier for the data point. | INT PRIMARY KEY NOT NULL AUTO_INCREMENT |
| ounitid1 | The id of the observation unit from which the first detected variant id was extracted. If this field is null the detected variant was entered directly in to this table and not via the standard DMS structure. | INT DEFAULT NULL, FOREIGN KEY oindex(ounitid) |
| ounitid2 | The id of the observation unit from which the second detected variant id was extracted. If this field is null the detected variant was entered directly in to this table and not via the standard DMS structure or only one observation was recorded, e.g. when the locus is homozygote or not detected. | INT DEFAULT NULL, FOREIGN KEY oindex(ounitid) |
| studyid | The id of the study to which this data point belongs. | INT DEFAULT NULL, FOREIGN KEY study(studyid) |
| dataset_id | Unique identifier of the respresation that is the dataset for this data point. | INT NOT NULL DEFAULT 0, FOREIGN KEY resprestn(represno) |
| sample_id | Reference to sample from which is score was recorded in IMS. This is usually a gid in a local sample tracking database. If the sample was not tracked in IMS then this value should be null. | INT DEFAULT 0 |
| individual | The entry number of the sample with in a set of germplasm with the same gid. This should be a consecutive numbering scheme starting at 1 and ending at n, where n is the number of entries for this gid in the study. If only one sample or entry is genotyped per germplasm then this value should be null. | INT DEFAULT NULL |
| gid | Reference to germplasm in which score was recorded. | INT NOT NULL DEFAULT 0, FOREIGN KEY germplsm(gid) |
| gfdetector_id | Reference to genomic feature detector used. | INT NOT NULL DEFAULT 0, FOREIGN KEY gems_genomic_feature_detector(gfdetector_id) |
| gfassay_id | Reference to genomic feature assay used. Used only if genomic feature detector can be used with more than on genomic feature protocol. | INT NOT NULL DEFAULT 0, FOREIGN KEY gems_genomic_feature_assay(gfassay_id) |
| gdvariant_label | An optional label that can used in visualisation software to represent the allele or marker state | VARCHAR(50) DEFAULT NULL |
| gdvariantno | A 1-based index that can be used to order alleles in a specific marker x genotype combination when different alleles for a specific germplasm are held in a separate row. 0 indicates that allele number is not relevant and that all the alleles for a specific germplasm are recorded in a single row. | INT NOT NULL DEFAULT 0, FOREIGN KEY |
| gdvariant_id1 | Reference to first or only detected variant scored. | INT DEFAULT 0, FOREIGN KEY gems_detected_variant(gvariant_id) |
| gdvariant_id2 | Reference to second detected variant scored if all the alleles for a specific germplasm are recorded in a single row. | INT DEFAULT 0, FOREIGN KEY gems_detected_variant(gvariant_id) |
| incidence | The incidence of the first detected variant in the specified germplasm for the given study, between 0 and 1, inclusive. A negative value indicates that the genotype was measured using this genomic feature detector in the study, but no incidence could be determined. In the case of complete matrices a null value indicates that the genotype was not measured using this genomic feature detector in this study. In a sparse matrix the value should never be null. If a second detected variant is recorded, the value of this field is inrelavent and mostly likely be set to null | DOUBLE UNSIGNED DEFAULT NULL |
| quality | A optional quality score for this data point, between 0 and 1, inclusive | DOUBLE UNSIGNED DEFAULT NULL |
The following combination of fields must always be unique regardless of the type of data or which optional fields are filled: dataset_id, sample_id, individual, gid, gdvariantno. Duplicate combinations of the these field wil either be ignored or generate an error on insert.
GEMS_MAP_DISTRIBUTIONS
A Map Distribution is a type of Map Feature and as such is recorded in the GEMS_MAP_FEATURE table. Tthe GEMS_MAP_DISTRIBUTIONS table provides the actual data (scores) for the distribution at each point (position) in the distribution.
Table 16. GEMS_MAP_DISTRIBUTION table
| Fieldname | Description | Type |
|---|---|---|
| gmdistribution_id | Long, primary key | Unique data point id |
| gmfeature_id | Long foreign key to the GEMS_MAP_FEATURE table, not null | The id of the map feature (distribution) to which this score is associate |
| gmd_score | Double not null | The score or value of the distributiuon at the given position; the units are defined by the map feature class |
| gmd_position | Double, not null | The position within the map feature, relative to the start of the map feature; the units are defined by the map class |
MySQL Scripts for ICIS 6.0 schema
Scripts for ICIS GEMS 6.0 schema can be downloaded at cropforge. DMS 6.0, GMS 6.0 and OMS 6.0 script is also available at cropforge. Separate scripts were created for creating GEMS tables, creating GEMS indices, dropping of GEMS tables and creating Foreign ke ys.
Note: To create foreign keys in GEMS, requires:
(1) DMS and GMS tables must be on the same database because there are fields in GEMS that references fields from DMS and GMS. An example is the field class_id which references the class_if field in the GMS _CLASS table which is in the GMS database.
(2) Indices in fields (both in GEMS and table that will be reference) included in the relationship must be in placed.
See Data Migration from GEMS5.4 to GEMS 6.0 To migrate data from GEMS 5.4 to GEMS 6.0
Semantic Correspondence between ICIS GEMS 5.4 and ICIS GEMS 6.0
The table below attempts to map the tables and attributes of the ICIS GEMS 5.4 schema onto the above ICIS GEMS 6.0 schema.
Table Mappings
Table 17. Approximate Semantic Mapping of ICIS GEMS 5.4 Tables onto ICIS GEMS 6.0 Tables
| ICIS 5.4 Table | ICIS 6.0 Table |
|---|---|
| no counterpart in 5.4 | GEMS_ENTITY |
| GEMS_NAMES | GEMS_NAMES |
| no counterpart in 5.4 | GEMS_PUB (join table between GEMS_ENTITY and ICIS_BIBREFS) |
| GEMS_COMP | GEMS_ANNOTATION used at GEMS_ENTITY level |
| GEMS_PROP | GEMS_FEATURE_METADATA + Controlled vocabulary term foreign key for Property itself (Chado cvterm table) |
| GEMS_SCALE | Controlled vocabulary term foreign key (Chado cvterm table) |
| GEMS_METHOD | GEMS_EVIDENCE + Controlled vocabulary term foreign key for Method itself (Chado cvterm table) |
| GEMS_MARKER_DETECTOR | GEMS_GENOMIC_FEATURE_DETECTOR |
| GEMS_MV | GEMS_DETECTED_VARIANT |
| GEMS_PD | GEMS_GENOMIC_FEATURE_ASSAY |
| GEMS_PD_COMP | GEMS_GENOMIC_FEATURE_DETECTION_PROTOCOL |
| no counterpart in 5.4 | GEMS_LINKAGE_MAP |
| GEMS_LOCUS | GEMS_GENOMIC_FEATURE for DNA identity with map location information stored in GEMS_LOCUS |
Field Mappings
Table 18. Semantic Mapping of ICIS GEMS 5.4 onto ICIS GEMS 6.0
| ICIS 5.4 Table | ICIS 5.4 Field | ICIS 6.0 Table | ICIS 6.0 Field |
|---|---|---|---|
| GEMS_NAMES | gnid | GEMS_NAMES | gnames_id |
| GEMS_NAMES | gobjid | GEMS_NAMES | gentity_id |
| not present | n/a | GEMS_NAMES | source_id (foreign key to GEMS_EVIDENCE table) |
| GEMS_NAMES | gntype | GEMS_NAMES | gntype |
| GEMS_NAMES | gnlocn | GEMS_NAMES | gnlocn |
| GEMS_NAMES | gndate | GEMS_EVIDENCE | observation_date |
| GEMS_NAMES | gnuid | GEMS_EVIDENCE | userid (foreign key to ICIS users table) |
| not present | n/a | GEMS_EVIDENCE | source_type |
| GEMS_NAMES | gnref | GEMS_EVIDENCE | source_id (foreign key to pertinent table, as per source_type) |
| GEMS_NAMES | gnval where gnstat = 1 (alternative name) | GEMS_NAMES | gnval |
| GEMS_NAMES | gnval (where gnstat = 0 (default/preferred name) and gobjtype='gems_marker_detector') | GEMS_GENOMIC_FEATURE_DETECTOR | gfdetector_name |
| GEMS_NAMES | gnval (where gnstat = 0 (default/preferred name) and gobjtype='gems_mv') | GEMS_DETECTED_VARIANT | gvariant_name |
| GEMS_COMP | cid | GEMS_ANNOTATION | gannot_id |
| GEMS_COMP | condid | GEMS_ANNOTATION | type_id (foreign key to cvterm table) |
| GEMS_COMP | comid | GEMS_ENTITY | type |
| GEMS_COMP | pid | GEMS_ANNOTATION | gmetadata_id |
| GEMS_COMP | comval | GEMS_ANNOTATION | value |
| GEMS_COMP | comuid | GEMS_EVIDENCE | userid |
| GEMS_COMP | comref | GEMS_EVIDENCE | source_id (foreign key to ICIS BIBREFS) |
| GEMS_PROP | pid | GEMS_FEATURE_METADATA | gfmetadata_id |
| GEMS_PROP | propid | GEMS_FEATURE_METADATA | type_id (foreign key to cvterm table) |
| GEMS_PROP | propname | Chado cvterm table | name in cvterm table via type_id |
| GEMS_PROP | scaleid | GEMS_FEATURE_METADATA | unit_id (foreign key to cvterm table) |
| GEMS_PROP | methid | GEMS_FEATURE_METADATA | gevidence_id (foreign key to GEMS_EVIDENCE table) |
| GEMS_METHOD, GEMS_SCALE | all fields | Chado cvterm table | Controlled vocabulary terms with pertinent documentation |
| GEMS_LOCUS | locusid | GEMS_LOCUS | glocus_id (see also GEMS_GENOMIC_FEATURE below) |
| GEMS_LOCUS | chr | GEMS_LOCUS | glmap_id (foreign key to GEMS_LINKAGE_MAP table - chromosome treated as another map) |
| GEMS_LOCUS | pos | GEMS_LOCUS | glocus_position |
| not present | n/a | GEMS_GENOMIC_FEATURE | genomic_locus_id foreign key to GEMS_LOCUS |
| GEMS_MARKER_DETECTOR | mdid | GEMS_GENOMIC_FEATURE_DETECTOR | gfdetector_id |
| GEMS_MARKER_DETECTOR | mauid | GEMS_EVIDENCE | userid |
| GEMS_MARKER_DETECTOR | maref | GEMS_EVIDENCE | source_id |
| GEMS_MARKER_DETECTOR | all other fields | GEMS_ANNOTATION | detector metadata recorded as GEMS_ANNOTATION of the corresponding GEMS_ENTITY (with associated GEMS_FEATURE_METADATA and supporting GEMS_EVIDENCE) |
| GEMS_MV | mvid | GEMS_DETECTED_VARIANT | gvariant_id |
| GEMS_MV | locusid | GEMS_DETECTED_VARIANT | gfeature_id (foreign key to GEMS_GENOMIC_FEATURE table) |
| GEMS_MV | mdid | GEMS_DETECTED_VARIANT | gfdetector_id (foreign key to GEMS_GENOMIC_FEATURE_DETECTOR table) |
| GEMS_MV | mvtype | GEMS_ENTITY | type |
| GEMS_MV | all other fields | GEMS_ANNOTATION | GEMS_DETECTED_VARIANT metadata recorded as GEMS_ANNOTATION of the corresponding GEMS_ENTITY (with associated GEMS_FEATURE_METADATA and supporting GEMS_EVIDENCE) |
| GEMS_PD | pdid | GEMS_GENOMIC_FEATURE_ASSAY | gfassay_id |
| GEMS_PD | condid | GEMS_GENOMIC_FEATURE_ASSAY | gfdprotocol_id (foreign key to GEMS_GENOMIC_FEATURE_DETECTION_PROTOCOL table) |
| GEMS_PD | mdid | GEMS_GENOMIC_FEATURE_ASSAY | gfdetector_id (foreign key to GEMS_GENOMIC_FEATURE_DETECTOR table) |
| GEMS_PD | all other fields | GEMS_ANNOTATION | GEMS_GENOMIC_FEATURE_ASSAY metadata recorded as GEMS_ANNOTATION of the corresponding GEMS_ENTITY (with associated GEMS_FEATURE_METADATA and supporting GEMS_EVIDENCE) |
| GEMS_PD_COMP | pdcomp | GEMS_GENOMIC_FEATURE_DETECTION_PROTOCOL | gfdprotocol_id |
| GEMS_PD_COMP | pdid | not present | not needed, since GEMS_GENOMIC_FEATURE_ASSAY points to the protocol, not vice-versa |
| GEMS_PD_COMP | all other fields | GEMS_ANNOTATION | GEMS_GENOMIC_FEATURE_DETECTION_PROTOCOL metadata recorded as GEMS_ANNOTATION of the corresponding GEMS_ENTITY (with associated GEMS_FEATURE_METADATA and supporting GEMS_EVIDENCE) |
| not present | n/a | GEMS_PUB | gpub_id is a foreign key into the ICIS BIBREFS table |
| not present | n/a | GEMS_PUB | gentity_id is a foreign key into the GEMS_ENTITY |
Examples of Data Input
See ICIS 5.4 GEMS Narrative Examples of Data Input.
Data Output
See ICIS 5.4 GEMS Narrative Examples of Data Output.
Summary
See also ICIS 5.4 GEMS Narrative Summary.





