TDM Gene Management System 6.0

From ICISWiki

Jump to: navigation, search

Back to Main Page > ICIS 6.0 Home Page > ICIS 6.0 Models & Schemata

MANAGEMENT OF GENETIC DATA

Contents

Acknowledgment

Several sections of this page simply cross reference their counterparts on the ICIS 5.4 GEMS narratives. The ICIS 6.0 GEMS schema, however, is modified and follows more closely the nomenclature and architecture of the Generation Challenge Programme Demeter domain model for genotyping data.

Introduction

See ICIS 5.4 GEMS Narrative Introduction.

Molecular Data

See ICIS 5.4 GEMS Narrative Molecular Data.

Molecular Variants

See ICIS 5.4 GEMS Narrative Molecular Variants.

Derived Data

See ICIS 5.4 GEMS Narrative Derived Data.

Genotype Data Model

The GEMS extended provides for scoring genotyping data (in addition to DMS), either as a data warehouse for data held in DMS or for large datasets held only in GEMS. This concept is the genoptype data point, which has the following attributes:

  • Identifier - Unique identifier of the data point score.
  • Observation Unit(s) - Optional observation unit (or observation unit if more than one 'Detected Variantwas record in the same observation unit) from which the data point was extracted, in known
  • Study - The study in which the data point was measured.
  • Dataset - The study effect representation in which where the data is to be held. See below.
  • Sample - Optional identifier of the DNA sample, if not defined it was assumed that the sample was not recorded independently from the germplasm entry. If there is only one entry per germplasn entry (gid) in a specific study then the gid of originating germplasm can be used as the sample id, otherwise it must be null or a unique identifier for the sample and not the originating germplasm (gid).
  • Individual - Optional individual (or plant) number using consecutive numbering scheme starting at 1 and ending at n, where n is the number of individuals genotyped for a specific germplasm entry (gid) in the study. Must be included if more than one individual is genotyped in specific study for a specific germplasm entry (gid).
  • Germplasm - The germplasm entry measured in this data point.
  • Genome Feature Detector - The genome feature detector (marker) used in this data point. This attribute is redundant since it is possible to determine the Genome Feature Detector from the Detected Variant, however it is included here to speed up queries by Genome Feature Detector.
  • Detected Variant Number - An index of the Detected Variant, which is only used if one database is used for each Detected Variant.
  • Detected Variant(s) - The Detected Variant or Detected Variants measured or found by the Genome Feature Detector for this data point
  • Incidence - The indicates the frequency of that the Detected Variant in the sample relative to other alleles. In the case of dominant markers or co-dominant markers using a genotyping platform where it is not possible to determine the relative frequency of a Detected Variant for a given Genome Feature Detector, this incidence would be '1' indicating the presence of only this allele or '0' indicating the absence of the allele. Where the genotyping platform in use can determine the relative frequency of the allele present in the sample, the incidence will be the allele frequency with '1' indicating the presence of only this allele in homozygote form '0.5' indicating the presence of only this allele in hetrozygote form or '0' indicating the absence of the allele. In the case of bulked DNA samples taken from segregating or open pollinated variety it may be possible to have a incidence (relative frequency) ranging from 0 to 1, inclusive. In all cases a negative value indicates that the genotype was measured in the study, but no incidence could be determined. Different negative numbers could be used to denote different types of this 'missing' data, these would be recorded in the study metadata. However -1 should be reserved for 'missing' data due to a unknown problem. A Null value indicates that the genotype was not measured in this study and is used in some circumstances to 'pad' the data so that it is a complete matrix of data.

Gene Management System (GEMS) and Data Management System (DMS)

Whether the genotyping data is entered directly into GEMS or or is copied into GEMS from DMS, a new Representation in DMS must be added. This Representation provides details on which of the concepts above are recorded in the dataset. If the dataset was previously loaded into DMS there will be two Representations, one for the original workbook data and one for the newly added Representation. The type of the new added Representation need to be decided. The following factors need to be record. Some are optional and will only be present if the cooresponding data is available in the gems_genotyping_scores table

Factor or Variant Name
Trait
Scale
Method
Mandatory
Factor/Variate
Observation Unit 1



No1

Observation Unit 2



No2

Sample ID3
GERMPLASM IDENTIFICATION(251)
GID(91)
DNA SAMPLE(495)
No
Factor
Individual
GERMPLASM IDENTIFICATION(251)
Entry Number(103)
Not Specified (?)
No4
Factor
Germplasm ID
GERMPLASM IDENTIFICATION(251)
GID(91)
Seed Source(92)
Yes
Factor
Genomic Feature Assay ID
MARKER_DETECTOR(503)
MD_ID(516)
GEMSID(497)
Yes
Factor
Detected Variant Label
MOLECULAR_VARIANT(501)
ALLELE LABEL (?)
Not Specified (?)
No
Variate
Detected Variant Number
MOLECULAR_VARIANT(501)

ALLELE NUMBER(504)
Count(521)
No5
Factor
Detected Variant 1
MOLECULAR_VARIANT(501)
MVID(503)
First Variant (?)
Yes
Variate/ Factor8
Detected Variant 2
MOLECULAR_VARIANT(501)
MVID(503)
Second Variant (?)
No6
Variate
Incidence9 MOLECULAR_VARIANT(501)
PRESENT/ABSENT(508)
Not Specified(508)
No7 Variate
Frequency9
MOLECULAR_VARIANT(501)
FREQUENCY (513)
Not Specified(519)
No
Variate
Quality CONFIDENCE OF SCORE
RATIO (?)
Not Specified (?)
No Variate

1Required if data transferred from DMS.

2Required if data transferred from DMS and data points comprises more than one observation unit.

3Usually only recorded in a local database.

4Required if more than one individual with the same gid is recorded and no separate sample id is recorded.  

5Required if more than row is used to record Detected Variants

6Required if co-dominant Detector is used and only one row is used to record Detected Variants

7Required if dominant Detector is used or if more than one row is used to record Detected Variants

8Concidered a Factor if Incidence is recorded

9Incidence and Frequence and mutually exclusive

The Gene Management System Table Structure

Generation Challenge Programme Domain Models

The ICIS 6.0 schema is largely inspired by the structure and class nomenclature of the Generation Challenge Programme Genotype and Genomic Mapping domain models. These models (as Unified Modeling Language diagrams, July 2007 Release 1.8) are replicated here below.

Figure 1. GCP Genotype Model

Image:GCP Genotype Model.png

Figure 2. GCP Genomic Mapping Model

Image:GCP LinkageMap Model.png

GEMS uses the following table structure to store details on genotype domain model objects (table names and attributes follow the Generation Challenge Programme convention).

Figure 3. ICIS GEMS 6.0 Schema

Image:GEMS6.0.png

Notation used:

Image:ICIS GEMS 6.0 Schema v2 notation.jpg

Table Relationships

The central table of GEMS 6.0 schema is the Entity table (gems_entity) which is the superclass which records all generic information about any data object in the GEMS. The class_id field of this table link to the gms_class table that each major class, sub class or type of actual GEMS entity: LinkageMap, GenomicFeature, etc.

Table Definitions

Generic Entity Tables

GEMS_ENTITY

The GEMS_ENTITY table contains one entry for every major entity object managed by GEMS and keeps track of all generic information about that Entity. In this way the GEMS_ENTITY table is in most cases a direct implementation of the GCP Entity class. In other cases the sub-tables of the GEMS_ENTITY table may represent, at least SimpleFeature objects, which have a specific type or class.

The important rule to note is that although the GEMS_ENTITY table autoincrements its primary keys, any and all subclassed objects (and corresponding table entries) share the identical primary key to the GEMS_ENTITY to which they correspond (e.g. a given entry in the GEMS_LINKAGE_MAP table will have the same primary key value as its corresponding entry in the GEMS_ENTITY table). All the tables below, except the linkage or data tables defined last 'extend' this table. Please note the class_id field is duplicated in all the sub-tables for entity. In some DBDS, this duplication can be performed through inheritance.

The GEMS_ENTITY table has relations with several other tables that provide for the identification, citations and annotation (see the GCP domain model for Entity) that describe the given GEMS_ENTITY entry.

Table 1. GEMS_ENTITY table

Fieldname Description Type
gentity_id Long, primary key, autoincrement Unique ID for the Entity
class_id Long, foreign key to the GMS_CLASS table The class, sub-class or type of the GEMS Entity
description varchar(255) Description of the specific GEMS Entity
GMS_CLASS (move to GMS)

The GMS_CLASS stores a list of various non-overlapping classes, sub classes and types for all ICIS Objects, largely described within the GCP Domain Model for Genotyping, including the following:

  • GenomicFeature: a specific segment (subsequence) of DNA in the genome (specified Germplasm) or generally within a class of organism(i.e. species, e.g. Oryza sativa) of interest. A GenomicFeature may (or may not) have a known location ("locus") on specified maps, a known content ("genotype") in specified Germplasm and/or a specified biological/technical biological (e.g. "gene") or technical (e.g. "SSR marker") function of interest.
  • LinkageMap: a genetic or physical map of genomic loci
  • Locus: experimentally observed assignment ("MapFeature") of a specified GenomicFeature to a specific LinkageMap
  • GenomicFeatureDetector: a specific biological reagent and/or information (e.g. PCR primers) that detect a specific subset of possible GenomicFeatures in a genome.
  • GenomicFeatureDetectionProtocol: non-biologically specific protocol for using a GenomicFeatureDetector in an experimental assay.
  • GenomicFeatureAssay: the potentially many-to-many join of GenomicFeatureDetector with available GenomicFeatureDetectionProtocols
  • DetectedVariant: one specified instance of GenomicFeature content ("allele") detected by a GenomicFeatureDetector. If the variant can be analytically interpreted to belong to a specified GenomicFeature, then it is considered in GCP domain model terms to be a GenomicFeatureVariant

The ontology_id and the root_term_id fields are corresponding GCP Ontologies ID (e.g. C0_500) and Term ID (e.g. 0000017 for protein based marker) from the GCP Crop Ontology Project.The parent_id gives the options to define sub-clases of classes. This is used in two possible ways. First the sub-class may define a smaller sub-tree of the ontology tree defined by the parent class. Second, the sub class may be defined in a separate ontolgy from the parent class. This second option in effect makes linkages between different ontologies and should only be used a short term solution to a fully defined single ontology.

Table x. GEMS_CLASS table

Fieldname Description Type
class_id Long, primary key Unique ID of the class
class_name varchar(50), not null Name of the class
ontology_id Long, foreign key to the OMS If known, the ontology ID of the ontology which defines are members of this class
root_term_id Long, foreign key to the OMS If known, the term ID which defines the Ontology subtree for all valid members
parent_id Long, foreign key to the class_id
If the class if in fact a sub-class, the parent_id gives a link to its parent ID. In this case ontology_id and term_id are not defined



GEMS_NAMES

GEMS_NAMES table serves as a catalogue of alternative names for example for genomic feature detectors and molecular variants. gnames_id is the unique identifier of a specific name and gentity_id is the unique ID of GEMS_ENTITY to which the name belongs. class_id is the 'class' or type of names to which the alternative name belong. gnlocn is location where Name was assigned (more information please)

Note: gnstat from GEMS 5.5 was deleted since only alternative names (gnstat=1) is stored in gems_names Ex.

GNAMES_ID = 2715
GENTITY_ID = 1200
GNVAL = RM105_143
...
this entry indicates that "RM105_143" is a name with identified by 
unique id = 2715 in the GEMS_NAMES table, that is a name of the entity identified 
by the foreign key unique id = 1200 pointing to the GEMS_Entity table. 

Table 2. GEMS_NAMES table

Fieldname Description Type
gnames_id Long, primary key Unique ID of the Name
gentity_id Long, not null Unique ID of GEMS_ENTITY to which the name belongs
class_id
Long, foreign key to the GMS_CLASS table, not null Class of Name
term_id
Long, foreign key to OMS
If the name type belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class.
gn_value Varchar(255), not null Name Value
gn_location_id Long Location where Name was assigned
gevidence_id Long Unique ID of GEMS_EVIDENCE documenting the support of this name assignment
GEMS_ANNOTATION

The GEMS_ANNOTATION table contains the values of GEMS_ENTITY feature annotation (i.e. GCP Feature data attached to the Entity).

Table 3. GEMS_ANNOTATION table

Fieldname Description Type
gannotation_id Long, primary key Unique ID for the annotation
class_id Long, foreign key to the GMS_CLASS table, not null Unique ID for the "type" of annotation as described in the gms_class table
term_id
Long, foreign key to OMS
If the annotation type belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class.
gentity_id Long foreign key to the GEMS_ENTITY primary key, not null Unique ID for the annotated GEMS_ENTITY
gfmetadata_id Long foreign key to the GEMS_FEATURE_METADATA primary key , not null Unique Property ID of the annotation type
ga_value Varchar(255), not null Value, encoded as a string
ga_value_id Long foreign key to a pertinent value table May points to a pertinent entry in another pertinent table, based on the value class_id, if the value cannot be adequately represented in a simple string.
ga_index Long Sequence number for a Feature value series (where the given annotation Feature value is a "vector" of values; rank 0 applies for a single valued feature)


GEMS_FEATURE_METADATA

GEMS_FEATURE_METADATA table defines the Feature type (a.k.a. ICIS "property"), unit (a.k.a. ICIS scale) and supportingEvidence (a.k.a. ICIS method) associated with a given type of GEMS_ENTITY annotation (i.e. GEMS_ANNOTATION entry gmetadata_id foreign key).

Table 4. GEMS_FEATURE_METADATA table

Fieldname Description Type
gfmetadata_id Long, primary key Unique ID for the combination of annotation Feature type, unit and supportingEvidence
class_id Long, foreign key to the GMS_CLASS table, not null Unique ID for the "type" of annotated Feature as described in the gms_class table
term_id
Long, foreign key to OMS
If the metadata type belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class.
unit_id Long, foreign key to the GMS_CLASS table, not null Unique ID for the "unit" of the value of the annotated Feature as described in the gms_class table
gevidence_id Long foreign key into GEMS_EVIDENCE table, not null Unique ID for the GEMS_EVIDENCE associated with the Feature


GEMS_EVIDENCE

The GEMS_EVIDENCE table documents the Evidence supporting a given GEMS_ANNOTATION assertion, including the source of the evidence (e.g. publication, study, analysis program, etc.), observation_date, score and the identity of the curator who recorded the evidence.

Table 5. GEMS_EVIDENCE table

Fieldname Description Type
gevidence_id Long, primary key Unique ID for the Evidence entry
class_id Long, foreign key to the GMS_CLASS table, not null Specifies the data type of the source_id (i.e. table identification for the source_id foreign key, e.g. BIBLIOREFS, STUDY, etc)
term_id
Long, foreign key to OMS
If the evidence type belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class.
userid Long foreign key into the ICIS users table, not null Curator of the specific evidence
source_id Long, not null Unique id of the pertinent entry in the evidence source table (e.g. BIBLIOREFS, STUDY, etc)
ge_date Long, not null Date when the Feature Evidence was asserted
ge_score varchar(64) Some source specific measurement of statistical confidence of the source assertion of the annotation Feature value


GEMS_LIST

GEMS_LIST table stores the collection details of markers, genes or alleles sets. Only the list details are stored in this table and actual elements of the list are stored in the GEMS_LIST_ELEMENTS table. A list of lists may be represented by indicating the ID of the list it belongs to in the glist_parent column.

Ex.

glist_id = 3204 
glist_name = phi100175
glist_parent = 0 

glist_id = 3381
glist_name = phi1
glist_parent = 3204

glist_id = 3382
glist_name = phi2
glist_parent = 3204

The gene phi100175 list contains locus phi1 and phi2

Table 6. GEMS_LIST table

Fieldname Description Type
glist_id Long, primary key Unique List ID
gl_name varchar(50), not null Name of the list
dataset_id INT DEFAULT NULL The id of the dataset (effect) to which this list is associated if part of a study
class_id Long, foreign key to the GMS_CLASS table, not null Unique ID for the "type" of list as described in the gms_class table
term_id
Long, foreign key to OMS
If the list type belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class.
gl_date Long, not null Date the list was created (format:YYYMMDD)
userid Long foreign key into the ICIS users table, not null Curator of the list
gl_description varchar(255), not null Title or description of the list
gl_parent Long ID for the parent list
gl_status Long Status of list or folder: 0=Folder, 1=Open List, 1x = hidden, 1xx = locked, 1xxx= final, 9= Deleted


Specific (scientific) Entity Tables

All the tables below extend the GEMS_ENTITY concept and as such the primary key for each of these tables is the same as the coordisponding gentity_id in the GEMS_ENTITY table. Please note the class_id field is duplicated in all the sub-tables for entity. In some DBDS, this duplication can be performed through inheritance.

GEMS_GENOMIC_FEATURE

The GEMS_GENOMIC_FEATURE table is redundant. All Genomic feature entities can be found in the GEMS_MAP_FEATURE table, but with the class: GenomicFeature. The Genomic Feature can have a list of Genomic Feature Variants, these can be found in the GEMS_DETECTED_VARIANT table, when the value of the gfeature_id is not null

GEMS_LINKAGE_MAP

GEMS_LINKAGE_MAP table contains information on genetic, physical, or other kinds of LinkageMap objects (see GCP genomic map domain model description).

Table 7. GEMS_LINKAGE_MAP table

Fieldname Description Type
glmap_id Long, primary key Unique ID for the LinkageMap
glm_name varchar(50), not null map name
class_id Long, foreign key to the GMS_CLASS table, not null Is a controlled vocabulary designating the class of the map
term_id
Long, foreign key to OMS
If the map type belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class.
glm_parent_id Long foreign key pointing reflexively into the GEMS_LINKAGE_MAP Unique ID for the LinkageMap within which the given LinkageMap entry resides. For most linkage groups, this is the parent chromosome map. For chromosome maps, the given genome map (set).
GEMS_MAP_FEATURE

GEMS_MAP_FEATURE table contains information on genetic, physical, or other kinds of GCP MapFeature (see GCP genomic map domain model description). The table also hold Genomic Features, which are indicated by the class 'GenomicFeature'.

Note: This table was previously called GEMS_LOCUS.

Table 8. GEMS_MAP_FEATURE table

Fieldname Description Type
gmfeature_id Long, primary key Unique ID for the a map feature
gmf_name varchar(50), not null Name of the map feature
class_id
Long, foreign key to OMS
If the name belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class.
term_id
Long, foreign key to OMS
If the map feature belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class.
glmap_id Long foreign key to the LinkageMap, not null Unique ID for the a LinkageMap
gmf_position Position in the Map
gmf_start Double Start position in the Map; the units are defined by the map class
gmf_stop Double Stop position in the Map; the units are defined by the map class
GEMS_GENOMIC_FEATURE_DETECTOR

GEMS_GENOMIC_FEATURE_DETECTOR table documents the biologically specific reagents/information required to detect a specified subset of GenomicFeature entities in a genome. 

Table 9. GEMS_GENOMIC_FEATURE_DETECTOR table

Fieldname Description Type
gfdetector_id Long, primary key Unique ID of the Genomic Feature Detector
gfd_name varchar(50), not null Name of the feature detector
class_id

term_id
Long, foreign key to OMS
If the detector belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class.


GEMS_GENOMIC_FEATURE_DETECTION_PROTOCOL

GEMS_GENOMIC_FEATURE_DETECTION_PROTOCOL documents all the generic (non-biological) reagents and methodology for using instances of GenomicFeatureDetector to identify instances of GenomicFeature experimentally. Such reagents and protocols can typically be reused with many distinct GenomicFeatureDetector instances, hence, are reusable across several GenomicFeatureAssays. Given that all of the documentation of this object may be represented in the GEMS_ENTITY table and its table relationships, the GEMS_GENOMIC_FEATURE_DETECTION_PROTOCOL may be represented solely as a subclass indexed set of entries in the GEMS_ENTITY table, hence this table is redundant (i.e. doesn't need to be implemented directly).

Table 10. GEMS_GENOMIC_FEATURE_DETECTION_PROTOCOL table

Fieldname Description Type
gfdprotocol_id Long, primary key Unique identifier of the given protocol
gfdp_name varchar(50), not null Name of the protocol
class_id

term_id
Long, foreign key to OMS
If the protocol belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class.


GEMS_DETECTED_VARIANT

GEMS_DETECTED_VARIANT stores information on molecular variants associated with GenomicFeature instances that are detected by GenomicFeatureDetector systems. Each molecular variant is uniquely identified by the gvariant_id field. Each gvariant_id is associated with the GenomicFeatureDetector that resolves it, using the gfdetector_id foreign key. If analysis can assign the specific molecular variant to a specified GenomicFeature, then the gfeature_id field that links the variant to an entry in the GEMS_GENOMIC_FEATURE table (this makes the entry a full GCP GenomicFeatureVariant instance).

Table 11. GEMS_DETECTED_VARIANT table

Fieldname Description Type
gdvariant_id Long, primary key Unique Molecular Variant ID
gdv_name varchar(50), not null Name of the molecular variant
class_id

term_id
Long, foreign key to OMS
If the detected variant belongs to a specific ontology, then the ontology term is given here. The term must be a sub-term (is a relationship) of the root term defined in the class.
gfdetector_id, not null Long foreign key into the GEMS_GENOME_FEATURE_DETECTOR table GenomicFeatureDetector that identified the variant
gmfeature_id Long foreign key into the GEMS_MAP_FEATURE table GenomicFeature to which the variant is specifically assigned (if any). If defined the detected variant is a Genome Feature Variant


Linkage and Data Tables

GEMS_LIST_ELEMENTS

The actual list elements of the lists in the gems_list are stored in the GEMS_LIST_ELEMENTS table.

Ex.

glist_id = 3381
glelement_name = phi1_1
gelement_entry_id = 1

glist_id = 3381
glelement_name = phi1_2
gelement_entry_id = 2

locus phi1 with glist_id 3381 in the gems_list table, contains alleles phi1_1 and phi1_2.

Table 12. GEMS_LIST_ELEMENTS table

Fieldname Description Type
glist_id Long, primary key Unique List ID
gentity_id Long foreign key to the GEMS_ENTITY table, not null Unique ID for the GEMS_ENTITY to which the list belongs
gle_name varchar(50) Name of this element of the list
gle_index Long, not null Sort order of this element
gle_status Long Status of the entry: 0 = active, 5 = textline value, 9 = deleted


GEMS_PUBLICATION

The GEMS_PUBLICATION table is the join table between GEMS_ENTITY and the ICIS BIBLIOREFS table.

Table 13. GEMS_PUB table

Fieldname Description Type
gentity_id Long foreign key to the GEMS_ENTITY table, not null Unique ID for the GEMS_ENTITY described within the given publication
ref_id Long foreign key to the ICIS BIBLIOREFS table, not null Unique ID of the publication



GEMS_GENOMIC_FEATURE_ASSAY

GEMS_GENOMIC_FEATURE_ASSAY (Polymorphic Detector) contains the different combination of GenomicFeatureDetector with GenomicFeatureDetectionProtocol. That is, a GenomicFeatureDetector can have one or more applied conditions (or protocols) for experimental assay. A protocol/condition can also be used for different GenomicFeatureDetector instances. This creates a many-to-many relationship between the markers and protocols, modeled by this join table.

Table 14. GEMS_GENOMIC_FEATURE_ASSAY table

Fieldname Description Type
gfassay_id Long, primary key Unique identifier
gfdetector_id Long foreign key into the GEMS_GENOMIC_FEATURE_DETECTOR table, not null Unique detector identifier
gfdprotocol_id Long foreign key into the GEMS_GENOMIC_FEATURE_DETECTION_PROTOCOL table, not null Unique detector identifier


GEMS_GENOTYPE_SCORES

The GEMS_GENOTYPE_SCORES table contains one entry for a single genotyping score of a specific DetectedVariant:' ("allele") of a GenomicFeatureDetector:("marker"). This table can be used in several ways depending on the type of the dataset (see Examples).

See GEMS data migration from DMS database to migrate GEMS data into the GEMS_GENOTYPE_SCORES table from a DMS database.

Table 15.GEMS_GENOTYPE_SCORES table

Fieldname Description Definition (MySQL)
dpid Unique identifier for the data point. INT PRIMARY KEY NOT NULL AUTO_INCREMENT
ounitid1 The id of the observation unit from which the first detected variant id was extracted. If this field is null the detected variant was entered directly in to this table and not via the standard DMS structure. INT DEFAULT NULL, FOREIGN KEY oindex(ounitid)
ounitid2 The id of the observation unit from which the second detected variant id was extracted. If this field is null the detected variant was entered directly in to this table and not via the standard DMS structure or only one observation was recorded, e.g. when the locus is homozygote or not detected. INT DEFAULT NULL, FOREIGN KEY oindex(ounitid)
studyid
The id of the study to which this data point belongs. 
INT DEFAULT NULL, FOREIGN KEY study(studyid)
dataset_id Unique identifier of the respresation that is the dataset for this data point. INT NOT NULL DEFAULT 0, FOREIGN KEY resprestn(represno)
sample_id Reference to sample from which is score was recorded in IMS. This is usually a gid in a local sample tracking database. If the sample was not tracked in IMS then this value should be null.
INT DEFAULT 0
individual The entry number of the sample with in a set of germplasm with the same gid. This should be a consecutive numbering scheme starting at 1 and ending at n, where n is the number of entries for this gid in the study. If only one sample or entry is genotyped per germplasm then this value should be null.  INT DEFAULT NULL
gid Reference to germplasm in which score was recorded. INT NOT NULL DEFAULT 0, FOREIGN KEY germplsm(gid)
gfdetector_id Reference to genomic feature detector used. INT NOT NULL DEFAULT 0, FOREIGN KEY gems_genomic_feature_detector(gfdetector_id)
gfassay_id
Reference to genomic feature assay used. Used only if genomic feature detector can be used with more than on genomic feature protocol.
INT NOT NULL DEFAULT 0, FOREIGN KEY gems_genomic_feature_assay(gfassay_id)
gdvariant_label
An optional label that can used in visualisation software to represent the allele or marker state
VARCHAR(50) DEFAULT NULL
gdvariantno A 1-based index that can be used to order alleles in a specific marker x genotype combination when different alleles for a specific germplasm are held in a separate row. 0 indicates that allele number is not relevant and that all the alleles for a specific germplasm are recorded in a single row. INT NOT NULL DEFAULT 0, FOREIGN KEY
gdvariant_id1 Reference to first or only detected variant scored. INT  DEFAULT 0, FOREIGN KEY gems_detected_variant(gvariant_id)
gdvariant_id2 Reference to second detected variant scored if all the alleles for a specific germplasm are recorded in a single row.
INT  DEFAULT 0, FOREIGN KEY gems_detected_variant(gvariant_id)
incidence The incidence of the first detected variant in the specified germplasm for the given study, between 0 and 1, inclusive. A negative value indicates that the genotype was measured using this genomic feature detector in the study, but no incidence could be determined. In the case of complete matrices a null value indicates that the genotype was not measured using this genomic feature detector in this study. In a sparse matrix the value should never be null. If a second detected variant is recorded, the value of this field is inrelavent and mostly likely be set to null
DOUBLE UNSIGNED DEFAULT NULL
quality
A optional quality score for this data point, between 0 and 1, inclusive
DOUBLE UNSIGNED DEFAULT NULL

The following combination of fields must always be unique regardless of the type of data or which optional fields are filled: dataset_id, sample_id, individual, gid, gdvariantno. Duplicate combinations of the these field wil either be ignored or generate an error on insert.

GEMS_MAP_DISTRIBUTIONS

A Map Distribution is a type of Map Feature and as such is recorded in the GEMS_MAP_FEATURE table. Tthe GEMS_MAP_DISTRIBUTIONS table provides the actual data (scores) for the distribution at each point (position) in the distribution.

Table 16. GEMS_MAP_DISTRIBUTION table

Fieldname Description Type
gmdistribution_id Long, primary key Unique data point id
gmfeature_id Long foreign key to the GEMS_MAP_FEATURE table, not null The id of the map feature (distribution) to which this score is associate
gmd_score Double not null The score or value of the distributiuon at the given position; the units are defined by the map feature class
gmd_position Double, not null The position within the map feature, relative to the start of the map feature; the units are defined by the map class



MySQL Scripts for ICIS 6.0 schema

Scripts for  ICIS GEMS 6.0 schema can be downloaded at cropforge.  DMS 6.0, GMS 6.0  and OMS 6.0 script is also available at cropforge.  Separate scripts were created for creating GEMS tables, creating GEMS indices, dropping of GEMS tables and creating Foreign ke ys. 

Note:  To create foreign keys in GEMS, requires:

(1)  DMS and GMS tables must be on the same database because there are fields in GEMS that references fields from DMS and GMS.  An example is the field class_id which references the class_if field in the GMS _CLASS table which is in the GMS database.

(2)  Indices in fields (both in GEMS and table that will be reference) included in the relationship must be in placed.


See Data Migration from GEMS5.4 to GEMS 6.0 To migrate data from GEMS 5.4 to GEMS 6.0

Semantic Correspondence between ICIS GEMS 5.4 and ICIS GEMS 6.0

The table below attempts to map the tables and attributes of the ICIS GEMS 5.4 schema onto the above ICIS GEMS 6.0 schema.

Table Mappings

Table 17. Approximate Semantic Mapping of ICIS GEMS 5.4 Tables onto ICIS GEMS 6.0 Tables

ICIS 5.4 Table ICIS 6.0 Table
no counterpart in 5.4 GEMS_ENTITY
GEMS_NAMES GEMS_NAMES
no counterpart in 5.4 GEMS_PUB (join table between GEMS_ENTITY and ICIS_BIBREFS)
GEMS_COMP GEMS_ANNOTATION used at GEMS_ENTITY level
GEMS_PROP GEMS_FEATURE_METADATA + Controlled vocabulary term foreign key for Property itself (Chado cvterm table)
GEMS_SCALE Controlled vocabulary term foreign key (Chado cvterm table)
GEMS_METHOD GEMS_EVIDENCE + Controlled vocabulary term foreign key for Method itself (Chado cvterm table)
GEMS_MARKER_DETECTOR GEMS_GENOMIC_FEATURE_DETECTOR
GEMS_MV GEMS_DETECTED_VARIANT
GEMS_PD GEMS_GENOMIC_FEATURE_ASSAY
GEMS_PD_COMP GEMS_GENOMIC_FEATURE_DETECTION_PROTOCOL
no counterpart in 5.4 GEMS_LINKAGE_MAP
GEMS_LOCUS GEMS_GENOMIC_FEATURE for DNA identity with map location information stored in GEMS_LOCUS


Field Mappings

Table 18. Semantic Mapping of ICIS GEMS 5.4 onto ICIS GEMS 6.0

ICIS 5.4 Table ICIS 5.4 Field ICIS 6.0 Table ICIS 6.0 Field
GEMS_NAMES gnid GEMS_NAMES gnames_id
GEMS_NAMES gobjid GEMS_NAMES gentity_id
not present n/a GEMS_NAMES source_id (foreign key to GEMS_EVIDENCE table)
GEMS_NAMES gntype GEMS_NAMES gntype
GEMS_NAMES gnlocn GEMS_NAMES gnlocn
GEMS_NAMES gndate GEMS_EVIDENCE observation_date
GEMS_NAMES gnuid GEMS_EVIDENCE userid (foreign key to ICIS users table)
not present n/a GEMS_EVIDENCE source_type
GEMS_NAMES gnref GEMS_EVIDENCE source_id (foreign key to pertinent table, as per source_type)
GEMS_NAMES gnval where gnstat = 1 (alternative name) GEMS_NAMES gnval
GEMS_NAMES gnval (where gnstat = 0 (default/preferred name) and gobjtype='gems_marker_detector') GEMS_GENOMIC_FEATURE_DETECTOR gfdetector_name
GEMS_NAMES gnval (where gnstat = 0 (default/preferred name) and gobjtype='gems_mv') GEMS_DETECTED_VARIANT gvariant_name
GEMS_COMP cid GEMS_ANNOTATION gannot_id
GEMS_COMP condid GEMS_ANNOTATION type_id (foreign key to cvterm table)
GEMS_COMP comid GEMS_ENTITY type
GEMS_COMP pid GEMS_ANNOTATION gmetadata_id
GEMS_COMP comval GEMS_ANNOTATION value
GEMS_COMP comuid GEMS_EVIDENCE userid
GEMS_COMP comref GEMS_EVIDENCE source_id (foreign key to ICIS BIBREFS)
GEMS_PROP pid GEMS_FEATURE_METADATA gfmetadata_id
GEMS_PROP propid GEMS_FEATURE_METADATA type_id (foreign key to cvterm table)
GEMS_PROP propname Chado cvterm table name in cvterm table via type_id
GEMS_PROP scaleid GEMS_FEATURE_METADATA unit_id (foreign key to cvterm table)
GEMS_PROP methid GEMS_FEATURE_METADATA gevidence_id (foreign key to GEMS_EVIDENCE table)
GEMS_METHOD, GEMS_SCALE all fields Chado cvterm table Controlled vocabulary terms with pertinent documentation
GEMS_LOCUS locusid GEMS_LOCUS glocus_id (see also GEMS_GENOMIC_FEATURE below)
GEMS_LOCUS chr GEMS_LOCUS glmap_id (foreign key to GEMS_LINKAGE_MAP table - chromosome treated as another map)
GEMS_LOCUS pos GEMS_LOCUS glocus_position
not present n/a GEMS_GENOMIC_FEATURE genomic_locus_id foreign key to GEMS_LOCUS
GEMS_MARKER_DETECTOR mdid GEMS_GENOMIC_FEATURE_DETECTOR gfdetector_id
GEMS_MARKER_DETECTOR mauid GEMS_EVIDENCE userid
GEMS_MARKER_DETECTOR maref GEMS_EVIDENCE source_id
GEMS_MARKER_DETECTOR all other fields GEMS_ANNOTATION detector metadata recorded as GEMS_ANNOTATION of the corresponding GEMS_ENTITY (with associated GEMS_FEATURE_METADATA and supporting GEMS_EVIDENCE)
GEMS_MV mvid GEMS_DETECTED_VARIANT gvariant_id
GEMS_MV locusid GEMS_DETECTED_VARIANT gfeature_id (foreign key to GEMS_GENOMIC_FEATURE table)
GEMS_MV mdid GEMS_DETECTED_VARIANT gfdetector_id (foreign key to GEMS_GENOMIC_FEATURE_DETECTOR table)
GEMS_MV mvtype GEMS_ENTITY type
GEMS_MV all other fields GEMS_ANNOTATION GEMS_DETECTED_VARIANT metadata recorded as GEMS_ANNOTATION of the corresponding GEMS_ENTITY (with associated GEMS_FEATURE_METADATA and supporting GEMS_EVIDENCE)
GEMS_PD pdid GEMS_GENOMIC_FEATURE_ASSAY gfassay_id
GEMS_PD condid GEMS_GENOMIC_FEATURE_ASSAY gfdprotocol_id (foreign key to GEMS_GENOMIC_FEATURE_DETECTION_PROTOCOL table)
GEMS_PD mdid GEMS_GENOMIC_FEATURE_ASSAY gfdetector_id (foreign key to GEMS_GENOMIC_FEATURE_DETECTOR table)
GEMS_PD all other fields GEMS_ANNOTATION GEMS_GENOMIC_FEATURE_ASSAY metadata recorded as GEMS_ANNOTATION of the corresponding GEMS_ENTITY (with associated GEMS_FEATURE_METADATA and supporting GEMS_EVIDENCE)
GEMS_PD_COMP pdcomp GEMS_GENOMIC_FEATURE_DETECTION_PROTOCOL gfdprotocol_id
GEMS_PD_COMP pdid not present not needed, since GEMS_GENOMIC_FEATURE_ASSAY points to the protocol, not vice-versa
GEMS_PD_COMP all other fields GEMS_ANNOTATION GEMS_GENOMIC_FEATURE_DETECTION_PROTOCOL metadata recorded as GEMS_ANNOTATION of the corresponding GEMS_ENTITY (with associated GEMS_FEATURE_METADATA and supporting GEMS_EVIDENCE)
not present n/a GEMS_PUB gpub_id is a foreign key into the ICIS BIBREFS table
not present n/a GEMS_PUB gentity_id is a foreign key into the GEMS_ENTITY

Examples of Data Input

See ICIS 5.4 GEMS Narrative Examples of Data Input.

Data Output

See ICIS 5.4 GEMS Narrative Examples of Data Output.

Summary

See also ICIS 5.4 GEMS Narrative Summary.