TDM Ontologies and Controlled Vocabularies 6.0

From ICISWiki

Jump to: navigation, search

Back to Main Page > ICIS 6.0 Home Page

Contents

Synopsis

This page discussed the overall design strategy for a "next generation" ontology management facility within ICIS.

The core assumptions of this review are:

  1. that ICIS schemata have an existing controlled vocabulary semantics e.g. DMS Property x Scale x Method) which should be preserved in version 6.0
  2. that ICIS software applications and legacy data assume a certain set of use cases for semantics (i.e. fixed SCALE types, specific controlled vocabulary DLL interfaces, etc.)
  3. that suitable public domain methodology for ontology management (including available open source schemata) would add significant functionality to ICIS if embedded herein

Priorities for Implementation in ICIS 5.* and the GCP MBP

This section is a bit of the "cart before the horse"

ICIS 5.* Semantics - the Big Picture

ICIS consists of several modules - GMS, DMS, GEMS - each of which contain existing controlled vocabularies. These vocabularies are generally stored in specialized tables and include:

GMS

GMS (Genealogy Management System) pedigrees are annotated with the following extensible semantics:

  • Germplasm Methods - controlled vocabulary stored in the METHOD table
  • User-defined Fields (UDFLDS) - used to add ad hoc attributes to the germplasm (ICIS GERMPLSM) table. Refers to the SCALEDIS table.
    • For example, passport data is stored with GMS records using the UDFLDS table and ATRIBUTS table

DMS

DMS (Data Management System) Study Factors and Variates have the following controlled vocabularies, as part of the ICIS Trait (a.k.a. Property) Management System:

  • Property: (formerly called "traits") is specified by a controlled vocabulary stored in the ICIS TRAIT table, with TraitId and TraitGrp (parent concept)
  • Scale: the type of which is a controlled vocabulary stored in the ICIS SCALE table. If the Scale type is "discrete", then the scale values are also a controlled vocabulary, stored in the ICIS SCALEDIS table. Other scale types are numeric and character, which are not CVs.
  • Method: is a controlled vocabulary stored in the ICIS TMETHOD table.

GEMS

The Gene Management System (GEMS) is a relatively new module for which the semantics is (currently) less flexible, however, the following facets of semantics may be recognized:

  • Marker attributes: currently hard coded in the GEMS Marker table as attributes, but should clearly be encoded in a more flexible fashion.
  • Protocol attributes: had a data model for annotation which is reminiscent of the DMS Property x Scale x Method but no data has been loaded here yet.
  • Polymorphism Detector: is actually a join between marker and protcol, generally inheriting attributes from each.
  • Molecular variant attributes: currently hard coded in the GEMS Molecular Variant table as attributes, but should clearly be encoded in a more flexible fashion.
  • Gene Locus: is currently only sparsely defined in the GEMS, but will likely have attributes to be encoded

All of the above GEMS semantics could greatly benefit from the formal application of ontology adapted from, for example, the [IPGRI Molecular Descriptors].

LDMS

The current design of the Location Data Management System (LDMS) has semantics encoded mainly in the LOCATION_DESCRIPTOR table that points to the UDFLDS table and some dedicated tables (e.g. the COUNTRY table).

ICIS 6.0 Ontology Management System (OMS)

Team discussions on design options for enhancing ontology facilities in ICIS focus mainly on the integration of the new [|OMS] module, adapted from the European Bioinformatics Institute Ontology Lookup Service (EBI-OLS), into ICIS with the following schema.

Image:OMS-db-diag.png

Unify ICIS schemata object attributes

Controlled vocabularies in ICIS are generally dealt with in distinct tables. Use of the Chado schema gets all such controlled vocabularies (CV) into a centrally managed dictionary. Furthermore, the use of CV is not done in a consistent manner across ICIS, or at least, exhibits some duplication. For example, the GEMS marker properties and DMS property, scale and method CV are very similar in usage, but are shown to use distinct module specific tables.

It is proposed to collapse all ICIS controlled vocabularies to use a single ontology subsystem. This will require slight changes to software management of CV's across the system.

Practical Consequences of the Proposed Changes

The Good...

The power of converting ICIS to use a full ontology subsystem is that we can leverage tools and ontology sets from the international community, such as Plant Ontology Consortium efforts in plant anatomy/development/trait ontology. Crop traits (phenotypes) can be dissected and more sophisticated querying may be possible with such external linkages, and the use of 3rd party ontology tools will be possible. Also, ICIS management of ontology can subsequently follow "best practices" for ontology development.

...The Bad...

Introducing a complete ontology subsystem will not likely be completely backward compatible to previous versions of ICIS. Conversion of whole databases to the new format will be needed. This may not be without difficulties and potential degradation of the data due to errors in translation.

...and the Ugly

Some significant reprogramming of the ICIS Dynamic Link Library (DLL) and formal curation of control vocabulary mappings from classic ICIS tables to ontology terms will be necessary. Database conversion scripts will be required. This may represent a significant programming effort.

Issues about Integrating OMS in ICIS

Ontology Loader and the Primary keys created in OMS

It was noted that the current ontology loader script in the EBI-OLS always assigns new internal primary key IDs every time the ontology is updated.

Some of the suggested solutions to handle this issue are:

  1. Use the public known term identifier and not the internal numeric ID to link the EBI-OLS with ICIS CV usage:
    1. The stable public CV (alphanumeric) identifier is likely to be stably mapped (once) onto ICIS controlled vocabulary tables (i.e. Method, Trait, Scale, Method, GEMS CV tables?)
    2. Labelling data with the public CV identifier may be done with stable internal ICIS identifiers
      1. Initially (release 5.6++), the current assigned controlled vocabulary (CV) tables could be retained, and a single attribute field added (i.e. in the Method, Trait, Scale, Method, GEMS CV tables?)
      2. Later versions (6.0++) could merge the old ICIS CV tables into one ICIS internal CV table, with stable internal CV identifier ("key") mappings, public CV identifier and CV term (human readable) name
    3. Searching on the full ontology could be done in the "external" ontology database, giving public identifiers matching the concepts the user wishes to use. Such public identifiers can then be looked up in the ICIS "link" table, to give the ICIS internal CV identifiers. Three basic use cases for this search are:
      1. Data curator look up of suitable CV terms for concepts to use in labeling data (e.g. in the ICIS DMS Workbook)
      2. End user data retrieval using suitable CV terms, whose ICIS internal CV identifiers are then looked and and used to access ICIS data labeled with the given concepts
      3. Hypothesis-free data mining and global analysis, which collects statistics of ICIS data. In this case, the analysis is done on the ICIS data, the ICIS CV identifiers of interest identified, then used to look up the public CV identifier in the lookup table, which are then used to look up the related ontology relationships in the external OMS database.
  2. Maintain two databases.
    1. One EBI-OLS schema database will be initially created with the current loaded ontologies. This the database to be used in ICIS and any initial IDs so created will be maintained. A second EBI-OLS "working" database will be maintained which will be used by the Ontology loader.  Any updates being done by the ontology loader to the working database will be synchronized to the OMS of ICIS through an independent script which doesn't overwrite ICIS IDs but rather matches existing public known identifiers and only assigns new identifiers to new public known term identifiers.

Cross product of two or more ontologies:  (Example is association of Trait, Scale and Tmethod)

In ICIS, a certain sale is a measurement unit of a particular trait or a method is a way of measuring a trait. If trait, scale and method will be in three different ontologies, one issues is how to represent the association of the terms from these three ontologies. This is like cross-product of two ontologies. Moreover, in FACTOR or VARIATE table,a variable is about a trait measured in certain scale and by a certain method. So, this involves cross-product of three ontologies.

Some possible ways to represent this cross producte in ICIS are:

1. Represent the cross-product through a table in OMS

2. Retain the TRAIT, SCALE and TMETHOD tables but the IDs that will be used is the one from the OMS.

3. Create a new table in ICIS for the cross product of Trait, Scale and Tmethod.  This can be initially populated by the unique combination of trait, scale and method existing in FACTOR and VARIATE tables.

Practical Application of Ontology in ICIS Tools

Breeder's Tools

1. Assist breeder to accurately specify the traits he wants to observe

Personal tools