IRIS Curation

From ICISWiki

Jump to: navigation, search

Contents

Guidelines for assigning new GID

IRIS has the capacity to document the complete history of all germplasm management. In practice, we don't know the complete history, nor do we want to create a system that is more cumbersome than we need. Therefore these guidelines are based on creating new GIDs only where strictly necessary to record the information we need. This principle applies to:

  • The creation of GIDs to represent new germplasm samples
  • The creation of GIDs to represent existing or historical or notional germplasm samples not previously documented in IRIS

Creating new GIDs for new germplasm samples

A new GID must be assigned to the following:

  1. Every seed harvest that involves a generative or derivative method.
  2. Every seed increase or multiplication of a GRC accession.
  3. Every imported seedlot, including imports of the harvest from lines exported from IRRI to another agency, unless we are sure that the seed are re-imports of exactly the same seed as we exported (if we are not sure if the seed are the same or a new harvest, assign a new GID like a new harvest).
  4. Every line that is transferred from one OU to another, if the receiving OU will manage its line independently (for example, IRRI lines deposited in GRC, GRC lines transferred to PBGB or other groups)
  5. Every seed harvest that is taken by an IRRI program in a different country.
  6. Every harvest from lines that are subject to stress treatment (like disease or insect stress).
  7. Every line nominated to INGER (including IRRI as well as other contributions to INGER)
  8. A variety released in a country other than the Philippines, even if it is an IRRI line.

A new GID will not be assigned in the following cases:

  1. Harvest of an IRRI breeding line that is stable (unless our breeders want to change that rule in such a way that each seed increase will give new GID to the harvest):
  2. Harvest in IRRI of INGER lines.
  3. The release name of an IRRI line that is released in the Philippines will be given a new name associated with existing GID of the IRRI line, without assigning a new GID.

Creating new GIDs to represent existing or historical germplasm samples

In order to represent the history of new incoming germplasm samples and their relationships with other GIDs already in IRIS, it is sometimes essential to create additional GIDs representing existing or historical or inferred germplasm samples that have not previously been documented in IRIS. For example, if we receive an incoming sample that is related in some way to another GID already in IRIS, then in order to represent that relationship it is essential to have a GID to represent the sample managed by the sender as well as a GID to represent the sample received by IRRI. Conversely, if the incoming sample has no relationship with any other GID in IRIS, it is not essential to create a GID to represent the sample managed by the sender (although it would not be wrong to do create one)

The procedures below describe how to create such new GIDs only when it is essential.

Curating Incoming Seeds

Germplasm creation method, date and location

All incoming seed samples must have

  • Method = import
  • GDate = date the sample was received by the importer
  • GLocN = location (OU or institute) of the receiving organization

Group and source

  1. Assign GPID2 (the source) of an incoming sample to equal the GID of the sample managed by the sender.
  2. Assign GPID1 (the group) of the incoming sample to equal the GPID1 of the GID representing the sender's sample.

Creating a GID for the sample managed by the sender

Check whether IRIS already contains a GID representing the sample managed by the sender. If not, you must create it in addition to the new GID for the incoming sample, and assign values as follows:

  1. The GLOCN of the sender's sample must be the sender's institute
  2. If the data provided by the sender show an IRGC No or IRTP No already associated with the incoming seed sample, assign the following data to the new GID representing the sample managed by the sender:
    1. Method = import
    2. GDate = date the original IRGC No or IRTP No was originally sent from IRRI to the current sender (if known and if the sender received the material directly from IRRI)
    3. Source = the GID of the original IRGC No or IRTP No (if the sender received the material directly from IRRI, or if the sender's direct source is unknown)
    4. Group = the GPID1 of the original IRGC No or IRTP No
    5. IRGC No or IRTP No is entered as a name of type DAccN (if the sender received the material directly from IRRI) or FAccN (if the sender received the material indirectly), with NLocN=GRC or IRRI and NDate inherited from the original IRGC No or IRTP No.
  3. If the incoming seed is identified by the sender's own preferred ID (genebank accession number or breeder's selection ID), and that same ID is already present in IRIS representing for samples held by IRRI or other agencies (i.e. the same ID but with nstat≠2 and with ntype=DAccN or FAccN or DrvNm), assign the GID for the sender's sample as the source of the other instances in IRIS as well as the source of the incoming sample.
  4. If the incoming seed originates in a sample originally collected from the field and comes with data on collecting site including the original collector's sample ID (name type ColNo), that is already present in IRIS as a ColNo of other other GIDs, assign:
    1. Group of the sender's sample = the GID representing the original collected sample (which must be created if it doesn't already exist – see below)
    2. If the immediate source of the sender's sample is known and documented in IRIS, set that GID as the source of the sender's sample
    3. If the immediate source of the sender's sample is unknown, set the GID of the original collected sample as the source of the sender's sample
  5. If the incoming seed is a released variety, and IRIS contains another GID representing the original release of that variety (i.e. with name type RELNM):
    1. If the actual source of the sender's sample is known, set that source as its GPID2.
    2. If its actual source is unknown, set its source to be the GID associated with the original release name.
  6. If the incoming seed is a derivative of an IRRI line without an IRGC No or IRTP No. In this case, set the source of the sender's sample = the GID of the parent IRRI line
  7. If the incoming sample is a landrace that is verified to have a source that is already documented in IRIS or that is verified to share the same source as another GID in IRIS, set the group and source of the sender's sample to its verified group and source. However, note that this bullet may be redundant when you consider the question "How do you verify whether the source is the same?". The (only?) three possible answers are: (i) By the presence in the sender's data of an IRGC No or IRTP No – a situation is dealt with in case 1 above; (ii) By the presence in the sender's data of a name that is already in IRIS with name type ColNo, i.e. a collector's sample ID. This situation is dealt with in case 3 above; and (iii) By the presence of other names of type AccNo (e.g. a PI number for accessions in the USDA collection) of DAccN (e.g. a PI number of a sample received by IRRI or another genebank from the USDA collection) or FAccN (e.g. a PI number of a sample received indirectly from the USDA collection). The procedure for managing this situation is equivalent to that for handling incoming samples identified in the sender's data by IRGC No or IRTP No, i.e. you must check whether IRIS already contains a GID to represent the sample managed by the sender, create one if it does not already exist, and set it as the source of the incoming sample. This situation is dealt with in case 2 above.
  8. For other breeding lines where the name, nlocn, glocn, and parents do not match those of any GID in IRIS, create a new GID with:
    1. The required name, nlocn and glocn
    2. If the parents are given,
      1. Method = unknown derivative
      2. Group and source = the cross of the parents (which may need to be a newly created GID, see below)

Creating a GID for the group

Check whether IRIS already contains a GID representing the group. If not, you must create it in addition to the new GID for the incoming sample, and assign values as follows:

  1. If the incoming seed originates in a sample originally collected from the field and comes with data on collecting site including an original collector's sample ID (name type ColNo), set
    1. GDate=Collecting date
    2. GLocN=Collecting location.
      1. If the only thing you know about the location is geopolitical administrative data (e.g. the country, province or district in which the sample was collected, or the nearest town or village), you should look for an existing location in IRIS for the smallest scale information you have, and use the LOCID of that location. For example, if you only know the country, use the country’s LOCID. If you know country, province, district and town, use the town’s LOCID. If you cannot find the place name in IRIS – take care! Place names in IRIS are carefully curated and checked against digital Gazetteers. If you cannot immediately find the place name, check first for variant spellings. IRIS only records one preferred spelling for each place, but many places have many names.
      2. If you have more specific information on the collecting locality (e.g. a text description such as “5km N of Vientiane”, or latitude/longitude coordinates, or altitude), you will probably need to create a new location record, with LTYPE=409 and LNAME=the text description.
    3. Method=Collected
    4. Normally, Group=source=missing. However, in a few instances the collected sample may be of a known released modern variety. In this case, set Group=source=GID of the released variety.
    5. Collector's sample ID = preferred ID (nstat=8), name type=ColNo
  2. If you have information confirming that the incoming seed is a released variety (with unknown parents) and you know the date and/or country of original release, set
    1. GDate=Release date
    2. GLocN=Country of release
    3. name type=RELNM
  3. If the incoming seed is a breeding line derived from a cross that is not documented in IRIS, create a new GID to represent the cross.
  4. If you know nothing about the origin of an incoming sample except the country of origin, create a new GID to represent the origin with GLOCN=country of origin.

Name, name state, name type, name date and name location

  • The name state (NSTAT) of a name takes one of a set of pre-defined values: 1 for the preferred name, 2 for the preferred abbreviation, ? for the preferred ID, 0 for other English language names.
    • One GID must have exactly one preferred name. The same name can be used (usually but not necessarily also as the preferred name) for other GIDs
    • One name (never more than one) can be the preferred unique ID – for example, a genebank accession ID. It must be the unique identifier assigned by the person who manages a germplasm sample to distinguish that particular germplasm sample from all other samples. If present, the preferred ID should be unique as a preferred ID in IRIS, not occurring as a preferred ID for any other GID.
  • The name type of a name takes one of a set of predefined values – RELNM for released cultivars, CVNAM for other cultivars etc.
    • Only the following name types can be a "preferred name": CRSNM, RELNM, DRVNM, CVNAM, LNAME, ELITE
    • Only the following name types can be a "preferred ID": ACCNO, ITEST, NTEST, GACC
  • The name date (NDATE) of a name is the date on which the name was first assigned to the germplasm it describes. For example, Doongara is a variety released under that name in Australia in 1989. A name Doongara for an incoming sample should have NDATE=19890000 (a more precise date would be given if it was known) even if the sample was received in 2007.
  • The name location (NLOCN) of a name is the location where it was first named. e.g. the name location of Doongara is Australia (a more precise location would be given if it was known), even if a sample of Doongara is received from another country.

Assigning a specific unique ID

If you assign your own unique IDs to samples under your management (e.g. IRTP numbers or IRGC numbers), you should:

  1. Create a new name to hold your preferred ID for the incoming sample.
  2. Check that it is not already in use for any other GID
  3. Assign a name state preferred ID
  4. Assign the name type selected for your preferred ID
  5. Assign the current date as the name date. (This may be the same as the GDate of the incoming sample, but in most cases it will be some time later: the GDate of the incoming sample is the date it arrives in IRRI, and it typically takes days to weeks or months before it is ready to receive your own preferred unique ID.
  6. Assign IRRI or your OU as the name location

Assigning other names

Most other names are like family names associated with a lineage and are therefore inherited from the source. Therefore there are two sources of data to determine the appropriate names and associated name data for an incoming sample – data provided by the sender, and data already in IRIS.

  1. Compare the name data provided by the sender with data already in IRIS on the name data for the sender's sample and/or its source.
    1. Where available, use the data in IRIS to fill in data not provided by the sender, e.g. to decide whether a cultivar name is the name under which it was released (RELNM) or not (CVNAM), or the date of release of a cultivar (NDATE of RELNM) or the date a sample was accessed into a genebank (NDATE of ACCNO).
    2. If the sender provides data that are missing from the source in IRIS (e.g. the release date of a RELNM), use the sender's data to fill in missing data on the source(s)
    3. If the name given by a sender has a different spelling from the same name in IRIS, consider entering both versions and assigning one as an alternative name.
    4. If there are other discrepancies, investigate or report – discrepancies in name data between GIDs in the same maintenance neighbourhood are not be permissible and must be corrected when found!
  2. Based on the above comparison of the sender's data with data in IRIS, use the following rules to assign appropriate names and name data to the incoming sample:
    1. A new GID should inherit all names from its source except names of type TACC, GACC, and possibly FACCN.
    2. Names with name state = preferred ID in the source change to name state 0 in the new GID
    3. Names of type ACCNO, ITEST and NTEST in the source change their name type to DACCN in the new GID
    4. Names of type DACCN change their name type to FACCN
    5. All other data associated with an inherited name (name state, name type, name date, name location) are also inherited from the source.

Curating seedlots transferred between IRRI OUs

(Examples: transfer from GRC to INGER or PBGB, from INGER to GRC, from PBGB to GRC or INGER, or to or from any other OU that manage its own germplasm records in IRIS)

The principles are the same as curation of incoming samples, but should be easier because there should always be a record in IRIS recording the sample managed by the source OU.

A new GID must always be created to represent the sample received and managed by the receiving OU, and the new GID must have

  • Method = import
  • GDate = date the sample was received
  • GLocN = the LocID of the receiving OU
  • Source = the GID of the source
  • Group = the group of the source
  • Names created or inherited in the same way as for incoming samples

Choosing a GID for outgoing seeds

  1. Verify the source of the IRRI seeds.
  2. If the source is from GRC, select the one with IRGC number
  3. If the source is unknown, select the one with IRTP Number since that is meant for distribution and collaborative testing