GRIMS Germplasm collecting locations
From ICISWiki
GRIMS main > GRIMS Issues and Solutions
Contents |
Introduction
The location where an accession was collected may be described by text data and/or latitude-longitude-altitude coordinates. Coordinate data are especially valuable because these enable mapping and associated GIS analyses.
Original vs interpreted data
We must distinguish between
- Information originally provided with the germplasm when it was received
- Our interpretation of that information
Substantial effort may be required to understand and interpret the information provided, for example to deal with language differences, historical changes, spelling errors, etc; and then to use text data to estimate or verify coordinate data. We suppose that the average user needs the interpreted information. However, the original information must always be retained and available for anyone wishing to verify or change the interpretation; and the reason for interpretation must always be annotated.
Text data
Text description of a collecting location may signify three different types of association between the information and what it represents:
| Type of information | What is represented? |
|---|---|
| The name of a political or administrative region, such as a country or a subnational administrative region, which are organized in a hierarchy that depends on the country (e.g. province, district, etc), described in ICIS as SNL1, SNL2, SNL3 | The collecting location is somewhere within the named region |
| The name of a populated place, such as village, town, city | The collecting location is somewhere near the named place |
| A locality description – a free-text description of an unnamed place in reference to nearby named places e.g “On the shore of the lake 2km E of Los Baños” | The collecting location is at the place described |
What goes into the locality description?
Any feature that isn’t a named populated place (village, town, city) or administrative division (district, province), including:
| Category | Examples |
|---|---|
|
Other types of feature |
lakes, rivers |
|
Vague regional descriptions that may include more than one town/village/district |
Ingham/Allingham, Cairns-Townsville area |
|
Institutes |
Burdekin Rural Education Centre |
|
Directions / description |
2 km E of Los Baños |
Notes:
- To the extent possible, the locality description should be standardized. See table below
- Unverifiable features or descriptions that we can’t understand (e.g. “HEBESA (PANA)”) will also go into Locality
- Not all punctuation indicates a locality description: “O'THOM” might be a village or town name. These should also be left where they are until validated.
- There is no need to record the IRGCIS field that originally held the locality description. For example, don’t put “TOWN=79.7 KM N OF JULIA CREEK;DISTRICT=ON NORMANTON ROAD;PROV=NEAR GEORGETOWN”. This locality description should be “79.7 km N of Julia Creek, on Normanton Road, near Georgetown”
- Locality description is not the same as original data. Original data should be recorded exactly as provided on original documents, not standardized, in separate fields in the database.
- Town, village or district? In many cases, one place name appears in IRGCIS sometimes as village, sometimes as town, sometimes as district. I have made no attempt to rationalize town/village differences
- Often one name can be used for both a town and a district. e.g. Adelaide in Australia is an ADM2 district and also the capital city of that district (www.statoids.com). In these cases:
- Where the name appears as a district, leave it as district
- Where the name appears as town or village, leave it as town or village but suppose it is also in the district of the same name
- Where the name appears as part of a description (e.g. “100km S of Adelaide”), I suppose
- The name refers to the town not the district.
- The place described is not necessarily in the district, so leave the district unchanged
Rules for case and punctuation
| Category | Rule | Example |
|---|---|---|
|
Place names |
Proper case |
Los Baños, West Bengal |
|
Compass directions (not part of place name) |
Upper case abbreviation without punctuation or embedded spaces, space before and after |
N,S,E,W, NE, SE, NNE, ENE |
|
Distances |
Standard abbreviation, lower case, space before and after, no punctuation |
m, km, mi |
|
Linking words |
Sentence case, in full not abbreviated |
from, to, after, before, along, and |
|
Types of feature that aren’t names |
Sentence case, in full not abbreviated |
highway, river, lake |
Coordinate data
Coordinate system
To assign coordinates to a location, it is necessary to:
- Define a mathematical approximation (a “Datum”) to the shape of the earth in the vicinity of the location. Many datums exist. WGS84 (http://en.wikipedia.org/wiki/World_Geodetic_System) is a global approximation, the standard for world maps such as Google Earth, and the default for GIS systems and GPS devices. Country specific maps may use different datums that are better approximations for the country.
- Define an arbitrary origin to represent the prime meridian, 0° longitude. The Greenwich Meridian (http://wwp.greenwichmeridian.com/) is now the global standard, although beware datasets that use the Paris Meridian (http://en.wikipedia.org/wiki/Paris_Meridian).
Care should be taken to document the coordinate system used. Combining coordinates from different systems without adjustment to a common system is an error.
Coordinate interpretation and uncertainty
Coordinate data coming different sources may need different interpretations. They have an associated (implied or explicit) degree of uncertainty, which range from a few metres for GPS readings to thousands of km if the only information available is the country:
| Source of coordinate data | What is represented? | Uncertainty |
|---|---|---|
| Look up coordinates of political or administrative region in a gazetteer | The centroid of the region | Because the collecting location is within the region, if no finer-scale information is available the measure of uncertainty of the region's coordinates correctly reflects uncertainty of the collecting location |
| Look up coordinates of a populated place, such as village, town, city in a gazetteer | The centroid of the named place. | Because the collecting location is just somewhere near to the place, a measure of uncertainty of the location of the place does not reflect uncertainty of the collecting location |
| BioGeomancer used to interpret a free text description | Biogeomancer’s best estimate of the collecting site location | A measure of uncertainty of the estimate |
| Coordinate data provided on collecting forms | The collector’s best estimate of the collecting site location | Uncertainty may be
|
| Coordinate data provided on other paper other than collecting forms, or electronically, without indication of source, uncertainty, datum, prime Meridian, or what location is described | Someone’s best estimate of a location | Uncertainty can usually be no more than partially inferred from the format in which the data were provided. |
Inferring uncertainty from the original coordinate data format
When there is no other explicit information on the uncertainty of coordinates provided, some incomplete inference may be made just based on the format and precision in which the data were provided.
In the following, “D”=degrees, “M”=minutes, “S”=seconds, “H”=hemisphere (N or S for latitude, E or W for longitude), “.”=decimal point; where “H” is missing from the format, negative values are used for S and W, positive for N and E. Formats may be:
| Format | Interpretation |
| DDH | Minimum precision – data given only to the nearest whole degree probably from a low-resolution map. Typically indicates an uncertainty of 100km or worse |
| DDMMH | Slightly higher precision - data given only to the nearest whole minute probably from a local map. Typically indicates an uncertainty of ~5km or less |
| DDMMSSH | High precision, likely from a high quality map. Typically indicates an uncertainty of a 100-500m |
| DDMM.MMMH DDMMSS.SH | These formats are typical for a GPS, indicative of uncertainty in the region of 10m |
| DD.DDDDD | This format is typical for Gazetteers, GIS systems and other databases containing spatial information. If provided with no associated information on source or uncertainty – beware! There’s no way to infer the uncertainty of these coordinates |
| DD.DDDDDDD | Beware! Displaying coordinates as decimal degrees to more than 5 decimal places is almost always meaningless. GPS devices are not that precise, and even if they were, such precision is useless for genetic resources purposes. Use of such high-precision display formats, especially if there’s no associated information on source or uncertainty, indicates unskilled use of GIS/GPS systems |
Display format for coordinate data
Users differ in their preferred display format - some like decimal degrees, some like degrees-minutes-seconds, others like degrees-decimal minutes.
In any case, normal practice should be to display coordinate only to a precision commensurate with the uncertainty, and not to a higher precision than used by the original provider. For example, a longitude value provided as 75°W should be displayed as 75°W or -75, not as 75°0'W or 75°0'0"W or -75.0 or -75.0000. The following rule of thumb could be applied
| Precision | Display in degrees-minutes-seconds system | Display in decimal degrees system |
|---|---|---|
| >100km | DDH | DD |
| 5km - 100km | DDMMH | DD.DD |
| 100m - 5km | DDMMSSH | DD.DDDD |
| <100m | DDMMSS.SH | DD.DDDDD |
Storing collecting location data in ICIS
Interpreted text data
Interpreted text data on collecting location are stored in the table LOCATION, resolved into components with different LTYPE as:
| Component | LTYPE |
|---|---|
| Validated country | 405 |
| Validated top level subnational administrative region | 406 |
| Validated second level subnational administrative region | 407 |
| Validated third level subnational administrative region | 408 |
| Validated nearest named populated place | 413 |
| Free text locality description | 409 |
Thus, up to 6 different location records may be associated with a single collecting site.
If there is no free text locality description, but there are other data (e.g. coordinate data or other site descriptions) associated with the collecting site itself, there will be a LOCATION record with LTYPE=409 but no text value.
Collecting location data are associated with an accession through its GPID1, which points to a GID that represents the collected sample. This GID has GMETHN=69=collected to show it represents a collected sample, and has GLOCN pointing to the smallest scale location record with it. Fields in that location record point to the associated larger scale locations (CNTRYID, SNL1ID, SNL2ID, SNL3ID, NNPID).
Original text data
Original data, as provided by the user, are stored without spelling corrections or standardisation of any kind, concatenated into a single string in the format “Field1=value; field2=value” (for example “Province=Laguna;Municipality=Los Baños;description=3km E of Los Baños”).
This string is stored in a record in table LOCDES with DTYPE=303, with the same LOCID=GLOCN of the collected sample.
Coordinate data
Coordinate data (latitude, longitude, altitude) and related information are stored in the GEOREF table as follows:
- Type of source of coordinate data (GPS, digital gazetteer, map, collecting form, biogeomancer, collecting report, provider’s database with unspecified source)
- Reference to specific map, report, gazetteer etc used to determine coordinate data
- Display format of coordinate data as originally provided
- Uncertainty of coordinate data, as a radius in metres
- Date of coordinate data provided or estimated
- Person who provided the coordinate data

