Data Validation Tool 5.5.2
From ICISWiki
M.C. Habito, A.R. Llorca, R. Sackville Hamilton
ICIS Technical Documentation 5.5 > Data Validation Tool 5.5 > Data Validation Tool 5.5.2
Download
Latest version is here.
Introduction
The ICIS Data Validation Tool is an application that searches ICIS for data errors that might render it meaningless. This is useful in making sure that published data are always of excellent quality.
Just choose the tests you want to execute and click on the "Run" button. If you wish to run several or all tests, it is recommended to leave this tool running overnight.
Genealogy Management System
Checks the GMS database (applicable to all ICIS implementations) --> see screenshot above.
Incorrect parent references [2 checks]
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| Query GIDs in the GMS.GERMPLSM table that have unknown group source, and with non-generative group source. | DataError-0001: Unknown group source | CHK01E |
SELECT g.gid, g.gnpgs, g.methn, g.gpid1, g.gpid2, u.uname as icis_user FROM germplsm AS g INNER JOIN users as u ON g.germuid = u.userid Where g.gnpgs = -1 And g.gpid1 = 0 and g.grplce = 0 And g.gpid2 <> 0 |
| DataError-0002: Germplasm with non-generative group source | |
SELECT g1.gid, m1.mname, g1.gpid2, g1.gpid1, m2.mname as method_of_gpid1, g2.gnpgs as gnpgs_of_gpid1, g2.gpid1 as gpid1_of_gpid1, u.uname as icis_user FROM ((germplsm as g1 INNER JOIN names as n1 ON g1.gid = n1.gid) inner join methods as m1 on g1.methn = m1.mid) INNER JOIN ( (germplsm as g2 INNER JOIN names as n2 ON g2.gid = n2.gid) inner join methods as m2 on g2.methn = m2.mid) on g2.gid = g1.gpid1, users as u Where n1.nstat = 1 And g1.gnpgs = -1 And g1.gpid1 <> 0 And g1.grplce = 0 And g2.gnpgs = -1 And g2.gpid1 <> 0 and g1.germuid = u.userid |
Circular references [3 checks]
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| If germplasm A has germplasm B as one of its parent and if germplasm B has germplasm A as one of its parents, then we have a circular reference situation. This option also checks for two and three-level circularity. | DataError-0003: 1st level circular reference | A references B, and B references A [CHK01C,CHK01D] |
GPID1: SELECT g.gid,
g.gpid1,
u.uname as icis_user
FROM germplsm as g, users as u WHERE g.gid = g.gpid1
and g.germuid = u.userid
and g.grplce = 0
GPID2: SELECT g.gid,
g.gpid2,
u.uname as icis_user
FROM germplsm as g, users as u
WHERE g.gid = g.gpid2
and g.germuid = u.userid
and g.grplce = 0
|
| DataError-0004: 2nd level circular reference | A references B, and B references C, and C references A| |
GPID1: SELECT g1.gid,
g1.gpid1,
g2.gpid1 as p1_gpid1,
u.uname as icis_user
FROM germplsm as g1 INNER JOIN germplsm AS g2
ON (g1.gpid1 = g2.gid) AND (g1.gid = g2.gpid1), users as u
Where g1.gnpgs = -1
and g1.germuid = u.userid
And g1.grplce = 0
GPID2: SELECT g1.gid,
g1.gpid2,
g2.gpid2 as p2_gpid2,
u.uname as icis_user
FROM germplsm as g1 INNER JOIN germplsm AS g2
ON (g1.gpid2 = g2.gid) AND (g1.gid = g2.gpid2), users as u
Where g1.gnpgs = -1
and g1.germuid = u.userid
And g1.grplce = 0
| |
| DataError-0005: 3rd level circular reference | A references B, and B references C, and C references D, and D references A| |
GPID1: SELECT g1.gid,
g1.gpid1,
g2.gpid1 as p1_gpid1,
g3.gpid1 as p1_p1_gpid1,
u.uname as icis_user
FROM (germplsm as g1 INNER JOIN germplsm AS g2 ON g1.gpid1 = g2.gid)
INNER JOIN germplsm AS g3 ON (g3.gpid1 = g1.gid)
AND (g2.gpid1 = g3.gid), users as u
Where g1.gnpgs = -1
and g1.germuid = u.userid
And g1.grplce = 0
GPID2: SELECT g1.gid,
g1.gpid2,
g2.gpid2 as p2_gpid2,
g3.gpid2 as p2_p2_gpid2,
u.uname as icis_user
FROM (germplsm as g1 INNER JOIN germplsm AS g2 ON g1.gpid2 = g2.gid)
INNER JOIN germplsm AS g3 ON (g3.gpid2 = g1.gid) AND (g2.gpid2 = g3.gid),
users as u
Where g1.gnpgs = -1
and g1.germuid = u.userid
And g1.grplce = 0
|
Deleted parent references [3 checks]
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| A GID should not reference a deleted GPID1, GPID2 or MGID. | DataError-0007A: Germplasm with deleted parent references (GPID1) |
SELECT g.gid as gid,
g.gpid1 as deleted_gpid1,
u.uname as icis_user
FROM germplsm AS g INNER JOIN germplsm as p1 ON g.gpid1 = p1.gid,
users as u
Where g.grplce = 0
and g.germuid = u.userid
and p1.grplce = p1.gid
| |
| DataError-0007B: Germplasm with deleted parent references (GPID2) |
SELECT g.gid as gid,
g.gpid2 as deleted_gpid2,
u.uname as icis_user
FROM germplsm AS g INNER JOIN germplsm as p2 ON g.gpid2 = p2.gid,
users as u
Where g.grplce = 0
and g.germuid = u.userid
and p2.grplce = p2.gid
| ||
| DataError-0007C: Germplasm with deleted parent references (MGID) |
SELECT g.gid as gid,
g.mgid as deleted_mgid,
u.uname as icis_user
FROM germplsm AS g INNER JOIN germplsm as m ON g.mgid = m.gid,
users as u
Where g.grplce = 0
and g.germuid and u.userid
and m.grplce = m.gid
|
Replaced parent references [3 checks]
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| Let us say germplasm A is replaced with germplasm B as shown in the GERMPLSM.GRPLCE database column. All germplasm that references germplasm A should be corrected to germplasm B. | DataError-0015A: Germplasm with replaced parent references (GPID1) |
select g.gid as gid,
g.gpid1 as original_gpid1,
p1.grplce as change_gpid1_to,
u.uname as icis_user
from germplsm as g INNER JOIN germplsm as p1 on g.gpid1 = p1.gid,
users as u
where g.grplce = 0
and p1.grplce <> 0
and p1.grplce <> p1.gid
and g.germuid = u.userid
| |
| DataError-0015B: Germplasm with replaced parent references (GPID2) |
select g.gid as gid,
g.gpid2 as original_gpid2,
p2.grplce as change_gpid2_to,
u.uname as icis_user
FROM germplsm as g INNER JOIN germplsm as p2 on g.gpid2 = p2.gid,
users as u
WHERE g.grplce = 0
and p2.grplce <> 0
and p2.grplce <> p2.gid
and g.germuid = u.userid
| ||
| DataError-0015C: Germplasm with replaced parent references (MGID) |
select g.gid as gid,
g.mgid as original_mgid,
m.grplce as change_mgid_to,
u.uname as icis_user
from germplsm as g inner join germplsm as m on g.mgid = m.gid,
users as u
where g.grplce = 0
and m.grplce <> 0
and m.grplce <> m.gid
and g.germuid = u.userid
|
Incorrect F2 Lines
Cropforge Feature Request #2377 (syates)
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| Germplasm that have GPID1=GPID2 where GPID1<>0 and GPID2<>0 when METHN=31. | DataError-0042: Incorrect F2 Lines |
SELECT g.gid,
g.methn,
g.gpid1,
g.gpid2,
u.uname as icis_user
FROM germplsm as g, users as u
WHERE g.methn = 31
AND g.gpid1 = g.gpid2
and g.gpid1 <> 0
and g.gpid2 <> 0
and g.grplce = 0
and g.germuid = u.userid
|
No names [1 check]
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| NAMES table: Every germplasm must have at least one name record. | DataError-0008: Germplasm with no NAMES records |
select a.gid as germplsm_gid, a.methn as germplsm_method, a.glocn as germplsm_location, u.uname as icis_user from germplsm as a, users as u where a.grplce = 0 and a.germuid = u.userid and not exists (select * from names n where n.gid = a.gid) |
Duplicate Names [2 checks]
CropForge Feature Request # 710 (rhamilton,mhabito)
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| NAMES table: GID-NVAL-NTYPE (Germplasm ID - Name value - Name type) should be unique in any implementation of ICIS. | DataError-0032: Germplasm with duplicate names |
Added by different ICIS users: Select a.gid,
a.nval,
a.ntype,
a.nid,
u.uname as icis_user
from names a, users as u
where exists
( select * from names b
where a.gid = b.gid
and a.ntype=b.ntype
and a.nval = b.nval
and a.nuid <> b.nuid
and a.nstat <> 9
and b.nstat <> 9)
and a.nuid = u.userid
order by g.gid, a.nval, u.uname
Added by the same ICIS user: Select a.gid,
a.nval,
a.ntype,
a.nid,
u.uname as icis_user
from names a, users as u
where exists
( select * from names b
where a.gid = b.gid
and a.ntype=b.ntype
and a.nval = b.nval
and a.nuid = b.nuid
and a.nid <> b.nid
and a.nstat <> 9
and b.nstat <> 9)
and a.nuid = u.userid
order by a.gid, a.nval, u.uname
| |
| Possible duplicates. Same GID-NTYPE combination, but NVALs may only have slight differences in spelling | DataError-0010: Germplasm with possible duplicate names (NTYPE-NSTAT combination occurring more than once) |
Select a.gid,
a.ntype,
a.nstat,
a.nval,
a.nid,
u.uname as icis_user
from names a, users as u
where exists
( select * from names b
where a.gid = b.gid
and a.ntype = b.ntype
and a.nstat = b.nstat
and a.nval <> b.nval
and a.nstat <> 9
and b.nstat <> 9)
and a.nuid = u.userid
order by a.gid,a.nval
|
Count preferred IDs [2 checks]
CropForge Feature Request # 160 (rhamilton) and/or Version 5.3 Discussion Article on ICISWiki
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
|
No more than one name of a GID can have a "preferred ID" status (NAMES.NSTAT =8) | DataError-0011: Germplasm with no preferred ID (NSTAT=8) and no existing preferred English name. |
select a.gid as gid,
u.uname as icis_user
from germplsm a, users u
where not exists (select * from names b
where a.gid = b.gid
and b.nstat = 8)
and not exists (select * from names c
where a.gid = c.gid
and c.nstat = 1)
and a.germuid = u.userid
| |
| DataError-0020: Germplasm with multiple preferred IDs (NSTAT=8) |
select distinct a.gid as gid,
u.uname as icis_user
from names a, users u
where exists(select * from names b
where a.gid = b.gid
and a.nstat=b.nstat
and a.nstat = 8
and a.nid<>b.nid
and a.nstat<>9
and b.nstat<>9)
and a.nuid = u.userid
order by a.gid
|
Count preferred names [2 checks]
CropForge Feature Request # 160 (rhamilton) and/or Version 5.3 Discussion Article on ICISWiki
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
|
No more than one name of a GID can have a "preferred name" status (NAMES.NSTAT =1) | DataError-0017: Germplasm with multiple preferred English names (NSTAT=1) |
select distinct a.gid as gid,
u.uname as icis_user
from names a, users u
where exists(select * from names b
where a.gid = b.gid
and a.nstat=b.nstat
and a.nstat = 1
and a.nid<>b.nid
and a.nstat<>9
and b.nstat<>9)
and a.nuid = u.userid
order by a.gid
| |
| DataError-0018: Germplasm with no preferred English name (NSTAT=1) and no existing preferred ID. |
select a.gid as gid,
u.uname as icis_user
from germplsm a, users u
where not exists (select * from names b
where a.gid = b.gid
and b.nstat = 1)
and not exists (select * from names c
where a.gid = c.gid
and c.nstat = 8)
and a.germuid = u.userid
|
Incorrect germplasm method [2 checks]
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| The method of genesis for the germplasm (METHN) should have a corresponding value for number of progenitors (GNPGS) | DataError-0006A: Incorrect germplasm method (Method = DER but GNPGS <> -1) | For a derivative process (Method = DER), GNPGS must be -1 [CHK01A] |
SELECT g.gid,
g.methn,
m.mtype,
g.gnpgs,
g.grplce,
u.uname as icis_user
FROM germplsm as g INNER JOIN methods as m ON g.methn = m.mid,
users as u
WHERE m.mtype='DER'
AND g.gnpgs<>-1
and g.germuid = u.userid
and g.grplce = 0
|
| DataError-0006B: Incorrect germplasm method (Method = GEN but GNPGS < 0) | For a generative process (Method = GEN), GNPGS must be greater than 0 [CHK01B] |
SELECT g.gid,
g.methn,
m.mtype,
g.gnpgs,
g.grplce,
u.uname
FROM germplsm as g INNER JOIN methods as m ON g.METHN = m.MID,
users as u
WHERE m.mtype='GEN'
and g.germuid = u.userid
AND g.gnpgs<0 and g.grplce = 0
|
Method - name type combinations [3 checks]
CropForge Feature Request # 160 (rhamilton) and/or Version 5.3 Discussion Article on ICISWiki
Only certain combinations of name type and germplasm creation method are acceptable. Namely:
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| Method type GEN: valid name types are: CRSNM, UNCRS, UNRES | DataError-0024: Invalid method-name type combination for method GEN (generative) |
SELECT g.gid,
m.mtype,
n.ntype,
u.fcode,
n.nval,
g.germuid,
n.nuid
FROM (germplsm AS g INNER JOIN methods AS m ON g.methn = m.mid),
(names as n INNER JOIN udflds as u on n.ntype = u.fldno)
WHERE m.mtype = 'GEN'
and g.grplce = 0
and g.gid = n.gid
and n.nstat <> 9
AND u.ftable = 'NAMES'
AND u.fcode NOT IN ('CRSNM','UNCRS','UNRES')
| |
| Method type DER: valid name types are: RELNM, DRVNM, CVNAM, CVABR, NTEST, LNAME, ADVNM, ACVNM, AABBR, OLDMUT1, OLDMUT2, ELITE, UNRES | DataError-0025: Invalid method-name type combination for method DER (derivative) |
SELECT g.gid,
m.mtype,
n.ntype,
u.fcode,
n.nval,
g.germuid,
n.nuid
FROM (germplsm AS g INNER JOIN methods AS m ON g.methn = m.mid),
(names as n INNER JOIN udflds as u on n.ntype = u.fldno)
WHERE m.mtype = 'DER'
and g.grplce = 0
and g.gid = n.gid
and n.nstat <> 9
AND u.ftable = 'NAMES'
AND u.FCODE NOT IN ('RELNM','DRVNM','CVNAM','CVABR','NTEST','LNAME',
'ADVNM','ACVNM','AABBR','OLDMUT1','OLDMUT2','ELITE','UNRES')
| |
| Method type MAN: valid name types are: ACCNO, RELNM, CVNAM, CVABR, COLNO, FACCN, ITEST, NTEST, LNAME, TACC, ADVNM, ACVNM, ELITE, GACC, DACCN, LCNAM, CIATGB | DataError-0026: Invalid method-name type combination for method MAN (maintenance) |
SELECT g.gid,
m.mtype,
n.ntype,
u.fcode,
n.nval,
g.germuid,
n.nuid
FROM (germplsm AS g INNER JOIN methods AS m ON g.methn = m.mid),
(names as n INNER JOIN udflds as u on n.ntype = u.fldno)
WHERE m.mtype = 'MAN'
and g.grplce = 0
and g.gid = n.gid
and n.nstat <> 9
AND u.ftable = 'NAMES'
AND u.FCODE NOT IN ('ACCNO','RELNM','CVNAM','CVABR','COLNO','FACCN',
'ITEST','NTEST','LNAME','TACC','ADVNM','ACVNM','ELITE','GACC','DACCN','LCNAM','CIATGB')
|
Germplasm location
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| For all germplasm with germplasm creation NOT method ID 62 (Import), germplasm location (GLOCN) of a GID should be the same as the GLOCN of its GPID2
CropForge Feature Request # 160 (rhamilton) and/or Version 5.3 Discussion Article on ICISWiki | DataError-0023B: Germplasm with GLOCN different from GLOCN of GPID2 (METHOD <> IMPORT) |
SELECT g.gid,
g.methn,
g.glocn,
p2.gid AS gpid2,
p2.glocn AS gpid2_glocn,
u.uname as icis_user
FROM germplsm AS g INNER JOIN germplsm AS p2 ON g.gpid2 = p2.gid,
users as u
WHERE g.grplce = 0
and g.methn <> 62
and g.germuid = u.userid
and g.glocn <> p2.glocn
|
DATES
- With option to specify precision
- 1. YYYY only
- 2. YYYYMM only
- 3. YYYYMMDD
Germplasm creation date (GDATE) [3 checks]
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| Check for existence/format/value of germplasm creation date (GDATE) | DataError-0034A: Germplasm with METHOD = IMPORT but with no germplasm creation date | Imported germplasm require a germplasm creation date |
SELECT gid,
gdate,
methn,
grplce,
u.uname as icis_user
FROM germplsm, users as u
where methn = 62
and gdate = 0
and germuid = u.userid
and grplce = 0
|
| DataError-0034B: Germplasm with incorrect creation date (GDATE) format | If GDATE is not 0, there should be exactly 8 digits and the format should be YYYYMMDD |
SELECT gid,
gdate,
methn,
grplce,
u.uname as icis_user
FROM germplsm,
users as u
where gdate <> 0
and grplce = 0
and germuid = u.userid
and len(gdate) <> 8
| |
| DataError-0034C: Germplasm with creation date (GDATE) after current date |
SELECT gid,
gdate,
methn,
grplce,
u.uname as icis_user
FROM germplsm, users as u
where gdate <> 0
and grplce = 0
and len(gdate) = 8
and gdate > [YYYYMMDD]
and germuid = u.userid
|
GDATE of progenitors [2 checks]
CropForge Feature Request # 160 (rhamilton) and/or Check #1: Version 5.3 Discussion Article on ICISWiki
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| The GDate of a GID must not predate GDate of any of its progenitors. (Beware of missing links in the chain of dates! If the test checks only the GDATE of the progenitors GPID1, GPID2 and MGID, then it will not detect the following error: non-zero GPID1, GPID2 or MGID has GDATE=0 but their GPID1, GPID2 or MGID are younger than the target GID. Therefore, if a GPID1, GPID2 or MGID has GDate=0, iterate to check their GPID1, GPID2 and MGID) | DataError-0012: Germplasm with germplasm creation date (GDATE) earlier than GDATE of parent (GPID1, GPID2 or both) |
select c1.gid,
c1.gdate,
p1.gid as gpid1,
p1.gdate as gpid1_gdate,
p2.gid as gpid2,
p2.gdate as gpid2_gdate,
u.uname as icis_user
from (germplsm as c1 inner join germplsm as p1 on c1.gpid1 = p1.gid),
(germplsm as c2 inner join germplsm as p2 on c2.gpid2 = p2.gid),
users as u
where c1.grplce = 0
and c1.gid = c2.gid
and c1.grplce = 0
and c1.gdate <> 0
and c2.gdate <> 0
and c1.germuid = u.userid
and ( (p1.gdate > c1.gdate
and mid(cstr(p1.gdate),1,) <> mid(cstr(c1.gdate),1,))
or (p2.gdate > c2.gdate and mid(cstr(p2.gdate),1,) <> mid(cstr(c2.gdate),1,)))
| |
| DataError-0013: Germplasm with germplasm creation date (GDATE) earlier than GDATE of MGID |
SELECT c.gid,
c.gdate,
mt.mtype,
mg.gid as mgid,
mg.gdate as mgid_gdate,
u.uname as icis_user
FROM germplsm AS c INNER JOIN germplsm AS mg ON c.mgid = mg.gid,
methods as mt,
users as u
where c.grplce = 0
and c.methn = mt.mid
and mt.mtype = 'MAN'
and c.gdate <> 0
and mg.gdate > c.gdate
and c.germuid = u.userid
and mid(cstr(mg.gdate),1,) <> mid(cstr(c.gdate),1,)
|
Name date [2 checks]
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| Format of name date (NDATE) should be YYYYMMDD | DataError-0035A: Germplasm with incorrect name date (NDATE) format |
SELECT n.nid,
n.gid,
n.ndate,
u.uname as icis_user
FROM names as n INNER JOIN germplsm as g ON n.gid = g.gid,
users as u
where n.ndate <> 0
and n.nstat <> 9
and g.grplce = 0
and len(n.ndate) <> 8
and n.nuid = u.userid
| |
| NDATE cannot be later than the current date | DataError-0035B: Germplasm with NAME date (NDATE) after current date |
SELECT n.nid,
n.gid,
n.ndate,
u.uname as icis_user
FROM names as n INNER JOIN germplsm as g ON n.gid = g.gid, users as u
where n.ndate <> 0
and n.nstat <> 9
and g.grplce = 0
and len(gdate) = 8
and n.ndate > [YYYYMMDD]
and n.nuid = u.userid
|
NDATE vs GDATE
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| CropForge Feature Request #631 (jnorgaard)
Name Date (NAMES.NDATE) could also be checked with the Germplasm Date (GERMPLSM.GDATE) to make sure it is not older. SQL modified to check only non-inherited names. | DataError-0033: Germplasm with non-inherited name but name date (NDATE) earlier than germplasm creation date (GDATE) |
select n.nid,
n.gid,
n.nval,
n.nstat,
n.ndate,
g.gdate,
n.nlocn,
u.uname as icis_user
from names as n inner join germplsm as g on n.gid = g.gid,
users as u
where n.ndate <> 0
and g.gdate <> 0
and g.grplce = 0
and n.nuid = u.userid
and n.ndate < g.gdate
and not exists (select * from names b
where b.gid=g.gpid2
and n.nval = b.nval)
and mid(cstr(n.ndate),1,) <> mid(cstr(g.gdate),1,)
|
Data Management System
Checks the DMS database (applicable to all ICIS implementations).
Check GIDs in DMS that have been replaced in the GMS
CropForge Feature Request # 935 (aportugal)
OINDEX table (Factor ID or Level No is zero)
(Requested by: tmetz, aportugal)
Trait-Scale-Method relationship [4 checks]
CropForge Feature Request # 610 (aportugal)
Check the consistency of the trait, scale and method in the FACTOR and VARIATE tables. The Scale and Method specified in those tables should be related to the the Trait.
This can be determined when the traitid in the factor/variate is the same as the traitid in the SCALE and TMETHOD tables for the entered scaleid/tmethid.
DataError-0031A: SCALE.TRAITID is not equal to FACTOR.TRAITID (given the same SCALEID) DataError-0031B: TMETHOD.TRAITID is not equal to FACTOR.TRAITID (given the same TMETHID) DataError-0031C: SCALE.TRAITID is not equal to VARIATE.TRAITID (given the same SCALEID) DataError-0031D: TMETHOD.TRAITID is not equal to VARIATE.TRAITID (given the same TMETHID)
Duplicate records in LEVEL_N, LEVEL_C tables
(Requested by: tmetz, aportugal)
Duplicate records in DATA_N, DATA_C tables
(Requested by: tmetz, aportugal)
Level inconsistency in labels of a factor
(Requested by: tmetz, aportugal)
Implementation-specific
Checks for Rice (may or may not be applicable to other crops).
Name type occurrence [3 checks]
CropForge Feature Request # 160 (rmahilton) and/or Version 5.3 Discussion Article on ICISWiki
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| Some name types must not occur more than once for a single GID. These are ACCNO, CRSNM, UNCRS, COLNO, ITEST, GACC, CIATGB, RELNM, DRVNM, CVNAM | DataError-0027: Germplasm with certain name types occurring more than once |
SELECT g.gid,
n.ntype,
COUNT(*) as number_of_names_with_same_ntype
FROM (germplsm as g INNER JOIN names as n ON g.gid = n.gid)
WHERE g.grplce = 0
and n.nstat <> 9
AND n.ntype IN (1,2,3,9,11,21,1019)
GROUP BY g.gid, n.ntype
HAVING COUNT(n.nid) > 1
| |
| Cross-names (CRSNM) and Line-names (LNAME) cannot occur together | DataError-0028: Germplasm with both CRSNM and LNAME name types |
select n1.gid,
n1.nid as nid1,
n1.ntype as ntype1,
n1.nval as nval1,
n1.nuid as nuid1,
n2.nid as nid2,
n2.ntype as ntype2,
n2.nval as nval2,
n2.nuid as nuid2
from (names as n1 INNER JOIN names as n2 on n1.gid = n2.gid),
germplsm as g
where n1.ntype = 2
and n1.nstat <> 9
and n2.ntype = 13
and n2.nstat <> 9
and n1.gid = g.gid
and g.grplce = 0
| |
| Release names (RELNM) and Collector's Numbers (COLNO) cannot occur together | DataError-0029: Germplasm with both RELNM and COLNO name types |
select n1.gid,
n1.nid as nid1,
n1.ntype as ntype1,
n1.nval as nval1,
n1.nuid as nuid,
n2.nid as nid2,
n2.ntype as ntype2,
n2.nval as nval2,
n2.nuid as nuid2
from (names as n1 INNER JOIN names as n2 on n1.gid = n2.gid),
germplsm as g
where n1.ntype = 9
and n1.nstat <> 9
and n2.ntype = 4
and n2.nstat <> 9
and n1.gid = g.gid
and g.grplce = 0
|
Name sharing [4 checks]
CropForge Feature Request # 160 (rhamilton) and/or Version 5.3 Discussion Article on ICISWiki
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| Germplasms cannot share the same name (NVAL) if the NTYPE is ACCNO | DataError-0030A: Multiple germplasms sharing the same name (ACCNO) |
SELECT n1.nval, n1.ntype, g1.grplce, Count(n1.GID) AS Number_of_germplasm_sharing_this_name FROM names AS n1 INNER JOIN germplsm AS g1 ON n1.GID = g1.GID WHERE n1.ntype=1 AND g1.grplce=0 GROUP BY n1.nval, n1.ntype, g1.grplce HAVING count(n1.gid) > 1 | |
| Germplasms cannot share the same name (NVAL) if the NTYPE is ITEST | DataError-0030B: Multiple germplasms sharing the same name (ITEST) |
SELECT n1.nval, n1.ntype, g1.grplce, Count(n1.GID) AS Number_of_germplasm_sharing_this_name FROM names AS n1 INNER JOIN germplsm AS g1 ON n1.GID = g1.GID WHERE n1.ntype=11 AND g1.grplce=0 GROUP BY n1.nval, n1.ntype, g1.grplce HAVING count(n1.gid) > 1 | |
| Germplasms cannot share the same name (NVAL) if the NTYPE is GACC | DataError-0030C: Multiple germplasms sharing the same name (GACC) |
SELECT n1.nval, n1.ntype, g1.grplce, Count(n1.GID) AS Number_of_germplasm_sharing_this_name FROM names AS n1 INNER JOIN germplsm AS g1 ON n1.GID = g1.GID WHERE n1.ntype=21 AND g1.grplce=0 GROUP BY n1.nval, n1.ntype, g1.grplce HAVING count(n1.gid) > 1 | |
| A Release Name (RELNM) cannot occur more then once for a GID-GLOCN-COUNTRY combination. Two GIDs can share the same RELNM only if their GLOCNs are in different countries | DataError-0030D: Germplasm sharing a RELNM (release name) with another germplasm in the SAME country |
SELECT n1.gid,
n1.nid,
n1.nval,
g1.germuid,
n2.gid AS gid_2,
n2.nid AS gid2_nid,
g2.germuid as gid2_germuid
FROM (names as n1 INNER JOIN germplsm as g1 ON n1.gid = g1.gid),
(names as n2 INNER JOIN germplsm as g2 ON n2.gid = g2.gid),
location as l
WHERE n1.ntype = 4
and n1.nstat <> 9
and n2.ntype = 4
and n2.nstat <> 9
and l.ltype = 405
and n1.nval = n2.nval
and g1.glocn = g2.glocn
and n1.gid <> n2.gid
and g2.grplce <> g1.gid
and g1.glocn = l.locid
|
Name type GACC (Genebank accession) / IRGC accession [5 checks]
CropForge Feature Request # 160 (rhamilton) and/or Version 5.3 Discussion Article on ICISWiki
All names of type GACC must have...
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| ...a METHN with MTYPE = MAN | DataError-0036A: Genebank accessions (name type GACC) with MTYPE <> Management (MAN) | |
SELECT g.gid as germplsm_gid,
g.methn as germplsm_methn,
m.mtype as method_mtype,
n.nval as names_nval,
u.uname as icis_user
FROM names as n,
(germplsm AS g INNER JOIN methods AS m ON g.methn = m.mid),
users as u
WHERE n.ntype = 21
and n.nstat <> 9
and g.gid = n.gid
and g.grplce = 0
and m.mtype <> 'MAN'
and g.germuid = u.userid
|
| ...MGID <> 0 | DataError-0036B: Genebank accessions (name type GACC) with MGID = 0 | MGID must not be zero for genebank accessions |
SELECT g.gid as germplsm_gid, g.methn as germplsm_methn, g.mgid, m.mtype as method_mtype, n.nval as names_nval, u.uname as icis_user FROM names as n, (germplsm AS g INNER JOIN methods AS m ON g.methn = m.mid), users as u WHERE n.ntype = 21 and n.nstat <> 9 and g.gid = n.gid and g.grplce = 0 and g.mgid = 0 and g.germuid = u.userid |
| ...GPID1=MGID if its MGID has GPID1=0 | DataError-0036C: Genebank accessions (name type GACC) with GPID1 <> MGID. | If a germplasm has an MGID with GPID1=0, then a GID's GPID1 and MGID must be equal |
SELECT g.gid,
n.nval,
g.gpid1,
g.mgid,
m.gpid1 as gpid1_of_mgid,
u.uname as icis_user
FROM names as n,
(germplsm AS g INNER JOIN germplsm as m ON g.mgid = m.gid),
users as u
Where n.ntype = 21
and n.nstat <> 9
and n.gid = g.gid
and g.gpid1 <> g.mgid
and g.grplce = 0
and m.gpid1 = 0
and g.germuid = u.userid
|
| ...GPID1=[GPID1 of MGID] if [GPID1 of MGID] is not 0 | DataError-0036D: Genebank accessions (name type GACC) with GPID1 <> MGID'S GPID1 |
If a germplasm has an MGID with GPID1 <> 0, then a germplasm's GPID1 and the MGID's GPID1 must be equal |
SELECT g.gid,
n.nval,
g.gpid1,
g.mgid,
m.gpid1 as gpid1_of_mgid,
u.uname as icis_user
FROM names as n,
(germplsm AS g INNER JOIN germplsm as m ON g.mgid = m.gid),
users as u
Where n.ntype = 21 and n.nstat <> 9
and n.gid = g.gid
and m.gpid1 <> 0
and g.grplce = 0
and g.gpid1 <> m.gpid1
and g.germuid = u.userid
|
| IRGC accession must not have a cross name | DataError-0036E: International Rice Genebank Collection (IRGC) accession with Cross Name (NTYPE = 2) | |
SELECT g.gid, n.nval, n.ntype, u.uname as icis_user FROM names as n INNER JOIN germplsm AS g ON n.gid = g.gid, users as u Where n.ntype = 2 and n.nstat <> 9 and g.grplce = 0 and LEFT(n.nval,4) = 'IRGC' and g.germuid = u.userid |
Name inheritance from GPID2 (NSTAT, NDATE, NLOCN) [3 checks]
CropForge Feature Request # 160 (rhamilton) and/or Check #2: Version 5.3 Discussion Article on ICISWiki
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| If a GID inherits a name from its source (GPID2) then that name record must also inherit the name date (NDATE), name location (NLOCN), and preferred name status (NSTAT=1) | DataError-0014: MAN germplasm with inherited name from GPID2 but NSTAT not inherited |
select n.nid,
n.gid,
n.nval,
n.nstat,
g.gpid2 as gpid2_gid,
n2.nstat as gpid2_nstat,
u.uname as icis_user
from names AS n INNER JOIN germplsm g on n.gid = g.gid,
names as n2,
methods as m,
users as u
where g.methn = m.mid
and m.mtype = 'MAN'
and n2.nval = n.nval
and g.grplce = 0
and n2.gid = g.gpid2
and n2.nstat <> n.nstat
and n2.nstat = 1
and n.nstat <> 9
and g.germuid = u.userid
| |
| DataError-0015: Germplasm with inherited name from GPID2 but NDATE not inherited |
select n.nid,
n.gid,
n.nval,
n.ndate,
g.gpid2 as gpid2_gid,
n2.ndate as gpid2_ndate,
u.uname as icis_user
from names AS n INNER JOIN germplsm g on n.gid = g.gid,
names as n2,
users as u
where n2.nval = n.nval
and g.grplce = 0
and n2.gid = g.gpid2
and n2.ndate <> n.ndate
and n2.ndate <> 0
and n.nstat <> 9
AND g.germuid = u.userid
| ||
| DataError-0016: Germplasm with inherited name from GPID2 but NLOCN not inherited |
select n.nid,
n.gid,
n.nval,
n.nlocn,
g.gpid2 as gpid2_gid,
n2.nlocn as gpid2_nlocn,
u.uname as icis_user
from names AS n INNER JOIN germplsm g on n.gid = g.gid,
names as n2,
methods as m,
users as u
where g.methn = m.mid
and m.mtype = 'MAN'
and n2.nval = n.nval
and g.grplce = 0
and n2.gid = g.gpid2
and n2.nlocn <> n.nlocn
and n.nstat <> 9
and g.germuid = u.userid
|
IRTP name format
(Requested by: aportugal)
If NSTAT=1, check if everything that follows "IRTP" is numeric.
IRGC name format
(Requested by: aportugal)
If NSTAT=1, check if everything that follows "IRGC" is numeric.
Preferred name eligibility (name types)
CropForge Feature Request # 160 (rhamilton) and/or Version 5.3 Discussion Article on ICISWiki
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| Names eligible to be the preferred name are: CRSNM, RELNM, DRVNM, CVNAM, ELITE | DataError-0019: Germplasm with invalid name type as preferred name | Only checks germplasm with more than one NAMES record. |
SELECT n.gid,
n.nstat,
n.ntype,
n.nval,
u.uname as icis_user
FROM names as n INNER JOIN germplsm as g ON n.gid = g.gid,
(SELECT gid
from names
group by gid
having count(nid) > 1) as x,
users as u
WHERE n.nstat = 1 and n.ntype NOT IN (2,4,5,6,20)
and g.grplce = 0
and n.gid = x.gid
and n.nuid = u.userid
|
Preferred ID eligibility (name types) [2 checks]
CropForge Feature Request # 160 (rhamilton) and/or Version 5.3 Discussion Article on ICISWiki
| Summary | Error Message | Notes | SQL statement |
|---|---|---|---|
| The following names types are eligible to be preferred ID: DRVNM, COLNO, ACCNO (if present), GACC (if present), ITEST (if present), CIATGB (if present) | DataError-0021: Germplasm with invalid name type as preferred ID | Only checks germplasm with more than one NAMES record |
SELECT n.gid,
n.nstat,
n.ntype,
n.nval,
u.uname as icis_user
FROM (names as n INNER JOIN germplsm as g on n.nid = g.gid),
(SELECT gid
from names
group by gid
having count(nid) > 1) as x,
users as u
WHERE n.nstat = 8
and g.grplce = 0
and n.ntype NOT IN (1,21,11,1019,5,9)
and n.gid = x.gid
and n.nuid = u.userid
|
| The preferred ID must be unique for the given name type – that is, two accessions must not share the same name of the same type if one or both is a preferred ID. | DataError-0022: Germplasm with preferred ID not unique for the given name type |
select n1.gid,
n1.nid,
n1.nstat,
n1.nval,
n2.gid as gid_2,
n2.nid as gid2_nid,
n2.nstat as gid2_nstat,
n2.nval as gid2_nval
from names as n1,
names as n2
where n1.gid <> n2.gid
and (n1.nstat = 8 or n2.nstat = 8)
and n1.nstat <> 9
and n2.nstat <> 9
and n1.nval = n2.nval
and n1.ntype = n2.ntype
|
Database foreign key references by tables
The ICIS database has no foreign key constraints enabled so we need to check if the foreign key definition is violated. Check for foreign key references used by tables:
Miscellaneous Features
- "About" form containing more information, plus the GNU General Public License. CRIL and IRRI logos also included.
- Application icon
DataError Index Listing
- DataError-0001: Unknown group source
- DataError-0002: Germplasm with non-generative group source
- DataError-0003: 1st level circular reference
- DataError-0004: 2nd level circular reference
- DataError-0005: 3rd level circular reference
- DataError-0006A: Incorrect germplasm method (Method = DER but GNPGS <> -1)
- DataError-0006B: Incorrect germplasm method (Method = GEN but GNPGS < 0)
- DataError-0007A: Germplasm with deleted parent references (GPID1)
- DataError-0007B: Germplasm with deleted parent references (GPID2)
- DataError-0007C: Germplasm with deleted parent references (MGID)
- DataError-0008: Germplasm with no NAMES records
- DataError-0009: Germplasm with duplicate names (GID-NVAL-NTYPE not unique)
- DataError-0010: Germplasm with possible duplicate names (NTYPE-NSTAT combination occurring more than once)
- DataError-0011: Germplasm with no preferred ID (NSTAT=8) and no existing preferred English name
- DataError-0012: Germplasm with germplasm creation date (GDATE) earlier than GDATE of parent (GPID1, GPID2 or both)
- DataError-0013: Germplasm with germplasm creation date (GDATE) earlier than GDATE of MGID
- DataError-0014: MAN germplasm with inherited name from GPID2 but NSTAT not inherited
- DataError-0015: Germplasm with inherited name from GPID2 but NDATE not inherited
- DataError-0016: Germplasm with inherited name from GPID2 but NLOCN not inherited
- DataError-0017: Germplasm with multiple preferred English names (NSTAT=1)
- DataError-0018: Germplasm with no preferred English name (NSTAT=1) and no existing preferred ID
- DataError-0019: Germplasm with invalid name type as preferred name
- DataError-0020: Germplasm with multiple preferred IDs (NSTAT=8)
- DataError-0021: Germplasm with invalid name type as preferred ID
- DataError-0022: Germplasm with preferred ID not unique for the given name type
- DataError-0023: Germplasm with GLOCN different from GLOCN of GPID2 (METHOD <> IMPORT)
- DataError-0024: Invalid method-name type combination for method GEN (generative)
- DataError-0025: Invalid method-name type combination for method DER (derivative)
- DataError-0026: Invalid method-name type combination for method MAN (maintenance)
- DataError-0027: Germplasm with certain name types occurring more than once
- DataError-0028: Germplasm with both CRSNM and LNAME name types
- DataError-0029: Germplasm with both RELNM and COLNO name types
- DataError-0030A: Multiple germplasms sharing the same name (ACCNO)
- DataError-0030B: Multiple germplasms sharing the same name (ITEST)
- DataError-0030C: Multiple germplasms sharing the same name (GACC)
- DataError-0030D: Germplasm sharing a RELNM (release name) with another germplasm in the SAME country
- DataError-0031A: SCALE.TRAITID is not equal to FACTOR.TRAITID (given the same SCALEID)
- DataError-0031B: TMETHOD.TRAITID is not equal to FACTOR.TRAITID (given the same TMETHID)
- DataError-0031C: SCALE.TRAITID is not equal to VARIATE.TRAITID (given the same SCALEID)
- DataError-0031D: TMETHOD.TRAITID is not equal to VARIATE.TRAITID (given the same TMETHID)
- DataError-0032A: Level inconsistency in labels of factors in LEVEL_N table
- DataError-0032B: Level inconsistency in labels of factors in LEVEL_C table
- DataError-0034A: Germplasm with METHOD = IMPORT but with no germplasm creation date
- DataError-0034B: Germplasm with incorrect creation date (GDATE) format
- DataError-0034C: Germplasm with creation date (GDATE) after current date
- DataError-0035A: Germplasm with incorrect name date (NDATE) format
- DataError-0035B: Germplasm with NAME date (NDATE) after current date
- DataError-0036A: Genebank accessions (name type GACC) with MTYPE <> Maintenance (MAN)
- DataError-0036B: Genebank accessions (name type GACC) with MGID = 0
- DataError-0036C: Genebank accessions (name type GACC) with GPID1 <> MGID.
- DataError-0036D: Genebank accessions (name type GACC) with GPID1 <> MGID'S GPID1
- DataError-0036E: International Rice Genebank Collection (IRGC) accession with Cross Name (NTYPE = 2)
- DataError-0037: GID in DMS was replaced in GMS
- DataError-0038: Factorid ID or Level No is zero in OINDEX table
- DataError-0039-IRTP/IRGC: IRTP/IRGC accession with invalid characters in names
- DataError-0040A: Duplicate OUNITID-VARIATID IN DATA_C (same value)
- DataError-0040B: Duplicate OUNITID-VARIATID IN DATA_C (different value)
- DataError-0041A: Duplicate LABELID-LEVELNO IN LEVEL_C
- DataError-0041B: Duplicate LABELID-LEVELNO IN LEVEL_N
- DataError-0042: Incorrect F2 Lines (GPID1=GPID2)
What's New in Version 5.5.1
- Improved user interface (display of results):
- 1. Listview that contains the file name, number of records returned, source database. If listview is double-clicked, the specified file will be opened.
- 2. Separate memobox for displaying status messages, SQLs.
- Added option to check local database also.
- New checks for germplasm creation date (GDATE)
- (If germplasm is imported, GDATE should not be blank)
- (Incorrect GDATE format)
- (GDATE later than current date)
- New checks for GMS.NAMES table
- (Name date (NDATE) earlier than germplasm creation date (GDATE)) CropForge Feature Request #631
- (Incorrect NDATE format)
- (NDATE later than current date)
- No NAMES record
- Duplicate names
- New checks for Genebank accession (NTYPE=GACC)
- (MTYPE <> Management (MAN))
- (MGID = 0)
- (GPID1 <> MGID: If a germplasm has an MGID with GPID1=0, then a GID's GPID1 and MGID must be equal )
- (GPID1 <> MGID'S GPID1: If a germplasm has an MGID with GPID1 not 0, then GPID1 and the MGID's GPID1 must be equal )
- New check for DMS: Trait-Scale-Method relationship
- SCALE.TRAITID is not equal to FACTOR.TRAITID (given the same SCALEID)
- TMETHOD.TRAITID is not equal to FACTOR.TRAITID (given the same TMETHID)
- SCALE.TRAITID is not equal to VARIATE.TRAITID (given the same SCALEID)
- TMETHOD.TRAITID is not equal to VARIATE.TRAITID (given the same TMETHID)
- Option to specify/change INI file to use
- Changed error code messaging from "Error-xxxx" to "DataError-xxxx" (prefixed with "Data" to distinguish from error codes of Installation Diagnostic Tool)
- Separation of test status messages and result datasets
- Option to print result dataset(s) in a TEXT file or EXCEL file.
- Option to choose format of filename containing result datasets
- 1. Datetime stamp first or Dataerror code first?
- 2. Include description of DataError in filename?
- If "Print result dataset to MS Excel format" is enabled but no. of records > 65530 (maximum row capacity of Excel is 65536), result dataset will instead be output in a text file.
Data Validation Tool Changes.txt
// // Legend: [+] Addition/New feature // [-] Removal // [*] Modification/Bug Fix // ###################################################################### # # DATA VALIDATION TOOL V5.5.2.2 (May 07 2010) # ###################################################################### [+] Added new check for incorrect F2 lines (GPID1=GPID2 when METHN=31) ###################################################################### # # DATA VALIDATION TOOL V5.5.2.1 (Mar 26 2010) # ###################################################################### [*] Local checks enabled. ###################################################################### # # DATA VALIDATION TOOL V5.5.2.0 (Feb 2010) # ###################################################################### [*] Fixed Bug: Cropforge [#1519] Data Validation Tool: DataError-0037 - very slow query ###################################################################### # # DATA VALIDATION TOOL V5.5.1.3 # ###################################################################### [*] Corrected error in SQL (sort by username) for checking of possible duplicate names ###################################################################### # # DATA VALIDATION TOOL V5.5.1.2 # ###################################################################### [*] Corrected typo error in DataError-0007C SQL (Cropforge Bug Report #1438) [+] Added option to just print SQL statements or perform actual queries on the database [*] Refined SQLs for checking duplicate names ###################################################################### # # DATA VALIDATION TOOL V5.5.1.1 # ###################################################################### [+] added check for GLOCN of imported germplasms ###################################################################### # # DATA VALIDATION TOOL V5.5.0.11 / V5.5.1.0 OFFICIAL RELEASE # ###################################################################### [+] Added option to specify level of precision in checking of dates (CropForge feature request # 1066) [+] Added grouping of output files in folders according to database (Cropforge feature request # 1058) [+] Addded check for replaced GIDS in DMS (Cropforge feature request #935) [*] Modified SQL for NDATE vs GDATE check (Cropforge feature request #631) ###################################################################### # # DATA VALIDATION TOOL V5.5.0.10 # ###################################################################### [*] Bug fix: Erroneous SQL in check 0023B and 0037E (Cropforge bug # #983) ###################################################################### # # DATA VALIDATION TOOL V5.5.0.9 # ###################################################################### [+] CropForge Feature Request #844: Naming conventions for Data Validation Tool Excel files [+] CropForge feature request #610: Checking the consistency of the Trait, Scale and Method [+] CropForge feature request #710: Check for duplicate names [+] CropFroge feature request #779: Include USERID in reporting of errors [+] Added viewing pane for text/Excel files generated from test [*] Disabled "query with batch update" (temporary only) ###################################################################### # # DATA VALIDATION TOOL V5.5.0.8 # ###################################################################### [*] Modified SQL: checking of foreign key references -- added filter (exclude 0's) [*] Modified SQL: GDATE & NDATE after current date -- added filter (GDATE <> 0; length should be 8) [*] Modified SQL: checking for number of preferred IDs and preferred names [*] Bug fix: preferred ID-name type matching ###################################################################### # # DATA VALIDATION TOOL V5.5.0.7 # ###################################################################### [*] Bug fix: SQL for DataError-0006B (Invalid methods) [*] Modified SQL for checking of progenitor GDATEs, NDATE vs GDATE. ###################################################################### # # DATA VALIDATION TOOL V5.5.0.6 # ###################################################################### [*] Result datasets printed separately from test status messsages. [*] New radiobuttons: Option to print result dataset(s) in TEXT file or EXCEL file [+] Added filter to SQLs: with GERMPLSM.GRPLCE = 0 [*] Bug fix: SQL for DataError-0030 (GID-RELNM-COUNTRY check) ###################################################################### # # DATA VALIDATION TOOL V5.5.0.5 # ###################################################################### [+] New buttons: Select all, deselect all [+] Default printing of result dataset is now on a separate text (.TXT) file. In cases where "Output to Excel" is selected, but no. of records is greater than 65,530: dataset is printed on a text file instead.

