ICIS Data Warehouse Creation and Maintenance FAQs
From ICISWiki
What are the Data Warehouse tables needed by ICIS and when are they used?
For faster generation of Pedigree trees in web application:
- wh_der_children
for faster retrieval of children of a given germplasm and of all other entities in the derivative neighborhood
- wh_man_children
for faster retrieval of children of a given germplasm and of all other entities in the maintenance neighborhood
- wh_germplasm_details
for faster retrieval of germplasm info (names,locations,method,etc) of a given germplasm
See Pedigree Viewer for Developers
See Pedigree Tree Warehouse Tables Structure
For faster retrieval of dataset entries of a given study in web application
- wh_columns
- wh_data
- wh_datasets
See Data Warehouse Tables Structure For Single Study Dataset Retrieval
For faster cross-study queries (StudyMart web application)
- wh_data
- wh_countdatapoints (new as of Sept 2009)
- factors (table adapted from original CropFinder by J.C. Alarcon)
- fieldsetup (table adapted from original CropFinder by J.C. Alarcon)
- variates (new as of Sept 2009; same concept as factors table. Columns defined by top 175 variates with most number of datapoints in wh_countdatapoints table)
See StudyMart Data Warehouse Tables Structure
Creation: When are Data Warehouses tables generated?
AFTER conversion of any ICIS database from MS Access to MySQL
See MS Access to MySQL conversion of ICIS databases
Updating: When and how should the Data Warehouses be updated?
- (from manacleto Aug 19, 2009) Right now, there is no clear procedure. A script is needed to add only the new entries to the data warehouses. Right now, you will have to generate them all again everytime an update(probably new conversion task) is made on the ICIS mysql databases.
Implementation Details
Implementation Details: Data Warehouse tables creation
The script can be found in Cropforge project named “icisjavatools”. The script may be viewed on the browser, the link is:
http://cropforge.org/plugins/scmsvn/viewcvs.php/projects/ICISDatabaseTools/src/sql/mysql/warehouse_tables/?root=icisjavatools
Example of commands made to create the warehouse tables:
mysql> use iris_myisam;
mysql> source create_icis_dms_warehouse.sql;
mysql> call activate_icis_dms_warehousing('iris_myisam');
Implementation Details: How long does it take to generate data warehouse tables for ICIS databases?
(Castor) On Castor server with the following specs: Server OS: Linux Debian. Hardware specs: 8 processors; Intel(R) Xeon(R) CPU; E5405 @ 2.00GHz 24 GB Memory
(CRIL4) On CRIL4 server with the following specs: Server OS: Linux Debian. Hardware specs: Pentium D 2.8Ghz, 2GB Memory, 500GB HD
IRIS 7 hours 41 mins 54.28 secs [for wh_columns, wh_data, wh_datasets, wh_countdatapoints] (on CRIL4) --As of Oct 20, 2009
IWIS 14 hours 25 min 34.73 sec + approx 4 hrs for factors and fieldsetup (on Castor)--As of Aug 19, 2009
IMIS
Implementation Details: Testing of script for data warehouse tables creation
Implementation Details: How is an ICIS database prepared/updated?
A. Convert from MS Access to MySQL database (on local machine/PC)
http://cropwiki.irri.org/icis/index.php/MS_Access_to_MySQL_conversion_of_ICIS_databases
B. Create a mysqldump file of your local Mysql database
mysql> mysqldump -usomeusername -p --default-character-set=utf8 iris > iris_yyyymmdd.sql
This local database should NOT have any indices at this stage.
C. Transfer the mysqldump file to the "database preparation server"
For Castor server, the "preparation server" is CRIL4.
D. MySQL Database Setup (see below)
Implementation Details: MySQL Database Setup (on database preparation server):
- Notes:
- 1. Use MyISAM as the storage engine (specified in database table definitions: GMS_TABLES.sql, DMS_TABLES.sql, GEMS_TABLES.sql).
- 2. Use utf8 as the default character set for the Mysql database (also specified in database table definitions: GMS_TABLES.sql, DMS_TABLES.sql, GEMS_TABLES.sql).
- 3. Increase the value of max_connections Mysql system variable (in my.cnf) to greater than 100 (100 is the default).
The examples shown below are for using the MySQL command prompt via SSH.
1. Create empty databases
a. Login remotely (via SSH) to Mysql on the preparation server. Example:
someuser@cril4:~$ mysql -usomeusername -p
b. Create empty databases. Example:
mysql> CREATE DATABASE iris_myisam_yyyymmdd CHARACTER SET utf8 COLLATE utf8_general_ci; mysql> CREATE DATABASE imis_myisam_yyyymmdd CHARACTER SET utf8 COLLATE utf8_general_ci; mysql> CREATE DATABASE iwis3_myisam_yyyymmdd CHARACTER SET utf8 COLLATE utf8_general_ci;
where: iris_myisam_yyyymmdd is the database for RICE data, imis_myisam_yyyymmdd for MAIZE data, and iwis3_myisam_yyyymmdd for WHEAT data
2. Define table structure and populate ICIS Core tables with data
"Core" = included in the standard ICIS schema (used by standalone applications).
Rice data
mysql> use iris_myisam_yyyymmdd; mysql> source iris_myisam_yyyymmdd.sql;
Wheat data
mysql> use iwis3_myisam_yyyymmdd; mysql> source iwis3_myisam_yyyymmdd.sql;
Maize data
mysql> use imis_myisam_yyyymmdd; mysql> source imis_myisam_yyyymmdd.sql;
3. Define indices on ICIS Core tables
Index definition files for:
- Genealogy Management System (GMS) tables: GMS_INDICES.sql
- Data Management System (DMS) tables: DMS_INDICES.sql
- Gene Management System (GEMS) tables: GEMS_INDICES.sql
Usage: mysql> use <name of database>; mysql> source GMS_INDICES.sql; mysql> source DMS_INDICES.sql; mysql> source GEMS_INDICES.sql;
4. Define UNIQUE indices on ICIS Core tables
Index definition files for:
- Genealogy Management System (GMS) tables: Download GMS_UNIQUEKEYS_CENTRAL.sql
- Data Management System (DMS) tables: Download DMS_UNIQUEKEYS.sql
- Gene Management System (GEMS) tables: Download GEMS_UNIQUEKEYS.sql
Usage: mysql> use <name of database>; mysql> source GMS_UNIQUEKEYS.sql; mysql> source DMS_UNIQUEKEYS.sql; mysql> source GEMS_UNIQUEKEYS.sql;
5. Set the Scale Type (SCTYPE) in the SCALE table
Script for setting the Scale Type (continuous [C] /discrete [D])
Download set_scale_sctype.sql (Nov 2011 update)
Usage:
mysql> use <name of database>; mysql> source set_scale_sctype.sql; mysql> CALL set_scale_sctype();
6. Run scripts for generating the preferred ids
Sets the preferred ids for germplasms without preferred ids (Adds new NAMES records with NSTAT = 8)
Download generate_germplasm_prefids.sql (Nov 2011 update)
Usage:
mysql> use <name of database>;
mysql> source generate_germplasm_prefids.sql;
mysql> CALL generate_germplasm_prefids('crop installation',userid,locid,currdate);
...where: crop installation = Four-letter abbreviation (enclosed in single quotes) to describe the crop.
Example: 'IRIS' for rice, 'IWIS' for wheat, 'IMIS' for maize, 'ICAS' for cassava, 'IVIS' for vigna
userid = Existing USERID (numeric) in USERS table; preferably your own USERID.
locid = Location ID (numeric) of the location where this script is being run.
currdate = Current date (numeric: YYYYMMDD).
7. Create and populate warehouse tables (based on ICIS Core tables) derived from the GMS
a. wh_create_children.sql
Tables: wh_der_children, wh_man_children
Download wh_create_children.sql (Nov 2011 update)
Usage:
mysql> use <name of database>; mysql> source wh_create_children.sql; mysql> call activate_children_warehousing();
b. wh_create_germplasm_details.sql
Table: wh_germplasm_details
Download wh_create_germplasm_details.sql (Nov 2011 update)
Usage:
mysql> use <name of database>; mysql> source wh_create_germplasm_details.sql; mysql> call activate_wh_germplasm_details();
c. create_passport.sql
Table: passport
[Download create_passport.sql]
Usage:
mysql> use <name of database>;
mysql> source create_passport.sql;
mysql> call activate_create_passport('name of database');
d. create aggtable_germplasmCount_studyCount.sql
(create aggregate tables for germplasm sets)
Usage: mysql> use <name of database>; mysql> source aggtable_germplasmCount_studyCount.sql;
8. Create and populate warehouse tables (based on ICIS Core tables) derived from the DMS
a. create_icis_dms_warehouse.sql
Tables: wh_datasets, wh_columns, wh_data, wh_countdatapoints
Download create_icis_dms_warehouse.sql (Nov 2011 update)
Usage:
mysql> use <name of database>;
mysql> source create_icis_dms_warehouse.sql;
mysql> CALL activate_icis_dms_warehousing('name of database', scaleid);
Where: scaleid = scale ID of the STUDY/ESTUDIO/etc factor (found in FACTOR table).
b. create_studymart_warehouse.sql
Tables: fieldsetup, factors, variates, aggregate_continuousxycount, aggregate_traitvaluepoints, aggregate_continuoustraitvaluepoints
Download create_studymart_warehouse.sql (Nov 2011 update)
Usage:
mysql> use <name of database>;
mysql> source create_studymart_warehouse.sql;
mysql> CALL activate_studymart_warehousing('name of database');
c. create_icis_aggregate_tables.sql
Tables: aggregate_continuousminmaxpoints, aggregate_datasetpoints, aggregate_studytraitvaluepoints, aggregate_xy_partitioned_datapoints
Download create_icis_aggregate_tables.sql
Usage: mysql> use <name of database>; mysql> source create_icis_aggregate_tables.sql; mysql> CALL generate_icis_aggregate_tables();
d. create_iris_wh_indices.sql
(For rice data only)
Download create_iris_wh_indices.sql
Usage:
mysql> use <name of database>; mysql> source create_iris_wh_indices.sql; mysql> CALL create_iris_wh_indices();
9. Validate warehouse data against data in ICIS core tables
- Count the number of records in every ICIS core table. Use icis_rec_count.sql
Download (This count (on the prep server) will be compared to the count (on the main server Castor) to see if database restoration was successful, at least at the record count level).
- Compare the number of datapoints in the ICIS core tables with the number of datapoints in the warehouse tables. Use icis_datapoint_count.sql Download
Implementation Details: Create mysqldump file of database on preparation server then transfer to main server
Examples:
Rice data
someuser@cril4:~$ mysqldump -uroot -p --disable-keys iris_myisam_yyyymmdd | gzip -9 > iris_myisam_yyyymmdd.sql.gz
Wheat data
someuser@cril4:~$ mysqldump -uroot -p --disable-keys iwis3_myisam_yyyymmdd | gzip -9 > iwis3_myisam_yyyymmdd.sql.gz
Maize data
someuser@cril4:~$ mysqldump -uroot -p --disable-keys imis_myisam_yyyymmdd | gzip -9 > imis_myisam_yyyymmdd.sql.gz
When transferring mysqldump files to main server: Example:
- Rice: Store compressed mysqldump files in /databases/ICIS_mysqldump/IRIS/iris_myisam_yyyymmdd/
- Wheat: Store compressed mysqldump files in /databases/ICIS_mysqldump/IWIS3/iwis3_myisam_yyyymmdd/
- Maize: Store compressed mysqldump files in /databases/ICIS_mysqldump/IMIS/imis_myisam_yyyymmdd/
Implementation Details: MySQL Database Setup (on main server (e.g. Castor)):
1. Configure MySQL
IMPORTANT:
- A. Use utf8 as the default character set for the Mysql database and utf8_general_ci as the collation.
Example:
CREATE DATABASE iris character set utf8 collate utf8_general_ci;
- B. Tuning MySQL through the my.ini file (Please contact a CRIL SysAd for assistance)
- i. Increase the value of max_connections Mysql system variable (in my.cnf) to greater than 100 (100 is the default).
Example:
max_connections=200
- ii. Set case insensitivity:
- iii. Set collation,character set (i.e. prevents "illegal mix of collations" error when using ICIS standalone applications)
[mysqld] set-variable=lower_case_table_names=1 init_connect='SET collation_connection = utf8_general_ci' collation_server=utf8_general_ci character_set_server=utf8
*** Make sure to restart Mysql after modifying my.ini contents. ****
The examples shown below are for using the MySQL command prompt via SSH.
2. Create empty database
1. Login remotely (via SSH) to Mysql on the main server (eg. Castor). Example:
someuser@castor:~$ mysql -usomeusername -p
2. Create empty databases. Example:
mysql> CREATE DATABASE iris_myisam_yyyymmdd CHARACTER SET utf8 COLLATE utf8_general_ci; mysql> CREATE DATABASE imis_myisam_yyyymmdd CHARACTER SET utf8 COLLATE utf8_general_ci; mysql> CREATE DATABASE iwis3_myisam_yyyymmdd CHARACTER SET utf8 COLLATE utf8_general_ci;
where: iris_myisam_yyyymmdd is the database for RICE data, imis_myisam_yyyymmdd for MAIZE data, and iwis3_myisam_yyyymmdd for WHEAT data
3. Restore the database
Use the mysqldump file created on the database preparation server.
Examples:
Rice data
someuser@castor:~$ time gunzip < iris_myisam_yyyymmdd.sql.gz | mysql -uroot -p iris_myisam_yyyymmdd;
Wheat data
someuser@castor:~$ time gunzip < iwis3_myisam_yyyymmdd.sql.gz | mysql -uroot -p iwis3_myisam_yyyymmdd;
Maize data
someuser@castor:~$ time gunzip < imis_myisam_yyyymmdd.sql.gz | mysql -uroot -p imis_myisam_yyyymmdd;
4. Setup user privileges
(Done by admin/root user only)
- 1. Create a MySQL user called datasourceuser
mysql> CREATE USER 'datasourceuser'@'%' identified by 'somepassword';
2. Grant privileges to user datasourceuser:
mysql> GRANT SELECT on iris_myisam_yyyymmdd.* TO 'datasourceuser'@'%' IDENTIFIED BY 'somepassword'; mysql> GRANT SELECT on iwis3_myisam_yyyymmdd.* TO 'datasourceuser'@'%' IDENTIFIED BY 'somepassword'; mysql> GRANT SELECT on imis_myisam_yyyymmdd.* TO 'datasourceuser'@'%' IDENTIFIED BY 'somepassword';

