ICIS Data Warehouse Creation and Maintenance FAQs

From ICISWiki

Jump to: navigation, search


Contents

What are the Data Warehouse tables needed by ICIS and when are they used?

For faster generation of Pedigree trees in web application:

  • wh_der_children

for faster retrieval of children of a given germplasm and of all other entities in the derivative neighborhood

  • wh_man_children

for faster retrieval of children of a given germplasm and of all other entities in the maintenance neighborhood

  • wh_germplasm_details

for faster retrieval of germplasm info (names,locations,method,etc) of a given germplasm

See Pedigree Viewer for Developers

See Pedigree Tree Warehouse Tables Structure


For faster retrieval of dataset entries of a given study in web application

  • wh_columns
  • wh_data
  • wh_datasets

See Data Warehouse Tables Structure For Single Study Dataset Retrieval

See Structure of wh_data


For faster cross-study queries (StudyMart web application)

  • wh_data
  • wh_countdatapoints (new as of Sept 2009)
  • factors (table adapted from original CropFinder by J.C. Alarcon)
  • fieldsetup (table adapted from original CropFinder by J.C. Alarcon)
  • variates (new as of Sept 2009; same concept as factors table. Columns defined by top 175 variates with most number of datapoints in wh_countdatapoints table)

See StudyMart Data Warehouse Tables Structure

Creation: When are Data Warehouses tables generated?

AFTER conversion of any ICIS database from MS Access to MySQL
See MS Access to MySQL conversion of ICIS databases

Updating: When and how should the Data Warehouses be updated?

  • (from manacleto Aug 19, 2009) Right now, there is no clear procedure. A script is needed to add only the new entries to the data warehouses. Right now, you will have to generate them all again everytime an update(probably new conversion task) is made on the ICIS mysql databases.


Implementation Details

Implementation Details: Data Warehouse tables creation

The script can be found in Cropforge project named “icisjavatools”.  The script may be viewed on the browser, the link is:
http://cropforge.org/plugins/scmsvn/viewcvs.php/projects/ICISDatabaseTools/src/sql/mysql/warehouse_tables/?root=icisjavatools

Example of commands made to create the warehouse tables:

mysql> use iris_myisam;
mysql> source create_icis_dms_warehouse.sql;
mysql> call activate_icis_dms_warehousing('iris_myisam');

Implementation Details: How long does it take to generate data warehouse tables for ICIS databases?

(Castor) On Castor server with the following specs: Server OS: Linux Debian. Hardware specs: 8 processors; Intel(R) Xeon(R) CPU; E5405 @ 2.00GHz 24 GB Memory

(CRIL4) On CRIL4 server with the following specs: Server OS: Linux Debian. Hardware specs: Pentium D 2.8Ghz, 2GB Memory, 500GB HD


IRIS  7 hours 41 mins 54.28 secs [for wh_columns, wh_data, wh_datasets, wh_countdatapoints] (on CRIL4) --As of Oct 20, 2009

IWIS 14 hours 25 min 34.73 sec + approx 4 hrs for factors and fieldsetup (on Castor)--As of Aug 19, 2009

IMIS

Implementation Details: Testing of script for data warehouse tables creation

Implementation Details: How is an ICIS database prepared/updated?

A. Convert from MS Access to MySQL database (on local machine/PC)

http://cropwiki.irri.org/icis/index.php/MS_Access_to_MySQL_conversion_of_ICIS_databases

B. Create a mysqldump file of your local Mysql database

mysql> mysqldump -usomeusername -p --default-character-set=utf8 iris > iris_yyyymmdd.sql

This local database should NOT have any indices at this stage.

C. Transfer the mysqldump file to the "database preparation server"

For Castor server, the "preparation server" is CRIL4. 

D. MySQL Database Setup (see below)

Implementation Details: MySQL Database Setup (on database preparation server):

Notes:
1. Use MyISAM as the storage engine (specified in database table definitions: GMS_TABLES.sql, DMS_TABLES.sql, GEMS_TABLES.sql).
2. Use utf8 as the default character set for the Mysql database (also specified in database table definitions: GMS_TABLES.sql, DMS_TABLES.sql, GEMS_TABLES.sql).
3. Increase the value of max_connections Mysql system variable (in my.cnf) to greater than 100 (100 is the default).


The examples shown below are for using the MySQL command prompt via SSH.

1. Create empty databases
a. Login remotely (via SSH) to Mysql on the preparation server. Example:
   someuser@cril4:~$ mysql -usomeusername -p
b. Create empty databases. Example:
   mysql> CREATE DATABASE iris_myisam_yyyymmdd CHARACTER SET utf8 COLLATE utf8_general_ci;    
   mysql> CREATE DATABASE imis_myisam_yyyymmdd CHARACTER SET utf8 COLLATE utf8_general_ci;      
   mysql> CREATE DATABASE iwis3_myisam_yyyymmdd CHARACTER SET utf8 COLLATE utf8_general_ci;
where: iris_myisam_yyyymmdd is the database for RICE data, imis_myisam_yyyymmdd for MAIZE data, and iwis3_myisam_yyyymmdd for WHEAT data


2. Define table structure and populate ICIS Core tables with data

"Core" = included in the standard ICIS schema (used by standalone applications).

Rice data
   mysql> use iris_myisam_yyyymmdd;
   mysql> source iris_myisam_yyyymmdd.sql; 
Wheat data
   mysql> use iwis3_myisam_yyyymmdd;    
   mysql> source iwis3_myisam_yyyymmdd.sql; 
Maize data
   mysql> use imis_myisam_yyyymmdd;  
   mysql> source imis_myisam_yyyymmdd.sql;  

3. Define indices on ICIS Core tables

Index definition files for:

  • Genealogy Management System (GMS) tables: GMS_INDICES.sql
  • Data Management System (DMS) tables: DMS_INDICES.sql
  • Gene Management System (GEMS) tables: GEMS_INDICES.sql

Download 

Usage:
mysql> use <name of database>;
mysql> source GMS_INDICES.sql;
mysql> source DMS_INDICES.sql;
mysql> source GEMS_INDICES.sql;


4. Define UNIQUE indices on ICIS Core tables

Index definition files for:


Usage:
mysql> use <name of database>;
mysql> source GMS_UNIQUEKEYS.sql;
mysql> source DMS_UNIQUEKEYS.sql;
mysql> source GEMS_UNIQUEKEYS.sql;


5. Set the Scale Type (SCTYPE) in the SCALE table

Script for setting the Scale Type (continuous [C] /discrete [D])

Download set_scale_sctype.sql (Nov 2011 update)‎

Usage:

mysql> use <name of database>;
mysql> source set_scale_sctype.sql;
mysql> CALL set_scale_sctype();



6. Run scripts for generating the preferred ids

Sets the preferred ids for germplasms without preferred ids (Adds new NAMES records with NSTAT = 8)

Download generate_germplasm_prefids.sql (Nov 2011 update)

Usage:

mysql> use <name of database>;
mysql> source generate_germplasm_prefids.sql;
mysql> CALL generate_germplasm_prefids('crop installation',userid,locid,currdate);

...where: crop installation = Four-letter abbreviation (enclosed in single quotes) to describe the crop.

                                                   Example: 'IRIS' for rice, 'IWIS' for wheat, 'IMIS' for maize, 'ICAS' for cassava, 'IVIS' for vigna

            userid = Existing USERID (numeric) in USERS table; preferably your own USERID.

             locid = Location ID (numeric) of the location where this script is being run.

             currdate = Current date (numeric: YYYYMMDD).





7. Create and populate warehouse tables (based on ICIS Core tables) derived from the GMS
a. wh_create_children.sql

Tables: wh_der_children, wh_man_children

Download wh_create_children.sql (Nov 2011 update)

Usage:

mysql> use <name of database>;
mysql> source wh_create_children.sql;
mysql> call activate_children_warehousing();  


b. wh_create_germplasm_details.sql 

Table: wh_germplasm_details

Download wh_create_germplasm_details.sql (Nov 2011 update)

Usage:

mysql> use <name of database>; 
mysql> source wh_create_germplasm_details.sql; 
mysql> call activate_wh_germplasm_details();
c. create_passport.sql

Table: passport

[Download create_passport.sql]

Usage:

mysql> use <name of database>;
mysql> source create_passport.sql;
mysql> call activate_create_passport('name of database');


d. create aggtable_germplasmCount_studyCount.sql

(create aggregate tables for germplasm sets)

Usage:
mysql> use <name of database>;
mysql> source aggtable_germplasmCount_studyCount.sql;
8. Create and populate warehouse tables (based on ICIS Core tables) derived from the DMS
a. create_icis_dms_warehouse.sql

Tables: wh_datasets, wh_columns, wh_data, wh_countdatapoints

Download create_icis_dms_warehouse.sql (Nov 2011 update)

Usage:

mysql> use <name of database>;
mysql> source create_icis_dms_warehouse.sql;
mysql> CALL activate_icis_dms_warehousing('name of database', scaleid);

Where: scaleid = scale ID of the STUDY/ESTUDIO/etc factor (found in FACTOR table).

b. create_studymart_warehouse.sql

Tables: fieldsetup, factors, variates, aggregate_continuousxycount, aggregate_traitvaluepoints, aggregate_continuoustraitvaluepoints

Download create_studymart_warehouse.sql (Nov 2011 update)

Usage:

mysql> use <name of database>;
mysql> source create_studymart_warehouse.sql;
mysql> CALL activate_studymart_warehousing('name of database');
c. create_icis_aggregate_tables.sql

Tables: aggregate_continuousminmaxpoints, aggregate_datasetpoints, aggregate_studytraitvaluepoints, aggregate_xy_partitioned_datapoints

Download create_icis_aggregate_tables.sql

Usage:
mysql> use <name of database>;
mysql> source create_icis_aggregate_tables.sql;
mysql> CALL generate_icis_aggregate_tables();


d. create_iris_wh_indices.sql

(For rice data only)

Download create_iris_wh_indices.sql

Usage:

mysql> use <name of database>;
mysql> source create_iris_wh_indices.sql;
mysql> CALL create_iris_wh_indices();
9. Validate warehouse data against data in ICIS core tables
  • Count the number of records in every ICIS core table. Use icis_rec_count.sql

Download (This count (on the prep server) will be compared to the count (on the main server Castor) to see if database restoration was successful, at least at the record count level).

  • Compare the number of datapoints in the ICIS core tables with the number of datapoints in the warehouse tables. Use icis_datapoint_count.sql Download

Implementation Details: Create mysqldump file of database on preparation server then transfer to main server

Examples:


Rice data

    someuser@cril4:~$ mysqldump -uroot -p --disable-keys iris_myisam_yyyymmdd | gzip -9 > iris_myisam_yyyymmdd.sql.gz

Wheat data

   someuser@cril4:~$ mysqldump -uroot -p --disable-keys iwis3_myisam_yyyymmdd | gzip -9 > iwis3_myisam_yyyymmdd.sql.gz

Maize data

   someuser@cril4:~$ mysqldump -uroot -p --disable-keys imis_myisam_yyyymmdd | gzip -9 > imis_myisam_yyyymmdd.sql.gz


When transferring mysqldump files to main server: Example:

  • Rice: Store compressed mysqldump files in /databases/ICIS_mysqldump/IRIS/iris_myisam_yyyymmdd/
  • Wheat: Store compressed mysqldump files in /databases/ICIS_mysqldump/IWIS3/iwis3_myisam_yyyymmdd/
  • Maize: Store compressed mysqldump files in /databases/ICIS_mysqldump/IMIS/imis_myisam_yyyymmdd/



Implementation Details: MySQL Database Setup (on main server (e.g. Castor)):

1. Configure MySQL

IMPORTANT:

A. Use utf8 as the default character set for the Mysql database and utf8_general_ci as the collation.

Example:

CREATE DATABASE iris character set utf8 collate utf8_general_ci;
B. Tuning MySQL through the my.ini file (Please contact a CRIL SysAd for assistance)
i. Increase the value of max_connections Mysql system variable (in my.cnf) to greater than 100 (100 is the default).

Example:

max_connections=200 
ii. Set case insensitivity:
iii. Set collation,character set (i.e. prevents "illegal mix of collations" error when using ICIS standalone applications)
[mysqld]
set-variable=lower_case_table_names=1
init_connect='SET collation_connection = utf8_general_ci' 
collation_server=utf8_general_ci
character_set_server=utf8


*** Make sure to restart Mysql after modifying my.ini contents. ****


The examples shown below are for using the MySQL command prompt via SSH.

2. Create empty database

1. Login remotely (via SSH) to Mysql on the main server (eg. Castor). Example:

   someuser@castor:~$ mysql -usomeusername -p

2. Create empty databases. Example:

   mysql> CREATE DATABASE iris_myisam_yyyymmdd CHARACTER SET utf8 COLLATE utf8_general_ci;  
  mysql> CREATE DATABASE imis_myisam_yyyymmdd CHARACTER SET utf8 COLLATE utf8_general_ci;
  mysql> CREATE DATABASE iwis3_myisam_yyyymmdd CHARACTER SET utf8 COLLATE utf8_general_ci;

where: iris_myisam_yyyymmdd is the database for RICE data, imis_myisam_yyyymmdd for MAIZE data, and iwis3_myisam_yyyymmdd for WHEAT data


3. Restore the database

Use the mysqldump file created on the database preparation server.

Examples:

Rice data
  someuser@castor:~$ time gunzip < iris_myisam_yyyymmdd.sql.gz | mysql -uroot -p iris_myisam_yyyymmdd;
Wheat data
  someuser@castor:~$ time gunzip < iwis3_myisam_yyyymmdd.sql.gz | mysql -uroot -p iwis3_myisam_yyyymmdd;
Maize data
  someuser@castor:~$ time gunzip < imis_myisam_yyyymmdd.sql.gz | mysql -uroot -p imis_myisam_yyyymmdd;


4. Setup user privileges

(Done by admin/root user only)

1. Create a MySQL user called datasourceuser
mysql> CREATE USER 'datasourceuser'@'%' identified by 'somepassword';

     2. Grant privileges to user datasourceuser:

mysql> GRANT SELECT on iris_myisam_yyyymmdd.* TO 'datasourceuser'@'%' IDENTIFIED BY 'somepassword';
mysql> GRANT SELECT on iwis3_myisam_yyyymmdd.* TO 'datasourceuser'@'%' IDENTIFIED BY 'somepassword';
mysql> GRANT SELECT on imis_myisam_yyyymmdd.* TO 'datasourceuser'@'%' IDENTIFIED BY 'somepassword';
Personal tools