TDM Data Mining Tool 5.5
From ICISWiki
S.M. Yates, Agriculture and Agri-Food Canada
Contents |
Introduction
The Data Mining Tool is an application that will query data from the GMS, DMS and GEMS databases. The focus of this tool is primarily on searching for phenotypic and genotypic data from the ICIS databases while allowing the user to streamline the information they want into a Microsoft Excel spreadsheet or text file. This application is currently in use by researchers and technicians at Agriculture and Agri-Food Canada Research Centres, but can be used on most databases using the ICIS 5.5 schema.
The functions of the Data Mining Tool are:
- Line Vs. Line Comparison - allows user to specify any number of germplasm and search for all related data.
- Study Retriever - queries the data from specified studies and includes all germplasm within those studies.
- Germplasm List Retriever - allows the user to select a germplasm list (created in SetGen) and search for all data related to those germplasm or find which studies those germplasm are in.
- Genetic Marker Search - queries the DMS for genotypic data related to a specified marker and can also retrieve a marker report from the GEMS.
Before starting the Data Mining Tool, some factor information for the GID factor must be present under the [WORKBOOK] section of the .ini file:
Table 1.1 Workbook Initialization Keys
| Key | Valid Values | Remarks |
| GIDTrait | Long Integer | TRAITID of the Factor called GID |
| GIDScale | Long Integer | SCALEID of the Factor called GID |
| GIDMethod | Long Integer | TMETHID of the Factor called GID |
For users who wish to implement the options of being able to include the checks from the tests to their output or run the Genetic Marker Search, a [DATAMINING] section has to be added to the .ini file and then add the following keys:
Table 1.3 Data Mining Tool Initialization Keys
| Key |
Valid Values | Remarks |
| CHECKTRAITID | Long Integer | For Line Vs. Line Comparison: TRAITID of a Factor that indicates whether the germplasm is a check or not |
| CHECKSCALEID | Long Integer | For Line Vs. Line Comparison: SCALEID of a Factor that indicates whether the germplasm is a check or not |
| CHECKINDICATOR | Long Integer | For Line Vs. Line Comparison: TMETHID of a Factor that indicates whether the germplasm is a check or not |
| CHECKMETHODID | Long Integer | For Line Vs. Line Comparison: the value that indicates whether a germplasm is a check or not (i.e. Y or N) |
| PRIMERTRAITID | Long Integer | For Genetic Marker Search: TRAITID of the Factor that holds the PRIMERID of the Primer (for searching the GEMS database) |
| PRIMERSCALEID | Long Integer | For Genetic Marker Search: SCALEID of the Factor that holds the PRIMERID of the Primer (for searching the GEMS database) |
| PRIMERMETHODID | Long Integer | For Genetic Marker Search: TMETHID of the Factor that holds the PRIMERID of the Primer (for searching the GEMS database) |
| PublicGEMS | DSN entry | For Genetic Marker Search: enter the DSN value for a Central or public GEMS (i.e. IWIS3 GEMS) |
If the above keys are not found in the .ini file, the Data Mining Tool will disable options related to them.
To have the Data Mining Tool menu show the correct crop name, also ensure that a proper value is added to the following key under the [SETGEN] section of the .ini:
Table 1.4 SetGen Initialization Keys
| Key | Valid Values | Remarks |
| CROP | CROP NAME | The full name or abbreviation of the crop |
System Requirements
Minimum Requirements:
Microsoft Windows XP, SP2
Microsoft Excel XP (2000)
Microsoft .Net Framework 2.0 (Installer will go to download if missing)
Pentium 4 processor, 256 MB RAM
Installation
The Data Mining Tool currently comes in a separate installer from the other ICIS tools. To install the Data Mining Tool, run the file called ICIS - Data Mining Tool.msi and can be downloaded from the CropForge website, at https://cropforge.org/frs/?group_id=23&release_id=552
Figure 1.2.1 Installer Welcome Screen
The user would click next to continue the installation.
Figure 1.2.2 Installer Directory Screen
The user has the choice of either installing the Data Mining Tool in the default location (C:\ICIS5\EXES) or can specify where the EXES folder is in their ICIS installation. They can also check to see how much space the application will take up and whether they want it installed for everyone using the computer, or just themselves. Once these choices have been properly answered, the user clicks next to continue.
Figure 1.2.3 Installer Confirm Screen
The user now confirms that they want to go ahead with the installation, or go back and change their choices.
Figure 1.2.4 Installer Progress Screen
The progress bar indicates how far along the installation is.
Figure 1.2.5 Installer Completed Screen
At this point, the user can close the installer. The Data Mining Tool will now be included in the ICIS suite of tools, on the Launcher.
Starting The Data Mining Tool
The Data Mining Tool can be started from the ICIS Launcher.
Figure 1.3.1 The ICIS Launcher
By double-clicking on the Data Mining Tool from the Launcher, the application will start and take you to the menu.
Figure 1.3.2 The Data Mining Tool Menu
Line Vs. Line Comparison
This query will search for all data related to the entered germplasm. If enabled, there is also an option to include the checks from any study where the germplasm are found, in order to compare the data with the checks in the environment.
Figure 1.4.1 Live Vs. Line Comparison Screen
All the user needs to do is type the name of the germplasm and click on ADD to add it to the query. To delete a germplasm from the query, click on it in the list to highlight it, and click on REMOVE. To include the checks from found studies, check the "Include all checks from found studies." The user can continue the query and go to the Factors screen by clicking NEXT or return to the main menu by clicking BACK..
Study Retriever
The Study Retriever will search for all germplasm within the chosen studies and will not include any data outside of those studies. The more studies chosen by the user may increase the length of time for the query to complete, depending on the specifications of the user's machine.
Figure 1.5.1 Study Retriever Screen
The user puts a check beside the studies they wish to include in the query and then clicks on NEXT. They can abort the query by clicking on BACK and return to the main menu.
Germplasm List Retriever
The Germplasm List Retriever is a query that will take a germplasm list, created in SetGen and search for all data related to those germplasm or create an output of which studies each germplasm is in.
Figure 1.6.1 Germplasm List Retriever Screen
The first thing the user should do is choose whether they want the query to search for data or produce a list of studies the germplasm appear in. The rest of the screen is very similar to the Study Retriever - the user simply puts a check next to the germplasm lists of interest, but as with the Study Retriever, choosing mulitple lists may result in longer query times, depending on the number of germplasm in each list. To start the query, the user clicks NEXT; to leave the query and return to the main menu, the user would click on BACK.
Genetic Marker Search
The Genetic Marker Search query allows the user to enter the name of a genetic marker, and search for all genotypic data associated with it. There is also an option to produce a marker report from the GEMS database, complete with reference information.
Figure 1.7.1 Genetic Marker Search Screen
The screen is very similar to the Line Vs. Line Comparison screen, only the user now enters the name of the marker and clicks on ADD to include it in the query. To remove a marker from the list, highlight it by clicking on it, and then click on REMOVE. To include a marker report from the GEMS database, place a check mark on the "Include GEMS marker report with data" option. To continue the query, click NEXT and to abort the query and return to the main menu, click BACK.
Factors
The next screen users will see is the Factors screen. Here is where the user can choose which factors they wish to include in their query.
Figure 1.8.1 Factors Screen
The first option that the user needs to choose is which dataset to search. All available datasets are listed in the drop-down box and which factors appear is dependent on the type of dataset the user chooses. The user then places a check next to any Factors that they wish to include. Some factors may be suggested by the application, depending on how the user's FACTORS table is set up. To remove these from the query, uncheck them. Click on NEXT to continue, or QUIT to abort the query and return to the main menu.
Data Filters
One major challenge for users in querying data from large datasets is to extract only the data they are interested in. The data filter screens in the Data Mining Tool allow the user to set up filters to help ensure that they only include relevant data in their output.
Figure 1.9.1 Factor Filter Screen
The factors chosen by the user on the prevous screen are listed and up to 6 of them can be selected here as a filter for the dataset. For example, if the user wished to limit the output to a certain location, they would put a check mark next to the LOCATION factor on this screen. The user then clicks NEXT to choose the filter values, or clicks QUIT to leave the query and return to the main menu.
Figure 1.9.2 Filter Values Screen
This screen allows the user to filter by the actual Factor values chosen in the previous screen. If the LOCATION factor was chosen to filter by, then all the locations associated with the query will be shown here. The user simply checks which values they want to filter by and clicks NEXT to continue with the query. Otherwise, the user can abort the query and return to the main menu by clicking on QUIT.
It should be noted that choosing too many filters can have erratic results for those unfamiliar with the dataset. For instance, if the user wants all data at Location A, but also chose to limit the output to Studies A, B and C then no output will occur if Location A was not part of Studies A, B or C. It is recommended that users only choose one or two filters until they are comfortable choosing more.
Traits
The final screen for the user to choose which traits they wish to include in the query. This list of traits is dependant on the dataset chosen in the Factors screen.
Figure 1.10.1 Traits Screen
Here the user simply puts a check next to the Traits they wish to include in the query and clicks NEXT to continue to the output. To quit and return to the main menu, the user can click on QUIT.
Excel Output
Figure 2.1.1 Line Vs. Line Comparison Output
Figure 2.1.2 Study Retriever

