ICISWiki/ICISSearch
From ICISWiki
ICIS Search Documentation
Overview
ICIS Search is a project which uses Lucene and HibernateSearch to provide free text searching for ICIS. The project offers Indexer, Searcher and Query classes to applications which want to provide free text searching capabilities to users.
This project uses HibernateSearch and JPA to provide free text searching for an ICIS Database. HibernateSearch is a project which uses Lucene for indexing and searching POJOs as defined by annotations. HibernateSearch takes care of forming Lucene documents and indices using the POJOs defined by xml mapping files or annotations.
To use the searching features provided by the project, the user should run the indexing process first. This is done by running the Indexing classes. The directory where the indices will be created are specified in the persistence.xml file found in the src/etc/config/META-INF directory. In this xml file, the database connection properties are also defined. The user must change the values as needed, but it is recommended to leave the other settings as they are. The indexing process will create many objects, depending on the number of records in the database, so it is recommended to set the following Java VM arguments "-Xms512m -Xmx1024m" to increase the heap size. Indexing will take some time so please be patient. Indexing 3 million records can take up to 1 hour and a half!
After indexing, the Searcher class can be used to create Query objects and get results. The JpaUtil class takes care of the connection to the database and sessions. Searching is thread safe as there is one Session for each thread.
Prerequisites
Developer should be knowledgeable in Java, Ant, Maven, and Hibernate implementation of JPA.
Required Reading
• HibernateSearch Reference
• Lucene in Action
Dependencies
• HibernateSearch and its dependencies
Note: See the project’s pom file for more details.
Directories
src/main/java – Java class files
src/etc/config – config files for JPA, Ehcache and log4j
src/test/java – Junit test cases
docs – documentation
xmls – maven files
Packages
org.generationcp.osiris.datasource.icis.search – contains the Indexer and Searcher classes
org.generationcp.osiris.datasource.icis.search.pojos – contains the JPA Pojos
org.generationcp.osiris.datasource.icis.search.query – contains the Query classes
org.generationcp.osiris.datasource.icis.search.utils – contains utility classes with various
functionalities
org.generationcp.osiris.datasource.icis.search.pojos.test – for testing pojos
org.generationcp.osiris.datasource.icis.search – for testing Searchers
org.generationcp.osiris.datasource.icis.search.utils.test – for testing utility classes
Java Classes
Javadocs are available for all the Java classes in the project. The documentation clearly defines the roles of the classes and how they do their jobs.
The users of the project will, at the least, deal with these three classes:
• Searcher – for setting up searches
• Query – for getting results from searches
• ExplanationParser – for understanding Explanation objects from Lucene
The classes are self-explanatory. If you have read the required readings then you will have no problem understanding the codes.
Configuration
Before using and running tests in the project, the persistence.xml file should have the proper configuration. The settings for connecting to the database should be correct, as well as the location of the folder for Lucene indices.
As you learn more about HibernateSearch you can add more properties to tweak the settings to your liking.
Change Log
The docs folder contains the ChangeLog file which keeps track of changes to the project. Update this file always when you make changes.
How to generate the Lucene index files:
1. Configure the project by changing the settings in persistence.xml in the src/etc/config/META-INF/ folder. Set the proper value for the database connection properties and the Lucene index folder property:
<property name="hibernate.search.default.indexBase"
value="/usr/local/tomcat-dev/icissearch/lucene/indexes"/>
2. Compile the project.
3. Run the GermplasmIndexer class. When running the class, set the maximum Java heap size to 1024m. The log on the console will tell you if the indexing was successful.
4. Run the StudyIndexer class. As with the GermplasmIndexer, set the maximum heap size to 1024m. The log on the console will tell you if the indexing was successful.
5. Finished!
Note: You can now also deploy the jar file of the project to a maven repository. This can then be used by the ICISSearchWebApp project.
Boosting of fields:
The two main searchable entities for this project are Germplasm and Study. It is suggested that you examine the JPA POJOs to understand the mappings and the fields used for indexing. Look for @Field and @IndexEmbedded annotations to know which fields are used in the indexing process. If you have read the HibernateSearch reference manual then you will he no trouble identifying the fields of the entities which are indexed. Looking at the POJOs is a good exercise for understanding the project and so the fields are not enumerated here in the documentation for developers.
In Lucene, certain fields of entities can be “boosted”. Boosting can make a field more important or unimportant compared to other fields. This means that entities which have matches in boosted fields will probably have a higher ranking in the list of results. In the project, look for fields which have the @Boost annotation. Again discovering the boosted fields in the project is left as an exercise to the developer reading this.

