This technical tip describes how to bulk load the index using the Bulk Index Tool with FAST InStream. It also describes techniques for troubleshooting bulk indexing performance.
Base port number used by FAST InStream
Cluster name created by the FAST InStream installation
Before Bulk Indexing with FAST InStream, it is recommended that the index profile be updated. Significant improvements have been made with the index size and bulk loading performance.
Note: This update is only necessary for Windchill 8.0 M040. The new index profile is included with Windchill 8.0 M050.
FAST InStream uses the same tool used by RetrievalWare to bulk load the index. A few changes were made to the bulk index tool in Windchill 8.0 M040. You no longer need to create a list of objects to be indexed prior to starting the bulk indexing. The list of objects to be indexed is now created automatically.
You should use the Bulk Index Tool to load FAST InStream collections:
To run the Bulk Index Tool, run the following command in a Windchill shell, and log in as a user from the Administrator user group:
windchill wt.index.BulkIndexTool (Ref Fig. 1)
Below is a discussion of each of the 9 Bulk Index Tool menu options:
If the tool has not been run, select option 1 to start the bulk indexing process.
Select option 2 to stop the bulk index loading process.
Select option 3 to set up a regular schedule to repeat the bulk indexing process. You may want to schedule this time during low user activity.
You must enter the following information:
Select option 4 to reset the objects that failed during indexing, so they can be processed again. Select option 7 to check for failed objects.
Select option 5 to reset processing objects to be processed again. This can be used if an object gets stuck in the processing state, select option 7 to check for processing objects.
Select option 6 to reset the objects without indexing policies. Select option 7 to check for objects with no indexing policies.
Select option 7 to view indexing status. The following status example indicates that 200 out of 500 objects have been indexed and that no objects have failed.
Current status of Bulk Indexing:
Total Objects: 500
Objects processed: 200
Objects processing: 0
Objects w/o indexing policies: 0
Objects remaining: 300
Objects failed: 0
When all objects have been processed, the bulk indexing process is complete.
Note: This progress is dependent on the wt.index.bulkIndexSize=200 property. No changes to status will be made until the set number of objects are processed.
Select option 8 to delete the bulk indexing list of objects. This will reset all entries back to the "objects remaining" status.
Note: This option will not update FAST InStream. If processed objects were deleted, you will also need to delete and recreate the collection within the FAST InStream Administration UI.
Select option 9 to close the Bulk Index Tool.
The Document Processor module manages the processing of each object as it passes through the Pipeline. The Out-Of-The-Box installation has one document processor. Significant improvements in indexing time were obtained by adding additional document processors.
A new document processor can be added using the FAST InStream Administration UI. Go to the System Management tab and find the Control Panel section. Click the http://www.ptc.com/cs/cs_26/howto/wci12523/wci12523b.bmpbutton next to Add Doc. Processor, and there will be an additional Document Processor module added under in the Installed module list section at the bottom of the page. These will use a great deal of memory and CPU during bulk indexing, so the number of document processors to use will depend heavily on the individual architecture of the solution.
Adding additional Document Processors can cut down the time of Bulk Indexing, and can be removed to free up resources after Bulk Indexing is completed.
The following technique provides an optional way to improve the performance of bulk indexing. This procedure divides bulk indexing into two steps:
Suspending indexing allows both steps to perform a little faster which may be necessary when resources are limited.
1. First suspend the indexing, run the following command from the FAST InStream install directory bin/rtsadmin <hostname> <BASE_PORT+3099> <clustername> 0 0 suspendindexing
2. Start bulk indexing by running from a Windchill shell: windchill wt.index.BulkIndexTool
3. Select option 1 to start the bulk indexing process.
4. Monitor the status of the bulk indexing process with option 7.
5. When the status of the bulk indexing shows zero remaining objects, start the indexer. There are two ways to resume indexing.
If this is an initial load, resume with:
bin/rtsadmin <hostname> <baseport+3099> <clustername> 0 0 resetindex
If this is an incremental load, resume with:
bin/rtsadmin <hostname> <baseport+3099> <clustername> 0 0 resumeindexing
Problem: Bulk Indexing fails with OutOfMemoryErrors.
Action: Increase the heap size of the queue processing MethodServer or decrease the wt.index.bulkIndexSize property in wt.properties. The property wt.index.bulkIndexSize (default 200) in wt.properties, defines how many objects should be sent through the indexing mechanism at a given time during a Bulk Indexing operation. The larger the number, the bigger the in memory footprint will be for a given indexing operation and there will be less updates to the Bulk Index log for tracking. Also, if there is an error, the entire set will be marked as a failure. Decreasing the size will allow larger documents to be processed.
Problem: Checking the status of the bulk index tool takes a long time.
Action: As a workaround, check the bulk index status with the following SQL query:
SELECT status, COUNT(*) AS statuscount FROM IndexStatus GROUP BY status;
Note: This query will only work with Windchill installations configured with FAST InStream. The status column was added to the IndexStatus table in the Windchill 8.0 M040 maintenance release.