Bulk Indexing with FAST InStream in Windchill 8.0

This technical tip describes how to bulk load the index using the Bulk Index Tool with FAST InStream. It also describes techniques for troubleshooting bulk indexing performance.

Values Used Throughout This Suggested Technique

Value

<BASE_PORT>

<clustername>

Default

13000

webcluster

Description

Base port number used by FAST InStream

Cluster name created by the FAST InStream installation

Update Index Profile

Before Bulk Indexing with FAST InStream, it is recommended that the index profile be updated. Significant improvements have been made with the index size and bulk loading performance.

Note: This update is only necessary for Windchill 8.0 M040. The new index profile is included with Windchill 8.0 M050.

Bulk Indexing Overview

FAST InStream uses the same tool used by RetrievalWare to bulk load the index. A few changes were made to the bulk index tool in Windchill 8.0 M040. You no longer need to create a list of objects to be indexed prior to starting the bulk indexing. The list of objects to be indexed is now created automatically.

You should use the Bulk Index Tool to load FAST InStream collections:

  • To build indexes of existing data. This is a process of upgrading from RetrievalWare to FAST InStream.
  • To reinitialize a FAST InStream collection from a failed FAST InStream system.
  • To reinitialize a FAST InStream collection after changes have been made to the indexing profile.

Bulk Index Tool

To run the Bulk Index Tool, run the following command in a Windchill shell, and log in as a user from the Administrator user group:

windchill wt.index.BulkIndexTool (Ref Fig. 1)

Below is a discussion of each of the 9 Bulk Index Tool menu options:

1. Start the bulk indexing process:

If the tool has not been run, select option 1 to start the bulk indexing process.

2. Stop the bulk indexing process:

Select option 2 to stop the bulk index loading process.

3. Schedule the bulk indexing process:

Select option 3 to set up a regular schedule to repeat the bulk indexing process. You may want to schedule this time during low user activity.

You must enter the following information:

  • Start time in the format MM/DD/YYYY HH:MM AM/PM.
  • Stop time in the same format.
  • Total number of runs (how many times you want the scheduled task repeated).
  • Frequency (in days) that you want the bulk indexing task to run. (For example, enter 1 for daily; enter 7 for weekly.)

4. Reset failed entries:

Select option 4 to reset the objects that failed during indexing, so they can be processed again. Select option 7 to check for failed objects.

5. Reset entries that are processing:

Select option 5 to reset processing objects to be processed again. This can be used if an object gets stuck in the processing state, select option 7 to check for processing objects.

6. Reset entries that had no indexing policies:

Select option 6 to reset the objects without indexing policies. Select option 7 to check for objects with no indexing policies.

7. Check the bulk indexing progress:

Select option 7 to view indexing status. The following status example indicates that 200 out of 500 objects have been indexed and that no objects have failed.

Current status of Bulk Indexing:

Total Objects: 500

Objects processed: 200

Objects processing: 0

Objects w/o indexing policies: 0

Objects remaining: 300

Objects failed: 0

When all objects have been processed, the bulk indexing process is complete.

Note: This progress is dependent on the wt.index.bulkIndexSize=200 property. No changes to status will be made until the set number of objects are processed.

8. Delete the bulk indexing list of objects:

Select option 8 to delete the bulk indexing list of objects. This will reset all entries back to the "objects remaining" status.

Note: This option will not update FAST InStream. If processed objects were deleted, you will also need to delete and recreate the collection within the FAST InStream Administration UI.

9. Exit:

Select option 9 to close the Bulk Index Tool.

Scale Instream to Utilize Servers with Additional Resources

The Document Processor module manages the processing of each object as it passes through the Pipeline. The Out-Of-The-Box installation has one document processor. Significant improvements in indexing time were obtained by adding additional document processors.

A new document processor can be added using the FAST InStream Administration UI. Go to the System Management tab and find the Control Panel section. Click the http://www.ptc.com/cs/cs_26/howto/wci12523/wci12523b.bmpbutton next to Add Doc. Processor, and there will be an additional Document Processor module added under in the Installed module list section at the bottom of the page. These will use a great deal of memory and CPU during bulk indexing, so the number of document processors to use will depend heavily on the individual architecture of the solution.

Adding additional Document Processors can cut down the time of Bulk Indexing, and can be removed to free up resources after Bulk Indexing is completed.

Two-Step Process

The following technique provides an optional way to improve the performance of bulk indexing. This procedure divides bulk indexing into two steps:

  • Bulk loading the documents into FAST
  • Indexing (making searchable)

Suspending indexing allows both steps to perform a little faster which may be necessary when resources are limited.

1. First suspend the indexing, run the following command from the FAST InStream install directory bin/rtsadmin <hostname> <BASE_PORT+3099> <clustername> 0 0 suspendindexing

Default values:

<hostname> localhost

<baseport+3099> 16099

<clustername> webcluster

2. Start bulk indexing by running from a Windchill shell: windchill wt.index.BulkIndexTool

3. Select option 1 to start the bulk indexing process.

4. Monitor the status of the bulk indexing process with option 7.

5. When the status of the bulk indexing shows zero remaining objects, start the indexer. There are two ways to resume indexing.

If this is an initial load, resume with:

bin/rtsadmin <hostname> <baseport+3099> <clustername> 0 0 resetindex

If this is an incremental load, resume with:

bin/rtsadmin <hostname> <baseport+3099> <clustername> 0 0 resumeindexing

Bulk Index Troubleshooting

Problem: Bulk Indexing fails with OutOfMemoryErrors.

Action: Increase the heap size of the queue processing MethodServer or decrease the wt.index.bulkIndexSize property in wt.properties. The property wt.index.bulkIndexSize (default 200) in wt.properties, defines how many objects should be sent through the indexing mechanism at a given time during a Bulk Indexing operation. The larger the number, the bigger the in memory footprint will be for a given indexing operation and there will be less updates to the Bulk Index log for tracking. Also, if there is an error, the entire set will be marked as a failure. Decreasing the size will allow larger documents to be processed.

Problem: Checking the status of the bulk index tool takes a long time.

Action: As a workaround, check the bulk index status with the following SQL query:

SELECT status, COUNT(*) AS statuscount FROM IndexStatus GROUP BY status;

Note: This query will only work with Windchill installations configured with FAST InStream. The status column was added to the IndexStatus table in the Windchill 8.0 M040 maintenance release.

Learn more with other tutorials...

Windchill Drag-n-Drop Single Document Upload
  • Windchill Drag-n-Drop Single Document Upload

  • |
  • 172 Views
  • |
  • Introductory Level
  • |
  • PTC is happy to announce the new HTML5 based drag-n-drop file upload utility with Windchill 10.2 M020. This new utility will replace the existing Java based drop target and give people a much simpler and ... (Show more)
Generate self-signed SSL certificate/use to secure Windchill.
Windchill Drag-n-Drop Multiple Attachment Upload
  • Windchill Drag-n-Drop Multiple Attachment Upload

  • |
  • 554 Views
  • |
  • Introductory Level
  • |
  • PTC is happy to announce the new HTML5 based drag-n-drop file upload utility with Windchill 10.2 M020. This new utility will replace the existing Java based drop target and give people a much simpler and ... (Show more)
Need a break? Play pinball!

Enjoy PTC University Pinball

Beat up the Skill-O-Meter and submit your highscore. Why not challenge your peers, too?
Start playing   
Replay video? Resume video