Bulk Indexing with FAST InStream in Windchill 8.0

This technical tip describes how to bulk load the index using the Bulk Index Tool with FAST InStream. It also describes techniques for troubleshooting bulk indexing performance.

Values Used Throughout This Suggested Technique

Value

<BASE_PORT>

<clustername>

Default

13000

webcluster

Description

Base port number used by FAST InStream

Cluster name created by the FAST InStream installation

Update Index Profile

Before Bulk Indexing with FAST InStream, it is recommended that the index profile be updated. Significant improvements have been made with the index size and bulk loading performance.

Note: This update is only necessary for Windchill 8.0 M040. The new index profile is included with Windchill 8.0 M050.

Bulk Indexing Overview

FAST InStream uses the same tool used by RetrievalWare to bulk load the index. A few changes were made to the bulk index tool in Windchill 8.0 M040. You no longer need to create a list of objects to be indexed prior to starting the bulk indexing. The list of objects to be indexed is now created automatically.

You should use the Bulk Index Tool to load FAST InStream collections:

  • To build indexes of existing data. This is a process of upgrading from RetrievalWare to FAST InStream.
  • To reinitialize a FAST InStream collection from a failed FAST InStream system.
  • To reinitialize a FAST InStream collection after changes have been made to the indexing profile.

Bulk Index Tool

To run the Bulk Index Tool, run the following command in a Windchill shell, and log in as a user from the Administrator user group:

windchill wt.index.BulkIndexTool (Ref Fig. 1)

Below is a discussion of each of the 9 Bulk Index Tool menu options:

1. Start the bulk indexing process:

If the tool has not been run, select option 1 to start the bulk indexing process.

2. Stop the bulk indexing process:

Select option 2 to stop the bulk index loading process.

3. Schedule the bulk indexing process:

Select option 3 to set up a regular schedule to repeat the bulk indexing process. You may want to schedule this time during low user activity.

You must enter the following information:

  • Start time in the format MM/DD/YYYY HH:MM AM/PM.
  • Stop time in the same format.
  • Total number of runs (how many times you want the scheduled task repeated).
  • Frequency (in days) that you want the bulk indexing task to run. (For example, enter 1 for daily; enter 7 for weekly.)

4. Reset failed entries:

Select option 4 to reset the objects that failed during indexing, so they can be processed again. Select option 7 to check for failed objects.

5. Reset entries that are processing:

Select option 5 to reset processing objects to be processed again. This can be used if an object gets stuck in the processing state, select option 7 to check for processing objects.

6. Reset entries that had no indexing policies:

Select option 6 to reset the objects without indexing policies. Select option 7 to check for objects with no indexing policies.

7. Check the bulk indexing progress:

Select option 7 to view indexing status. The following status example indicates that 200 out of 500 objects have been indexed and that no objects have failed.

Current status of Bulk Indexing:

Total Objects: 500

Objects processed: 200

Objects processing: 0

Objects w/o indexing policies: 0

Objects remaining: 300

Objects failed: 0

When all objects have been processed, the bulk indexing process is complete.

Note: This progress is dependent on the wt.index.bulkIndexSize=200 property. No changes to status will be made until the set number of objects are processed.

8. Delete the bulk indexing list of objects:

Select option 8 to delete the bulk indexing list of objects. This will reset all entries back to the "objects remaining" status.

Note: This option will not update FAST InStream. If processed objects were deleted, you will also need to delete and recreate the collection within the FAST InStream Administration UI.

9. Exit:

Select option 9 to close the Bulk Index Tool.

Scale Instream to Utilize Servers with Additional Resources

The Document Processor module manages the processing of each object as it passes through the Pipeline. The Out-Of-The-Box installation has one document processor. Significant improvements in indexing time were obtained by adding additional document processors.

A new document processor can be added using the FAST InStream Administration UI. Go to the System Management tab and find the Control Panel section. Click the http://www.ptc.com/cs/cs_26/howto/wci12523/wci12523b.bmpbutton next to Add Doc. Processor, and there will be an additional Document Processor module added under in the Installed module list section at the bottom of the page. These will use a great deal of memory and CPU during bulk indexing, so the number of document processors to use will depend heavily on the individual architecture of the solution.

Adding additional Document Processors can cut down the time of Bulk Indexing, and can be removed to free up resources after Bulk Indexing is completed.

Two-Step Process

The following technique provides an optional way to improve the performance of bulk indexing. This procedure divides bulk indexing into two steps:

  • Bulk loading the documents into FAST
  • Indexing (making searchable)

Suspending indexing allows both steps to perform a little faster which may be necessary when resources are limited.

1. First suspend the indexing, run the following command from the FAST InStream install directory bin/rtsadmin <hostname> <BASE_PORT+3099> <clustername> 0 0 suspendindexing

Default values:

<hostname> localhost

<baseport+3099> 16099

<clustername> webcluster

2. Start bulk indexing by running from a Windchill shell: windchill wt.index.BulkIndexTool

3. Select option 1 to start the bulk indexing process.

4. Monitor the status of the bulk indexing process with option 7.

5. When the status of the bulk indexing shows zero remaining objects, start the indexer. There are two ways to resume indexing.

If this is an initial load, resume with:

bin/rtsadmin <hostname> <baseport+3099> <clustername> 0 0 resetindex

If this is an incremental load, resume with:

bin/rtsadmin <hostname> <baseport+3099> <clustername> 0 0 resumeindexing

Bulk Index Troubleshooting

Problem: Bulk Indexing fails with OutOfMemoryErrors.

Action: Increase the heap size of the queue processing MethodServer or decrease the wt.index.bulkIndexSize property in wt.properties. The property wt.index.bulkIndexSize (default 200) in wt.properties, defines how many objects should be sent through the indexing mechanism at a given time during a Bulk Indexing operation. The larger the number, the bigger the in memory footprint will be for a given indexing operation and there will be less updates to the Bulk Index log for tracking. Also, if there is an error, the entire set will be marked as a failure. Decreasing the size will allow larger documents to be processed.

Problem: Checking the status of the bulk index tool takes a long time.

Action: As a workaround, check the bulk index status with the following SQL query:

SELECT status, COUNT(*) AS statuscount FROM IndexStatus GROUP BY status;

Note: This query will only work with Windchill installations configured with FAST InStream. The status column was added to the IndexStatus table in the Windchill 8.0 M040 maintenance release.

Learn more with other tutorials...

Working with Windchill 10 Tables
  • Working with Windchill 10 Tables

  • |
  • 3010 Views
  • |
  • Introductory Level
  • |
  • Learn how to work with Windchill 10.0 tables to improve your ability to view and access information. The tutorial will demonstrate how to use table features such as the View filter, sorting, column sizing ... (Show more)
Product Structure Browser
  • Product Structure Browser

  • |
  • 3045 Views
  • |
  • Introductory Level
  • |
  • This video tutorial will demonstrate how to use the Product Structure Browser to view Windchill product structures. The video will explore how to display desired product structure components using the ... (Show more)
Editing Windchill Projects in Microsoft Project
Is eLearning right for me? Download Checklist

Free Checklist – Is eLearning right for me?

This new article offers guidance on finding the right eLearning solution for you and your organization. Learn about the advantages of eLearning and how to avoid pitfalls. On top, receive a free checklist to help you evaluate a solution.
Download now!   
Replay video? Resume video