Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

Bulk reindexing

Darren Hardy edited this page Nov 28, 2016 · 6 revisions

Using ActiveMQ

The easiest way to bulk reindex is to use ActiveMQ to batch all the druids that you'd like to reindex. There is a pids_to_reindex folder in SULMQ's folders (see /opt/app/karaf/current). You need only copy a file that contains a list of druids (fully qualified like druid:aa111bb2222) into the folder. The message broker will pick up that file and remove it after it enqueues all of the messages for it. Then, the broker will do all the work of making the calls to the dor_indexing_app service to do the reindexing.

Background reindexing

Our ActiveMQ broker is configured to reindex objects in the background whenever the reindexing pipeline is idle. It picks the N oldest objects in the index and reindexes them. It takes about 3 days to go through the entire index using this background method.

From scratch reindex

To do a completely clean reindex, you would need to extract all the pids from Fedora's database, order them by object type (so that APOs are first, for example), and then feed the pids to the message broker as describe above. See the Argo::PidGatherer class for an example.

Clone this wiki locally