Friday, January 18, 2008


Regenerating equally sized shards from set of Lucene indexes

If you have a need to split your large index or set of indexes into smaller equally sized "shards" this prototype of tool might be for you. There is a tool for combining several indexes into one inside Lucene distribution but to my knowledge there is no tool to do the opposite.

The usual use case for splitting your index is index distribution: for example you plan to distribute (pieces of your index) into several machines to increase query throughput. Of course this operation could be done by reindexing the data, but resizing the index shards _seems_ to be faster than that (need to do some benching to confirm that).

This tool should be able to handle several different scenarios for you:

1. splitting one large index into many smaller ones

2. combining and resplitting several indexes into new set of indexes

3. combining several indexes into one

This tool does not try to interprete the physical index format but lets Lucene do the heavy lifting by simply using IndexWriter.addIndexes().

DISCLAIMER: I only had time to do some very limited testing with smallish indexes with this tool, but I plan to do some more testing with bigger indexes soon to get an idea how this will work in real life.

Download sources.

Labels: ,