The road so far….

April 26, 2010

Lucene : Reuse IndexSearcher rather creating new everytime

Filed under: java, lucene — Rahul Sharma @ 10:26 pm

In Lucene we can create an index using IndexWriter and then to search over this created index we need an IndexSearcher. If the requirement is such that some part of the application is continually updating the index which some other part is using  then the IndexSearcher that is used to search the index needs to be created again, whenever the index is updated. It turns out that every time we create an IndexSearcher, it keeps a snapshot of the index over which it has been opened and then searches over the snapshot only, any subsequent updates to the index are not reflected. In such a case extra collaboration needs to build in place between the elements that are updating and that are searching the index.

We can avoid this by creating the IndexSearcher passively every time, i.e . whenever required, this makes sure that when we are going to search then the latest index will be used. This also creates an over head as we will be additionally opening and closing the searcher every time we do the search. It would be optimal if the searcher can re-load the latest index whenever required.

There is a reopen API on the  IndexReader (a reader is associated with the Indexwriter and the IndexSearcher) that can be used to reopen the index if the index has changed. The IndexReader obtained  from an IndexWriter can do this whenever a commit has been done by the corresponding writer, but if there is a reader obtained using and IndexSearcher on the same index it will not be able to detect the change.

IndexSearcher can be constructed using and IndexReader, so we can somehow get the reader that has been obtained using Indexwriter and then create a searcher over it.  Even doing this will not help,  the IndexReader(now there is just one because the instance  from Indexwriter has been used to get the Searcher) now sees the change and reloads the index, but the IndexSearcher still holds the snapshot . So then how can we reload the Index ?

The  answer that I found was we can not do this but we can determine if the index on which the IndexSearceher has been opened is the latest one or not and then can utilise this information to open new searcher or not. One way to do this is to develop some kind of notifications between the part of the application that is updating the index and the part which is searching the index. But a better solution can be built using the APIs provides in the IndexReader.

The IndexReader provides the IsCurrent API that can be used to query the index and check if any modification has been made on not. We can query the underlying index, using the reader from the IndexSearcher and then if it turns out that some changes have been made we can close our current IndexSearcher and create a new one.

IndexReader reader= searcher.getReader()
if(!reader.isCurrent()){
searcher.close()
// open new searcher
}
Advertisements

3 Comments »

  1. Rahul, but how could u say that this is a re-useability of searcher? As soon as you add some docs to the index and commit , index version will be changed and this will cause searcher to be reopened and not reused. If you can do something which populates this newly added docs to this already opened searcher would be a real re-useability. please correct me if i am wrong.

    cause in this case your index updating application should not send commit command on every update or every batch update.Because hard commit will force the indexsearcher to be closed and open new one.Which is fairly expensive. More over your suggested approach should check for index version periodically and not contentiously.

    can you suggest some fair strategy to achieve this?

    I am also working on the same scenario and struggling with concurrent search/update.

    Comment by hardikmu — December 28, 2012 @ 3:27 pm

    • Hardikmu, I was working on version 2.4 to achieve this. I think you should look at NRT available in the latest version of lucene. Also you could think of using soft commits feature.

      Comment by Rahul Sharma — December 29, 2012 @ 2:21 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: