The road so far….

August 31, 2011

SOLR tips & tricks

Filed under: java, lucene — Tags: — Rahul Sharma @ 12:15 pm

In my recent project we have been playing around with SOLR quite a lot. Here are some of the tricks which we found quite handy when you start playing with SOLR.

Using EmbededSolrServer

You will be developing a solution for  SOLR web app instance and then you can say that why should we care about the EmbededSolrServer. But there are many reasons of using this server e.g. this can provide you a test-bed where you can test your configurations in Junit tests rather than deploying it again and again.

Here is the way if you do not want to set the solr.solr.home environment variable and rather load resources from classpath.

 SolrServer createServer(String coreName) throws Exception{
    CoreContainer coreContainer = new CoreContainer();
    CoreDescriptor discriptor = new CoreDescriptor(coreContainer, coreName, new File(baseLocation, coreName).getAbsolutePath());
    SolrCore solrCore = coreContainer.create(discriptor);
    coreContainer.register(solrCore, false);
    return new EmbeddedSolrServer(coreContainer, coreName);
  }

This will load solrconfig.xml, schema.xml and solr.xml from your classpath.

Also you may want to shut down the server gracefully rather than just killing it ( when the Junit tests are done).  Here is the way to do it .

  void shutdown(EmbeddedSolrServer server) throws Exception {
        Field field = EmbeddedSolrServer.class
                .getDeclaredField("coreContainer");
        field.setAccessible(true);
        CoreContainer container = (CoreContainer) field.get(server);
        container.shutdown();
    }

You need to get hold of CoreConatiner in-order to shut down the server. This hack has been applied due to the SOLR-1178 (https://issues.apache.org/jira/browse/SOLR-1178) issue.

Character Encoding Issues

SOLR itself can handle UTF-8 characters. When you run Junit tests with UTF-8 data using EmbededSolrServer, the tests run fine. But when you deploy the same on tomcat the results are quite different.

Basically, it is the tomcat container that causes the issues. As a first thing you should use the POST method to send data rather the GET method.

solrServer.query(q, METHOD.POST);

If this does not solve the problem then here is the tomcat link on ways to enable encoding in tomcat.

ExtendDismaxParser

You would require a parser for search queries. The LuceneQueryParser that is used by default is quite error prone. It would give back ParserException every now and then. This dues to the fact that it exposes a lit of features that can make the query quite powerful but at the same time also quite error prone.

The ExtendedDismaxParser on the other hand exposes only a few of the operators but at the same time makes sure that you do not get exception in any case. A no result found is an acceptable result rather that an exception. Enable this in your solrconfig.xml :

<requestHandler name="/search" class="solr.SearchHandler" startup="lazy">
		<lst name="defaults">
			<str name="echoParams">explicit</str>
			<str name="defType">edismax</str>
			<str name="mm">0</str>
		</lst>
	</requestHandler>

UpdateRequest.setCommitWithin

When you commit your data to a SOLR core, it re-opens the attached searchers. There is an upper limit to how many searchers you can open. This can lead to error while adding data and the recommended solution is to slowdown the adding process.

An alternative to this, that we used, is using a commit timeout. This attribute will push the data within the set time-out and  will not push the data immediately.  What I believe the exception is not completely fixed, it will again pop up sometime when the index contains a lot of data as the indexing times will be greater then.

Type-Ahead Suggestions

There are a couple of things available in SOLR that can be used here.

There is a Suggester component for this in SOLR 1.4.  But it keeps the associated data-structures in memory and rebuilds them on every commit. This increases the indexing times and also gives back OutOfMemoryException once the index is large enough. The way to fix this is to keep only limited number of terms in the index. You can control that using the threshold attribute. But in-case you want to used every possible term in the index then this not your cup of tea.

Another possible solution to this problem is using the TermsComponenet. This is based on the underlying index and would give back every term in index as a suggestion. As there is no associated data-structure here so it does not impact indexing time. But here you will have to play around a bit with regular expressions and regex.flags to get the correct thing.

Also there a couple of more solutions that can be explored here like Ngram Query, Faceting etc.

Advertisements

3 Comments »

  1. You Can get more solr information here,
    http://antguider.blogspot.com/2012/06/solr-search.html

    Comment by AntGuider — June 28, 2012 @ 6:52 pm

    • I am sorry, but the question too general. You need to be some what specific.

      Comment by Rahul Sharma — December 17, 2012 @ 9:08 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: