Relevancy Driven Development with Solr

The relevancy of search engine results is very subjective so therefore testing the relevancy of queries is also subjective.
One technique that exists in the information retrieval field is the use of judgement lists; an alternative approach discussed here is to follow the Behaviour Driven Development methodology employing user story acceptance criteria – I’ve been calling this Relevancy Driven Development or RDD for short.

I’d like to thank Eric Pugh for a great discussion on search engine testing and for giving me a guest slot in his ‘Better Search Engine Testing‘ talk* at Lucene EuroCon Barcelona 2011 last week to mention RDD. The first iteration of Solr-RDD combines my passion for automated testing with my passion for Groovy by leveraging EasyB (a Groovy BDD testing framework).

Background – Testing Solr

I’d been applying some of the best practices that I used on Java/Grails projects to Solr, but this initially focused on the performance aspects using the (production) ‘access’ request log from Solr, JMeter plus the Access Log Sampler and of course Jenkins. To cope with the evolutionary nature of the schema and the query (when not using (e)dismax) this was accompanied by some Groovy ‘migration’ scripts:

  • An index dumper script – to walk the Lucene index and export the documents to Solr update XML format
  • A data modifier script – to modify the XML dataset
  • An access log processing script – to update the queries that were replayed

plus delete_all.xml and optimize.xml for use with the Solr

Whilst this gave confidence that we could track the performance trends of any query changes or configuration tuning – it didn’t address the relevancy. For that we had another script, known as the SolrJ Query Tool, to execute pre-canned queries – although this didn’t have an automated feedback loop as the results would be emailed to the client for them to assess (there wasn’t a judgement list due to time constraints).

The importance of the controlled/constrained dataset

If you read between the lines above, we would recreate the data to a known state before each test run.
This is critical if you are to be able to make valid assertions about search results.


The aim is to use a story format to describe query relevancy e.g.

given our product data set
when I search for ‘exercise bike’
and I sort by price descending
then I should get two results with ids [PRD-123,PRD-234]
and PRD-123 has a higher score than PRD-234

Using SolrJ this should be viewed as an integration test directly against Solr rather than as a functional test which uses an HTTP Client to interact with the primary web application.

Iteration 1

This was the essentially the alpha-grade implementation and for this run through I used Solr 1.4.1 as it was used for the project that made the idea concrete.


  • Solr installed and running the example core (e.g. cd /Applications/apache-solr-1.4.1/example/; java -jar start.jar)
  • Download a copy of EasyB from
  • You’ll also need Ivy if you want to use Groovy dependency management


I highly recommend for being able to track down dependencies and they have them directly in Groovy @Grab form.

@Grab(group='org.apache.solr', module='solr-solrj', version='1.4.1')
@Grab(group='org.slf4j', module='slf4j-nop', version='1.6.2')


import org.apache.solr.client.solrj.*
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
import org.apache.solr.client.solrj.response.*
import org.apache.solr.common.*

The before fixture

SolrServer server

before “configure search client”, {
url = ‘http://localhost:8983/solr’
server = new CommonsHttpSolrServer(url)

before “set up constrained data”, {
given “our sample product data set”, {
SolrInputDocument doc1 = new SolrInputDocument()
doc1.addField(“id”, “PRD-123”, 1.0f)
doc1.addField(“name”, “Best exercise bike”, 1.5f)
doc1.addField(“price”, 100)

SolrInputDocument doc2 = new SolrInputDocument()
doc2.addField(“id”, “PRD-234”, 1.0f)
doc2.addField(“name”, “Old exercise bike”, 1.0f)
doc2.addField(“price”, 20)

Collection docs = new ArrayList()


The sample scenario

scenario "Exercise bikes",{
SolrQuery query = new SolrQuery()
def rdocs

when “I search for ‘exercise bike'”, {
query.setQuery(“name:\”exercise bike\””)
and “I sort by price descending”, {
query.addSortField(“price”, SolrQuery.ORDER.desc)
then “I should get two results with ids [PRD-123,PRD-234]”, {
QueryResponse rsp = server.query(query)
rdocs = rsp.getResults()


and “PRD-123 has a higher score than PRD-234”, {

Executing from the Command line

The code from the four sections above was saved as ‘BaseSearch.story‘ – the suffix instructs EasyB that it is a Story (as opposed to a specification).

Typing the following within the EasyB installation directory gives the output as shown in Figure 1:
java -cp easyb-0.9.8.jar:lib/commons-cli-1.2.jar:lib/groovy-all-1.7.5.jar:$GRAILS_HOME/lib/ivy-2.2.0.jar org.easyb.BehaviorRunner ~/Projects/rbramley/solr-rdd/stories/BaseSearch.story -prettyprint

Figure 1: EasyB pretty-printed command line output

If we now change the SolrQuery.ORDER.desc to SolrQuery.ORDER.asc and re-run, we’ll see the failure output as shown in Figure 2.

Figure 2: EasyB failure output

Note that the -txtstory argument will make EasyB output the stories in a ‘business-readable’ form.


  • The EasyB classloader prevents the use of an Embedded Solr with the default Solr configuration (e.g. org.apache.solr.common.SolrException: Error loading class 'solr.FastLRUCache')

The Solr-RDD Backlog

The first iteration uses vanilla EasyB, but there is code that needs to be written to remove the necessity of a lot of the boilerplate code:

  1. A DSL to abstract the SolrJ client library (including a query builder)
  2. Data loading integration
  3. Dependency management simplification
  4. EasyB plugin / syntax extension (building on the above items)
  5. Jenkins / Hudson integration

Feel free to suggest additional features or participate via the fledgling project on GitHub.


* You can get an older version of Eric’s talk from here.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.