Monitoring Apache Solr

Apache Solr is an open source enterprise search service from the Lucene project. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat.
Like any service or component in your architecture, you’ll want to monitor it to ensure that it’s available and gather performance data to help with tuning.

In this post, we’ll look at how we can monitor Solr, what performance metrics we might want to gather and how we can easily achieve this with Opsview.

We’ll use Opsview as it is built on Nagios and thus has access to a wide range of plugins, yet provides a more approachable user interface for configuring service checks.

A check list for service checks

Solr is built on Lucene so follows the same layout, an index contains documents that are comprised of fields. As part of the search service value add over Lucene, Solr provides a number of useful ways of obtaining health status / monitoring metrics:

  1. Health-check status using the /admin/ping handler
  2. The admin statistics page /admin/stats.jsp (XML styled with XSL)
  3. JMX MBeans

The list of applicable checks could be defined by whether it is a health check or a data gathering check – but we’d end up with a lot of overlap. Instead I’ve divided the list into the checks that can be performed remotely (without an installed agent on the server) and those that are best performed locally to the Solr server.

Remote (agent-less) checks

What should we look for over the network?

Firstly we can have a host-level check which may perform a network level ping.
Next we can check TCP connectivity to the servlet container port and then make an HTTP GET request to the Solr ‘front page’ and check for a known string (e.g. Welcome to Solr).

Now we’ve made it up to the application layer so can start to perform Solr specific checks. Items to monitor may include (delete as applicable):

  1. Ping status
  2. Number of docs
  3. Number of queries / queries per second
  4. Average response time
  5. Number of updates
  6. Cache hit ratios
  7. Replication status
  8. Synthetic queries

Agent-based checks

Installing an Opsview agent on the Solr server means we can run additional checks over NRPE (Nagios Remote Plugin Executor). This could be operating system level checks such as memory/disk utilisation or CPU load, or the following:

  1. Java servlet container process is running
  2. JMX checks e.g. heap memory or custom MBeans
  3. File age
  4. Log parsing for exceptions

For more detail on some of the non-Solr specific checks, see my previous post on monitoring Grails (though broadly applicable to any JVM application).

The Solr wiki describes how to configure JMX support: http://wiki.apache.org/solr/SolrJmx.

Opsview configuration

The rest of the article assumes that you’ve got Opsview (or the Opsview VMWare appliance) installed & have completed the Quick Start.

Solr-specific Plugin

We’ll install the plugin from https://github.com/rbramley/Opsview-solr-checks into /usr/local/nagios/libexec/

The check_solr plugin was developed using Perl, so that it could be contributed back to Opsview (and it would have used Nagios::Plugin had I known about it sooner). It requires the CPAN XML::XPath module (sudo cpan -i XML::XPath).

The plugin includes usage instructions, check_solr -h which can also be viewed in Opsview by selecting the ‘Show Plugin Help‘ link beneath the Plugin drop down (see Figure 1).
The -u option can be used to specify the URL path for multi-core set-ups.

Service check setup

Figure 1 gives an example of a service check configuration.

Figure 1 - Service check definition (showing plugin help)

Figure 2 shows the agentless service check group with plugins and their arguments.

Figure 2 - Service check group

Host configuration

Figure 3 shows a simplistic host setup with a ping check.

Figure 3 - Host configuration

Figure 4 is an extract from the Monitors tab, where we select the checks we want performed for the current host.

Figure 4 - Picking monitors for a host

Viewing output

The check results shown in Figure 5 are visible by navigating through the host group hierarchy.

Figure 5 - Host check results

If you click on the graph icon of Solr Cache Hit Ratios this will drill down onto the graph shown in Figure 6.
Clicking on the graph icon for Solr Avg Response Time – standard will take you to the graphs in Figure 7.

Figure 6 - Cache hit ratios

Figure 7 - Average request time

If you shutdown Solr, then the check results will start to turn critical and show in red as per Figure 8.

Figure 8 - Post shutdown alert

Alternatives

Of course I’m not the first person who’s wanted to monitor Solr from Opsview or a Nagios system, so there are a few plugins available – although the ones I found didn’t meet my personal needs (am happy to share my more detailed assessment notes if there is interest).

Also, chapter 8 of the recently published Apache Solr 3 Enterprise Search Server book includes a section on Monitoring Solr Performance.

Summary

Using check_solr in conjunction with Opsview allows you to ensure that your Solr server is available and provides you with metrics that can help you tune your Solr configuration. This can be complemented with additional agent-based operating system and JMX checks to give you a full picture view.

Hopefully check_solr and the demonstrated agentless service checks will soon be incorporated into Opsview to join my check_hudson_job contribution.

Advertisements

7 responses to “Monitoring Apache Solr

  1. If you know your way round the jmx mbeans then the various generic check_jmx checks mean you can get access to pretty much any of the mbean info directly. We use the dual use jmxquery.jar checker to do both nagios/opsview and munin monitoring, but it’s poorly documented, and a bit of a black art to get running.

    • Hi Stu,
      For the check_solr plugin I wanted to focus on agentless checks using stats.jsp as the lowest common denominator (though there’s nothing to stop it being accessed over NRPE).
      The JMX-RMI connector is hard to tunnel through firewalls, check_jmx/jmxquery.jar have the overhead of JVM instantiation and JNRPE has a different set of challenges. Then you need to use map.local to get performance data into RRD…

      Note that the MBean info is now meant to be available in XML form from /admin/mbeans (SOLR-1750). It would be trivial to tweak the XPath to work with this – however this wouldn’t work with customers that are still on 1.4 and on my Solr 3.5.0 installation the statistics weren’t actually available!

  2. Pingback: Monitoring Apache Solr « Another Word For It

  3. Hi,

    How do you get the Solr Admin Interface in xml ouput? I donwload your plugin, but I can’t get the xml.

    Thanks

    • The Solr Admin Stats /solr/admin/stats.jsp is XML styled with XSL.

      For instance, if I view the page source in my browser, the first line is:

      <?xml-stylesheet type="text/xsl" href="stats.xsl"?>

      The plugin uses stats.jsp for all options apart from ping, which as the plugin help says defaults to the /solr/admin/ping handler.

  4. Newer versions of Solr disable stats.jsp by default in favour of API endpoints… I’ve written a lot of newer Nagios plugins for the Solr + SolrCloud APIs that can be used with any Nagios-compatible monitoring systems. Find them here:

    https://github.com/harisekhon/nagios-plugins

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s