Category Archives: How-to

Monitoring elasticsearch

elasticsearch is an open source distributed RESTful search engine built on top of Apache Lucene.
Like any service or component in your architecture, you’ll want to monitor it to ensure that it’s available and gather performance data to help with tuning.

In this brief post, we’ll look at how we can monitor elasticsearch using Opsview, which is built on Nagios and thus has access to a wide range of plugins, yet provides a more approachable user interface for configuring service checks.

Opsview configuration

The rest of the article assumes that you’ve got Opsview (or the Opsview VMWare appliance) installed & have completed the Quick Start.

elasticsearch-specific Plugin

We’ll install the plugin from https://github.com/rbramley/Opsview-elasticsearch into /usr/local/nagios/libexec/

The check_elasticsearch plugin is developed using Perl, so that it can be contributed back to Opsview. It requires the CPAN JSON module (sudo cpan -i JSON).

The plugin includes usage instructions, check_elasticsearch -h which can also be viewed in Opsview by selecting the ‘Show Plugin Help‘ link beneath the Plugin drop down.

Service check setup

Figure 1 gives an overview of service check configurations.

Figure 1 – Check definitions overview

The checks in action

The check results shown in Figure 2 are visible by navigating through the host group hierarchy.

Figure 2 – service check results

Note: They’re showing as warning because the checks were run against a standalone instance rather than a cluster.

Summary

The current checks are based on the Cluster Health API, the intention is to add stats/status checks too that will take threshold criteria and output performance data. The code for the check is on GitHub at https://github.com/rbramley/Opsview-elasticsearch so feel free to fork & send pull requests.

Monitoring OpenStack Swift with Opsview

This is a quick how-to for Opsview users who need to monitor an OpenStack (Essex) Swift installation. As a starting point we’ll perform a ‘front door’ check as this should work no matter what Swift implementation you are using.
Continue reading

Using Mahout Recommenders in Grails

Apache Mahout is a scalable machine learning framework that can be used to create intelligent applications. In this article we’ll see how Mahout can be used to create personalised recommendations within a Grails application.

This article originally appeared in the February 2012 edition of GroovyMag.

Continue reading

Monitoring MarkLogic

This is a quick how-to post for Opsview users who have a need to monitor MarkLogic.
Continue reading

Using Lucene in Grails

Apache Lucene is the leading open source search engine and is used in many businesses, projects and products. Lucene has sub-projects which provide additional functionality such as the Nutch web crawler and the Solr search service. This article gives an introduction to Lucene, a tutorial on three Grails Lucene plugins and a comparison between them.

This article originally appeared in the September 2011 edition of GroovyMag.

Continue reading

C.R.A.P. metrics for Grails

This is a quick how-to post on getting Change Risk Anti-Patterns statistics for your Grails code.

Firstly thanks to Jeff Winkler for bringing the new GMetrics v0.5 CrapMetric to my attention and thanks to Chris Mair for his great work on GMetrics (and Codenarc).

What is Change Risk Anti-Patterns?

It is a method to analyze and predict the amount of effort, pain, and time required to maintain an existing body of code. Given a Java method m, C.R.A.P. for m is calculated as follows:
C.R.A.P.(m) = comp(m)^2 * (1 – cov(m)/100)^3 + comp(m)
Where comp(m) is the cyclomatic complexity of method m, and cov(m) is the test code coverage provided by automated tests.
For more background information see this blog post describing the C.R.A.P. metric.

Setting up Grails

This is very fresh so you’ll need to do a little bit of tinkering until I can get the following changes incorporated into the GMetrics plugin!
Note you’ll also need to have the Code Coverage plugin installed (refer to “Grails & Hudson Part 3: Testing” for more details on that).

Install the GMetrics plugin

This is the straightforward bit:
grails install-plugin gmetrics

The hackery

With GMetrics plugin version 0.3.1 you’ll need to edit the installed plugin dependencies.groovy (e.g. ~/.grails/1.3.7/projects/foo/plugins/gmetrics-0.3.1/dependencies.groovy) to use gmetrics 0.5 instead of 0.3.
Line 27 should be changed to:

provided('org.gmetrics:GMetrics:0.5') {
}

And you’ll want to modify scripts/gmetrics.groovy to use metricSetFile on the Ant task otherwise it will just use the default metrics which doesn’t include C.R.A.P.
e.g. line 44 should become:

ant.gmetrics(metricSetFile:metricSetFilename)

Define a metric set

We now need to tell GMetrics which metrics to run – out of laziness I saved this in the root of the Grails project as test.gmetrics:

import org.gmetrics.metric.cyclomatic.CyclomaticComplexityMetric

final COBERTURA_FILE = 'file:target/test-reports/cobertura/coverage.xml'

  metricset {
    def cyclomaticComplexityMetric = metric(CyclomaticComplexityMetric)

    def coberturaMetric = CoberturaLineCoverage {
      coberturaFile = COBERTURA_FILE
      functions = ['total']
    }

    CRAP {
      functions = ['total']
      coverageMetric = coberturaMetric
      complexityMetric = cyclomaticComplexityMetric
    }
  }

Configure the plugin

If you want the HTML report, then your Config.groovy will need a single addition to specify the metric set:

gmetrics.metricSetfilename = 'file:test.gmetrics'

If you want an XML report you’ll also need:

gmetrics.outputFile = 'target/test-reports/GMetricsReport.xml'
gmetrics.reportType = 'org.gmetrics.report.XmlReportWriter'

Let’s go

Now it’s all configured, we just need to ensure that we have coverage data before we run the CrapMetric.

e.g.
grails test-app -unit -coverage -xml

We can now run:
grails gmetrics

The report will be generated and the plugin output will state where it has been written to. The CRAPpy threshold has been defined as 30 – so the report will help to highlight the areas of code that you need to concentrate on first.

Remember that over complex code with full test coverage can still be classed as CRAPpy.

Monitoring Apache Solr

Apache Solr is an open source enterprise search service from the Lucene project. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat.
Like any service or component in your architecture, you’ll want to monitor it to ensure that it’s available and gather performance data to help with tuning.

In this post, we’ll look at how we can monitor Solr, what performance metrics we might want to gather and how we can easily achieve this with Opsview.

We’ll use Opsview as it is built on Nagios and thus has access to a wide range of plugins, yet provides a more approachable user interface for configuring service checks.

A check list for service checks

Solr is built on Lucene so follows the same layout, an index contains documents that are comprised of fields. As part of the search service value add over Lucene, Solr provides a number of useful ways of obtaining health status / monitoring metrics:

  1. Health-check status using the /admin/ping handler
  2. The admin statistics page /admin/stats.jsp (XML styled with XSL)
  3. JMX MBeans

The list of applicable checks could be defined by whether it is a health check or a data gathering check – but we’d end up with a lot of overlap. Instead I’ve divided the list into the checks that can be performed remotely (without an installed agent on the server) and those that are best performed locally to the Solr server.

Remote (agent-less) checks

What should we look for over the network?

Firstly we can have a host-level check which may perform a network level ping.
Next we can check TCP connectivity to the servlet container port and then make an HTTP GET request to the Solr ‘front page’ and check for a known string (e.g. Welcome to Solr).

Now we’ve made it up to the application layer so can start to perform Solr specific checks. Items to monitor may include (delete as applicable):

  1. Ping status
  2. Number of docs
  3. Number of queries / queries per second
  4. Average response time
  5. Number of updates
  6. Cache hit ratios
  7. Replication status
  8. Synthetic queries

Agent-based checks

Installing an Opsview agent on the Solr server means we can run additional checks over NRPE (Nagios Remote Plugin Executor). This could be operating system level checks such as memory/disk utilisation or CPU load, or the following:

  1. Java servlet container process is running
  2. JMX checks e.g. heap memory or custom MBeans
  3. File age
  4. Log parsing for exceptions

For more detail on some of the non-Solr specific checks, see my previous post on monitoring Grails (though broadly applicable to any JVM application).

The Solr wiki describes how to configure JMX support: http://wiki.apache.org/solr/SolrJmx.

Opsview configuration

The rest of the article assumes that you’ve got Opsview (or the Opsview VMWare appliance) installed & have completed the Quick Start.

Solr-specific Plugin

We’ll install the plugin from https://github.com/rbramley/Opsview-solr-checks into /usr/local/nagios/libexec/

The check_solr plugin was developed using Perl, so that it could be contributed back to Opsview (and it would have used Nagios::Plugin had I known about it sooner). It requires the CPAN XML::XPath module (sudo cpan -i XML::XPath).

The plugin includes usage instructions, check_solr -h which can also be viewed in Opsview by selecting the ‘Show Plugin Help‘ link beneath the Plugin drop down (see Figure 1).
The -u option can be used to specify the URL path for multi-core set-ups.

Service check setup

Figure 1 gives an example of a service check configuration.

Figure 1 - Service check definition (showing plugin help)

Figure 2 shows the agentless service check group with plugins and their arguments.

Figure 2 - Service check group

Host configuration

Figure 3 shows a simplistic host setup with a ping check.

Figure 3 - Host configuration

Figure 4 is an extract from the Monitors tab, where we select the checks we want performed for the current host.

Figure 4 - Picking monitors for a host

Viewing output

The check results shown in Figure 5 are visible by navigating through the host group hierarchy.

Figure 5 - Host check results

If you click on the graph icon of Solr Cache Hit Ratios this will drill down onto the graph shown in Figure 6.
Clicking on the graph icon for Solr Avg Response Time – standard will take you to the graphs in Figure 7.

Figure 6 - Cache hit ratios

Figure 7 - Average request time

If you shutdown Solr, then the check results will start to turn critical and show in red as per Figure 8.

Figure 8 - Post shutdown alert

Alternatives

Of course I’m not the first person who’s wanted to monitor Solr from Opsview or a Nagios system, so there are a few plugins available – although the ones I found didn’t meet my personal needs (am happy to share my more detailed assessment notes if there is interest).

Also, chapter 8 of the recently published Apache Solr 3 Enterprise Search Server book includes a section on Monitoring Solr Performance.

Summary

Using check_solr in conjunction with Opsview allows you to ensure that your Solr server is available and provides you with metrics that can help you tune your Solr configuration. This can be complemented with additional agent-based operating system and JMX checks to give you a full picture view.

Hopefully check_solr and the demonstrated agentless service checks will soon be incorporated into Opsview to join my check_hudson_job contribution.

Using Browser Push in Grails

Browser Push is the collective term for techniques that allow a server to send asynchronous data updates in near real time to a browser. This article provides an overview of browser push and then provides a sample of Grails usage by extending the example project from the ‘Using JMS in Grails‘ article in the June 2011 edition to send event-driven updates to the browser.

This article originally appeared in the July 2011 edition of GroovyMag.

Continue reading

Relevancy Driven Development with Solr

The relevancy of search engine results is very subjective so therefore testing the relevancy of queries is also subjective.
One technique that exists in the information retrieval field is the use of judgement lists; an alternative approach discussed here is to follow the Behaviour Driven Development methodology employing user story acceptance criteria – I’ve been calling this Relevancy Driven Development or RDD for short.

I’d like to thank Eric Pugh for a great discussion on search engine testing and for giving me a guest slot in his ‘Better Search Engine Testing‘ talk* at Lucene EuroCon Barcelona 2011 last week to mention RDD. The first iteration of Solr-RDD combines my passion for automated testing with my passion for Groovy by leveraging EasyB (a Groovy BDD testing framework).

Continue reading

Using JMS in Grails

The Java Message Service (JMS) API is one of the cornerstones of the Java Enterprise Edition that allows applications to reliably communicate using asynchronous messages sent via a message broker. This article provides an introduction to JMS, the JMS support in the Spring Framework and then provides practical examples of usage within Grails using the JMS plugin.

This article originally appeared in the June 2011 edition of GroovyMag.

Continue reading