Overview¶

The Mirantis OpenStack LMA (Logging, Monitoring and Alerting) Toolchain is comprised of a collection of open-source tools to help you monitor and diagnose problems in your OpenStack environment. These tools are packaged and delivered as Fuel plugins you can install from within the graphic user interface of Fuel starting with Mirantis OpenStack version 6.1.

From a high level view, the LMA Toolchain includes:

The LMA Collector (or just the Collector) to gather all operational data that we think are relevant to increase the operational visibility over your OpenStack environment. Those data are collected from a variety of sources including the log messages, collectd, and the OpenStack notifications bus
Pluggable external systems we call satellite clusters which can take action on the data received from the Collectors running on the OpenStack nodes.

The Collector is best described as a pluggable message processing and routing pipeline. Its core components are :

Collectd that is bundled with a collection of monitoring plugins. Many of them are purpose-built for OpenStack.
Heka which is the cornerstone component of the Collector.
A collection of Heka plugins written in Lua to decode, process and encode the data to be sent to external systems.

The primary function of the Collector is to transform the acquired raw operational data into an internal message representation that is based on the Heka message structure. that can be further exploited to, for example, detect anomalies or create new metric messages.

The satellite clusters delivered as part of the LMA Toolchain starting with Mirantis OpenStack 6.1 include:

Elasticsearch, a powerful open source search server based on Lucene and analytics engine that makes data like log messages and notifications easy to explore and analyse.
InfluxDB, an open-source and distributed time-series database to store and search metrics.

By combining Elasticsearch with Kibana, the LMA Toolchain provides an effective way to search and correlate all service-affecting events that occurred in the system for root cause analysis.

Likewise, by combining InfluxDB with Grafana, the LMA Toolchain brings you insightful metrics analytics to visualise how OpenStack behaves over time. This includes metrics for the OpenStack services status and a variety of resource usage and performance indicators. The ability to visualise time-series over a period of time that can vary from 5 minutes to the last 30 days helps anticipating failure conditions and plan capacity ahead of time to cope with a changing demand.

Furthermore, the LMA Toolchain has been designed with the dual objective to be both insightful and adaptive.

It is, for example, quite possible (without any code change) to integrate the Collector with an external monitoring application like Nagios. This could simply be done through enabling the Nagios output plugin of Heka for a subset of messages matching the message matcher syntax of the output plugin. You should probably not modify the configuration of the LMA Collector manually but apply any configuration change to the Puppet manifests that are shipped with the LMA Collector plugin for Fuel. Many other integration combinations are possible thanks to the extreme flexibility of Heka.

We recommend you to read the Heka documentation to become more familiar with that technology.

The rest of this document is organised in several chapters that will take you through a description of the internal message structure for the categories of operational data that are handled by the LMA Toolchain.