List of metrics¶
The following is a list of metrics that are emitted by the StackLight Collector. The metrics are listed by category, then by metric name.
System¶
CPU¶
Metrics have a cpu_number
field that contains the CPU number to which the
metric applies.
cpu_idle
, the percentage of CPU time spent in the idle task.cpu_interrupt
, the percentage of CPU time spent servicing interrupts.cpu_nice
, the percentage of CPU time spent in user mode with low priority (nice).cpu_softirq
, the percentage of CPU time spent servicing soft interrupts.cpu_steal
, the percentage of CPU time spent in other operating systems.cpu_system
, the percentage of CPU time spent in system mode.cpu_user
, the percentage of CPU time spent in user mode.cpu_wait
, the percentage of CPU time spent waiting for I/O operations to complete.
Disk¶
Metrics have a device
field that contains the disk device number the metric
applies to. For example, ‘sda’, ‘sdb’, and others.
disk_merged_read
, the number of read operations per second that could be merged with already queued operations.disk_merged_write
, the number of write operations per second that could be merged with already queued operations.disk_octets_read
, the number of octets (bytes) read per second.disk_octets_write
, the number of octets (bytes) written per second.disk_ops_read
, the number of read operations per second.disk_ops_write
, the number of write operations per second.disk_time_read
, the average time for a read operation to complete in the last interval.disk_time_write
, the average time for a write operation to complete in the last interval.
File system¶
Metrics have a fs
field that contains the partition’s mount point to which
the metric applies. For example, ‘/’, ‘/var/lib’, and others.
fs_inodes_free
, the number of free inodes on the file system.fs_inodes_percent_free
, the percentage of free inodes on the file system.fs_inodes_percent_reserved
, the percentage of reserved inodes.fs_inodes_percent_used
, the percentage of used inodes.fs_inodes_reserved
, the number of reserved inodes.fs_inodes_used
, the number of used inodes.fs_space_free
, the number of free bytes.fs_space_percent_free
, the percentage of free bytes.fs_space_percent_reserved
, the percentage of reserved bytes.fs_space_percent_used
, the percentage of used bytes.fs_space_reserved
, the number of reserved bytes.fs_space_used
, the number of used bytes.
System load¶
load_longterm
, the system load average over the last 15 minutes.load_midterm
, the system load average over the last 5 minutes.load_shortterm
, the system load average over the last minute.
Memory¶
memory_buffered
, the amount of buffered memory in bytes.memory_cached
, the amount of cached memory in bytes.memory_free
, the amount of free memory in bytes.memory_used
, the amount of used memory in bytes.
Network¶
Metrics have an interface
field that contains the interface name the
metric applies to. For example, ‘eth0’, ‘eth1’, and others.
if_errors_rx
, the number of errors per second detected when receiving from the interface.if_errors_tx
, the number of errors per second detected when transmitting from the interface.if_octets_rx
, the number of octets (bytes) received per second by the interface.if_octets_tx
, the number of octets (bytes) transmitted per second by the interface.if_packets_rx
, the number of packets received per second by the interface.if_packets_tx
, the number of packets transmitted per second by the interface.
Processes¶
processes_count
, the number of processes in a given state. The metric has astate
field (one of ‘blocked’, ‘paging’, ‘running’, ‘sleeping’, ‘stopped’ or ‘zombies’).processes_fork_rate
, the number of processes forked per second.
Swap¶
swap_cached
, the amount of cached memory (in bytes) that is in the swap.swap_free
, the amount of free memory (in bytes) that is in the swap.swap_io_in
, the number of swap pages written per second.swap_io_out
, the number of swap pages read per second.swap_used
, the amount of used memory (in bytes) that is in the swap.
Users¶
logged_users
, the number of users currently logged in.
Apache¶
apache_bytes
, the number of bytes per second transmitted by the server.apache_connections
, the current number of active connections.apache_idle_workers
, the current number of idle workers.apache_requests
, the number of requests processed per second.apache_workers_closing
, the number of workers in closing state.apache_workers_dnslookup
, the number of workers in DNS lookup state.apache_workers_finishing
, the number of workers in finishing state.apache_workers_idle_cleanup
, the number of workers in idle cleanup state.apache_workers_keepalive
, the number of workers in keepalive state.apache_workers_logging
, the number of workers in logging state.apache_workers_open
, the number of workers in open state.apache_workers_reading
, the number of workers in reading state.apache_workers_sending
, the number of workers in sending state.apache_workers_starting
, the number of workers in starting state.apache_workers_waiting
, the number of workers in waiting state.
MySQL¶
Commands¶
mysql_commands
, the number of times per second a given statement has been
executed. The metric has a statement
field that contains the statement to
which it applies. The values can be as follows:
change_db
for the USE statement.commit
for the COMMIT statement.flush
for the FLUSH statement.insert
for the INSERT statement.rollback
for the ROLLBACK statement.select
for the SELECT statement.set_option
for the SET statement.show_collations
for the SHOW COLLATION statement.show_databases
for the SHOW DATABASES statement.show_fields
for the SHOW FIELDS statement.show_master_status
for the SHOW MASTER STATUS statement.show_status
for the SHOW STATUS statement.show_tables
for the SHOW TABLES statement.show_variables
for the SHOW VARIABLES statement.show_warnings
for the SHOW WARNINGS statement.update
for the UPDATE statement.
Handlers¶
mysql_handler
, the number of times per second a given handler has been
executed. The metric has a handler
field that contains the handler
it applies to. The values can be as follows:
commit
for the internal COMMIT statements.delete
for the internal DELETE statements.external_lock
for the external locks.read_first
for the requests that read the first entry in an index.read_key
for the requests that read a row based on a key.read_next
for the requests that read the next row in key order.read_prev
for the requests that read the previous row in key order.read_rnd
for the requests that read a row based on a fixed position.read_rnd_next
for the requests that read the next row in the data file.rollback
the requests that perform the rollback operation.update
the requests that update a row in a table.write
the requests that insert a row in a table.
Locks¶
mysql_locks_immediate
, the number of times per second the requests for table locks could be granted immediately.mysql_locks_waited
, the number of times per second the requests for table locks had to wait.
Network¶
mysql_octets_rx
, the number of bytes per second received by the server.mysql_octets_tx
, the number of bytes per second sent by the server.
Threads¶
mysql_threads_cached
, the number of threads in the thread cache.mysql_threads_connected
, the number of currently open connections.mysql_threads_created
, the number of threads created per second to handle connections.mysql_threads_running
, the number of threads that are not sleeping.
Cluster¶
The following metrics are collected with statement ‘SHOW STATUS’. For details, see Percona documentation.
mysql_cluster_connected
,1
when the node is connected to the cluster, if not, then0
.mysql_cluster_local_cert_failures
, the number of write sets that failed the certification test.mysql_cluster_local_commits
, the number of write sets committed on the node.mysql_cluster_local_recv_queue
, the number of write sets waiting to be applied.mysql_cluster_local_send_queue
, the number of write sets waiting to be sent.mysql_cluster_ready
,1
when the node is ready to accept queries, if not, then0
.mysql_cluster_received
, the total number of write sets received from other nodes.mysql_cluster_received_bytes
, the total size in bytes of write sets received from other nodes.mysql_cluster_replicated
, the total number of write sets sent to other nodes.mysql_cluster_replicated_bytes
the total size in bytes of write sets sent to other nodes.mysql_cluster_size
, the current number of nodes in the cluster.mysql_cluster_status
,1
when the node is ‘Primary’,2
if ‘Non-Primary’, and3
if ‘Disconnected’.
Slow queries¶
The following metric is collected with statement ‘SHOW STATUS where Variable_name = ‘Slow_queries’.
mysql_slow_queries
, the number of queries that have taken more than X seconds, depending on the MySQL configuration parameter ‘long_query_time’ (10s per default).
RabbitMQ¶
Cluster¶
rabbitmq_connections
, the total number of connections.rabbitmq_consumers
, the total number of consumers.rabbitmq_channels
, the total number of channels.rabbitmq_exchanges
, the total number of exchanges.rabbitmq_messages
, the total number of messages which are ready to be consumed or not yet acknowledged.rabbitmq_queues
, the total number of queues.rabbitmq_running_nodes
, the total number of running nodes in the cluster.rabbitmq_disk_free
, the free disk space.rabbitmq_disk_free_limit
, the minimum amount of free disk space for RabbitMQ. Whenrabbitmq_disk_free
drops below this value, all producers are blocked.rabbitmq_remaining_disk
, the difference betweenrabbitmq_disk_free
andrabbitmq_disk_free_limit
.rabbitmq_used_memory
, bytes of memory used by the whole RabbitMQ process.rabbitmq_vm_memory_limit
, the maximum amount of memory allocated for RabbitMQ. Whenrabbitmq_used_memory
uses more than this value, all producers are blocked.rabbitmq_remaining_memory
, the difference betweenrabbitmq_vm_memory_limit
andrabbitmq_used_memory
.
HAProxy¶
The frontend
and backend
field values can be as follows:
- cinder-api
- glance-api
- glance-registry-api
- heat-api
- heat-cfn-api
- heat-cloudwatch-api
- horizon-web (when Horizon is deployed without TLS)
- horizon-https (when Horizon is deployed with TLS)
- keystone-public-api
- keystone-admin-api
- mysqld-tcp
- murano-api
- neutron-api
- nova-api
- nova-metadata-api
- nova-novncproxy-websocket
- sahara-api
- swift-api
Server¶
haproxy_connections
, the number of current connections.haproxy_pipes_free
, the number of free pipes.haproxy_pipes_used
, the number of used pipes.haproxy_run_queue
, the number of connections waiting in the queue.haproxy_ssl_connections
, the number of current SSL connections.haproxy_tasks
, the number of tasks.haproxy_uptime
, the HAProxy server uptime in seconds.
Frontends¶
The following metrics have a frontend
field that contains the name of the
front-end server:
haproxy_frontend_bytes_in
, the number of bytes received by the frontend.haproxy_frontend_bytes_out
, the number of bytes transmitted by the frontend.haproxy_frontend_denied_requests
, the number of denied requests.haproxy_frontend_denied_responses
, the number of denied responses.haproxy_frontend_error_requests
, the number of error requests.haproxy_frontend_response_1xx
, the number of HTTP responses with 1xx code.haproxy_frontend_response_2xx
, the number of HTTP responses with 2xx code.haproxy_frontend_response_3xx
, the number of HTTP responses with 3xx code.haproxy_frontend_response_4xx
, the number of HTTP responses with 4xx code.haproxy_frontend_response_5xx
, the number of HTTP responses with 5xx code.haproxy_frontend_response_other
, the number of HTTP responses with other code.haproxy_frontend_session_current
, the number of current sessions.haproxy_frontend_session_total
, the cumulative number of sessions.
Backends¶
The following metrics have a backend
field that contains the name of the
back-end server:
haproxy_backend_bytes_in
, the number of bytes received by the back end.haproxy_backend_bytes_out
, the number of bytes transmitted by the back end.haproxy_backend_denied_requests
, the number of denied requests.haproxy_backend_denied_responses
, the number of denied responses.haproxy_backend_downtime
, the total downtime in seconds.haproxy_backend_error_connection
, the number of error connections.haproxy_backend_error_responses
, the number of error responses.haproxy_backend_queue_current
, the number of requests in queue.haproxy_backend_redistributed
, the number of times a request was redispatched to another server.haproxy_backend_response_1xx
, the number of HTTP responses with 1xx code.haproxy_backend_response_2xx
, the number of HTTP responses with 2xx code.haproxy_backend_response_3xx
, the number of HTTP responses with 3xx code.haproxy_backend_response_4xx
, the number of HTTP responses with 4xx code.haproxy_backend_response_5xx
, the number of HTTP responses with 5xx code.haproxy_backend_response_other
, the number of HTTP responses with other code.haproxy_backend_retries
, the number of times a connection to a server was retried.haproxy_backend_servers
, the count of servers grouped by state. This metric has an additionalstate
field that contains the state of the back ends (either ‘down’ or ‘up’).haproxy_backend_session_current
, the number of current sessions.haproxy_backend_session_total
, the cumulative number of sessions.haproxy_backend_status
, the global back-end status where values0
and1
represent, respectively,DOWN
(all back ends are down) andUP
(at least one back end is up).
Memcached¶
memcached_command_flush
, the cumulative number of flush reqs.memcached_command_get
, the cumulative number of retrieval reqs.memcached_command_set
, the cumulative number of storage reqs.memcached_command_touch
, the cumulative number of touch reqs.memcached_connections_current
, the number of open connections.memcached_df_cache_free
, the current number of free bytes to store items.memcached_df_cache_used
, the current number of bytes used to store items.memcached_items_current
, the current number of items stored.memcached_octets_rx
, the total number of bytes read by this server from the network.memcached_octets_tx
, the total number of bytes sent by this server to the network.memcached_ops_decr_hits
, the number of successful decr reqs.memcached_ops_decr_misses
, the number of decr reqs against missing keys.memcached_ops_evictions
, the number of valid items removed from cache to free memory for new items.memcached_ops_hits
, the number of keys that have been requested.memcached_ops_incr_hits
, the number of successful incr reqs.memcached_ops_incr_misses
, the number of successful incr reqs.memcached_ops_misses
, the number of items that have been requested and not found.memcached_percent_hitratio
, the percentage of get command hits (in cache).memcached_ps_cputime_syst
, the percentage of CPU time spent in system mode by memcached. It can be greater than 100% when the node has more than one CPU.memcached_ps_cputime_user
, the percentage of CPU time spent in user mode by memcached. It can be greater than 100% when the node has more than one CPU.
For details, see the Memcached documentation.
Libvirt¶
Every metric contains an instance_id
field, which is the UUID of the
instance for the Nova service.
CPU¶
virt_cpu_time
, the average amount of CPU time (in nanoseconds) allocated to the virtual instance in a second.virt_vcpu_time
, the average amount of CPU time (in nanoseconds) allocated to the virtual CPU in a second. The metric contains avcpu_number
field which is the virtual CPU number.
Disk¶
Metrics have a device
field that contains the virtual disk device to which
the metric applies. For example, ‘vda’, ‘vdb’, and others.
virt_disk_octets_read
, the number of octets (bytes) read per second.virt_disk_octets_write
, the number of octets (bytes) written per second.virt_disk_ops_read
, the number of read operations per second.virt_disk_ops_write
, the number of write operations per second.
Memory¶
virt_memory_total
, the total amount of memory (in bytes) allocated to the virtual instance.
Network¶
Metrics have an interface
field that contains the interface name to which
the metric applies. For example, ‘tap0dc043a6-dd’, ‘tap769b123a-2e’, and others.
virt_if_dropped_rx
, the number of dropped packets per second when receiving from the interface.virt_if_dropped_tx
, the number of dropped packets per second when transmitting from the interface.virt_if_errors_rx
, the number of errors per second detected when receiving from the interface.virt_if_errors_tx
, the number of errors per second detected when transmitting from the interface.virt_if_octets_rx
, the number of octets (bytes) received per second by the interface.virt_if_octets_tx
, the number of octets (bytes) transmitted per second by the interface.virt_if_packets_rx
, the number of packets received per second by the interface.virt_if_packets_tx
, the number of packets transmitted per second by the interface.
OpenStack¶
Service checks¶
openstack_check_api
, the service’s API status,1
if it is responsive,if not, then
0
. The metric contains aservice
field that identifies the OpenStack service being checked.
<service>
is one of the following values with their respective resource
checks:
- ‘ceilometer-api’: ‘/v2/capabilities’
- ‘cinder-api’: ‘/’
- ‘cinder-v2-api’: ‘/’
- ‘glance-api’: ‘/’
- ‘heat-api’: ‘/’
- ‘heat-cfn-api’: ‘/’
- ‘keystone-public-api’: ‘/’
- ‘neutron-api’: ‘/’
- ‘nova-api’: ‘/’
- ‘swift-api’: ‘/healthcheck’
- ‘swift-s3-api’: ‘/healthcheck’
Note
All checks except for Ceilometer are performed without authentication.
Compute¶
The following metrics are emitted per compute node:
openstack_nova_free_disk
, the disk space in GB available for new instances.openstack_nova_free_ram
, the memory in MB available for new instances.openstack_nova_free_vcpus
, the number of virtual CPU available for new instances.openstack_nova_instance_creation_time
, the time in seconds it took to launch a new instance.openstack_nova_instance_state
, the number of instances which entered a given state (the value is always1
). The metric contains astate
field.openstack_nova_running_instances
, the number of running instances.openstack_nova_running_tasks
, the number of tasks currently executed.openstack_nova_used_disk
, the disk space in GB used by the instances.openstack_nova_used_ram
, the memory in MB used by the instances.openstack_nova_used_vcpus
, the number of virtual CPU used by the instances.
The following metrics are retrieved from the Nova API and represent the aggregated values across all compute nodes.
openstack_nova_total_free_disk
, the total amount of disk space in GB available for new instances.openstack_nova_total_free_ram
, the total amount of memory in MB available for new instances.openstack_nova_total_free_vcpus
, the total number of virtual CPU available for new instances.openstack_nova_total_running_instances
, the total number of running instances.openstack_nova_total_running_tasks
, the total number of tasks currently executed.openstack_nova_total_used_disk
, the total amount of disk space in GB used by the instances.openstack_nova_total_used_ram
, the total amount of memory in MB used by the instances.openstack_nova_total_used_vcpus
, the total number of virtual CPU used by the instances.
The following metrics are retrieved from the Nova API:
openstack_nova_instances
, the total count of instances in a given state. The metric contains astate
field which is one of ‘active’, ‘deleted’, ‘error’, ‘paused’, ‘resumed’, ‘rescued’, ‘resized’, ‘shelved_offloaded’ or ‘suspended’.
The following metrics are retrieved from the Nova database:
openstack_nova_service
, the Nova service state (either0
for ‘up’,1
for ‘down’ or2
for ‘disabled’). The metric contains aservice
field (one of ‘compute’, ‘conductor’, ‘scheduler’, ‘cert’ or ‘consoleauth’) and astate
field (one of ‘up’, ‘down’ or ‘disabled’).openstack_nova_services
, the total count of Nova services by state. The metric contains aservice
field (one of ‘compute’, ‘conductor’, ‘scheduler’, ‘cert’ or ‘consoleauth’) and astate
field (one of ‘up’, ‘down’, or ‘disabled’).
Identity¶
The following metrics are retrieved from the Keystone API:
openstack_keystone_roles
, the total number of roles.openstack_keystone_tenants
, the number of tenants by state. The metric contains astate
field (either ‘enabled’ or ‘disabled’).openstack_keystone_users
, the number of users by state. The metric contains astate
field (either ‘enabled’ or ‘disabled’).
Volume¶
The following metrics are emitted per volume node:
openstack_cinder_volume_creation_time
, the time in seconds it took to create a new volume.
Note
When using Ceph as the back end storage for volumes, the hostname
value is always set to rbd
.
The following metrics are retrieved from the Cinder API:
openstack_cinder_snapshots
, the number of snapshots by state. The metric contains astate
field.openstack_cinder_snapshots_size
, the total size (in bytes) of snapshots by state. The metric contains astate
field.openstack_cinder_volumes
, the number of volumes by state. The metric contains astate
field.openstack_cinder_volumes_size
, the total size (in bytes) of volumes by state. The metric contains astate
field.
state
is one of ‘available’, ‘creating’, ‘attaching’, ‘in-use’, ‘deleting’,
‘backing-up’, ‘restoring-backup’, ‘error’, ‘error_deleting’, ‘error_restoring’,
‘error_extending’.
The following metrics are retrieved from the Cinder database:
openstack_cinder_service
, the Cinder service state (either0
for ‘up’,1
for ‘down’, or2
for ‘disabled’). The metric contains aservice
field (one of ‘volume’, ‘backup’, ‘scheduler’) and astate
field (one of ‘up’, ‘down’ or ‘disabled’).openstack_cinder_services
, the total count of Cinder services by state. The metric contains aservice
field (one of ‘volume’, ‘backup’, ‘scheduler’) and astate
field (one of ‘up’, ‘down’ or ‘disabled’).
Image¶
The following metrics are retrieved from the Glance API:
openstack_glance_images
, the number of images by state and visibility. The metric containsstate
andvisibility
fields.openstack_glance_images_size
, the total size (in bytes) of images by state and visibility. The metric containsstate
andvisibility
fields.openstack_glance_snapshots
, the number of snapshot images by state and visibility. The metric containsstate
andvisibility
fields.openstack_glance_snapshots_size
, the total size (in bytes) of snapshots by state and visibility. The metric containsstate
andvisibility
fields.
state
is one of ‘queued’, ‘saving’, ‘active’, ‘killed’, ‘deleted’,
‘pending_delete’. visibility
is either ‘public’ or ‘private’.
Network¶
The following metrics are retrieved from the Neutron API:
openstack_neutron_floatingips
, the total number of floating IP addresses.openstack_neutron_networks
, the number of virtual networks by state. The metric contains astate
field.openstack_neutron_ports
, the number of virtual ports by owner and state. The metric containsowner
andstate
fields.openstack_neutron_routers
, the number of virtual routers by state. The metric contains astate
field.openstack_neutron_subnets
, the number of virtual subnets.
<state>
is one of ‘active’, ‘build’, ‘down’ or ‘error’.
<owner>
is one of ‘compute’, ‘dhcp’, ‘floatingip’, ‘floatingip_agent_gateway’, ‘router_interface’, ‘router_gateway’, ‘router_ha_interface’,
‘router_interface_distributed’, or ‘router_centralized_snat’.
The following metrics are retrieved from the Neutron database:
Note
These metrics are not collected when the Contrail plugin is deployed.
openstack_neutron_agent
, the Neutron agent state (either0
for ‘up’,1
for ‘down’, or2
for ‘disabled’). The metric contains aservice
field (one of ‘dhcp’, ‘l3’, ‘metadata’, or ‘openvswitch’), and astate
field (one of ‘up’, ‘down’ or ‘disabled’).openstack_neutron_agents
, the total number of Neutron agents by service and state. The metric containsservice
(one of ‘dhcp’, ‘l3’, ‘metadata’ or ‘openvswitch’) andstate
(one of ‘up’, ‘down’ or ‘disabled’) fields.
API response times¶
openstack_<service>_http_response_times
, HTTP response time statistics. The statistics aremin
,max
,sum
,count
,upper_90
(90 percentile) over 10 seconds. The metric contains anhttp_method
field, for example, ‘GET’, ‘POST’, and others, and anhttp_status
field, for example, ‘2xx’, ‘4xx’, and others.
<service>
is one of ‘cinder’, ‘glance’, ‘heat’ ‘keystone’, ‘neutron’ or
‘nova’.
Logs¶
log_messages
, the number of log messages per second for the given service and severity level. The metric containsservice
andlevel
(one of ‘debug’, ‘info’, and others) fields.
Ceph¶
All Ceph metrics have a cluster
field containing the name of the Ceph
cluster (ceph by default).
For details, see Cluster monitoring and RADOS monitoring.
Cluster¶
ceph_health
, the health status of the entire cluster where values1
,2
,3
representOK
,WARNING
andERROR
, respectively.ceph_monitor_count
, the number of ceph-mon processes.ceph_quorum_count
, the number of ceph-mon processes participating in the quorum.
Pools¶
ceph_pool_total_avail_bytes
, the total available size in bytes for all pools.ceph_pool_total_bytes
, the total number of bytes for all pools.ceph_pool_total_number
, the total number of pools.ceph_pool_total_used_bytes
, the total used size in bytes by all pools.
The following metrics have a pool
field that contains the name of the
Ceph pool.
ceph_pool_bytes_used
, the amount of data in bytes used by the pool.ceph_pool_max_avail
, the available size in bytes for the pool.ceph_pool_objects
, the number of objects in the pool.ceph_pool_op_per_sec
, the number of operations per second for the pool.ceph_pool_pg_num
, the number of placement groups for the pool.ceph_pool_read_bytes_sec
, the number of bytes read by second for the pool.ceph_pool_size
, the number of data replications for the pool.ceph_pool_write_bytes_sec
, the number of bytes written by second for the pool.
Placement Groups¶
ceph_pg_bytes_avail
, the available size in bytes.ceph_pg_bytes_total
, the cluster total size in bytes.ceph_pg_bytes_used
, the data stored size in bytes.ceph_pg_data_bytes
, the stored data size in bytes before it is replicated, cloned or snapshotted.ceph_pg_state
, the number of placement groups in a given state. The metric contains astate
field whose<state>
value is a combination separated by+
of 2 or more states of this list:creating
,active
,clean
,down
,replay
,splitting
,scrubbing
,degraded
,inconsistent
,peering
,repair
,recovering
,recovery_wait
,backfill
,backfill-wait
,backfill_toofull
,incomplete
,stale
,remapped
.ceph_pg_total
, the total number of placement groups.
OSD Daemons¶
ceph_osd_down
, the number of OSD daemons DOWN.ceph_osd_in
, the number of OSD daemons IN.ceph_osd_out
, the number of OSD daemons OUT.ceph_osd_up
, the number of OSD daemons UP.
The following metrics have an osd
field that contains the OSD identifier:
ceph_osd_apply_latency
, apply latency in ms for the given OSD.ceph_osd_commit_latency
, commit latency in ms for the given OSD.ceph_osd_total
, the total size in bytes for the given OSD.ceph_osd_used
, the data stored size in bytes for the given OSD.
OSD Performance¶
All the following metrics are retrieved per OSD daemon from the corresponding
/var/run/ceph/ceph-osd.<ID>.asok
socket by issuing the perf dump
command.
All metrics have an osd
field that contains the OSD identifier.
Note
These metrics are not collected when a node has both the ceph-osd and controller roles.
For details, see OSD performance counters.
ceph_perf_osd_op
, the number of client operations.ceph_perf_osd_op_in_bytes
, the number of bytes received from clients for write operations.ceph_perf_osd_op_latency
, the average latency in ms for client operations (including queue time).ceph_perf_osd_op_out_bytes
, the number of bytes sent to clients for read operations.ceph_perf_osd_op_process_latency
, the average latency in ms for client operations (excluding queue time).ceph_perf_osd_op_r
, the number of client read operations.ceph_perf_osd_op_r_latency
, the average latency in ms for read operation (including queue time).ceph_perf_osd_op_r_out_bytes
, the number of bytes sent to clients for read operations.ceph_perf_osd_op_r_process_latency
, the average latency in ms for read operation (excluding queue time).ceph_perf_osd_op_rw
, the number of client read-modify-write operations.ceph_perf_osd_op_rw_in_bytes
, the number of bytes per second received from clients for read-modify-write operations.ceph_perf_osd_op_rw_latency
, the average latency in ms for read-modify-write operations (including queue time).ceph_perf_osd_op_rw_out_bytes
, the number of bytes per second sent to clients for read-modify-write operations.ceph_perf_osd_op_rw_process_latency
, the average latency in ms for read-modify-write operations (excluding queue time).ceph_perf_osd_op_rw_rlat
, the average latency in ms for read-modify-write operations with readable/applied.ceph_perf_osd_op_w
, the number of client write operations.ceph_perf_osd_op_wip
, the number of replication operations currently being processed (primary).ceph_perf_osd_op_w_in_bytes
, the number of bytes received from clients for write operations.ceph_perf_osd_op_w_latency
, the average latency in ms for write operations (including queue time).ceph_perf_osd_op_w_process_latency
, the average latency in ms for write operation (excluding queue time).ceph_perf_osd_op_w_rlat
, the average latency in ms for write operations with readable/applied.ceph_perf_osd_recovery_ops
, the number of recovery operations in progress.
Pacemaker¶
Resource location¶
pacemaker_resource_local_active
,1
when the resource is located on the host reporting the metric, if not, then0
. The metric contains aresource
field which is one of ‘vip__public’, ‘vip__management’, ‘vip__vrouter_pub’, or ‘vip__vrouter’.
Clusters¶
The cluster metrics are emitted by the GSE plugins. For details, see Configuring alarms.
cluster_node_status
, the status of the node cluster. The metric contains acluster_name
field that identifies the node cluster.cluster_service_status
, the status of the service cluster. The metric contains acluster_name
field that identifies the service cluster.cluster_status
, the status of the global cluster. The metric contains acluster_name
field that identifies the global cluster.
The supported values for these metrics are:
0
for the Okay status.1
for the Warning status.2
for the Unknown status.3
for the Critical status.4
for the Down status.
Self-monitoring¶
System¶
The metrics have a service
field with the name of the service it applies
to. The values can be: hekad
, collectd
, influxd
, grafana-server
or elasticsearch
.
lma_components_count_processes
, the number of processes currently running.lma_components_count_threads
, the number of threads currently running.lma_components_cputime_syst
, the percentage of CPU time spent in system mode by the service. It can be greater than 100% when the node has more than one CPU.lma_components_cputime_user
, the percentage of CPU time spent in user mode by the service. It can be greater than 100% when the node has more than one CPU.lma_components_disk_bytes_read
, the number of bytes read from disk(s) per second.lma_components_disk_bytes_write
, the number of bytes written to disk(s) per second.lma_components_disk_ops_read
, the number of read operations from disk(s) per second.lma_components_disk_ops_write
, the number of write operations to disk(s) per second.lma_components_memory_code
, the physical memory devoted to executable code in bytes.lma_components_memory_data
, the physical memory devoted to other than executable code in bytes.lma_components_memory_rss
, the non-swapped physical memory used in bytes.lma_components_memory_vm
, the virtual memory size in bytes.lma_components_pagefaults_majflt
, major page faults per second.lma_components_pagefaults_minflt
, minor page faults per second.lma_components_stacksize
, the absolute value of the start address (the bottom) of the stack minus the address of the current stack pointer.
Heka pipeline¶
The metrics have two fields: name
that contains the name of the decoder
or filter as defined by Heka and type
that is either decoder or
filter.
The metrics for both types are as follows:
hekad_memory
, the total memory in bytes used by the Sandbox.hekad_msg_avg_duration
, the average time in nanoseconds for processing the message.hekad_msg_count
, the total number of messages processed by the decoder. This resets to0
when the process is restarted.
Additional metrics for filter type:
heakd_timer_event_avg_duration
, the average time in nanoseconds for executing the timer_event function.hekad_timer_event_count
, the total number of executions of the timer_event function. This resets to0
when the process is restarted.
Back-end checks¶
http_check
, the API status of the back end,1
if it is responsive, if not, then0
. The metric contains aservice
field that identifies the LMA back-end service being checked.
<service>
is one of the following values, depending on which Fuel plugins
are deployed in the environment:
- ‘influxdb’
Elasticsearch¶
The following metrics represent the simple status on the health of the cluster. For details, see Cluster health.
elasticsearch_cluster_active_primary_shards
, the number of active primary shards.elasticsearch_cluster_active_shards
, the number of active shards.elasticsearch_cluster_health
, the health status of the entire cluster where values1
,2
,3
representgreen
,yellow
andred
, respectively. Thered
status may also be reported when the Elasticsearch API returns an unexpected result, for example, a network failure.elasticsearch_cluster_initializing_shards
, the number of initializing shards.elasticsearch_cluster_number_of_nodes
, the number of nodes in the cluster.elasticsearch_cluster_number_of_pending_tasks
, the number of pending tasks.elasticsearch_cluster_relocating_shards
, the number of relocating shards.elasticsearch_cluster_unassigned_shards
, the number of unassigned shards.
InfluxDB¶
The following metrics are extracted from the output of the show stats command. The values are reset to zero when InfluxDB is restarted.
cluster¶
The following metrics are only available if there is more than one node in the cluster:
influxdb_cluster_write_shard_points_requests
, the number of requests for writing a time series points to a shard.influxdb_cluster_write_shard_requests
, the number of requests for writing to a shard.
httpd¶
influxdb_httpd_failed_auths
, the number of failed authentications.influxdb_httpd_ping_requests
, the number of ping requests.influxdb_httpd_query_requests
, the number of query requests received.influxdb_httpd_query_response_bytes
, the number of bytes returned to the client.influxdb_httpd_requests
, the number of requests received.influxdb_httpd_write_points_ok
, the number of points successfully written.influxdb_httpd_write_request_bytes
, the number of bytes received for write requests.influxdb_httpd_write_requests
, the number of write requests received.
write¶
influxdb_write_local_point_requests
, the number of write points requests from the local data node.influxdb_write_ok
, the number of successful writes of consistency level.influxdb_write_point_requests
, the number of write points requests across all data nodes.influxdb_write_remote_point_requests
, the number of write points requests to remote data nodes.influxdb_write_requests
, the number of write requests across all data nodes.influxdb_write_sub_ok
, the number of successful points sent to subscriptions.
runtime¶
influxdb_garbage_collections
, the number of garbage collections.influxdb_go_routines
, the number of Golang routines.influxdb_heap_idle
, the number of bytes in idle spans.influxdb_heap_in_use
, the number of bytes in non-idle spans.influxdb_heap_objects
, the total number of allocated objects.influxdb_heap_released
, the number of bytes released to the operating system.influxdb_heap_system
, the number of bytes obtained from the system.influxdb_memory_alloc
, the number of bytes allocated and not yet freed.influxdb_memory_frees
, the number of free operations.influxdb_memory_lookups
, the number of pointer lookups.influxdb_memory_mallocs
, the number of malloc operations.influxdb_memory_system
, the number of bytes obtained from the system.influxdb_memory_total_alloc
, the number of bytes allocated (even if freed).