e3e6fa4f1d
In many default installations ceph and varnish binaries require root permissions or specific user permissions to actually gather metrics. Although it's possible to instrumented system configuration to either run telegraf with root privileges or share the same user/permissions as measured software it's not a default setup. In my opinion ability to run binary with `sudo` makes entire setup simpler without additional interaction. My changes allow to use configuration like: ``` binary = "/usr/bin/sudo /usr/bin/varnishstat" ceph_binary = "/usr/bin/sudo /usr/bin/ceph" ``` With a proper sudo configuration it will simplify amount of changes needed to collect metrics. I've also changed type of varnish metrics. It was `int` but in varnish source code we can find all metrics are actually uint64 (https://github.com/varnishcache/varnish-cache/blob/master/include/tbl/vsc_f_main.h) I hit that issue with some metrics: ``` ~# varnishstat -1 | grep -i MAIN.n_objectcore MAIN.n_objectcore 18446744073709550877 . objectcore structs made ``` |
||
---|---|---|
.. | ||
README.md | ||
ceph.go | ||
ceph_test.go |
README.md
Ceph Storage Input Plugin
Collects performance metrics from the MON and OSD nodes in a Ceph storage cluster.
The plugin works by scanning the configured SocketDir for OSD and MON socket files. When it finds a MON socket, it runs ceph --admin-daemon $file perfcounters_dump. For OSDs it runs ceph --admin-daemon $file perf dump
The resulting JSON is parsed and grouped into collections, based on top-level key. Top-level keys are used as collection tags, and all sub-keys are flattened. For example:
{
"paxos": {
"refresh": 9363435,
"refresh_latency": {
"avgcount": 9363435,
"sum": 5378.794002000
}
}
}
Would be parsed into the following metrics, all of which would be tagged with collection=paxos:
- refresh = 9363435
- refresh_latency.avgcount: 9363435
- refresh_latency.sum: 5378.794002000
Configuration:
# Collects performance metrics from the MON and OSD nodes in a Ceph storage cluster.
[[inputs.ceph]]
## All configuration values are optional, defaults are shown below
## location of ceph binary
ceph_binary = "/usr/bin/ceph"
## directory in which to look for socket files
socket_dir = "/var/run/ceph"
## prefix of MON and OSD socket files, used to determine socket type
mon_prefix = "ceph-mon"
osd_prefix = "ceph-osd"
## suffix used to identify socket files
socket_suffix = "asok"
Measurements & Fields:
All fields are collected under the ceph measurement and stored as float64s. For a full list of fields, see the sample perf dumps in ceph_test.go.
Tags:
All measurements will have the following tags:
- type: either 'osd' or 'mon' to indicate which type of node was queried
- id: a unique string identifier, parsed from the socket file name for the node
- collection: the top-level key under which these fields were reported. Possible values are:
- for MON nodes:
- cluster
- leveldb
- mon
- paxos
- throttle-mon_client_bytes
- throttle-mon_daemon_bytes
- throttle-msgr_dispatch_throttler-mon
- for OSD nodes:
- WBThrottle
- filestore
- leveldb
- mutex-FileJournal::completions_lock
- mutex-FileJournal::finisher_lock
- mutex-FileJournal::write_lock
- mutex-FileJournal::writeq_lock
- mutex-JOS::ApplyManager::apply_lock
- mutex-JOS::ApplyManager::com_lock
- mutex-JOS::SubmitManager::lock
- mutex-WBThrottle::lock
- objecter
- osd
- recoverystate_perf
- throttle-filestore_bytes
- throttle-filestore_ops
- throttle-msgr_dispatch_throttler-client
- throttle-msgr_dispatch_throttler-cluster
- throttle-msgr_dispatch_throttler-hb_back_server
- throttle-msgr_dispatch_throttler-hb_front_serve
- throttle-msgr_dispatch_throttler-hbclient
- throttle-msgr_dispatch_throttler-ms_objecter
- throttle-objecter_bytes
- throttle-objecter_ops
- throttle-osd_client_bytes
- throttle-osd_client_messages
- for MON nodes:
Example Output:
telegraf -test -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d -input-filter ceph * Plugin: ceph, Collection 1 > ceph,collection=paxos, id=node-2,role=openstack,type=mon accept_timeout=0,begin=14931264,begin_bytes.avgcount=14931264,begin_bytes.sum=180309683362,begin_keys.avgcount=0,begin_keys.sum=0,begin_latency.avgcount=14931264,begin_latency.sum=9293.29589,collect=1,collect_bytes.avgcount=1,collect_bytes.sum=24,collect_keys.avgcount=1,collect_keys.sum=1,collect_latency.avgcount=1,collect_latency.sum=0.00028,collect_timeout=0,collect_uncommitted=0,commit=14931264,commit_bytes.avgcount=0,commit_bytes.sum=0,commit_keys.avgcount=0,commit_keys.sum=0,commit_latency.avgcount=0,commit_latency.sum=0,lease_ack_timeout=0,lease_timeout=0,new_pn=0,new_pn_latency.avgcount=0,new_pn_latency.sum=0,refresh=14931264,refresh_latency.avgcount=14931264,refresh_latency.sum=8706.98498,restart=4,share_state=0,share_state_bytes.avgcount=0,share_state_bytes.sum=0,share_state_keys.avgcount=0,share_state_keys.sum=0,start_leader=0,start_peon=1,store_state=14931264,store_state_bytes.avgcount=14931264,store_state_bytes.sum=353119959211,store_state_keys.avgcount=14931264,store_state_keys.sum=289807523,store_state_latency.avgcount=14931264,store_state_latency.sum=10952.835724 1462821234814535148 > ceph,collection=throttle-mon_client_bytes,id=node-2,type=mon get=1413017,get_or_fail_fail=0,get_or_fail_success=0,get_sum=71211705,max=104857600,put=1413013,put_sum=71211459,take=0,take_sum=0,val=246,wait.avgcount=0,wait.sum=0 1462821234814737219 > ceph,collection=throttle-mon_daemon_bytes,id=node-2,type=mon get=4058121,get_or_fail_fail=0,get_or_fail_success=0,get_sum=6027348117,max=419430400,put=4058121,put_sum=6027348117,take=0,take_sum=0,val=0,wait.avgcount=0,wait.sum=0 1462821234814815661 > ceph,collection=throttle-msgr_dispatch_throttler-mon,id=node-2,type=mon get=54276277,get_or_fail_fail=0,get_or_fail_success=0,get_sum=370232877040,max=104857600,put=54276277,put_sum=370232877040,take=0,take_sum=0,val=0,wait.avgcount=0,wait.sum=0 1462821234814872064