Improve procstat readme

This commit is contained in:
Daniel Nelson 2018-02-01 16:12:08 -08:00
parent a7571d5730
commit 23933e1139
No known key found for this signature in database
GPG Key ID: CAAD59C9444F6155
2 changed files with 140 additions and 140 deletions

View File

@ -1,140 +1,138 @@
# Telegraf plugin: procstat
# Procstat Input Plugin
#### Description
The procstat plugin can be used to monitor the system resource usage of one or more processes.
The procstat plugin can be used to monitor system resource usage by an
individual process using their /proc data.
Processes can be selected for monitoring using one of several methods:
- pidfile
- exe
- pattern
- user
- systemd_unit
- cgroup
Processes can be specified either by pid file, by executable name, by command
line pattern matching, by username, by systemd unit name, or by cgroup name/path
(in this order or priority). Procstat plugin will use `pgrep` when executable
name is provided to obtain the pid. Procstat plugin will transmit IO, memory,
cpu, file descriptor related measurements for every process specified. A prefix
can be set to isolate individual process specific measurements.
### Configuration:
The plugin will tag processes according to how they are specified in the configuration. If a pid file is used, a "pidfile" tag will be generated.
On the other hand, if an executable is used an "exe" tag will be generated. Possible tag names:
* pidfile
* exe
* pattern
* user
* systemd_unit
* cgroup
Additionally the plugin will tag processes by their PID (pid_tag = true in the config) and their process name:
* pid
* process_name
### Windows
On windows we only support exe and pattern. Both of these are implemented using WMI queries. exe is on the Name field and pattern is on the CommandLine field.
Windows Support:
* exe (WMI Name)
* pattern (WMI CommandLine)
this allows you to do fuzzy matching but only what is supported by [WMI query patterns](https://msdn.microsoft.com/en-us/library/aa392263(v=vs.85).aspx).
Example:
Windows fuzzy matching:
```[[inputs.procstat]]
exe = "%influx%"
process_name="influxd"
prefix = "influxd"
```
### Linux
```
```toml
# Monitor process cpu and memory usage
[[inputs.procstat]]
exe = "influxd"
prefix = "influxd"
## PID file to monitor process
pid_file = "/var/run/nginx.pid"
## executable name (ie, pgrep <exe>)
# exe = "nginx"
## pattern as argument for pgrep (ie, pgrep -f <pattern>)
# pattern = "nginx"
## user as argument for pgrep (ie, pgrep -u <user>)
# user = "nginx"
## Systemd unit name
# systemd_unit = "nginx.service"
## CGroup name or path
# cgroup = "systemd/system.slice/nginx.service"
## override for process_name
## This is optional; default is sourced from /proc/<pid>/status
# process_name = "bar"
## Field name prefix
# prefix = ""
## Add PID as a tag instead of a field; useful to differentiate between
## processes whose tags are otherwise the same. Can create a large number
## of series, use judiciously.
# pid_tag = false
## Method to use when finding process IDs. Can be one of 'pgrep', or
## 'native'. The pgrep finder calls the pgrep executable in the PATH while
## the native finder performs the search directly in a manor dependent on the
## platform. Default is 'pgrep'
# pid_finder = "pgrep"
```
#### Windows support
Preliminary support for Windows has been added, however you may prefer using
the `win_perf_counters` input plugin as a more mature alternative.
When using the `pid_finder = "native"` in Windows, the pattern lookup method is
implemented as a WMI query. The pattern allows fuzzy matching using only
[WMI query patterns](https://msdn.microsoft.com/en-us/library/aa392263(v=vs.85).aspx):
```toml
[[inputs.procstat]]
pid_file = "/var/run/lxc/dnsmasq.pid"
pattern = "%influx%"
pid_finder = "native"
```
The above configuration would result in output like:
### Metrics:
- procstat
- tags:
- pid (when `pid_tag` is true)
- process_name
- pidfile (when defined)
- exe (when defined)
- pattern (when defined)
- user (when selected)
- systemd_unit (when defined)
- cgroup (when defined)
- fields:
- cpu_time (int)
- cpu_time_guest (float)
- cpu_time_guest_nice (float)
- cpu_time_idle (float)
- cpu_time_iowait (float)
- cpu_time_irq (float)
- cpu_time_nice (float)
- cpu_time_soft_irq (float)
- cpu_time_steal (float)
- cpu_time_stolen (float)
- cpu_time_system (float)
- cpu_time_user (float)
- cpu_usage (float)
- involuntary_context_switches (int)
- memory_data (int)
- memory_locked (int)
- memory_rss (int)
- memory_stack (int)
- memory_swap (int)
- memory_vms (int)
- nice_priority (int)
- num_fds (int, *telegraf* may need to be ran as **root**)
- num_threads (int)
- pid (int)
- read_bytes (int, *telegraf* may need to be ran as **root**)
- read_count (int, *telegraf* may need to be ran as **root**)
- realtime_priority (int)
- rlimit_cpu_time_hard (int)
- rlimit_cpu_time_soft (int)
- rlimit_file_locks_hard (int)
- rlimit_file_locks_soft (int)
- rlimit_memory_data_hard (int)
- rlimit_memory_data_soft (int)
- rlimit_memory_locked_hard (int)
- rlimit_memory_locked_soft (int)
- rlimit_memory_rss_hard (int)
- rlimit_memory_rss_soft (int)
- rlimit_memory_stack_hard (int)
- rlimit_memory_stack_soft (int)
- rlimit_memory_vms_hard (int)
- rlimit_memory_vms_soft (int)
- rlimit_nice_priority_hard (int)
- rlimit_nice_priority_soft (int)
- rlimit_num_fds_hard (int)
- rlimit_num_fds_soft (int)
- rlimit_realtime_priority_hard (int)
- rlimit_realtime_priority_soft (int)
- rlimit_signals_pending_hard (int)
- rlimit_signals_pending_soft (int)
- signals_pending (int)
- voluntary_context_switches (int)
- write_bytes (int, *telegraf* may need to be ran as **root**)
- write_count (int, *telegraf* may need to be ran as **root**)
*NOTE: Resource limit > 2147483647 will be reported as 2147483647.*
### Example Output:
```
> procstat,pidfile=/var/run/lxc/dnsmasq.pid,process_name=dnsmasq rlimit_file_locks_soft=2147483647i,rlimit_signals_pending_hard=1758i,voluntary_context_switches=478i,read_bytes=307200i,cpu_time_user=0.01,cpu_time_guest=0,memory_swap=0i,memory_locked=0i,rlimit_num_fds_hard=4096i,rlimit_nice_priority_hard=0i,num_fds=11i,involuntary_context_switches=20i,read_count=23i,memory_rss=1388544i,rlimit_memory_rss_soft=2147483647i,rlimit_memory_rss_hard=2147483647i,nice_priority=20i,rlimit_cpu_time_hard=2147483647i,cpu_time=0i,write_bytes=0i,cpu_time_idle=0,cpu_time_nice=0,memory_data=229376i,memory_stack=135168i,rlimit_cpu_time_soft=2147483647i,rlimit_memory_data_hard=2147483647i,rlimit_memory_locked_hard=65536i,rlimit_signals_pending_soft=1758i,write_count=11i,cpu_time_iowait=0,cpu_time_steal=0,cpu_time_stolen=0,rlimit_memory_stack_soft=8388608i,cpu_time_system=0.02,cpu_time_guest_nice=0,rlimit_memory_locked_soft=65536i,rlimit_memory_vms_soft=2147483647i,rlimit_file_locks_hard=2147483647i,rlimit_realtime_priority_hard=0i,pid=828i,num_threads=1i,cpu_time_soft_irq=0,rlimit_memory_vms_hard=2147483647i,rlimit_realtime_priority_soft=0i,memory_vms=15884288i,rlimit_memory_stack_hard=2147483647i,cpu_time_irq=0,rlimit_memory_data_soft=2147483647i,rlimit_num_fds_soft=1024i,signals_pending=0i,rlimit_nice_priority_soft=0i,realtime_priority=0i
> procstat,exe=influxd,process_name=influxd rlimit_num_fds_hard=16384i,rlimit_signals_pending_hard=1758i,realtime_priority=0i,rlimit_memory_vms_hard=2147483647i,rlimit_signals_pending_soft=1758i,cpu_time_stolen=0,rlimit_memory_stack_hard=2147483647i,rlimit_realtime_priority_hard=0i,cpu_time=0i,pid=500i,voluntary_context_switches=975i,cpu_time_idle=0,memory_rss=3072000i,memory_locked=0i,rlimit_nice_priority_soft=0i,signals_pending=0i,nice_priority=20i,read_bytes=823296i,cpu_time_soft_irq=0,rlimit_memory_data_hard=2147483647i,rlimit_memory_locked_soft=65536i,write_count=8i,cpu_time_irq=0,memory_vms=33501184i,rlimit_memory_stack_soft=8388608i,cpu_time_iowait=0,rlimit_memory_vms_soft=2147483647i,rlimit_nice_priority_hard=0i,num_fds=29i,memory_data=229376i,rlimit_cpu_time_soft=2147483647i,rlimit_file_locks_soft=2147483647i,num_threads=1i,write_bytes=0i,cpu_time_steal=0,rlimit_memory_rss_hard=2147483647i,cpu_time_guest=0,cpu_time_guest_nice=0,cpu_usage=0,rlimit_memory_locked_hard=65536i,rlimit_file_locks_hard=2147483647i,involuntary_context_switches=38i,read_count=16851i,memory_swap=0i,rlimit_memory_data_soft=2147483647i,cpu_time_user=0.11,rlimit_cpu_time_hard=2147483647i,rlimit_num_fds_soft=16384i,rlimit_realtime_priority_soft=0i,cpu_time_system=0.27,cpu_time_nice=0,memory_stack=135168i,rlimit_memory_rss_soft=2147483647i
procstat,pidfile=/var/run/lxc/dnsmasq.pid,process_name=dnsmasq rlimit_file_locks_soft=2147483647i,rlimit_signals_pending_hard=1758i,voluntary_context_switches=478i,read_bytes=307200i,cpu_time_user=0.01,cpu_time_guest=0,memory_swap=0i,memory_locked=0i,rlimit_num_fds_hard=4096i,rlimit_nice_priority_hard=0i,num_fds=11i,involuntary_context_switches=20i,read_count=23i,memory_rss=1388544i,rlimit_memory_rss_soft=2147483647i,rlimit_memory_rss_hard=2147483647i,nice_priority=20i,rlimit_cpu_time_hard=2147483647i,cpu_time=0i,write_bytes=0i,cpu_time_idle=0,cpu_time_nice=0,memory_data=229376i,memory_stack=135168i,rlimit_cpu_time_soft=2147483647i,rlimit_memory_data_hard=2147483647i,rlimit_memory_locked_hard=65536i,rlimit_signals_pending_soft=1758i,write_count=11i,cpu_time_iowait=0,cpu_time_steal=0,cpu_time_stolen=0,rlimit_memory_stack_soft=8388608i,cpu_time_system=0.02,cpu_time_guest_nice=0,rlimit_memory_locked_soft=65536i,rlimit_memory_vms_soft=2147483647i,rlimit_file_locks_hard=2147483647i,rlimit_realtime_priority_hard=0i,pid=828i,num_threads=1i,cpu_time_soft_irq=0,rlimit_memory_vms_hard=2147483647i,rlimit_realtime_priority_soft=0i,memory_vms=15884288i,rlimit_memory_stack_hard=2147483647i,cpu_time_irq=0,rlimit_memory_data_soft=2147483647i,rlimit_num_fds_soft=1024i,signals_pending=0i,rlimit_nice_priority_soft=0i,realtime_priority=0i
procstat,exe=influxd,process_name=influxd rlimit_num_fds_hard=16384i,rlimit_signals_pending_hard=1758i,realtime_priority=0i,rlimit_memory_vms_hard=2147483647i,rlimit_signals_pending_soft=1758i,cpu_time_stolen=0,rlimit_memory_stack_hard=2147483647i,rlimit_realtime_priority_hard=0i,cpu_time=0i,pid=500i,voluntary_context_switches=975i,cpu_time_idle=0,memory_rss=3072000i,memory_locked=0i,rlimit_nice_priority_soft=0i,signals_pending=0i,nice_priority=20i,read_bytes=823296i,cpu_time_soft_irq=0,rlimit_memory_data_hard=2147483647i,rlimit_memory_locked_soft=65536i,write_count=8i,cpu_time_irq=0,memory_vms=33501184i,rlimit_memory_stack_soft=8388608i,cpu_time_iowait=0,rlimit_memory_vms_soft=2147483647i,rlimit_nice_priority_hard=0i,num_fds=29i,memory_data=229376i,rlimit_cpu_time_soft=2147483647i,rlimit_file_locks_soft=2147483647i,num_threads=1i,write_bytes=0i,cpu_time_steal=0,rlimit_memory_rss_hard=2147483647i,cpu_time_guest=0,cpu_time_guest_nice=0,cpu_usage=0,rlimit_memory_locked_hard=65536i,rlimit_file_locks_hard=2147483647i,involuntary_context_switches=38i,read_count=16851i,memory_swap=0i,rlimit_memory_data_soft=2147483647i,cpu_time_user=0.11,rlimit_cpu_time_hard=2147483647i,rlimit_num_fds_soft=16384i,rlimit_realtime_priority_soft=0i,cpu_time_system=0.27,cpu_time_nice=0,memory_stack=135168i,rlimit_memory_rss_soft=2147483647i
```
# Measurements
Note: prefix can be set by the user, per process.
Threads related measurement names:
- procstat_[prefix_]num_threads value=5
File descriptor related measurement names (*telegraf* needs to run as **root**):
- procstat_[prefix_]num_fds value=4
Priority related measurement names:
- procstat_[prefix_]realtime_priority value=0
- procstat_[prefix_]nice_priority value=20
Signals related measurement names:
- procstat_[prefix_]signals_pending value=0
Context switch related measurement names:
- procstat_[prefix_]voluntary_context_switches value=250
- procstat_[prefix_]involuntary_context_switches value=0
I/O related measurement names (*telegraf* needs to run as **root**):
- procstat_[prefix_]read_count value=396
- procstat_[prefix_]write_count value=1
- procstat_[prefix_]read_bytes value=1019904
- procstat_[prefix_]write_bytes value=1
CPU related measurement names:
- procstat_[prefix_]cpu_time value=0.01
- procstat_[prefix_]cpu_time_user value=0
- procstat_[prefix_]cpu_time_system value=0.01
- procstat_[prefix_]cpu_time_idle value=0
- procstat_[prefix_]cpu_time_nice value=0
- procstat_[prefix_]cpu_time_iowait value=0
- procstat_[prefix_]cpu_time_irq value=0
- procstat_[prefix_]cpu_time_soft_irq value=0
- procstat_[prefix_]cpu_time_steal value=0
- procstat_[prefix_]cpu_time_stolen value=0
- procstat_[prefix_]cpu_time_guest value=0
- procstat_[prefix_]cpu_time_guest_nice value=0
Memory related measurement names:
- procstat_[prefix_]memory_rss value=1777664
- procstat_[prefix_]memory_vms value=24227840
- procstat_[prefix_]memory_swap value=282624
- procstat_[prefix_]memory_data value=229376
- procstat_[prefix_]memory_stack value=135168
- procstat_[prefix_]memory_locked value=0
Resource limits:
- procstat_[prefix_]rlimit_cpu_time_hard value=2147483647
- procstat_[prefix_]rlimit_cpu_time_soft value=2147483647
- procstat_[prefix_]rlimit_file_locks_hard value=2147483647
- procstat_[prefix_]rlimit_file_locks_soft value=2147483647
- procstat_[prefix_]rlimit_memory_data_hard value=2147483647
- procstat_[prefix_]rlimit_memory_data_soft value=2147483647
- procstat_[prefix_]rlimit_memory_locked_hard value=65536
- procstat_[prefix_]rlimit_memory_locked_soft value=65536
- procstat_[prefix_]rlimit_memory_rss_hard value=2147483647
- procstat_[prefix_]rlimit_memory_rss_soft value=2147483647
- procstat_[prefix_]rlimit_memory_stack_hard value=2147483647
- procstat_[prefix_]rlimit_memory_stack_soft value=8388608
- procstat_[prefix_]rlimit_memory_vms_hard value=2147483647
- procstat_[prefix_]rlimit_memory_vms_soft value=2147483647
- procstat_[prefix_]rlimit_nice_priority_hard value=0
- procstat_[prefix_]rlimit_nice_priority_soft value=0
- procstat_[prefix_]rlimit_num_fds_hard value=16384
- procstat_[prefix_]rlimit_num_fds_soft value=16384
- procstat_[prefix_]rlimit_realtime_priority_hard value=0
- procstat_[prefix_]rlimit_realtime_priority_soft value=0
- procstat_[prefix_]rlimit_signals_pending_hard value=1758
- procstat_[prefix_]rlimit_signals_pending_soft value=1758
*NOTE: Due to a limitation in an underlying library Telegraf uses, any resource limit > 2147483647 will be misreported as 2147483647.*

View File

@ -41,12 +41,6 @@ type Procstat struct {
}
var sampleConfig = `
## pidFinder can be pgrep or native
## pgrep tries to exec pgrep
## native will work on all platforms, unix systems will use regexp.
## Windows will use WMI calls with like queries
pid_finder = "native"
## Must specify one of: pid_file, exe, or pattern
## PID file to monitor process
pid_file = "/var/run/nginx.pid"
## executable name (ie, pgrep <exe>)
@ -63,12 +57,20 @@ var sampleConfig = `
## override for process_name
## This is optional; default is sourced from /proc/<pid>/status
# process_name = "bar"
## Field name prefix
prefix = ""
## comment this out if you want raw cpu_time stats
fielddrop = ["cpu_time_*"]
## This is optional; moves pid into a tag instead of a field
pid_tag = false
# prefix = ""
## Add PID as a tag instead of a field; useful to differentiate between
## processes whose tags are otherwise the same. Can create a large number
## of series, use judiciously.
# pid_tag = false
## Method to use when finding process IDs. Can be one of 'pgrep', or
## 'native'. The pgrep finder calls the pgrep executable in the PATH while
## the native finder performs the search directly in a manor dependent on the
## platform. Default is 'pgrep'
# pid_finder = "pgrep"
`
func (_ *Procstat) SampleConfig() string {
@ -308,7 +310,7 @@ func (p *Procstat) findPids() ([]PID, map[string]string, error) {
pids, err = p.cgroupPIDs()
tags = map[string]string{"cgroup": p.CGroup}
} else {
err = fmt.Errorf("Either exe, pid_file, user, or pattern has to be specified")
err = fmt.Errorf("Either exe, pid_file, user, pattern, systemd_unit, or cgroup must be specified")
}
return pids, tags, err