telegraf/plugins/inputs/zfs/README.md

330 lines
14 KiB
Markdown
Raw Permalink Normal View History

# ZFS plugin
2015-11-03 15:53:09 +00:00
This ZFS plugin provides metrics from your ZFS filesystems. It supports ZFS on
Linux and FreeBSD. It gets ZFS stat from `/proc/spl/kstat/zfs` on Linux and
from `sysctl` and `zpool` on FreeBSD.
2015-11-03 15:53:09 +00:00
### Configuration:
2015-11-03 15:53:09 +00:00
```toml
[[inputs.zfs]]
## ZFS kstat path. Ignored on FreeBSD
## If not specified, then default is:
# kstatPath = "/proc/spl/kstat/zfs"
2015-11-03 15:53:09 +00:00
## By default, telegraf gather all zfs stats
## Override the stats list using the kstatMetrics array:
## For FreeBSD, the default is:
# kstatMetrics = ["arcstats", "zfetchstats", "vdev_cache_stats"]
## For Linux, the default is:
# kstatMetrics = ["abdstats", "arcstats", "dnodestats", "dbufcachestats",
# "dmu_tx", "fm", "vdev_mirror_stats", "zfetchstats", "zil"]
2015-11-03 15:53:09 +00:00
## By default, don't gather zpool stats
# poolMetrics = false
```
2015-11-03 15:53:09 +00:00
### Measurements & Fields:
By default this plugin collects metrics about ZFS internals and pool.
These metrics are either counters or measure sizes
in bytes. These metrics will be in the `zfs` measurement with the field
names listed bellow.
If `poolMetrics` is enabled then additional metrics will be gathered for
each pool.
- zfs
With fields listed bellow.
#### ARC Stats (FreeBSD and Linux)
- arcstats_allocated (FreeBSD only)
- arcstats_anon_evict_data (Linux only)
- arcstats_anon_evict_metadata (Linux only)
- arcstats_anon_evictable_data (FreeBSD only)
- arcstats_anon_evictable_metadata (FreeBSD only)
- arcstats_anon_size
- arcstats_arc_loaned_bytes (Linux only)
- arcstats_arc_meta_limit
- arcstats_arc_meta_max
- arcstats_arc_meta_min (FreeBSD only)
- arcstats_arc_meta_used
- arcstats_arc_no_grow (Linux only)
- arcstats_arc_prune (Linux only)
- arcstats_arc_tempreserve (Linux only)
- arcstats_c
- arcstats_c_max
- arcstats_c_min
- arcstats_data_size
- arcstats_deleted
2015-11-03 15:53:09 +00:00
- arcstats_demand_data_hits
- arcstats_demand_data_misses
- arcstats_demand_hit_predictive_prefetch (FreeBSD only)
2015-11-03 15:53:09 +00:00
- arcstats_demand_metadata_hits
- arcstats_demand_metadata_misses
- arcstats_duplicate_buffers
- arcstats_duplicate_buffers_size
- arcstats_duplicate_reads
2015-11-03 15:53:09 +00:00
- arcstats_evict_l2_cached
- arcstats_evict_l2_eligible
- arcstats_evict_l2_ineligible
- arcstats_evict_l2_skip (FreeBSD only)
- arcstats_evict_not_enough (FreeBSD only)
- arcstats_evict_skip
- arcstats_hash_chain_max
- arcstats_hash_chains
- arcstats_hash_collisions
2015-11-03 15:53:09 +00:00
- arcstats_hash_elements
- arcstats_hash_elements_max
- arcstats_hdr_size
- arcstats_hits
- arcstats_l2_abort_lowmem
- arcstats_l2_asize
- arcstats_l2_cdata_free_on_write
- arcstats_l2_cksum_bad
- arcstats_l2_compress_failures
- arcstats_l2_compress_successes
- arcstats_l2_compress_zeros
- arcstats_l2_evict_l1cached (FreeBSD only)
- arcstats_l2_evict_lock_retry
- arcstats_l2_evict_reading
- arcstats_l2_feeds
- arcstats_l2_free_on_write
- arcstats_l2_hdr_size
2015-11-03 15:53:09 +00:00
- arcstats_l2_hits
- arcstats_l2_io_error
2015-11-03 15:53:09 +00:00
- arcstats_l2_misses
- arcstats_l2_read_bytes
- arcstats_l2_rw_clash
- arcstats_l2_size
- arcstats_l2_write_buffer_bytes_scanned (FreeBSD only)
- arcstats_l2_write_buffer_iter (FreeBSD only)
- arcstats_l2_write_buffer_list_iter (FreeBSD only)
- arcstats_l2_write_buffer_list_null_iter (FreeBSD only)
2015-11-03 15:53:09 +00:00
- arcstats_l2_write_bytes
- arcstats_l2_write_full (FreeBSD only)
- arcstats_l2_write_in_l2 (FreeBSD only)
- arcstats_l2_write_io_in_progress (FreeBSD only)
- arcstats_l2_write_not_cacheable (FreeBSD only)
- arcstats_l2_write_passed_headroom (FreeBSD only)
- arcstats_l2_write_pios (FreeBSD only)
- arcstats_l2_write_spa_mismatch (FreeBSD only)
- arcstats_l2_write_trylock_fail (FreeBSD only)
2015-11-03 15:53:09 +00:00
- arcstats_l2_writes_done
- arcstats_l2_writes_error
- arcstats_l2_writes_hdr_miss (Linux only)
- arcstats_l2_writes_lock_retry (FreeBSD only)
- arcstats_l2_writes_sent
- arcstats_memory_direct_count (Linux only)
- arcstats_memory_indirect_count (Linux only)
2015-11-03 15:53:09 +00:00
- arcstats_memory_throttle_count
- arcstats_meta_size (Linux only)
- arcstats_mfu_evict_data (Linux only)
- arcstats_mfu_evict_metadata (Linux only)
- arcstats_mfu_ghost_evict_data (Linux only)
- arcstats_mfu_ghost_evict_metadata (Linux only)
- arcstats_metadata_size (FreeBSD only)
- arcstats_mfu_evictable_data (FreeBSD only)
- arcstats_mfu_evictable_metadata (FreeBSD only)
- arcstats_mfu_ghost_evictable_data (FreeBSD only)
- arcstats_mfu_ghost_evictable_metadata (FreeBSD only)
- arcstats_mfu_ghost_hits
- arcstats_mfu_ghost_size
- arcstats_mfu_hits
- arcstats_mfu_size
- arcstats_misses
- arcstats_mru_evict_data (Linux only)
- arcstats_mru_evict_metadata (Linux only)
- arcstats_mru_ghost_evict_data (Linux only)
- arcstats_mru_ghost_evict_metadata (Linux only)
- arcstats_mru_evictable_data (FreeBSD only)
- arcstats_mru_evictable_metadata (FreeBSD only)
- arcstats_mru_ghost_evictable_data (FreeBSD only)
- arcstats_mru_ghost_evictable_metadata (FreeBSD only)
- arcstats_mru_ghost_hits
- arcstats_mru_ghost_size
- arcstats_mru_hits
- arcstats_mru_size
- arcstats_mutex_miss
- arcstats_other_size
- arcstats_p
- arcstats_prefetch_data_hits
- arcstats_prefetch_data_misses
- arcstats_prefetch_metadata_hits
- arcstats_prefetch_metadata_misses
- arcstats_recycle_miss (Linux only)
- arcstats_size
- arcstats_sync_wait_for_async (FreeBSD only)
#### Zfetch Stats (FreeBSD and Linux)
- zfetchstats_bogus_streams (Linux only)
- zfetchstats_colinear_hits (Linux only)
- zfetchstats_colinear_misses (Linux only)
2015-11-03 15:53:09 +00:00
- zfetchstats_hits
- zfetchstats_max_streams (FreeBSD only)
2015-11-03 15:53:09 +00:00
- zfetchstats_misses
- zfetchstats_reclaim_failures (Linux only)
- zfetchstats_reclaim_successes (Linux only)
- zfetchstats_streams_noresets (Linux only)
- zfetchstats_streams_resets (Linux only)
- zfetchstats_stride_hits (Linux only)
- zfetchstats_stride_misses (Linux only)
#### Vdev Cache Stats (FreeBSD)
2015-11-03 15:53:09 +00:00
- vdev_cache_stats_delegations
- vdev_cache_stats_hits
- vdev_cache_stats_misses
#### Pool Metrics (optional)
2015-11-03 15:53:09 +00:00
On Linux (reference: kstat accumulated time and queue length statistics):
2015-11-03 15:53:09 +00:00
- zfs_pool
- nread (integer, bytes)
- nwritten (integer, bytes)
- reads (integer, count)
- writes (integer, count)
- wtime (integer, nanoseconds)
- wlentime (integer, queuelength * nanoseconds)
- wupdate (integer, timestamp)
- rtime (integer, nanoseconds)
- rlentime (integer, queuelength * nanoseconds)
- rupdate (integer, timestamp)
- wcnt (integer, count)
- rcnt (integer, count)
2015-11-03 15:53:09 +00:00
On FreeBSD:
2015-11-03 15:53:09 +00:00
- zfs_pool
- allocated (integer, bytes)
- capacity (integer, bytes)
- dedupratio (float, ratio)
- free (integer, bytes)
- size (integer, bytes)
- fragmentation (integer, percent)
2015-11-03 15:53:09 +00:00
### Tags:
2015-11-03 15:53:09 +00:00
- ZFS stats (`zfs`) will have the following tag:
- pools - A `::` concatenated list of all ZFS pools on the machine.
2015-11-03 15:53:09 +00:00
- Pool metrics (`zfs_pool`) will have the following tag:
- pool - with the name of the pool which the metrics are for.
- health - the health status of the pool. (FreeBSD only)
2015-11-03 15:53:09 +00:00
### Example Output:
2015-11-03 15:53:09 +00:00
```
$ ./telegraf --config telegraf.conf --input-filter zfs --test
* Plugin: zfs, Collection 1
> zfs_pool,health=ONLINE,pool=zroot allocated=1578590208i,capacity=2i,dedupratio=1,fragmentation=1i,free=64456531968i,size=66035122176i 1464473103625653908
> zfs,pools=zroot arcstats_allocated=4167764i,arcstats_anon_evictable_data=0i,arcstats_anon_evictable_metadata=0i,arcstats_anon_size=16896i,arcstats_arc_meta_limit=10485760i,arcstats_arc_meta_max=115269568i,arcstats_arc_meta_min=8388608i,arcstats_arc_meta_used=51977456i,arcstats_c=16777216i,arcstats_c_max=41943040i,arcstats_c_min=16777216i,arcstats_data_size=0i,arcstats_deleted=1699340i,arcstats_demand_data_hits=14836131i,arcstats_demand_data_misses=2842945i,arcstats_demand_hit_predictive_prefetch=0i,arcstats_demand_metadata_hits=1655006i,arcstats_demand_metadata_misses=830074i,arcstats_duplicate_buffers=0i,arcstats_duplicate_buffers_size=0i,arcstats_duplicate_reads=123i,arcstats_evict_l2_cached=0i,arcstats_evict_l2_eligible=332172623872i,arcstats_evict_l2_ineligible=6168576i,arcstats_evict_l2_skip=0i,arcstats_evict_not_enough=12189444i,arcstats_evict_skip=195190764i,arcstats_hash_chain_max=2i,arcstats_hash_chains=10i,arcstats_hash_collisions=43134i,arcstats_hash_elements=2268i,arcstats_hash_elements_max=6136i,arcstats_hdr_size=565632i,arcstats_hits=16515778i,arcstats_l2_abort_lowmem=0i,arcstats_l2_asize=0i,arcstats_l2_cdata_free_on_write=0i,arcstats_l2_cksum_bad=0i,arcstats_l2_compress_failures=0i,arcstats_l2_compress_successes=0i,arcstats_l2_compress_zeros=0i,arcstats_l2_evict_l1cached=0i,arcstats_l2_evict_lock_retry=0i,arcstats_l2_evict_reading=0i,arcstats_l2_feeds=0i,arcstats_l2_free_on_write=0i,arcstats_l2_hdr_size=0i,arcstats_l2_hits=0i,arcstats_l2_io_error=0i,arcstats_l2_misses=0i,arcstats_l2_read_bytes=0i,arcstats_l2_rw_clash=0i,arcstats_l2_size=0i,arcstats_l2_write_buffer_bytes_scanned=0i,arcstats_l2_write_buffer_iter=0i,arcstats_l2_write_buffer_list_iter=0i,arcstats_l2_write_buffer_list_null_iter=0i,arcstats_l2_write_bytes=0i,arcstats_l2_write_full=0i,arcstats_l2_write_in_l2=0i,arcstats_l2_write_io_in_progress=0i,arcstats_l2_write_not_cacheable=380i,arcstats_l2_write_passed_headroom=0i,arcstats_l2_write_pios=0i,arcstats_l2_write_spa_mismatch=0i,arcstats_l2_write_trylock_fail=0i,arcstats_l2_writes_done=0i,arcstats_l2_writes_error=0i,arcstats_l2_writes_lock_retry=0i,arcstats_l2_writes_sent=0i,arcstats_memory_throttle_count=0i,arcstats_metadata_size=17014784i,arcstats_mfu_evictable_data=0i,arcstats_mfu_evictable_metadata=16384i,arcstats_mfu_ghost_evictable_data=5723648i,arcstats_mfu_ghost_evictable_metadata=10709504i,arcstats_mfu_ghost_hits=1315619i,arcstats_mfu_ghost_size=16433152i,arcstats_mfu_hits=7646611i,arcstats_mfu_size=305152i,arcstats_misses=3676993i,arcstats_mru_evictable_data=0i,arcstats_mru_evictable_metadata=0i,arcstats_mru_ghost_evictable_data=0i,arcstats_mru_ghost_evictable_metadata=80896i,arcstats_mru_ghost_hits=324250i,arcstats_mru_ghost_size=80896i,arcstats_mru_hits=8844526i,arcstats_mru_size=16693248i,arcstats_mutex_miss=354023i,arcstats_other_size=34397040i,arcstats_p=4172800i,arcstats_prefetch_data_hits=0i,arcstats_prefetch_data_misses=0i,arcstats_prefetch_metadata_hits=24641i,arcstats_prefetch_metadata_misses=3974i,arcstats_size=51977456i,arcstats_sync_wait_for_async=0i,vdev_cache_stats_delegations=779i,vdev_cache_stats_hits=323123i,vdev_cache_stats_misses=59929i,zfetchstats_hits=0i,zfetchstats_max_streams=0i,zfetchstats_misses=0i 1464473103634124908
```
2015-11-03 15:53:09 +00:00
### Description
2015-11-03 15:53:09 +00:00
A short description for some of the metrics.
2015-11-03 15:53:09 +00:00
#### ARC Stats
2015-11-03 15:53:09 +00:00
`arcstats_hits` Total amount of cache hits in the arc.
2015-11-03 15:53:09 +00:00
`arcstats_misses` Total amount of cache misses in the arc.
2015-11-03 15:53:09 +00:00
`arcstats_demand_data_hits` Amount of cache hits for demand data, this is what matters (is good) for your application/share.
2015-11-03 15:53:09 +00:00
`arcstats_demand_data_misses` Amount of cache misses for demand data, this is what matters (is bad) for your application/share.
2015-11-03 15:53:09 +00:00
`arcstats_demand_metadata_hits` Amount of cache hits for demand metadata, this matters (is good) for getting filesystem data (ls,find,…)
2015-11-03 15:53:09 +00:00
`arcstats_demand_metadata_misses` Amount of cache misses for demand metadata, this matters (is bad) for getting filesystem data (ls,find,…)
2015-11-03 15:53:09 +00:00
`arcstats_prefetch_data_hits` The zfs prefetcher tried to prefetch something, but it was already cached (boring)
2015-11-03 15:53:09 +00:00
`arcstats_prefetch_data_misses` The zfs prefetcher prefetched something which was not in the cache (good job, could become a demand hit in the future)
2015-11-03 15:53:09 +00:00
`arcstats_prefetch_metadata_hits` Same as above, but for metadata
2015-11-03 15:53:09 +00:00
`arcstats_prefetch_metadata_misses` Same as above, but for metadata
2015-11-03 15:53:09 +00:00
`arcstats_mru_hits` Cache hit in the “most recently used cache”, we move this to the mfu cache.
2015-11-03 15:53:09 +00:00
`arcstats_mru_ghost_hits` Cache hit in the “most recently used ghost list” we had this item in the cache, but evicted it, maybe we should increase the mru cache size.
2015-11-03 15:53:09 +00:00
`arcstats_mfu_hits` Cache hit in the “most frequently used cache” we move this to the beginning of the mfu cache.
2015-11-03 15:53:09 +00:00
`arcstats_mfu_ghost_hits` Cache hit in the “most frequently used ghost list” we had this item in the cache, but evicted it, maybe we should increase the mfu cache size.
2015-11-03 15:53:09 +00:00
`arcstats_allocated` New data is written to the cache.
2015-11-03 15:53:09 +00:00
`arcstats_deleted` Old data is evicted (deleted) from the cache.
2015-11-03 15:53:09 +00:00
`arcstats_evict_l2_cached` We evicted something from the arc, but its still cached in the l2 if we need it.
2015-11-03 15:53:09 +00:00
`arcstats_evict_l2_eligible` We evicted something from the arc, and its not in the l2 this is sad. (maybe we hadnt had enough time to store it there)
2015-11-03 15:53:09 +00:00
`arcstats_evict_l2_ineligible` We evicted something which cannot be stored in the l2.
Reasons could be:
- We have multiple pools, we evicted something from a pool without an l2 device.
- The zfs property secondary cache.
2015-11-03 15:53:09 +00:00
`arcstats_c` Arc target size, this is the size the system thinks the arc should have.
2015-11-03 15:53:09 +00:00
`arcstats_size` Total size of the arc.
2015-11-03 15:53:09 +00:00
`arcstats_l2_hits` Hits to the L2 cache. (It was not in the arc, but in the l2 cache)
2015-11-03 15:53:09 +00:00
`arcstats_l2_misses` Miss to the L2 cache. (It was not in the arc, and not in the l2 cache)
2015-11-03 15:53:09 +00:00
`arcstats_l2_size` Size of the l2 cache.
`arcstats_l2_hdr_size` Size of the metadata in the arc (ram) used to manage (lookup if something is in the l2) the l2 cache.
#### Zfetch Stats
`zfetchstats_hits` Counts the number of cache hits, to items which are in the cache because of the prefetcher.
`zfetchstats_misses` Counts the number of prefetch cache misses.
`zfetchstats_colinear_hits` Counts the number of cache hits, to items which are in the cache because of the prefetcher (prefetched linear reads)
`zfetchstats_stride_hits` Counts the number of cache hits, to items which are in the cache because of the prefetcher (prefetched stride reads)
#### Vdev Cache Stats (FreeBSD only)
note: the vdev cache is deprecated in some ZFS implementations
`vdev_cache_stats_hits` Hits to the vdev (device level) cache.
2015-11-03 15:53:09 +00:00
`vdev_cache_stats_misses` Misses to the vdev (device level) cache.
#### ABD Stats (Linux Only)
ABD is a linear/scatter dual typed buffer for ARC
`abdstats_linear_cnt` number of linear ABDs which are currently allocated
`abdstats_linear_data_size` amount of data stored in all linear ABDs
`abdstats_scatter_cnt` number of scatter ABDs which are currently allocated
`abdstats_scatter_data_size` amount of data stored in all scatter ABDs
#### DMU Stats (Linux Only)
`dmu_tx_dirty_throttle` counts when writes are throttled due to the amount of dirty data growing too large
`dmu_tx_memory_reclaim` counts when memory is low and throttling activity
`dmu_tx_memory_reserve` counts when memory footprint of the txg exceeds the ARC size
#### Fault Management Ereport errors (Linux Only)
`fm_erpt-dropped` counts when an error report cannot be created (eg available memory is too low)
#### ZIL (Linux Only)
note: ZIL measurements are system-wide, neither per-pool nor per-dataset
`zil_commit_count` counts when ZFS transactions are committed to a ZIL