43c16aa898 | ||
---|---|---|
.. | ||
README.md | ||
smart.go | ||
smart_test.go |
README.md
S.M.A.R.T. Input Plugin
Get metrics using the command line utility smartctl
for S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology) storage devices. SMART is a monitoring system included in computer hard disk drives (HDDs) and solid-state drives (SSDs)[1] that detects and reports on various indicators of drive reliability, with the intent of enabling the anticipation of hardware failures.
See smartmontools (https://www.smartmontools.org/).
If no devices are specified, the plugin will scan for SMART devices via the following command:
smartctl --scan
Metrics will be reported from the following smartctl
command:
smartctl --info --attributes --health -n <nocheck> --format=brief <device>
This plugin supports smartmontools version 5.41 and above, but v. 5.41 and v. 5.42
might require setting nocheck
, see the comment in the sample configuration.
To enable SMART on a storage device run:
smartctl -s on <device>
Configuration
# Read metrics from storage devices supporting S.M.A.R.T.
[[inputs.smart]]
## Optionally specify the path to the smartctl executable
# path = "/usr/bin/smartctl"
## On most platforms smartctl requires root access.
## Setting 'use_sudo' to true will make use of sudo to run smartctl.
## Sudo must be configured to to allow the telegraf user to run smartctl
## without a password.
# use_sudo = false
## Skip checking disks in this power mode. Defaults to
## "standby" to not wake up disks that have stoped rotating.
## See --nocheck in the man pages for smartctl.
## smartctl version 5.41 and 5.42 have faulty detection of
## power mode and might require changing this value to
## "never" depending on your disks.
# nocheck = "standby"
## Gather detailed metrics for each SMART Attribute.
# attributes = false
## Optionally specify devices to exclude from reporting.
# excludes = [ "/dev/pass6" ]
## Optionally specify devices and device type, if unset
## a scan (smartctl --scan) for S.M.A.R.T. devices will
## done and all found will be included except for the
## excluded in excludes.
# devices = [ "/dev/ada0 -d atacam" ]
Permissions
It's important to note that this plugin references smartctl, which may require additional permissions to execute successfully. Depending on the user/group permissions of the telegraf user executing this plugin, you may need to use sudo.
You will need the following in your telegraf config:
[[inputs.smart]]
use_sudo = true
You will also need to update your sudoers file:
$ visudo
# Add the following line:
Cmnd_Alias SMARTCTL = /usr/bin/smartctl
telegraf ALL=(ALL) NOPASSWD: SMARTCTL
Defaults!SMARTCTL !logfile, !syslog, !pam_session
Metrics
-
smart_device:
- tags:
- capacity
- device
- device_model
- enabled
- health
- serial_no
- wwn
- fields:
- exit_status
- health_ok
- read_error_rate
- seek_error
- temp_c
- udma_crc_errors
- tags:
-
smart_attribute:
- tags:
- device
- fail
- flags
- id
- name
- serial_no
- wwn
- fields:
- exit_status
- raw_value
- threshold
- value
- worst
- tags:
Flags
The interpretation of the tag flags
is:
K
auto-keepC
event countR
error rateS
speed/performanceO
updated onlineP
prefailure warning
Exit Status
The exit_status
field captures the exit status of the smartctl command which
is defined by a bitmask. For the interpretation of the bitmask see the man page for
smartctl.
Device Names
Device names, e.g., /dev/sda
, are not persistent, and may be
subject to change across reboots or system changes. Instead, you can the
World Wide Name (WWN) or serial number to identify devices. On Linux block
devices can be referenced by the WWN in the following location:
/dev/disk/by-id/
.
To run smartctl
with sudo
create a wrapper script and use path
in
the configuration to execute that.
Troubleshooting
If this plugin is not working as expected for your SMART enabled device, please run these commands and include the output in a bug report:
smartctl --scan
Run the following command replacing your configuration setting for NOCHECK and the DEVICE from the previous command:
smartctl --info --health --attributes --tolerance=verypermissive --nocheck NOCHECK --format=brief -d DEVICE
Example Output
smart_device,enabled=Enabled,host=mbpro.local,device=rdisk0,model=APPLE\ SSD\ SM0512F,serial_no=S1K5NYCD964433,wwn=5002538655584d30,capacity=500277790720 udma_crc_errors=0i,exit_status=0i,health_ok=true,read_error_rate=0i,temp_c=40i 1502536854000000000
smart_attribute,serial_no=S1K5NYCD964433,wwn=5002538655584d30,id=199,name=UDMA_CRC_Error_Count,flags=-O-RC-,fail=-,host=mbpro.local,device=rdisk0 threshold=0i,raw_value=0i,exit_status=0i,value=200i,worst=200i 1502536854000000000
smart_attribute,device=rdisk0,serial_no=S1K5NYCD964433,wwn=5002538655584d30,id=240,name=Unknown_SSD_Attribute,flags=-O---K,fail=-,host=mbpro.local exit_status=0i,value=100i,worst=100i,threshold=0i,raw_value=0i 1502536854000000000