Rewrite configuration documentation (#5227)

2019-01-07 14:31:10 -08:00
parent 3621bcf5a6
commit 0ceb10e017
1 changed files with 301 additions and 235 deletions
--- a/docs/CONFIGURATION.md
+++ b/docs/CONFIGURATION.md
@@ -1,26 +1,37 @@
 # Configuration

-Telegraf's configuration file is written using
-[TOML](https://github.com/toml-lang/toml#toml).
+Telegraf's configuration file is written using [TOML][] and is composed of
+three sections: [global tags][], [agent][] settings, and [plugins][].

-[View the telegraf.conf config file with all available
-plugins](/etc/telegraf.conf).
+View the default [telegraf.conf][] config file with all available plugins.

-## Generating a Configuration File
+### Generating a Configuration File

 A default config file can be generated by telegraf:
-
-```
+```sh
 telegraf config > telegraf.conf
 ```

 To generate a file with specific inputs and outputs, you can use the
 --input-filter and --output-filter flags:

-```
+```sh
 telegraf --input-filter cpu:mem:net:swap --output-filter influxdb:kafka config
 ```

+### Configuration Loading
+
+The location of the configuration file can be set via the `--config` command
+line flag.
+
+When the `--config-directory` command line flag is used files ending with
+`.conf` in the specified directory will also be included in the Telegraf
+configuration.
+
+On most systems, the default locations are `/etc/telegraf/telegraf.conf` for
+the main configuration file and `/etc/telegraf/telegraf.d` for the directory of
+configuration files.
+
 ### Environment Variables

 Environment variables can be used anywhere in the config file, simply prepend
@@ -66,81 +77,164 @@ parsed:
  password = "monkey123"
 ```

-### Configuration file locations
+### Intervals

-The location of the configuration file can be set via the `--config` command
-line flag.
-
-When the `--config-directory` command line flag is used files ending with
-`.conf` in the specified directory will also be included in the Telegraf
-configuration.
-
-On most systems, the default locations are `/etc/telegraf/telegraf.conf` for
-the main configuration file and `/etc/telegraf/telegraf.d` for the directory of
-configuration files.
+Intervals are durations of time and can be specified for supporting settings by
+combining an integer value and time unit as a string value.  Valid time units are
+`ns`, `us` (or `µs`), `ms`, `s`, `m`, `h`.
+```toml
+[agent]
+  interval = "10s"
+```

 ### Global Tags

-Global tags can be specified in the `[global_tags]` section of the config file
-in key="value" format. All metrics being gathered on this host will be tagged
-with the tags specified here.
+Global tags can be specified in the `[global_tags]` table in key="value"
+format. All metrics that are gathered will be tagged with the tags specified.

-### Agent Configuration
+```toml
+[global_tags]
+  dc = "us-east-1"
+```

-Telegraf has a few options you can configure under the `[agent]` section of the
-config.
+### Agent

-* **interval**: Default data collection interval for all inputs
-* **round_interval**: Rounds collection interval to 'interval'
-ie, if interval="10s" then always collect on :00, :10, :20, etc.
-* **metric_batch_size**: Telegraf will send metrics to output in batch of at
-most metric_batch_size metrics.
-* **metric_buffer_limit**: Telegraf will cache metric_buffer_limit metrics
-for each output, and will flush this buffer on a successful write.
-This should be a multiple of metric_batch_size and could not be less
-than 2 times metric_batch_size.
-* **collection_jitter**: Collection jitter is used to jitter
-the collection by a random amount.
-Each plugin will sleep for a random time within jitter before collecting.
-This can be used to avoid many plugins querying things like sysfs at the
-same time, which can have a measurable effect on the system.
-* **flush_interval**: Default data flushing interval for all outputs.
-You should not set this below
-interval. Maximum flush_interval will be flush_interval + flush_jitter
-* **flush_jitter**: Jitter the flush interval by a random amount.
-This is primarily to avoid
-large write spikes for users running a large number of telegraf instances.
-ie, a jitter of 5s and flush_interval 10s means flushes will happen every 10-15s.
-* **precision**:
-   By default or when set to "0s", precision will be set to the same
-   timestamp order as the collection interval, with the maximum being 1s.
-   Precision will NOT be used for service inputs. It is up to each individual
-   service input to set the timestamp at the appropriate precision.
-   Valid time units are "ns", "us" (or "µs"), "ms", "s".
+The agent table configures Telegraf and the defaults used across all plugins.

-* **logfile**: Specify the log file name. The empty string means to log to stderr.
-* **debug**: Run telegraf in debug mode.
-* **quiet**: Run telegraf in quiet mode (error messages only).
-* **hostname**: Override default hostname, if empty use os.Hostname().
-* **omit_hostname**: If true, do no set the "host" tag in the telegraf agent.
+- **interval**: Default data collection interval for all inputs.

-### Input Configuration
+- **round_interval**: Rounds collection interval to 'interval'
+  ie, if interval="10s" then always collect on :00, :10, :20, etc.

-The following config parameters are available for all inputs:
+- **metric_batch_size**:
+  Telegraf will send metrics to outputs in batches of at most
+  metric_batch_size metrics.
+  This controls the size of writes that Telegraf sends to output plugins.

-* **interval**: How often to gather this metric. Normal plugins use a single
-global interval, but if one particular input should be run less or more often,
-you can configure that here.
-* **name_override**: Override the base name of the measurement.
-(Default is the name of the input).
-* **name_prefix**: Specifies a prefix to attach to the measurement name.
-* **name_suffix**: Specifies a suffix to attach to the measurement name.
-* **tags**: A map of tags to apply to a specific input's measurements.
+- **metric_buffer_limit**:
+  For failed writes, telegraf will cache metric_buffer_limit metrics for each
+  output, and will flush this buffer on a successful write. Oldest metrics
+  are dropped first when this buffer fills.
+  This buffer only fills when writes fail to output plugin(s).

-The [metric filtering](#metric-filtering) parameters can be used to limit what metrics are
+- **collection_jitter**:
+  Collection jitter is used to jitter the collection by a random amount.
+  Each plugin will sleep for a random time within jitter before collecting.
+  This can be used to avoid many plugins querying things like sysfs at the
+  same time, which can have a measurable effect on the system.
+
+- **flush_interval**:
+  Default flushing interval for all outputs. Maximum flush_interval will be
+  flush_interval + flush_jitter
+
+- **flush_jitter**:
+  Jitter the flush interval by a random amount. This is primarily to avoid
+  large write spikes for users running a large number of telegraf instances.
+  ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
+
+- **precision**:
+  Collected metrics are rounded to the precision specified as a duration.
+
+  Precision will NOT be used for service inputs. It is up to each individual
+  service input to set the timestamp at the appropriate precision.
+
+- **debug**:
+  Run telegraf with debug log messages.
+- **quiet**:
+  Run telegraf in quiet mode (error log messages only).
+- **logfile**:
+  Specify the log file name. The empty string means to log to stderr.
+
+- **hostname**:
+  Override default hostname, if empty use os.Hostname()
+- **omit_hostname**:
+  If set to true, do no set the "host" tag in the telegraf agent.
+
+### Plugins
+
+Telegraf plugins are divided into 4 types: [inputs][], [outputs][],
+[processors][], and [aggregators][].
+
+Unlike the `global_tags` and `agent` tables, any plugin can be defined
+multiple times and each instance will run independantly.  This allows you to
+have plugins defined with differing configurations as needed within a single
+Telegraf process.
+
+Each plugin has a unique set of configuration options, reference the
+sample configuration for details.  Additionally, several options are available
+on any plugin depending on its type.
+
+### Input Plugins
+
+Input plugins gather and create metrics.  They support both polling and event
+driven operation.
+
+Parameters that can be used with any input plugin:
+
+- **interval**: How often to gather this metric. Normal plugins use a single
+  global interval, but if one particular input should be run less or more
+  often, you can configure that here.
+- **name_override**: Override the base name of the measurement.  (Default is
+  the name of the input).
+- **name_prefix**: Specifies a prefix to attach to the measurement name.
+- **name_suffix**: Specifies a suffix to attach to the measurement name.
+- **tags**: A map of tags to apply to a specific input's measurements.
+
+The [metric filtering][] parameters can be used to limit what metrics are
 emitted from the input plugin.

-### Output Configuration
+#### Examples
+
+Use the name_suffix parameter to emit measurements with the name `cpu_total`:
+```toml
+[[inputs.cpu]]
+  name_suffix = "_total"
+  percpu = false
+  totalcpu = true
+```
+
+Use the name_override parameter to emit measurements with the name `foobar`:
+```toml
+[[inputs.cpu]]
+  name_override = "foobar"
+  percpu = false
+  totalcpu = true
+```
+
+Emit measurements with two additional tags: `tag1=foo` and `tag2=bar`
+
+> **NOTE**: With TOML, order matters.  Parameters belong to the last defined
+> table header, place `[inputs.cpu.tags]` table at the _end_ of the plugin
+> definition.
+```toml
+[[inputs.cpu]]
+  percpu = false
+  totalcpu = true
+  [inputs.cpu.tags]
+    tag1 = "foo"
+    tag2 = "bar"
+```
+
+Utilize `name_override`, `name_prefix`, or `name_suffix` config options to
+avoid measurement collisions when defining multiple plugins:
+```toml
+[[inputs.cpu]]
+  percpu = false
+  totalcpu = true
+
+[[inputs.cpu]]
+  percpu = true
+  totalcpu = false
+  name_override = "percpu_usage"
+  fielddrop = ["cpu_time*"]
+```
+
+### Output Plugins
+
+Output plugins write metrics to a location.  Outputs commonly write to
+databases, network services, and messaging systems.
+
+Parameters that can be used with any output plugin:

 - **flush_interval**: The maximum time between flushes.  Use this setting to
  override the agent `flush_interval` on a per plugin basis.
@@ -150,42 +244,121 @@ emitted from the input plugin.
  Use this setting to override the agent `metric_buffer_limit` on a per plugin
  basis.

-The [metric filtering](#metric-filtering) parameters can be used to limit what metrics are
+The [metric filtering][] parameters can be used to limit what metrics are
 emitted from the output plugin.

-### Aggregator Configuration
+#### Examples

-The following config parameters are available for all aggregators:
+Override flush parameters for a single output:
+```toml
+[agent]
+  flush_interval = "10s"
+  metric_batch_size = 1000

-* **period**: The period on which to flush & clear each aggregator. All metrics
-that are sent with timestamps outside of this period will be ignored by the
-aggregator.
-* **delay**: The delay before each aggregator is flushed. This is to control
-how long for aggregators to wait before receiving metrics from input plugins,
-in the case that aggregators are flushing and inputs are gathering on the
-same interval.
-* **drop_original**: If true, the original metric will be dropped by the
-aggregator and will not get sent to the output plugins.
-* **name_override**: Override the base name of the measurement.
-(Default is the name of the input).
-* **name_prefix**: Specifies a prefix to attach to the measurement name.
-* **name_suffix**: Specifies a suffix to attach to the measurement name.
-* **tags**: A map of tags to apply to a specific input's measurements.
+[[outputs.influxdb]]
+  urls = [ "http://example.org:8086" ]
+  database = "telegraf"

-The [metric filtering](#metric-filtering) parameters can be used to limit what metrics are
+[[outputs.file]]
+  files = [ "stdout" ]
+  flush_interval = "1s"
+  metric_batch_size = 10
+```
+
+### Processor Plugins
+
+Processor plugins perform processing tasks on metrics and are commonly used to
+rename or apply transformations to metrics.  Processors are applied after the
+input plugins and before any aggregator plugins.
+
+Parameters that can be used with any processor plugin:
+
+- **order**: The order in which the processor(s) are executed. If this is not
+  specified then processor execution order will be random.
+
+The [metric filtering][] parameters can be used to limit what metrics are
+handled by the processor.  Excluded metrics are passed downstream to the next
+processor.
+
+#### Examples
+
+If the order processors are applied matters you must set order on all involved
+processors:
+```toml
+[[processors.rename]]
+  order = 1
+  [[processors.rename.replace]]
+    tag = "path"
+    dest = "resource"
+
+[[processors.strings]]
+  order = 2
+  [[processors.strings.trim_prefix]]
+    tag = "resource"
+    prefix = "/api/"
+```
+
+### Aggregator Plugins
+
+Aggregator plugins produce new metrics after examining metrics over a time
+period, as the name suggests they are commonly used to produce new aggregates
+such as mean/max/min metrics.  Aggregators operate on metrics after any
+processors have been applied.
+
+Parameters that can be used with any aggregator plugin:
+
+- **period**: The period on which to flush & clear each aggregator. All
+  metrics that are sent with timestamps outside of this period will be ignored
+  by the aggregator.
+- **delay**: The delay before each aggregator is flushed. This is to control
+  how long for aggregators to wait before receiving metrics from input
+  plugins, in the case that aggregators are flushing and inputs are gathering
+  on the same interval.
+- **drop_original**: If true, the original metric will be dropped by the
+  aggregator and will not get sent to the output plugins.
+- **name_override**: Override the base name of the measurement.  (Default is
+  the name of the input).
+- **name_prefix**: Specifies a prefix to attach to the measurement name.
+- **name_suffix**: Specifies a suffix to attach to the measurement name.
+- **tags**: A map of tags to apply to a specific input's measurements.
+
+The [metric filtering][] parameters can be used to limit what metrics are
 handled by the aggregator.  Excluded metrics are passed downstream to the next
 aggregator.

-### Processor Configuration
+#### Examples

-The following config parameters are available for all processors:
+Collect and emit the min/max of the system load1 metric every 30s, dropping
+the originals.
+```toml
+[[inputs.system]]
+  fieldpass = ["load1"] # collects system load1 metric.

-* **order**: This is the order in which the processor(s) get executed. If this
-is not specified then processor execution order will be random.
+[[aggregators.minmax]]
+  period = "30s"        # send & clear the aggregate every 30s.
+  drop_original = true  # drop the original metrics.

-The [metric filtering](#metric-filtering) parameters can be used to limit what metrics are
-handled by the processor.  Excluded metrics are passed downstream to the next
-processor.
+[[outputs.file]]
+  files = ["stdout"]
+```
+
+Collect and emit the min/max of the swap metrics every 30s, dropping the
+originals. The aggregator will not be applied to the system load metrics due
+to the `namepass` parameter.
+```toml
+[[inputs.swap]]
+
+[[inputs.system]]
+  fieldpass = ["load1"] # collects system load1 metric.
+
+[[aggregators.minmax]]
+  period = "30s"        # send & clear the aggregate every 30s.
+  drop_original = true  # drop the original metrics.
+  namepass = ["swap"]   # only "pass" swap metrics through the aggregator.
+
+[[outputs.file]]
+  files = ["stdout"]
+```

 <a id="measurement-filtering"></a>
 ### Metric Filtering
@@ -244,39 +417,9 @@ The inverse of `taginclude`. Tags with a tag key matching one of the patterns
 will be discarded from the metric.  Any tag can be filtered including global
 tags and the agent `host` tag.

-### Input Configuration Examples
-
-This is a full working config that will output CPU data to an InfluxDB instance
-at 192.168.59.103:8086, tagging measurements with dc="denver-1". It will output
-measurements at a 10s interval and will collect per-cpu data, dropping any
-fields which begin with `time_`.
-
-```toml
-[global_tags]
-  dc = "denver-1"
-
-[agent]
-  interval = "10s"
-
-# OUTPUTS
-[[outputs.influxdb]]
-  url = "http://192.168.59.103:8086" # required.
-  database = "telegraf" # required.
-
-# INPUTS
-[[inputs.cpu]]
-  percpu = true
-  totalcpu = false
-  # filter all fields beginning with 'time_'
-  fielddrop = ["time_*"]
-```
-
-#### Input Config: tagpass and tagdrop
-
-**NOTE** `tagpass` and `tagdrop` parameters must be defined at the _end_ of
-the plugin definition, otherwise subsequent plugin config options will be
-interpreted as part of the tagpass/tagdrop map.
+##### Filtering Examples

+Using tagpass and tagdrop:
 ```toml
 [[inputs.cpu]]
  percpu = true
@@ -295,7 +438,6 @@ interpreted as part of the tagpass/tagdrop map.
    # Globs can also be used on the tag values
    path = [ "/opt", "/home*" ]

-    
 [[inputs.win_perf_counters]]
  [[inputs.win_perf_counters.object]]
    ObjectName = "Network Interface"
@@ -308,11 +450,9 @@ interpreted as part of the tagpass/tagdrop map.
  # Don't send metrics where the Windows interface name (instance) begins with isatap or Local
  [inputs.win_perf_counters.tagdrop]
    instance = ["isatap*", "Local*"]
-    
 ```

-#### Input Config: fieldpass and fielddrop
-
+Using fieldpass and fielddrop:
 ```toml
 # Drop all metrics for guest & steal CPU usage
 [[inputs.cpu]]
@@ -325,8 +465,7 @@ interpreted as part of the tagpass/tagdrop map.
  fieldpass = ["inodes*"]
 ```

-#### Input Config: namepass and namedrop
-
+Using namepass and namedrop:
 ```toml
 # Drop all metrics about containers for kubelet
 [[inputs.prometheus]]
@@ -339,8 +478,7 @@ interpreted as part of the tagpass/tagdrop map.
  namepass = ["rest_client_*"]
 ```

-#### Input Config: taginclude and tagexclude
-
+Using taginclude and tagexclude:
 ```toml
 # Only include the "cpu" tag in the measurements for the cpu plugin.
 [[inputs.cpu]]
@@ -353,64 +491,7 @@ interpreted as part of the tagpass/tagdrop map.
  tagexclude = ["fstype"]
 ```

-#### Input config: prefix, suffix, and override
-
-This plugin will emit measurements with the name `cpu_total`
-
-```toml
-[[inputs.cpu]]
-  name_suffix = "_total"
-  percpu = false
-  totalcpu = true
-```
-
-This will emit measurements with the name `foobar`
-
-```toml
-[[inputs.cpu]]
-  name_override = "foobar"
-  percpu = false
-  totalcpu = true
-```
-
-#### Input config: tags
-
-This plugin will emit measurements with two additional tags: `tag1=foo` and
-`tag2=bar`
-
-NOTE: Order matters, the `[inputs.cpu.tags]` table must be at the _end_ of the
-plugin definition.
-
-```toml
-[[inputs.cpu]]
-  percpu = false
-  totalcpu = true
-  [inputs.cpu.tags]
-    tag1 = "foo"
-    tag2 = "bar"
-```
-
-#### Multiple inputs of the same type
-
-Additional inputs (or outputs) of the same type can be specified,
-just define more instances in the config file. It is highly recommended that
-you utilize `name_override`, `name_prefix`, or `name_suffix` config options
-to avoid measurement collisions:
-
-```toml
-[[inputs.cpu]]
-  percpu = false
-  totalcpu = true
-
-[[inputs.cpu]]
-  percpu = true
-  totalcpu = false
-  name_override = "percpu_usage"
-  fielddrop = ["cpu_time*"]
-```
-
-#### Output Configuration Examples:
-
+Metrics can be routed to different outputs using the metric name and tags:
 ```toml
 [[outputs.influxdb]]
  urls = [ "http://localhost:8086" ]
@@ -432,50 +513,35 @@ to avoid measurement collisions:
    cpu = ["cpu0"]
 ```

-#### Aggregator Configuration Examples:
-
-This will collect and emit the min/max of the system load1 metric every
-30s, dropping the originals.
-
+Routing metrics to different outputs based on the input.  Metrics are tagged
+with `influxdb_database` in the input, which is then used to select the
+output.  The tag is removed in the outputs before writing.
 ```toml
-[[inputs.system]]
-  fieldpass = ["load1"] # collects system load1 metric.
+[[outputs.influxdb]]
+  urls = ["http://influxdb.example.com"]
+  database = "db_default"
+  [outputs.influxdb.tagdrop]
+    influxdb_database = ["*"]

-[[aggregators.minmax]]
-  period = "30s"        # send & clear the aggregate every 30s.
-  drop_original = true  # drop the original metrics.
+[[outputs.influxdb]]
+  urls = ["http://influxdb.example.com"]
+  database = "db_other"
+  tagexclude = ["influxdb_database"]
+  [ouputs.influxdb.tagpass]
+    influxdb_database = ["other"]

-[[outputs.file]]
-  files = ["stdout"]
+[[inputs.disk]]
+  [inputs.disk.tags]
+    influxdb_database = "other"
 ```

-This will collect and emit the min/max of the swap metrics every
-30s, dropping the originals. The aggregator will not be applied
-to the system load metrics due to the `namepass` parameter.
-
-```toml
-[[inputs.swap]]
-
-[[inputs.system]]
-  fieldpass = ["load1"] # collects system load1 metric.
-
-[[aggregators.minmax]]
-  period = "30s"        # send & clear the aggregate every 30s.
-  drop_original = true  # drop the original metrics.
-  namepass = ["swap"]   # only "pass" swap metrics through the aggregator.
-
-[[outputs.file]]
-  files = ["stdout"]
-```
-
-#### Processor Configuration Examples:
-
-Print only the metrics with `cpu` as the measurement name, all metrics are
-passed to the output:
-```toml
-[[processors.printer]]
-  namepass = "cpu"
-
-[[outputs.file]]
-  files = ["/tmp/metrics.out"]
-```
+[TOML]: https://github.com/toml-lang/toml#toml
+[global tags]: #global-tags
+[agent]: #agent
+[plugins]: #plugins
+[inputs]: #input-plugins
+[outputs]: #output-plugins
+[processors]: #processor-plugins
+[aggregators]: #aggregator-plugins
+[metric filtering]: #metric-filtering
+[telegraf.conf]: /etc/telegraf.conf