From 41d528c8ce02fe00ca42c281e506e1e20e60f378 Mon Sep 17 00:00:00 2001 From: Daniel Nelson Date: Mon, 17 Sep 2018 11:45:08 -0700 Subject: [PATCH] Split parser/serializer docs (#4690) --- README.md | 39 +- docs/DATA_FORMATS_INPUT.md | 1131 +------------------- docs/DATA_FORMATS_OUTPUT.md | 197 +--- docs/METRICS.md | 22 + docs/README.md | 21 + docs/TEMPLATE_PATTERN.md | 135 +++ plugins/inputs/statsd/README.md | 8 +- plugins/inputs/statsd/statsd.go | 2 +- plugins/parsers/EXAMPLE_README.md | 46 + plugins/parsers/collectd/README.md | 44 + plugins/parsers/csv/README.md | 104 ++ plugins/parsers/dropwizard/README.md | 171 +++ plugins/parsers/graphite/README.md | 48 + plugins/parsers/grok/README.md | 222 ++++ plugins/parsers/influx/README.md | 20 + plugins/parsers/json/README.md | 214 ++++ plugins/parsers/logfmt/README.md | 34 + plugins/parsers/nagios/README.md | 17 + plugins/parsers/value/README.md | 36 + plugins/parsers/wavefront/README.md | 20 + plugins/serializers/EXAMPLE_README.md | 46 + plugins/serializers/graphite/README.md | 51 + plugins/serializers/influx/README.md | 34 + plugins/serializers/json/README.md | 77 ++ plugins/serializers/splunkmetric/README.md | 4 +- 25 files changed, 1412 insertions(+), 1331 deletions(-) create mode 100644 docs/METRICS.md create mode 100644 docs/README.md create mode 100644 docs/TEMPLATE_PATTERN.md create mode 100644 plugins/parsers/EXAMPLE_README.md create mode 100644 plugins/parsers/collectd/README.md create mode 100644 plugins/parsers/csv/README.md create mode 100644 plugins/parsers/dropwizard/README.md create mode 100644 plugins/parsers/graphite/README.md create mode 100644 plugins/parsers/grok/README.md create mode 100644 plugins/parsers/influx/README.md create mode 100644 plugins/parsers/json/README.md create mode 100644 plugins/parsers/logfmt/README.md create mode 100644 plugins/parsers/nagios/README.md create mode 100644 plugins/parsers/value/README.md create mode 100644 plugins/parsers/wavefront/README.md create mode 100644 plugins/serializers/EXAMPLE_README.md create mode 100644 plugins/serializers/graphite/README.md create mode 100644 plugins/serializers/influx/README.md create mode 100644 plugins/serializers/json/README.md diff --git a/README.md b/README.md index 6ddb793ef..5bc830457 100644 --- a/README.md +++ b/README.md @@ -1,23 +1,19 @@ # Telegraf [![Circle CI](https://circleci.com/gh/influxdata/telegraf.svg?style=svg)](https://circleci.com/gh/influxdata/telegraf) [![Docker pulls](https://img.shields.io/docker/pulls/library/telegraf.svg)](https://hub.docker.com/_/telegraf/) -Telegraf is an agent written in Go for collecting, processing, aggregating, -and writing metrics. +Telegraf is an agent for collecting, processing, aggregating, and writing metrics. Design goals are to have a minimal memory footprint with a plugin system so -that developers in the community can easily add support for collecting metrics -. For an example configuration referencet from local or remote services. +that developers in the community can easily add support for collecting +metrics. -Telegraf is plugin-driven and has the concept of 4 distinct plugins: +Telegraf is plugin-driven and has the concept of 4 distinct plugin types: 1. [Input Plugins](#input-plugins) collect metrics from the system, services, or 3rd party APIs 2. [Processor Plugins](#processor-plugins) transform, decorate, and/or filter metrics 3. [Aggregator Plugins](#aggregator-plugins) create aggregate metrics (e.g. mean, min, max, quantiles, etc.) 4. [Output Plugins](#output-plugins) write metrics to various destinations -For more information on Processor and Aggregator plugins please [read this](./docs/AGGREGATORS_AND_PROCESSORS.md). - -New plugins are designed to be easy to contribute, -we'll eagerly accept pull +New plugins are designed to be easy to contribute, we'll eagerly accept pull requests and will manage the set of plugins that Telegraf supports. ## Contributing @@ -26,7 +22,7 @@ There are many ways to contribute: - Fix and [report bugs](https://github.com/influxdata/telegraf/issues/new) - [Improve documentation](https://github.com/influxdata/telegraf/issues?q=is%3Aopen+label%3Adocumentation) - [Review code and feature proposals](https://github.com/influxdata/telegraf/pulls) -- Answer questions on github and on the [Community Site](https://community.influxdata.com/) +- Answer questions and discuss here on github and on the [Community Site](https://community.influxdata.com/) - [Contribute plugins](CONTRIBUTING.md) ## Installation: @@ -42,7 +38,7 @@ Ansible role: https://github.com/rossmcdonald/telegraf Telegraf requires golang version 1.9 or newer, the Makefile requires GNU make. -1. [Install Go](https://golang.org/doc/install) >=1.9 +1. [Install Go](https://golang.org/doc/install) >=1.9 (1.10 recommended) 2. [Install dep](https://golang.github.io/dep/docs/installation.html) ==v0.5.0 3. Download Telegraf source: ``` @@ -86,44 +82,47 @@ These builds are generated from the master branch: See usage with: ``` -./telegraf --help +telegraf --help ``` #### Generate a telegraf config file: ``` -./telegraf config > telegraf.conf +telegraf config > telegraf.conf ``` #### Generate config with only cpu input & influxdb output plugins defined: ``` -./telegraf --input-filter cpu --output-filter influxdb config +telegraf --input-filter cpu --output-filter influxdb config ``` #### Run a single telegraf collection, outputing metrics to stdout: ``` -./telegraf --config telegraf.conf --test +telegraf --config telegraf.conf --test ``` #### Run telegraf with all plugins defined in config file: ``` -./telegraf --config telegraf.conf +telegraf --config telegraf.conf ``` #### Run telegraf, enabling the cpu & memory input, and influxdb output plugins: ``` -./telegraf --config telegraf.conf --input-filter cpu:mem --output-filter influxdb +telegraf --config telegraf.conf --input-filter cpu:mem --output-filter influxdb ``` +## Documentation -## Configuration +[Latest Release Documentation][release docs]. -See the [configuration guide](docs/CONFIGURATION.md) for a rundown of the more advanced -configuration options. +For documentation on the latest development code see the [documentation index][devel docs]. + +[release docs]: https://docs.influxdata.com/telegraf +[devel docs]: docs ## Input Plugins diff --git a/docs/DATA_FORMATS_INPUT.md b/docs/DATA_FORMATS_INPUT.md index ff9160812..b71650168 100644 --- a/docs/DATA_FORMATS_INPUT.md +++ b/docs/DATA_FORMATS_INPUT.md @@ -1,42 +1,24 @@ -# Telegraf Input Data Formats +# Input Data Formats -Telegraf is able to parse the following input data formats into metrics: +Telegraf contains many general purpose plugins that support parsing input data +using a configurable parser into [metrics][]. This allows, for example, the +`kafka_consumer` input plugin to process messages in either InfluxDB Line +Protocol or in JSON format. -1. [InfluxDB Line Protocol](#influx) -1. [JSON](#json) -1. [Graphite](#graphite) -1. [Value](#value), ie: 45 or "booyah" -1. [Nagios](#nagios) (exec input only) -1. [Collectd](#collectd) -1. [Dropwizard](#dropwizard) -1. [Grok](#grok) -1. [Logfmt](#logfmt) -1. [Wavefront](#wavefront) -1. [CSV](#csv) +- [InfluxDB Line Protocol](/plugins/parsers/influx) +- [Collectd](/plugins/parsers/collectd) +- [CSV](/plugins/parsers/csv) +- [Dropwizard](/plugins/parsers/dropwizard) +- [Graphite](/plugins/parsers/graphite) +- [Grok](/plugins/parsers/grok) +- [JSON](/plugins/parsers/json) +- [Logfmt](/plugins/parsers/logfmt) +- [Nagios](/plugins/parsers/nagios) +- [Value](/plugins/parsers/value), ie: 45 or "booyah" +- [Wavefront](/plugins/parsers/wavefront) -Telegraf metrics, similar to InfluxDB's [points][influxdb key concepts], are a -combination of four basic parts: - -[influxdb key concepts]: https://docs.influxdata.com/influxdb/v1.6/concepts/key_concepts/ - -1. Measurement Name -1. Tags -1. Fields -1. Timestamp - -These four parts are easily defined when using InfluxDB line-protocol as a -data format. But there are other data formats that users may want to use which -require more advanced configuration to create usable Telegraf metrics. - -Plugins such as `exec` and `kafka_consumer` parse textual data. Up until now, -these plugins were statically configured to parse just a single -data format. `exec` mostly only supported parsing JSON, and `kafka_consumer` only -supported data in InfluxDB line-protocol. - -But now we are normalizing the parsing of various data formats across all -plugins that can support it. You will be able to identify a plugin that supports -different data formats by the presence of a `data_format` config option, for -example, in the exec plugin: +Any input plugin containing the `data_format` option can use it to select the +desired parser: ```toml [[inputs.exec]] @@ -51,1081 +33,6 @@ example, in the exec plugin: ## more about them here: ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md data_format = "json" - - ## Additional configuration options go here -``` - -Each data_format has an additional set of configuration options available, which -I'll go over below. - -# Influx: - -There are no additional configuration options for InfluxDB [line protocol][]. The -metrics are parsed directly into Telegraf metrics. - -[line protocol]: https://docs.influxdata.com/influxdb/latest/write_protocols/line/ - -#### Influx Configuration: - -```toml -[[inputs.exec]] - ## Commands array - commands = ["/tmp/test.sh", "/usr/bin/mycollector --foo=bar"] - - ## measurement name suffix (for separating different commands) - name_suffix = "_mycollector" - - ## Data format to consume. - ## Each data format has its own unique set of configuration options, read - ## more about them here: - ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md - data_format = "influx" -``` - -# JSON: - -The JSON data format flattens JSON into metric _fields_. -NOTE: Only numerical values are converted to fields, and they are converted -into a float. strings are ignored unless specified as a tag_key (see below). - -So for example, this JSON: - -```json -{ - "a": 5, - "b": { - "c": 6 - }, - "ignored": "I'm a string" -} -``` - -Would get translated into _fields_ of a measurement: - -``` -myjsonmetric a=5,b_c=6 -``` - -The _measurement_ _name_ is usually the name of the plugin, -but can be overridden using the `name_override` config option. - -#### JSON Configuration: - -The JSON data format supports specifying "tag_keys", "json_string_fields", and "json_query". -If specified, keys in "tag_keys" and "json_string_fields" will be searched for in the root-level -and any nested lists of the JSON blob. All int and float values are added to fields by default. -If the key(s) exist, they will be applied as tags or fields to the Telegraf metrics. -If "json_string_fields" is specified, the string will be added as a field. - -The "json_query" configuration is a gjson path to an JSON object or -list of JSON objects. If this path leads to an array of values or -single data point an error will be thrown. If this configuration -is specified, only the result of the query will be parsed and returned as metrics. - -The "json_name_key" configuration specifies the key of the field whos value will be -added as the metric name. - -Object paths are specified using gjson path format, which is denoted by object keys -concatenated with "." to go deeper in nested JSON objects. -Additional information on gjson paths can be found here: https://github.com/tidwall/gjson#path-syntax - -The JSON data format also supports extracting time values through the -config "json_time_key" and "json_time_format". If "json_time_key" is set, -"json_time_format" must be specified. The "json_time_key" describes the -name of the field containing time information. The "json_time_format" -must be a recognized Go time format. -If parsing a Unix epoch timestamp in seconds, e.g. 1536092344.1, this config must be set to "unix" (case insensitive); -corresponding JSON value can have a decimal part and can be a string or a number JSON representation. -If value is in number representation, it'll be treated as a double precision float, and could have some precision loss. -If value is in string representation, there'll be no precision loss up to nanosecond precision. Decimal positions beyond that will be dropped. -If parsing a Unix epoch timestamp in milliseconds, e.g. 1536092344100, this config must be set to "unix_ms" (case insensitive); -corresponding JSON value must be a (long) integer and be in number JSON representation. -If there is no year provided, the metrics will have the current year. -More info on time formats can be found here: https://golang.org/pkg/time/#Parse - -For example, if you had this configuration: - -```toml -[[inputs.exec]] - ## Commands array - commands = ["/tmp/test.sh", "/usr/bin/mycollector --foo=bar"] - - ## measurement name suffix (for separating different commands) - name_suffix = "_mycollector" - - ## Data format to consume. - ## Each data format has its own unique set of configuration options, read - ## more about them here: - ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md - data_format = "json" - - ## List of tag names to extract from JSON server response - tag_keys = [ - "my_tag_1", - "my_tag_2" - ] - - ## The json path specifying where to extract the metric name from - # json_name_key = "" - - ## List of field names to extract from JSON and add as string fields - # json_string_fields = [] - - ## gjson query path to specify a specific chunk of JSON to be parsed with - ## the above configuration. If not specified, the whole file will be parsed. - ## gjson query paths are described here: https://github.com/tidwall/gjson#path-syntax - # json_query = "" - - ## holds the name of the tag of timestamp - # json_time_key = "" - - ## holds the format of timestamp to be parsed - # json_time_format = "" ``` -with this JSON output from a command: - -```json -{ - "a": 5, - "b": { - "c": 6 - }, - "my_tag_1": "foo" -} -``` - -Your Telegraf metrics would get tagged with "my_tag_1" - -``` -exec_mycollector,my_tag_1=foo a=5,b_c=6 -``` - -If the JSON data is an array, then each element of the array is -parsed with the configured settings. Each resulting metric will -be output with the same timestamp. - -For example, if the following configuration: - -```toml -[[inputs.exec]] - ## Commands array - commands = ["/usr/bin/mycollector --foo=bar"] - - ## measurement name suffix (for separating different commands) - name_suffix = "_mycollector" - - ## Data format to consume. - ## Each data format has its own unique set of configuration options, read - ## more about them here: - ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md - data_format = "json" - - ## List of tag names to extract from top-level of JSON server response - tag_keys = [ - "my_tag_1", - "my_tag_2" - ] - - ## List of field names to extract from JSON and add as string fields - # json_string_fields = [] - - ## gjson query path to specify a specific chunk of JSON to be parsed with - ## the above configuration. If not specified, the whole file will be parsed - # json_query = "" - - ## holds the name of the tag of timestamp - json_time_key = "b_time" - - ## holds the format of timestamp to be parsed - json_time_format = "02 Jan 06 15:04 MST" -``` - -with this JSON output from a command: - -```json -[ - { - "a": 5, - "b": { - "c": 6, - "time":"04 Jan 06 15:04 MST" - }, - "my_tag_1": "foo", - "my_tag_2": "baz" - }, - { - "a": 7, - "b": { - "c": 8, - "time":"11 Jan 07 15:04 MST" - }, - "my_tag_1": "bar", - "my_tag_2": "baz" - } -] -``` - -Your Telegraf metrics would get tagged with "my_tag_1" and "my_tag_2" and fielded with "b_c" -The metric's time will be a time.Time object, as specified by "b_time" - -``` -exec_mycollector,my_tag_1=foo,my_tag_2=baz b_c=6 1136387040000000000 -exec_mycollector,my_tag_1=bar,my_tag_2=baz b_c=8 1168527840000000000 -``` - -If you want to only use a specific portion of your JSON, use the "json_query" -configuration to specify a path to a JSON object. - -For example, with the following config: -```toml -[[inputs.exec]] - ## Commands array - commands = ["/usr/bin/mycollector --foo=bar"] - - ## measurement name suffix (for separating different commands) - name_suffix = "_mycollector" - - ## Data format to consume. - ## Each data format has its own unique set of configuration options, read - ## more about them here: - ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md - data_format = "json" - - ## List of tag names to extract from top-level of JSON server response - tag_keys = ["first"] - - ## List of field names to extract from JSON and add as string fields - json_string_fields = ["last"] - - ## gjson query path to specify a specific chunk of JSON to be parsed with - ## the above configuration. If not specified, the whole file will be parsed - json_query = "obj.friends" - - ## holds the name of the tag of timestamp - # json_time_key = "" - - ## holds the format of timestamp to be parsed - # json_time_format = "" -``` - -with this JSON as input: -```json -{ - "obj": { - "name": {"first": "Tom", "last": "Anderson"}, - "age":37, - "children": ["Sara","Alex","Jack"], - "fav.movie": "Deer Hunter", - "friends": [ - {"first": "Dale", "last": "Murphy", "age": 44}, - {"first": "Roger", "last": "Craig", "age": 68}, - {"first": "Jane", "last": "Murphy", "age": 47} - ] - } -} -``` -You would recieve 3 metrics tagged with "first", and fielded with "last" and "age" - -``` -exec_mycollector, "first":"Dale" "last":"Murphy","age":44 -exec_mycollector, "first":"Roger" "last":"Craig","age":68 -exec_mycollector, "first":"Jane" "last":"Murphy","age":47 -``` - -# Value: - -The "value" data format translates single values into Telegraf metrics. This -is done by assigning a measurement name and setting a single field ("value") -as the parsed metric. - -#### Value Configuration: - -You **must** tell Telegraf what type of metric to collect by using the -`data_type` configuration option. Available options are: - -1. integer -2. float or long -3. string -4. boolean - -**Note:** It is also recommended that you set `name_override` to a measurement -name that makes sense for your metric, otherwise it will just be set to the -name of the plugin. - -```toml -[[inputs.exec]] - ## Commands array - commands = ["cat /proc/sys/kernel/random/entropy_avail"] - - ## override the default metric name of "exec" - name_override = "entropy_available" - - ## Data format to consume. - ## Each data format has its own unique set of configuration options, read - ## more about them here: - ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md - data_format = "value" - data_type = "integer" # required -``` - -# Graphite: - -The Graphite data format translates graphite _dot_ buckets directly into -telegraf measurement names, with a single value field, and without any tags. -By default, the separator is left as ".", but this can be changed using the -"separator" argument. For more advanced options, -Telegraf supports specifying "templates" to translate -graphite buckets into Telegraf metrics. - -Templates are of the form: - -``` -"host.mytag.mytag.measurement.measurement.field*" -``` - -Where the following keywords exist: - -1. `measurement`: specifies that this section of the graphite bucket corresponds -to the measurement name. This can be specified multiple times. -2. `field`: specifies that this section of the graphite bucket corresponds -to the field name. This can be specified multiple times. -3. `measurement*`: specifies that all remaining elements of the graphite bucket -correspond to the measurement name. -4. `field*`: specifies that all remaining elements of the graphite bucket -correspond to the field name. - -Any part of the template that is not a keyword is treated as a tag key. This -can also be specified multiple times. - -NOTE: `field*` cannot be used in conjunction with `measurement*`! - -#### Measurement & Tag Templates: - -The most basic template is to specify a single transformation to apply to all -incoming metrics. So the following template: - -```toml -templates = [ - "region.region.measurement*" -] -``` - -would result in the following Graphite -> Telegraf transformation. - -``` -us.west.cpu.load 100 -=> cpu.load,region=us.west value=100 -``` - -Multiple templates can also be specified, but these should be differentiated -using _filters_ (see below for more details) - -```toml -templates = [ - "*.*.* region.region.measurement", # <- all 3-part measurements will match this one. - "*.*.*.* region.region.host.measurement", # <- all 4-part measurements will match this one. -] -``` - -#### Field Templates: - -The field keyword tells Telegraf to give the metric that field name. -So the following template: - -```toml -separator = "_" -templates = [ - "measurement.measurement.field.field.region" -] -``` - -would result in the following Graphite -> Telegraf transformation. - -``` -cpu.usage.idle.percent.eu-east 100 -=> cpu_usage,region=eu-east idle_percent=100 -``` - -The field key can also be derived from all remaining elements of the graphite -bucket by specifying `field*`: - -```toml -separator = "_" -templates = [ - "measurement.measurement.region.field*" -] -``` - -which would result in the following Graphite -> Telegraf transformation. - -``` -cpu.usage.eu-east.idle.percentage 100 -=> cpu_usage,region=eu-east idle_percentage=100 -``` - -#### Filter Templates: - -Users can also filter the template(s) to use based on the name of the bucket, -using glob matching, like so: - -```toml -templates = [ - "cpu.* measurement.measurement.region", - "mem.* measurement.measurement.host" -] -``` - -which would result in the following transformation: - -``` -cpu.load.eu-east 100 -=> cpu_load,region=eu-east value=100 - -mem.cached.localhost 256 -=> mem_cached,host=localhost value=256 -``` - -#### Adding Tags: - -Additional tags can be added to a metric that don't exist on the received metric. -You can add additional tags by specifying them after the pattern. -Tags have the same format as the line protocol. -Multiple tags are separated by commas. - -```toml -templates = [ - "measurement.measurement.field.region datacenter=1a" -] -``` - -would result in the following Graphite -> Telegraf transformation. - -``` -cpu.usage.idle.eu-east 100 -=> cpu_usage,region=eu-east,datacenter=1a idle=100 -``` - -There are many more options available, -[More details can be found here](https://github.com/influxdata/influxdb/tree/master/services/graphite#templates) - -#### Graphite Configuration: - -```toml -[[inputs.exec]] - ## Commands array - commands = ["/tmp/test.sh", "/usr/bin/mycollector --foo=bar"] - - ## measurement name suffix (for separating different commands) - name_suffix = "_mycollector" - - ## Data format to consume. - ## Each data format has its own unique set of configuration options, read - ## more about them here: - ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md - data_format = "graphite" - - ## This string will be used to join the matched values. - separator = "_" - - ## Each template line requires a template pattern. It can have an optional - ## filter before the template and separated by spaces. It can also have optional extra - ## tags following the template. Multiple tags should be separated by commas and no spaces - ## similar to the line protocol format. There can be only one default template. - ## Templates support below format: - ## 1. filter + template - ## 2. filter + template + extra tag(s) - ## 3. filter + template with field key - ## 4. default template - templates = [ - "*.app env.service.resource.measurement", - "stats.* .host.measurement* region=eu-east,agent=sensu", - "stats2.* .host.measurement.field", - "measurement*" - ] -``` - -# Nagios: - -There are no additional configuration options for Nagios line-protocol. The -metrics are parsed directly into Telegraf metrics. - -Note: Nagios Input Data Formats is only supported in `exec` input plugin. - -#### Nagios Configuration: - -```toml -[[inputs.exec]] - ## Commands array - commands = ["/usr/lib/nagios/plugins/check_load -w 5,6,7 -c 7,8,9"] - - ## measurement name suffix (for separating different commands) - name_suffix = "_mycollector" - - ## Data format to consume. - ## Each data format has its own unique set of configuration options, read - ## more about them here: - ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md - data_format = "nagios" -``` - -# Collectd: - -The collectd format parses the collectd binary network protocol. Tags are -created for host, instance, type, and type instance. All collectd values are -added as float64 fields. - -For more information about the binary network protocol see -[here](https://collectd.org/wiki/index.php/Binary_protocol). - -You can control the cryptographic settings with parser options. Create an -authentication file and set `collectd_auth_file` to the path of the file, then -set the desired security level in `collectd_security_level`. - -Additional information including client setup can be found -[here](https://collectd.org/wiki/index.php/Networking_introduction#Cryptographic_setup). - -You can also change the path to the typesdb or add additional typesdb using -`collectd_typesdb`. - -#### Collectd Configuration: - -```toml -[[inputs.socket_listener]] - service_address = "udp://127.0.0.1:25826" - name_prefix = "collectd_" - - ## Data format to consume. - ## Each data format has its own unique set of configuration options, read - ## more about them here: - ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md - data_format = "collectd" - - ## Authentication file for cryptographic security levels - collectd_auth_file = "/etc/collectd/auth_file" - ## One of none (default), sign, or encrypt - collectd_security_level = "encrypt" - ## Path of to TypesDB specifications - collectd_typesdb = ["/usr/share/collectd/types.db"] - - # Multi-value plugins can be handled two ways. - # "split" will parse and store the multi-value plugin data into separate measurements - # "join" will parse and store the multi-value plugin as a single multi-value measurement. - # "split" is the default behavior for backward compatability with previous versions of influxdb. - collectd_parse_multivalue = "split" -``` - -# Dropwizard: - -The dropwizard format can parse the JSON representation of a single dropwizard metric registry. By default, tags are parsed from metric names as if they were actual influxdb line protocol keys (`measurement<,tag_set>`) which can be overriden by defining custom [measurement & tag templates](./DATA_FORMATS_INPUT.md#measurement--tag-templates). All field value types are supported, `string`, `number` and `boolean`. - -A typical JSON of a dropwizard metric registry: - -```json -{ - "version": "3.0.0", - "counters" : { - "measurement,tag1=green" : { - "count" : 1 - } - }, - "meters" : { - "measurement" : { - "count" : 1, - "m15_rate" : 1.0, - "m1_rate" : 1.0, - "m5_rate" : 1.0, - "mean_rate" : 1.0, - "units" : "events/second" - } - }, - "gauges" : { - "measurement" : { - "value" : 1 - } - }, - "histograms" : { - "measurement" : { - "count" : 1, - "max" : 1.0, - "mean" : 1.0, - "min" : 1.0, - "p50" : 1.0, - "p75" : 1.0, - "p95" : 1.0, - "p98" : 1.0, - "p99" : 1.0, - "p999" : 1.0, - "stddev" : 1.0 - } - }, - "timers" : { - "measurement" : { - "count" : 1, - "max" : 1.0, - "mean" : 1.0, - "min" : 1.0, - "p50" : 1.0, - "p75" : 1.0, - "p95" : 1.0, - "p98" : 1.0, - "p99" : 1.0, - "p999" : 1.0, - "stddev" : 1.0, - "m15_rate" : 1.0, - "m1_rate" : 1.0, - "m5_rate" : 1.0, - "mean_rate" : 1.0, - "duration_units" : "seconds", - "rate_units" : "calls/second" - } - } -} -``` - -Would get translated into 4 different measurements: - -``` -measurement,metric_type=counter,tag1=green count=1 -measurement,metric_type=meter count=1,m15_rate=1.0,m1_rate=1.0,m5_rate=1.0,mean_rate=1.0 -measurement,metric_type=gauge value=1 -measurement,metric_type=histogram count=1,max=1.0,mean=1.0,min=1.0,p50=1.0,p75=1.0,p95=1.0,p98=1.0,p99=1.0,p999=1.0 -measurement,metric_type=timer count=1,max=1.0,mean=1.0,min=1.0,p50=1.0,p75=1.0,p95=1.0,p98=1.0,p99=1.0,p999=1.0,stddev=1.0,m15_rate=1.0,m1_rate=1.0,m5_rate=1.0,mean_rate=1.0 -``` - -You may also parse a dropwizard registry from any JSON document which contains a dropwizard registry in some inner field. -Eg. to parse the following JSON document: - -```json -{ - "time" : "2017-02-22T14:33:03.662+02:00", - "tags" : { - "tag1" : "green", - "tag2" : "yellow" - }, - "metrics" : { - "counters" : { - "measurement" : { - "count" : 1 - } - }, - "meters" : {}, - "gauges" : {}, - "histograms" : {}, - "timers" : {} - } -} -``` -and translate it into: - -``` -measurement,metric_type=counter,tag1=green,tag2=yellow count=1 1487766783662000000 -``` - -you simply need to use the following additional configuration properties: - -```toml -dropwizard_metric_registry_path = "metrics" -dropwizard_time_path = "time" -dropwizard_time_format = "2006-01-02T15:04:05Z07:00" -dropwizard_tags_path = "tags" -## tag paths per tag are supported too, eg. -#[inputs.yourinput.dropwizard_tag_paths] -# tag1 = "tags.tag1" -# tag2 = "tags.tag2" -``` - - -For more information about the dropwizard json format see -[here](http://metrics.dropwizard.io/3.1.0/manual/json/). - -#### Dropwizard Configuration: - -```toml -[[inputs.exec]] - ## Commands array - commands = ["curl http://localhost:8080/sys/metrics"] - timeout = "5s" - - ## Data format to consume. - ## Each data format has its own unique set of configuration options, read - ## more about them here: - ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md - data_format = "dropwizard" - - ## Used by the templating engine to join matched values when cardinality is > 1 - separator = "_" - - ## Each template line requires a template pattern. It can have an optional - ## filter before the template and separated by spaces. It can also have optional extra - ## tags following the template. Multiple tags should be separated by commas and no spaces - ## similar to the line protocol format. There can be only one default template. - ## Templates support below format: - ## 1. filter + template - ## 2. filter + template + extra tag(s) - ## 3. filter + template with field key - ## 4. default template - ## By providing an empty template array, templating is disabled and measurements are parsed as influxdb line protocol keys (measurement<,tag_set>) - templates = [] - - ## You may use an appropriate [gjson path](https://github.com/tidwall/gjson#path-syntax) - ## to locate the metric registry within the JSON document - # dropwizard_metric_registry_path = "metrics" - - ## You may use an appropriate [gjson path](https://github.com/tidwall/gjson#path-syntax) - ## to locate the default time of the measurements within the JSON document - # dropwizard_time_path = "time" - # dropwizard_time_format = "2006-01-02T15:04:05Z07:00" - - ## You may use an appropriate [gjson path](https://github.com/tidwall/gjson#path-syntax) - ## to locate the tags map within the JSON document - # dropwizard_tags_path = "tags" - - ## You may even use tag paths per tag - # [inputs.exec.dropwizard_tag_paths] - # tag1 = "tags.tag1" - # tag2 = "tags.tag2" -``` - -# Grok: - -The grok data format parses line delimited data using a regular expression like -language. - -The best way to get acquainted with grok patterns is to read the logstash docs, -which are available here: - https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html - -The grok parser uses a slightly modified version of logstash "grok" -patterns, with the format: - -``` -%{[:][:]} -``` - -The `capture_syntax` defines the grok pattern that's used to parse the input -line and the `semantic_name` is used to name the field or tag. The extension -`modifier` controls the data type that the parsed item is converted to or -other special handling. - -By default all named captures are converted into string fields. -Timestamp modifiers can be used to convert captures to the timestamp of the -parsed metric. If no timestamp is parsed the metric will be created using the -current time. - -You must capture at least one field per line. - -- Available modifiers: - - string (default if nothing is specified) - - int - - float - - duration (ie, 5.23ms gets converted to int nanoseconds) - - tag (converts the field into a tag) - - drop (drops the field completely) - - measurement (use the matched text as the measurement name) -- Timestamp modifiers: - - ts (This will auto-learn the timestamp format) - - ts-ansic ("Mon Jan _2 15:04:05 2006") - - ts-unix ("Mon Jan _2 15:04:05 MST 2006") - - ts-ruby ("Mon Jan 02 15:04:05 -0700 2006") - - ts-rfc822 ("02 Jan 06 15:04 MST") - - ts-rfc822z ("02 Jan 06 15:04 -0700") - - ts-rfc850 ("Monday, 02-Jan-06 15:04:05 MST") - - ts-rfc1123 ("Mon, 02 Jan 2006 15:04:05 MST") - - ts-rfc1123z ("Mon, 02 Jan 2006 15:04:05 -0700") - - ts-rfc3339 ("2006-01-02T15:04:05Z07:00") - - ts-rfc3339nano ("2006-01-02T15:04:05.999999999Z07:00") - - ts-httpd ("02/Jan/2006:15:04:05 -0700") - - ts-epoch (seconds since unix epoch, may contain decimal) - - ts-epochnano (nanoseconds since unix epoch) - - ts-syslog ("Jan 02 15:04:05", parsed time is set to the current year) - - ts-"CUSTOM" - -CUSTOM time layouts must be within quotes and be the representation of the -"reference time", which is `Mon Jan 2 15:04:05 -0700 MST 2006`. -To match a comma decimal point you can use a period. For example `%{TIMESTAMP:timestamp:ts-"2006-01-02 15:04:05.000"}` can be used to match `"2018-01-02 15:04:05,000"` -To match a comma decimal point you can use a period in the pattern string. -See https://golang.org/pkg/time/#Parse for more details. - -Telegraf has many of its own [built-in patterns](./grok/patterns/influx-patterns), -as well as support for most of -[logstash's builtin patterns](https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns). -_Golang regular expressions do not support lookahead or lookbehind. -logstash patterns that depend on these are not supported._ - -If you need help building patterns to match your logs, -you will find the https://grokdebug.herokuapp.com application quite useful! - -#### Grok Configuration: -```toml -[[inputs.file]] - ## Files to parse each interval. - ## These accept standard unix glob matching rules, but with the addition of - ## ** as a "super asterisk". ie: - ## /var/log/**.log -> recursively find all .log files in /var/log - ## /var/log/*/*.log -> find all .log files with a parent dir in /var/log - ## /var/log/apache.log -> only tail the apache log file - files = ["/var/log/apache/access.log"] - - ## The dataformat to be read from files - ## Each data format has its own unique set of configuration options, read - ## more about them here: - ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md - data_format = "grok" - - ## This is a list of patterns to check the given log file(s) for. - ## Note that adding patterns here increases processing time. The most - ## efficient configuration is to have one pattern. - ## Other common built-in patterns are: - ## %{COMMON_LOG_FORMAT} (plain apache & nginx access logs) - ## %{COMBINED_LOG_FORMAT} (access logs + referrer & agent) - grok_patterns = ["%{COMBINED_LOG_FORMAT}"] - - ## Full path(s) to custom pattern files. - grok_custom_pattern_files = [] - - ## Custom patterns can also be defined here. Put one pattern per line. - grok_custom_patterns = ''' - ''' - - ## Timezone allows you to provide an override for timestamps that - ## don't already include an offset - ## e.g. 04/06/2016 12:41:45 data one two 5.43µs - ## - ## Default: "" which renders UTC - ## Options are as follows: - ## 1. Local -- interpret based on machine localtime - ## 2. "Canada/Eastern" -- Unix TZ values like those found in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones - ## 3. UTC -- or blank/unspecified, will return timestamp in UTC - grok_timezone = "Canada/Eastern" -``` - -#### Timestamp Examples - -This example input and config parses a file using a custom timestamp conversion: - -``` -2017-02-21 13:10:34 value=42 -``` - -```toml -[[inputs.file]] - grok_patterns = ['%{TIMESTAMP_ISO8601:timestamp:ts-"2006-01-02 15:04:05"} value=%{NUMBER:value:int}'] -``` - -This example input and config parses a file using a timestamp in unix time: - -``` -1466004605 value=42 -1466004605.123456789 value=42 -``` - -```toml -[[inputs.file]] - grok_patterns = ['%{NUMBER:timestamp:ts-epoch} value=%{NUMBER:value:int}'] -``` - -This example parses a file using a built-in conversion and a custom pattern: - -``` -Wed Apr 12 13:10:34 PST 2017 value=42 -``` - -```toml -[[inputs.file]] - grok_patterns = ["%{TS_UNIX:timestamp:ts-unix} value=%{NUMBER:value:int}"] - grok_custom_patterns = ''' - TS_UNIX %{DAY} %{MONTH} %{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND} %{TZ} %{YEAR} - ''' -``` - -For cases where the timestamp itself is without offset, the `timezone` config var is available -to denote an offset. By default (with `timezone` either omit, blank or set to `"UTC"`), the times -are processed as if in the UTC timezone. If specified as `timezone = "Local"`, the timestamp -will be processed based on the current machine timezone configuration. Lastly, if using a -timezone from the list of Unix [timezones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones), -grok will offset the timestamp accordingly. - -#### TOML Escaping - -When saving patterns to the configuration file, keep in mind the different TOML -[string](https://github.com/toml-lang/toml#string) types and the escaping -rules for each. These escaping rules must be applied in addition to the -escaping required by the grok syntax. Using the Multi-line line literal -syntax with `'''` may be useful. - -The following config examples will parse this input file: - -``` -|42|\uD83D\uDC2F|'telegraf'| -``` - -Since `|` is a special character in the grok language, we must escape it to -get a literal `|`. With a basic TOML string, special characters such as -backslash must be escaped, requiring us to escape the backslash a second time. - -```toml -[[inputs.file]] - grok_patterns = ["\\|%{NUMBER:value:int}\\|%{UNICODE_ESCAPE:escape}\\|'%{WORD:name}'\\|"] - grok_custom_patterns = "UNICODE_ESCAPE (?:\\\\u[0-9A-F]{4})+" -``` - -We cannot use a literal TOML string for the pattern, because we cannot match a -`'` within it. However, it works well for the custom pattern. -```toml -[[inputs.file]] - grok_patterns = ["\\|%{NUMBER:value:int}\\|%{UNICODE_ESCAPE:escape}\\|'%{WORD:name}'\\|"] - grok_custom_patterns = 'UNICODE_ESCAPE (?:\\u[0-9A-F]{4})+' -``` - -A multi-line literal string allows us to encode the pattern: -```toml -[[inputs.file]] - grok_patterns = [''' - \|%{NUMBER:value:int}\|%{UNICODE_ESCAPE:escape}\|'%{WORD:name}'\| - '''] - grok_custom_patterns = 'UNICODE_ESCAPE (?:\\u[0-9A-F]{4})+' -``` - -#### Tips for creating patterns - -Writing complex patterns can be difficult, here is some advice for writing a -new pattern or testing a pattern developed [online](https://grokdebug.herokuapp.com). - -Create a file output that writes to stdout, and disable other outputs while -testing. This will allow you to see the captured metrics. Keep in mind that -the file output will only print once per `flush_interval`. - -```toml -[[outputs.file]] - files = ["stdout"] -``` - -- Start with a file containing only a single line of your input. -- Remove all but the first token or piece of the line. -- Add the section of your pattern to match this piece to your configuration file. -- Verify that the metric is parsed successfully by running Telegraf. -- If successful, add the next token, update the pattern and retest. -- Continue one token at a time until the entire line is successfully parsed. - -# Logfmt -This parser implements the logfmt format by extracting and converting key-value pairs from log text in the form `=`. -At the moment, the plugin will produce one metric per line and all keys -are added as fields. -A typical log -``` -method=GET host=influxdata.org ts=2018-07-24T19:43:40.275Z -connect=4ms service=8ms status=200 bytes=1653 -``` -will be converted into -``` -logfmt method="GET",host="influxdata.org",ts="2018-07-24T19:43:40.275Z",connect="4ms",service="8ms",status=200i,bytes=1653i - -``` -Additional information about the logfmt format can be found [here](https://brandur.org/logfmt). - -# Wavefront: - -Wavefront Data Format is metrics are parsed directly into Telegraf metrics. -For more information about the Wavefront Data Format see -[here](https://docs.wavefront.com/wavefront_data_format.html). - -There are no additional configuration options for Wavefront Data Format line-protocol. - -#### Wavefront Configuration: - -```toml -[[inputs.exec]] - ## Commands array - commands = ["/tmp/test.sh", "/usr/bin/mycollector --foo=bar"] - - ## measurement name suffix (for separating different commands) - name_suffix = "_mycollector" - - ## Data format to consume. - ## Each data format has its own unique set of configuration options, read - ## more about them here: - ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md - data_format = "wavefront" -``` - -# CSV -Parse out metrics from a CSV formatted table. By default, the parser assumes there is no header and -will read data from the first line. If `csv_header_row_count` is set to anything besides 0, the parser -will extract column names from the first number of rows. Headers of more than 1 row will have their -names concatenated together. Any unnamed columns will be ignored by the parser. - -The `csv_skip_rows` config indicates the number of rows to skip before looking for header information or data -to parse. By default, no rows will be skipped. - -The `csv_skip_columns` config indicates the number of columns to be skipped before parsing data. These -columns will not be read out of the header. Naming with the `csv_column_names` will begin at the first -parsed column after skipping the indicated columns. By default, no columns are skipped. - -To assign custom column names, the `csv_column_names` config is available. If the `csv_column_names` -config is used, all columns must be named as additional columns will be ignored. If `csv_header_row_count` -is set to 0, `csv_column_names` must be specified. Names listed in `csv_column_names` will override names extracted -from the header. - -The `csv_tag_columns` and `csv_field_columns` configs are available to add the column data to the metric. -The name used to specify the column is the name in the header, or if specified, the corresponding -name assigned in `csv_column_names`. If neither config is specified, no data will be added to the metric. - -Additional configs are available to dynamically name metrics and set custom timestamps. If the -`csv_column_names` config is specified, the parser will assign the metric name to the value found -in that column. If the `csv_timestamp_column` is specified, the parser will extract the timestamp from -that column. If `csv_timestamp_column` is specified, the `csv_timestamp_format` must also be specified -or an error will be thrown. - -#### CSV Configuration -```toml - data_format = "csv" - - ## Indicates how many rows to treat as a header. By default, the parser assumes - ## there is no header and will parse the first row as data. If set to anything more - ## than 1, column names will be concatenated with the name listed in the next header row. - ## If `csv_column_names` is specified, the column names in header will be overridden. - # csv_header_row_count = 0 - - ## Indicates the number of rows to skip before looking for header information. - # csv_skip_rows = 0 - - ## Indicates the number of columns to skip before looking for data to parse. - ## These columns will be skipped in the header as well. - # csv_skip_columns = 0 - - ## The seperator between csv fields - ## By default, the parser assumes a comma (",") - # csv_delimiter = "," - - ## The character reserved for marking a row as a comment row - ## Commented rows are skipped and not parsed - # csv_comment = "" - - ## If set to true, the parser will remove leading whitespace from fields - ## By default, this is false - # csv_trim_space = false - - ## For assigning custom names to columns - ## If this is specified, all columns should have a name - ## Unnamed columns will be ignored by the parser. - ## If `csv_header_row_count` is set to 0, this config must be used - csv_column_names = [] - - ## Columns listed here will be added as tags. Any other columns - ## will be added as fields. - csv_tag_columns = [] - - ## The column to extract the name of the metric from - ## By default, this is the name of the plugin - ## the `name_override` config overrides this - # csv_measurement_column = "" - - ## The column to extract time information for the metric - ## `csv_timestamp_format` must be specified if this is used - # csv_timestamp_column = "" - - ## The format of time data extracted from `csv_timestamp_column` - ## this must be specified if `csv_timestamp_column` is specified - # csv_timestamp_format = "" - ``` +[metrics]: /docs/METRICS.md diff --git a/docs/DATA_FORMATS_OUTPUT.md b/docs/DATA_FORMATS_OUTPUT.md index 609021656..c06ab4719 100644 --- a/docs/DATA_FORMATS_OUTPUT.md +++ b/docs/DATA_FORMATS_OUTPUT.md @@ -4,13 +4,14 @@ In addition to output specific data formats, Telegraf supports a set of standard data formats that may be selected from when configuring many output plugins. -1. [InfluxDB Line Protocol](#influx) -1. [JSON](#json) -1. [Graphite](#graphite) -1. [SplunkMetric](../plugins/serializers/splunkmetric/README.md) +1. [InfluxDB Line Protocol](/plugins/serializers/influx) +1. [JSON](/plugins/serializers/json) +1. [Graphite](/plugins/serializers/graphite) +1. [SplunkMetric](/plugins/serializers/splunkmetric) You will be able to identify the plugins with support by the presence of a `data_format` config option, for example, in the `file` output plugin: + ```toml [[outputs.file]] ## Files to write to, "stdout" is a specially handled file. @@ -22,191 +23,3 @@ You will be able to identify the plugins with support by the presence of a ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md data_format = "influx" ``` - -## Influx - -The `influx` data format outputs metrics using -[InfluxDB Line Protocol](https://docs.influxdata.com/influxdb/latest/write_protocols/line_protocol_tutorial/). -This is the recommended format unless another format is required for -interoperability. - -### Influx Configuration -```toml -[[outputs.file]] - ## Files to write to, "stdout" is a specially handled file. - files = ["stdout", "/tmp/metrics.out"] - - ## Data format to output. - ## Each data format has its own unique set of configuration options, read - ## more about them here: - ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md - data_format = "influx" - - ## Maximum line length in bytes. Useful only for debugging. - # influx_max_line_bytes = 0 - - ## When true, fields will be output in ascending lexical order. Enabling - ## this option will result in decreased performance and is only recommended - ## when you need predictable ordering while debugging. - # influx_sort_fields = false - - ## When true, Telegraf will output unsigned integers as unsigned values, - ## i.e.: `42u`. You will need a version of InfluxDB supporting unsigned - ## integer values. Enabling this option will result in field type errors if - ## existing data has been written. - # influx_uint_support = false -``` - -## Graphite - -The Graphite data format is translated from Telegraf Metrics using either the -template pattern or tag support method. You can select between the two -methods using the [`graphite_tag_support`](#graphite-tag-support) option. When set, the tag support -method is used, otherwise the [`template` pattern](#template-pattern) is used. - -#### Template Pattern - -The `template` option describes how Telegraf traslates metrics into _dot_ -buckets. The default template is: - -``` -template = "host.tags.measurement.field" -``` - -In the above template, we have four parts: - -1. _host_ is a tag key. This can be any tag key that is in the Telegraf -metric(s). If the key doesn't exist, it will be ignored. If it does exist, the -tag value will be filled in. -1. _tags_ is a special keyword that outputs all remaining tag values, separated -by dots and in alphabetical order (by tag key). These will be filled after all -tag keys are filled. -1. _measurement_ is a special keyword that outputs the measurement name. -1. _field_ is a special keyword that outputs the field name. - -**Example Conversion**: - -``` -cpu,cpu=cpu-total,dc=us-east-1,host=tars usage_idle=98.09,usage_user=0.89 1455320660004257758 -=> -tars.cpu-total.us-east-1.cpu.usage_user 0.89 1455320690 -tars.cpu-total.us-east-1.cpu.usage_idle 98.09 1455320690 -``` - -Fields with string values will be skipped. Boolean fields will be converted -to 1 (true) or 0 (false). - -#### Graphite Tag Support - -When the `graphite_tag_support` option is enabled, the template pattern is not -used. Instead, tags are encoded using -[Graphite tag support](http://graphite.readthedocs.io/en/latest/tags.html) -added in Graphite 1.1. The `metric_path` is a combination of the optional -`prefix` option, measurement name, and field name. - -The tag `name` is reserved by Graphite, any conflicting tags and will be encoded as `_name`. - -**Example Conversion**: -``` -cpu,cpu=cpu-total,dc=us-east-1,host=tars usage_idle=98.09,usage_user=0.89 1455320660004257758 -=> -cpu.usage_user;cpu=cpu-total;dc=us-east-1;host=tars 0.89 1455320690 -cpu.usage_idle;cpu=cpu-total;dc=us-east-1;host=tars 98.09 1455320690 -``` - -### Graphite Configuration - -```toml -[[outputs.file]] - ## Files to write to, "stdout" is a specially handled file. - files = ["stdout", "/tmp/metrics.out"] - - ## Data format to output. - ## Each data format has its own unique set of configuration options, read - ## more about them here: - ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md - data_format = "graphite" - - ## Prefix added to each graphite bucket - prefix = "telegraf" - ## Graphite template pattern - template = "host.tags.measurement.field" - - ## Support Graphite tags, recommended to enable when using Graphite 1.1 or later. - # graphite_tag_support = false -``` - -## JSON - -The JSON output data format output for a single metric is in the -form: -```json -{ - "fields": { - "field_1": 30, - "field_2": 4, - "field_N": 59, - "n_images": 660 - }, - "name": "docker", - "tags": { - "host": "raynor" - }, - "timestamp": 1458229140 -} -``` - -When an output plugin needs to emit multiple metrics at one time, it may use -the batch format. The use of batch format is determined by the plugin, -reference the documentation for the specific plugin. -```json -{ - "metrics": [ - { - "fields": { - "field_1": 30, - "field_2": 4, - "field_N": 59, - "n_images": 660 - }, - "name": "docker", - "tags": { - "host": "raynor" - }, - "timestamp": 1458229140 - }, - { - "fields": { - "field_1": 30, - "field_2": 4, - "field_N": 59, - "n_images": 660 - }, - "name": "docker", - "tags": { - "host": "raynor" - }, - "timestamp": 1458229140 - } - ] -} -``` - -### JSON Configuration - -```toml -[[outputs.file]] - ## Files to write to, "stdout" is a specially handled file. - files = ["stdout", "/tmp/metrics.out"] - - ## Data format to output. - ## Each data format has its own unique set of configuration options, read - ## more about them here: - ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md - data_format = "json" - - ## The resolution to use for the metric timestamp. Must be a duration string - ## such as "1ns", "1us", "1ms", "10ms", "1s". Durations are truncated to - ## the power of 10 less than the specified units. - json_timestamp_units = "1s" -``` diff --git a/docs/METRICS.md b/docs/METRICS.md new file mode 100644 index 000000000..1c238e30a --- /dev/null +++ b/docs/METRICS.md @@ -0,0 +1,22 @@ +# Metrics + +Telegraf metrics are the internal representation used to model data during +processing. Metrics are closely based on InfluxDB's data model and contain +four main components: + +- **Measurement Name**: Description and namespace for the metric. +- **Tags**: Key/Value string pairs and usually used to identify the + metric. +- **Fields**: Key/Value pairs that are typed and usually contain the + metric data. +- **Timestamp**: Date and time associated with the fields. + +This metric type exists only in memory and must be converted to a concrete +representation in order to be transmitted or viewed. To acheive this we +provide several [output data formats][] sometimes referred to as +*serializers*. Our default serializer converts to [InfluxDB Line +Protocol][line protocol] which provides a high performance and one-to-one +direct mapping from Telegraf metrics. + +[output data formats]: /docs/DATA_FORMATS_OUTPUT.md +[line protocol]: /plugins/serializers/influx diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 000000000..b7b55336c --- /dev/null +++ b/docs/README.md @@ -0,0 +1,21 @@ +# Telegraf + +- Concepts + - [Metrics][metrics] + - [Input Data Formats][parsers] + - [Output Data Formats][serializers] + - [Aggregators & Processors][aggproc] +- Administration + - [Configuration][conf] + - [Profiling][profiling] + - [Windows Service][winsvc] + - [FAQ][faq] + +[conf]: /docs/CONFIGURATION.md +[metrics]: /docs/METRICS.md +[parsers]: /docs/DATA_FORMATS_INPUT.md +[serializers]: /docs/DATA_FORMATS_OUTPUT.md +[aggproc]: /docs/AGGREGATORS_AND_PROCESSORS.md +[profiling]: /docs/PROFILING.md +[winsvc]: /docs/WINDOWS_SERVICE.md +[faq]: /docs/FAQ.md diff --git a/docs/TEMPLATE_PATTERN.md b/docs/TEMPLATE_PATTERN.md new file mode 100644 index 000000000..4244369d7 --- /dev/null +++ b/docs/TEMPLATE_PATTERN.md @@ -0,0 +1,135 @@ +# Template Patterns + +Template patterns are a mini language that describes how a dot delimited +string should be mapped to and from [metrics][]. + +A template has the form: +``` +"host.mytag.mytag.measurement.measurement.field*" +``` + +Where the following keywords can be set: + +1. `measurement`: specifies that this section of the graphite bucket corresponds +to the measurement name. This can be specified multiple times. +2. `field`: specifies that this section of the graphite bucket corresponds +to the field name. This can be specified multiple times. +3. `measurement*`: specifies that all remaining elements of the graphite bucket +correspond to the measurement name. +4. `field*`: specifies that all remaining elements of the graphite bucket +correspond to the field name. + +Any part of the template that is not a keyword is treated as a tag key. This +can also be specified multiple times. + +**NOTE:** `field*` cannot be used in conjunction with `measurement*`. + +### Examples + +#### Measurement & Tag Templates + +The most basic template is to specify a single transformation to apply to all +incoming metrics. So the following template: + +```toml +templates = [ + "region.region.measurement*" +] +``` + +would result in the following Graphite -> Telegraf transformation. + +``` +us.west.cpu.load 100 +=> cpu.load,region=us.west value=100 +``` + +Multiple templates can also be specified, but these should be differentiated +using _filters_ (see below for more details) + +```toml +templates = [ + "*.*.* region.region.measurement", # <- all 3-part measurements will match this one. + "*.*.*.* region.region.host.measurement", # <- all 4-part measurements will match this one. +] +``` + +#### Field Templates + +The field keyword tells Telegraf to give the metric that field name. +So the following template: + +```toml +separator = "_" +templates = [ + "measurement.measurement.field.field.region" +] +``` + +would result in the following Graphite -> Telegraf transformation. + +``` +cpu.usage.idle.percent.eu-east 100 +=> cpu_usage,region=eu-east idle_percent=100 +``` + +The field key can also be derived from all remaining elements of the graphite +bucket by specifying `field*`: + +```toml +separator = "_" +templates = [ + "measurement.measurement.region.field*" +] +``` + +which would result in the following Graphite -> Telegraf transformation. + +``` +cpu.usage.eu-east.idle.percentage 100 +=> cpu_usage,region=eu-east idle_percentage=100 +``` + +#### Filter Templates + +Users can also filter the template(s) to use based on the name of the bucket, +using glob matching, like so: + +```toml +templates = [ + "cpu.* measurement.measurement.region", + "mem.* measurement.measurement.host" +] +``` + +which would result in the following transformation: + +``` +cpu.load.eu-east 100 +=> cpu_load,region=eu-east value=100 + +mem.cached.localhost 256 +=> mem_cached,host=localhost value=256 +``` + +#### Adding Tags + +Additional tags can be added to a metric that don't exist on the received metric. +You can add additional tags by specifying them after the pattern. +Tags have the same format as the line protocol. +Multiple tags are separated by commas. + +```toml +templates = [ + "measurement.measurement.field.region datacenter=1a" +] +``` + +would result in the following Graphite -> Telegraf transformation. + +``` +cpu.usage.idle.eu-east 100 +=> cpu_usage,region=eu-east,datacenter=1a idle=100 +``` + +[metrics]: /docs/METRICS.md diff --git a/plugins/inputs/statsd/README.md b/plugins/inputs/statsd/README.md index 85cb4a46e..c1093bf39 100644 --- a/plugins/inputs/statsd/README.md +++ b/plugins/inputs/statsd/README.md @@ -10,7 +10,7 @@ ## MaxTCPConnection - applicable when protocol is set to tcp (default=250) max_tcp_connections = 250 - + ## Enable TCP keep alive probes (default=false) tcp_keep_alive = false @@ -45,7 +45,7 @@ parse_data_dog_tags = false ## Statsd data translation templates, more info can be read here: - ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md#graphite + ## https://github.com/influxdata/telegraf/blob/master/docs/TEMPLATE_PATTERN.md # templates = [ # "cpu.* measurement*" # ] @@ -227,5 +227,5 @@ mem.cached.localhost:256|g => mem_cached,host=localhost 256 ``` -There are many more options available, -[More details can be found here](https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md#graphite) +Consult the [Template Patterns](/docs/TEMPLATE_PATTERN.md) documentation for +additional details. diff --git a/plugins/inputs/statsd/statsd.go b/plugins/inputs/statsd/statsd.go index 60b55887e..6b0dd0b78 100644 --- a/plugins/inputs/statsd/statsd.go +++ b/plugins/inputs/statsd/statsd.go @@ -216,7 +216,7 @@ const sampleConfig = ` parse_data_dog_tags = false ## Statsd data translation templates, more info can be read here: - ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md#graphite + ## https://github.com/influxdata/telegraf/blob/master/docs/TEMPLATE_PATTERN.md # templates = [ # "cpu.* measurement*" # ] diff --git a/plugins/parsers/EXAMPLE_README.md b/plugins/parsers/EXAMPLE_README.md new file mode 100644 index 000000000..b3c1bc2e2 --- /dev/null +++ b/plugins/parsers/EXAMPLE_README.md @@ -0,0 +1,46 @@ +# Example + +This description explains at a high level what the parser does and provides +links to where additional information about the format can be found. + +### Configuration + +This section contains the sample configuration for the parser. Since the +configuration for a parser is not have a standalone plugin, use the `file` or +`exec` input as the base config. + +```toml +[[inputs.file]] + files = ["example"] + + ## Data format to consume. + ## Each data format has its own unique set of configuration options, read + ## more about them here: + ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md + data_format = "example" + + ## Describe variables using the standard SampleConfig style. + ## https://github.com/influxdata/telegraf/wiki/SampleConfig + example_option = "example_value" +``` + +#### example_option + +If an option requires a more expansive explanation than can be included inline +in the sample configuration, it may be described here. + +### Metrics + +The optional Metrics section contains details about how the parser converts +input data into Telegraf metrics. + +### Examples + +The optional Examples section can show an example conversion from the input +format using InfluxDB Line Protocol as the reference format. + +For line delimited text formats a diff may be appropriate: +```diff +- cpu|host=localhost|source=example.org|value=42 ++ cpu,host=localhost,source=example.org value=42 +``` diff --git a/plugins/parsers/collectd/README.md b/plugins/parsers/collectd/README.md new file mode 100644 index 000000000..06f14d6d4 --- /dev/null +++ b/plugins/parsers/collectd/README.md @@ -0,0 +1,44 @@ +# Collectd + +The collectd format parses the collectd binary network protocol. Tags are +created for host, instance, type, and type instance. All collectd values are +added as float64 fields. + +For more information about the binary network protocol see +[here](https://collectd.org/wiki/index.php/Binary_protocol). + +You can control the cryptographic settings with parser options. Create an +authentication file and set `collectd_auth_file` to the path of the file, then +set the desired security level in `collectd_security_level`. + +Additional information including client setup can be found +[here](https://collectd.org/wiki/index.php/Networking_introduction#Cryptographic_setup). + +You can also change the path to the typesdb or add additional typesdb using +`collectd_typesdb`. + +### Configuration + +```toml +[[inputs.file]] + files = ["example"] + + ## Data format to consume. + ## Each data format has its own unique set of configuration options, read + ## more about them here: + ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md + data_format = "collectd" + + ## Authentication file for cryptographic security levels + collectd_auth_file = "/etc/collectd/auth_file" + ## One of none (default), sign, or encrypt + collectd_security_level = "encrypt" + ## Path of to TypesDB specifications + collectd_typesdb = ["/usr/share/collectd/types.db"] + + ## Multi-value plugins can be handled two ways. + ## "split" will parse and store the multi-value plugin data into separate measurements + ## "join" will parse and store the multi-value plugin as a single multi-value measurement. + ## "split" is the default behavior for backward compatability with previous versions of influxdb. + collectd_parse_multivalue = "split" +``` diff --git a/plugins/parsers/csv/README.md b/plugins/parsers/csv/README.md new file mode 100644 index 000000000..532980991 --- /dev/null +++ b/plugins/parsers/csv/README.md @@ -0,0 +1,104 @@ +# CSV + +The `csv` parser creates metrics from a document containing comma separated +values. + +### Configuration + +```toml +[[inputs.file]] + files = ["example"] + + ## Data format to consume. + ## Each data format has its own unique set of configuration options, read + ## more about them here: + ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md + data_format = "csv" + + ## Indicates how many rows to treat as a header. By default, the parser assumes + ## there is no header and will parse the first row as data. If set to anything more + ## than 1, column names will be concatenated with the name listed in the next header row. + ## If `csv_column_names` is specified, the column names in header will be overridden. + csv_header_row_count = 0 + + ## For assigning custom names to columns + ## If this is specified, all columns should have a name + ## Unnamed columns will be ignored by the parser. + ## If `csv_header_row_count` is set to 0, this config must be used + csv_column_names = [] + + ## Indicates the number of rows to skip before looking for header information. + csv_skip_rows = 0 + + ## Indicates the number of columns to skip before looking for data to parse. + ## These columns will be skipped in the header as well. + csv_skip_columns = 0 + + ## The seperator between csv fields + ## By default, the parser assumes a comma (",") + csv_delimiter = "," + + ## The character reserved for marking a row as a comment row + ## Commented rows are skipped and not parsed + csv_comment = "" + + ## If set to true, the parser will remove leading whitespace from fields + ## By default, this is false + csv_trim_space = false + + ## Columns listed here will be added as tags. Any other columns + ## will be added as fields. + csv_tag_columns = [] + + ## The column to extract the name of the metric from + csv_measurement_column = "" + + ## The column to extract time information for the metric + ## `csv_timestamp_format` must be specified if this is used + csv_timestamp_column = "" + + ## The format of time data extracted from `csv_timestamp_column` + ## this must be specified if `csv_timestamp_column` is specified + csv_timestamp_format = "" + ``` +#### csv_timestamp_column, csv_timestamp_format + +By default the current time will be used for all created metrics, to set the +time using the JSON document you can use the `csv_timestamp_column` and +`csv_timestamp_format` options together to set the time to a value in the parsed +document. + +The `csv_timestamp_column` option specifies the column name containing the +time value and `csv_timestamp_format` must be set to a Go "reference time" +which is defined to be the specific time: `Mon Jan 2 15:04:05 MST 2006`. + +Consult the Go [time][time parse] package for details and additional examples +on how to set the time format. + +### Metrics + +One metric is created for each row with the columns added as fields. The type +of the field is automatically determined based on the contents of the value. + +### Examples + +Config: +``` +[[inputs.file]] + files = ["example"] + data_format = "csv" + csv_header_row_count = 1 + csv_timestamp_column = "time" + csv_timestamp_format = "2006-01-02T15:04:05Z07:00" +``` + +Input: +``` +measurement,cpu,time_user,time_system,time_idle,time +cpu,cpu0,42,42,42,2018-09-13T13:03:28Z +``` + +Output: +``` +cpu cpu=cpu0,time_user=42,time_system=42,time_idle=42 1536869008000000000 +``` diff --git a/plugins/parsers/dropwizard/README.md b/plugins/parsers/dropwizard/README.md new file mode 100644 index 000000000..f0ff6d15c --- /dev/null +++ b/plugins/parsers/dropwizard/README.md @@ -0,0 +1,171 @@ +# Dropwizard + +The `dropwizard` data format can parse the [JSON Dropwizard][dropwizard] representation of a single dropwizard metric registry. By default, tags are parsed from metric names as if they were actual influxdb line protocol keys (`measurement<,tag_set>`) which can be overriden by defining a custom [template pattern][templates]. All field value types are supported, `string`, `number` and `boolean`. + +[templates]: /docs/TEMPLATE_PATTERN.md +[dropwizard]: http://metrics.dropwizard.io/3.1.0/manual/json/ + +### Configuration + +```toml +[[inputs.file]] + files = ["example"] + + ## Data format to consume. + ## Each data format has its own unique set of configuration options, read + ## more about them here: + ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md + data_format = "dropwizard" + + ## Used by the templating engine to join matched values when cardinality is > 1 + separator = "_" + + ## Each template line requires a template pattern. It can have an optional + ## filter before the template and separated by spaces. It can also have optional extra + ## tags following the template. Multiple tags should be separated by commas and no spaces + ## similar to the line protocol format. There can be only one default template. + ## Templates support below format: + ## 1. filter + template + ## 2. filter + template + extra tag(s) + ## 3. filter + template with field key + ## 4. default template + ## By providing an empty template array, templating is disabled and measurements are parsed as influxdb line protocol keys (measurement<,tag_set>) + templates = [] + + ## You may use an appropriate [gjson path](https://github.com/tidwall/gjson#path-syntax) + ## to locate the metric registry within the JSON document + # dropwizard_metric_registry_path = "metrics" + + ## You may use an appropriate [gjson path](https://github.com/tidwall/gjson#path-syntax) + ## to locate the default time of the measurements within the JSON document + # dropwizard_time_path = "time" + # dropwizard_time_format = "2006-01-02T15:04:05Z07:00" + + ## You may use an appropriate [gjson path](https://github.com/tidwall/gjson#path-syntax) + ## to locate the tags map within the JSON document + # dropwizard_tags_path = "tags" + + ## You may even use tag paths per tag + # [inputs.exec.dropwizard_tag_paths] + # tag1 = "tags.tag1" + # tag2 = "tags.tag2" +``` + + +### Examples + +A typical JSON of a dropwizard metric registry: + +```json +{ + "version": "3.0.0", + "counters" : { + "measurement,tag1=green" : { + "count" : 1 + } + }, + "meters" : { + "measurement" : { + "count" : 1, + "m15_rate" : 1.0, + "m1_rate" : 1.0, + "m5_rate" : 1.0, + "mean_rate" : 1.0, + "units" : "events/second" + } + }, + "gauges" : { + "measurement" : { + "value" : 1 + } + }, + "histograms" : { + "measurement" : { + "count" : 1, + "max" : 1.0, + "mean" : 1.0, + "min" : 1.0, + "p50" : 1.0, + "p75" : 1.0, + "p95" : 1.0, + "p98" : 1.0, + "p99" : 1.0, + "p999" : 1.0, + "stddev" : 1.0 + } + }, + "timers" : { + "measurement" : { + "count" : 1, + "max" : 1.0, + "mean" : 1.0, + "min" : 1.0, + "p50" : 1.0, + "p75" : 1.0, + "p95" : 1.0, + "p98" : 1.0, + "p99" : 1.0, + "p999" : 1.0, + "stddev" : 1.0, + "m15_rate" : 1.0, + "m1_rate" : 1.0, + "m5_rate" : 1.0, + "mean_rate" : 1.0, + "duration_units" : "seconds", + "rate_units" : "calls/second" + } + } +} +``` + +Would get translated into 4 different measurements: + +``` +measurement,metric_type=counter,tag1=green count=1 +measurement,metric_type=meter count=1,m15_rate=1.0,m1_rate=1.0,m5_rate=1.0,mean_rate=1.0 +measurement,metric_type=gauge value=1 +measurement,metric_type=histogram count=1,max=1.0,mean=1.0,min=1.0,p50=1.0,p75=1.0,p95=1.0,p98=1.0,p99=1.0,p999=1.0 +measurement,metric_type=timer count=1,max=1.0,mean=1.0,min=1.0,p50=1.0,p75=1.0,p95=1.0,p98=1.0,p99=1.0,p999=1.0,stddev=1.0,m15_rate=1.0,m1_rate=1.0,m5_rate=1.0,mean_rate=1.0 +``` + +You may also parse a dropwizard registry from any JSON document which contains a dropwizard registry in some inner field. +Eg. to parse the following JSON document: + +```json +{ + "time" : "2017-02-22T14:33:03.662+02:00", + "tags" : { + "tag1" : "green", + "tag2" : "yellow" + }, + "metrics" : { + "counters" : { + "measurement" : { + "count" : 1 + } + }, + "meters" : {}, + "gauges" : {}, + "histograms" : {}, + "timers" : {} + } +} +``` +and translate it into: + +``` +measurement,metric_type=counter,tag1=green,tag2=yellow count=1 1487766783662000000 +``` + +you simply need to use the following additional configuration properties: + +```toml +dropwizard_metric_registry_path = "metrics" +dropwizard_time_path = "time" +dropwizard_time_format = "2006-01-02T15:04:05Z07:00" +dropwizard_tags_path = "tags" +## tag paths per tag are supported too, eg. +#[inputs.yourinput.dropwizard_tag_paths] +# tag1 = "tags.tag1" +# tag2 = "tags.tag2" +``` diff --git a/plugins/parsers/graphite/README.md b/plugins/parsers/graphite/README.md new file mode 100644 index 000000000..b0b1127aa --- /dev/null +++ b/plugins/parsers/graphite/README.md @@ -0,0 +1,48 @@ +# Graphite + +The Graphite data format translates graphite *dot* buckets directly into +telegraf measurement names, with a single value field, and without any tags. +By default, the separator is left as `.`, but this can be changed using the +`separator` argument. For more advanced options, Telegraf supports specifying +[templates](#templates) to translate graphite buckets into Telegraf metrics. + +### Configuration + +```toml +[[inputs.exec]] + ## Commands array + commands = ["/tmp/test.sh", "/usr/bin/mycollector --foo=bar"] + + ## measurement name suffix (for separating different commands) + name_suffix = "_mycollector" + + ## Data format to consume. + ## Each data format has its own unique set of configuration options, read + ## more about them here: + ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md + data_format = "graphite" + + ## This string will be used to join the matched values. + separator = "_" + + ## Each template line requires a template pattern. It can have an optional + ## filter before the template and separated by spaces. It can also have optional extra + ## tags following the template. Multiple tags should be separated by commas and no spaces + ## similar to the line protocol format. There can be only one default template. + ## Templates support below format: + ## 1. filter + template + ## 2. filter + template + extra tag(s) + ## 3. filter + template with field key + ## 4. default template + templates = [ + "*.app env.service.resource.measurement", + "stats.* .host.measurement* region=eu-east,agent=sensu", + "stats2.* .host.measurement.field", + "measurement*" + ] +``` + +#### templates + +Consult the [Template Patterns](/docs/TEMPLATE_PATTERN.md) documentation for +details. diff --git a/plugins/parsers/grok/README.md b/plugins/parsers/grok/README.md new file mode 100644 index 000000000..7b22d340e --- /dev/null +++ b/plugins/parsers/grok/README.md @@ -0,0 +1,222 @@ +# Grok + +The grok data format parses line delimited data using a regular expression like +language. + +The best way to get acquainted with grok patterns is to read the logstash docs, +which are available here: + https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html + +The grok parser uses a slightly modified version of logstash "grok" +patterns, with the format: + +``` +%{[:][:]} +``` + +The `capture_syntax` defines the grok pattern that's used to parse the input +line and the `semantic_name` is used to name the field or tag. The extension +`modifier` controls the data type that the parsed item is converted to or +other special handling. + +By default all named captures are converted into string fields. +Timestamp modifiers can be used to convert captures to the timestamp of the +parsed metric. If no timestamp is parsed the metric will be created using the +current time. + +You must capture at least one field per line. + +- Available modifiers: + - string (default if nothing is specified) + - int + - float + - duration (ie, 5.23ms gets converted to int nanoseconds) + - tag (converts the field into a tag) + - drop (drops the field completely) + - measurement (use the matched text as the measurement name) +- Timestamp modifiers: + - ts (This will auto-learn the timestamp format) + - ts-ansic ("Mon Jan _2 15:04:05 2006") + - ts-unix ("Mon Jan _2 15:04:05 MST 2006") + - ts-ruby ("Mon Jan 02 15:04:05 -0700 2006") + - ts-rfc822 ("02 Jan 06 15:04 MST") + - ts-rfc822z ("02 Jan 06 15:04 -0700") + - ts-rfc850 ("Monday, 02-Jan-06 15:04:05 MST") + - ts-rfc1123 ("Mon, 02 Jan 2006 15:04:05 MST") + - ts-rfc1123z ("Mon, 02 Jan 2006 15:04:05 -0700") + - ts-rfc3339 ("2006-01-02T15:04:05Z07:00") + - ts-rfc3339nano ("2006-01-02T15:04:05.999999999Z07:00") + - ts-httpd ("02/Jan/2006:15:04:05 -0700") + - ts-epoch (seconds since unix epoch, may contain decimal) + - ts-epochnano (nanoseconds since unix epoch) + - ts-syslog ("Jan 02 15:04:05", parsed time is set to the current year) + - ts-"CUSTOM" + +CUSTOM time layouts must be within quotes and be the representation of the +"reference time", which is `Mon Jan 2 15:04:05 -0700 MST 2006`. +To match a comma decimal point you can use a period. For example `%{TIMESTAMP:timestamp:ts-"2006-01-02 15:04:05.000"}` can be used to match `"2018-01-02 15:04:05,000"` +To match a comma decimal point you can use a period in the pattern string. +See https://golang.org/pkg/time/#Parse for more details. + +Telegraf has many of its own [built-in patterns](./grok/patterns/influx-patterns), +as well as support for most of +[logstash's builtin patterns](https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns). +_Golang regular expressions do not support lookahead or lookbehind. +logstash patterns that depend on these are not supported._ + +If you need help building patterns to match your logs, +you will find the https://grokdebug.herokuapp.com application quite useful! + +### Configuration +```toml +[[inputs.file]] + ## Files to parse each interval. + ## These accept standard unix glob matching rules, but with the addition of + ## ** as a "super asterisk". ie: + ## /var/log/**.log -> recursively find all .log files in /var/log + ## /var/log/*/*.log -> find all .log files with a parent dir in /var/log + ## /var/log/apache.log -> only tail the apache log file + files = ["/var/log/apache/access.log"] + + ## The dataformat to be read from files + ## Each data format has its own unique set of configuration options, read + ## more about them here: + ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md + data_format = "grok" + + ## This is a list of patterns to check the given log file(s) for. + ## Note that adding patterns here increases processing time. The most + ## efficient configuration is to have one pattern. + ## Other common built-in patterns are: + ## %{COMMON_LOG_FORMAT} (plain apache & nginx access logs) + ## %{COMBINED_LOG_FORMAT} (access logs + referrer & agent) + grok_patterns = ["%{COMBINED_LOG_FORMAT}"] + + ## Full path(s) to custom pattern files. + grok_custom_pattern_files = [] + + ## Custom patterns can also be defined here. Put one pattern per line. + grok_custom_patterns = ''' + ''' + + ## Timezone allows you to provide an override for timestamps that + ## don't already include an offset + ## e.g. 04/06/2016 12:41:45 data one two 5.43µs + ## + ## Default: "" which renders UTC + ## Options are as follows: + ## 1. Local -- interpret based on machine localtime + ## 2. "Canada/Eastern" -- Unix TZ values like those found in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones + ## 3. UTC -- or blank/unspecified, will return timestamp in UTC + grok_timezone = "Canada/Eastern" +``` + +#### Timestamp Examples + +This example input and config parses a file using a custom timestamp conversion: + +``` +2017-02-21 13:10:34 value=42 +``` + +```toml +[[inputs.file]] + grok_patterns = ['%{TIMESTAMP_ISO8601:timestamp:ts-"2006-01-02 15:04:05"} value=%{NUMBER:value:int}'] +``` + +This example input and config parses a file using a timestamp in unix time: + +``` +1466004605 value=42 +1466004605.123456789 value=42 +``` + +```toml +[[inputs.file]] + grok_patterns = ['%{NUMBER:timestamp:ts-epoch} value=%{NUMBER:value:int}'] +``` + +This example parses a file using a built-in conversion and a custom pattern: + +``` +Wed Apr 12 13:10:34 PST 2017 value=42 +``` + +```toml +[[inputs.file]] + grok_patterns = ["%{TS_UNIX:timestamp:ts-unix} value=%{NUMBER:value:int}"] + grok_custom_patterns = ''' + TS_UNIX %{DAY} %{MONTH} %{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND} %{TZ} %{YEAR} + ''' +``` + +For cases where the timestamp itself is without offset, the `timezone` config var is available +to denote an offset. By default (with `timezone` either omit, blank or set to `"UTC"`), the times +are processed as if in the UTC timezone. If specified as `timezone = "Local"`, the timestamp +will be processed based on the current machine timezone configuration. Lastly, if using a +timezone from the list of Unix [timezones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones), +grok will offset the timestamp accordingly. + +#### TOML Escaping + +When saving patterns to the configuration file, keep in mind the different TOML +[string](https://github.com/toml-lang/toml#string) types and the escaping +rules for each. These escaping rules must be applied in addition to the +escaping required by the grok syntax. Using the Multi-line line literal +syntax with `'''` may be useful. + +The following config examples will parse this input file: + +``` +|42|\uD83D\uDC2F|'telegraf'| +``` + +Since `|` is a special character in the grok language, we must escape it to +get a literal `|`. With a basic TOML string, special characters such as +backslash must be escaped, requiring us to escape the backslash a second time. + +```toml +[[inputs.file]] + grok_patterns = ["\\|%{NUMBER:value:int}\\|%{UNICODE_ESCAPE:escape}\\|'%{WORD:name}'\\|"] + grok_custom_patterns = "UNICODE_ESCAPE (?:\\\\u[0-9A-F]{4})+" +``` + +We cannot use a literal TOML string for the pattern, because we cannot match a +`'` within it. However, it works well for the custom pattern. +```toml +[[inputs.file]] + grok_patterns = ["\\|%{NUMBER:value:int}\\|%{UNICODE_ESCAPE:escape}\\|'%{WORD:name}'\\|"] + grok_custom_patterns = 'UNICODE_ESCAPE (?:\\u[0-9A-F]{4})+' +``` + +A multi-line literal string allows us to encode the pattern: +```toml +[[inputs.file]] + grok_patterns = [''' + \|%{NUMBER:value:int}\|%{UNICODE_ESCAPE:escape}\|'%{WORD:name}'\| + '''] + grok_custom_patterns = 'UNICODE_ESCAPE (?:\\u[0-9A-F]{4})+' +``` + +#### Tips for creating patterns + +Writing complex patterns can be difficult, here is some advice for writing a +new pattern or testing a pattern developed [online](https://grokdebug.herokuapp.com). + +Create a file output that writes to stdout, and disable other outputs while +testing. This will allow you to see the captured metrics. Keep in mind that +the file output will only print once per `flush_interval`. + +```toml +[[outputs.file]] + files = ["stdout"] +``` + +- Start with a file containing only a single line of your input. +- Remove all but the first token or piece of the line. +- Add the section of your pattern to match this piece to your configuration file. +- Verify that the metric is parsed successfully by running Telegraf. +- If successful, add the next token, update the pattern and retest. +- Continue one token at a time until the entire line is successfully parsed. + + diff --git a/plugins/parsers/influx/README.md b/plugins/parsers/influx/README.md new file mode 100644 index 000000000..51c0106e6 --- /dev/null +++ b/plugins/parsers/influx/README.md @@ -0,0 +1,20 @@ +# InfluxDB Line Protocol + +There are no additional configuration options for InfluxDB [line protocol][]. The +metrics are parsed directly into Telegraf metrics. + +[line protocol]: https://docs.influxdata.com/influxdb/latest/write_protocols/line/ + +### Configuration + +```toml +[[inputs.file]] + files = ["example"] + + ## Data format to consume. + ## Each data format has its own unique set of configuration options, read + ## more about them here: + ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md + data_format = "influx" +``` + diff --git a/plugins/parsers/json/README.md b/plugins/parsers/json/README.md new file mode 100644 index 000000000..fa0d767ff --- /dev/null +++ b/plugins/parsers/json/README.md @@ -0,0 +1,214 @@ +# JSON + +The JSON data format parses a [JSON][json] object or an array of objects into +metric fields. + +**NOTE:** All JSON numbers are converted to float fields. JSON String are +ignored unless specified in the `tag_key` or `json_string_fields` options. + +### Configuration + +```toml +[[inputs.file]] + files = ["example"] + + ## Data format to consume. + ## Each data format has its own unique set of configuration options, read + ## more about them here: + ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md + data_format = "json" + + ## Query is a GJSON path that specifies a specific chunk of JSON to be + ## parsed, if not specified the whole document will be parsed. + ## + ## GJSON query paths are described here: + ## https://github.com/tidwall/gjson#path-syntax + json_query = "" + + ## Tag keys is an array of keys that should be added as tags. + tag_keys = [ + "my_tag_1", + "my_tag_2" + ] + + ## String fields is an array of keys that should be added as string fields. + json_string_fields = [] + + ## Name key is the key to use as the measurement name. + json_name_key = "" + + ## Time key is the key containing the time that should be used to create the + ## metric. + json_time_key = "" + + ## Time format is the time layout that should be used to interprete the + ## json_time_key. The time must be `unix`, `unix_ms` or a time in the + ## "reference time". + ## ex: json_time_format = "Mon Jan 2 15:04:05 -0700 MST 2006" + ## json_time_format = "2006-01-02T15:04:05Z07:00" + ## json_time_format = "unix" + ## json_time_format = "unix_ms" + json_time_format = "" +``` + +#### json_query + +The `json_query` is a [GJSON][gjson] path that can be used to limit the +portion of the overall JSON document that should be parsed. The result of the +query should contain a JSON object or an array of objects. + +Consult the GJSON [path syntax][gjson syntax] for details and examples. + +#### json_time_key, json_time_format + +By default the current time will be used for all created metrics, to set the +time using the JSON document you can use the `json_time_key` and +`json_time_format` options together to set the time to a value in the parsed +document. + +The `json_time_key` option specifies the key containing the time value and +`json_time_format` must be set to `unix`, `unix_ms`, or the Go "reference +time" which is defined to be the specific time: `Mon Jan 2 15:04:05 MST 2006`. + +Consult the Go [time][time parse] package for details and additional examples +on how to set the time format. + +### Examples + +#### Basic Parsing +Config: +```toml +[[inputs.file]] + files = ["example"] + name_override = "myjsonmetric" + data_format = "json" +``` + +Input: +```json +{ + "a": 5, + "b": { + "c": 6 + }, + "ignored": "I'm a string" +} +``` + +Output: +``` +myjsonmetric a=5,b_c=6 +``` + +#### Name, Tags, and String Fields + +Config: +```toml +[[inputs.file]] + files = ["example"] + name_key = "name" + tag_keys = ["my_tag_1"] + string_fields = ["my_field"] + data_format = "json" +``` + +Input: +```json +{ + "a": 5, + "b": { + "c": 6, + "my_field": "description" + }, + "my_tag_1": "foo", + "name": "my_json" +} +``` + +Output: +``` +my_json,my_tag_1=foo a=5,b_c=6,my_field="description" +``` + +#### Arrays + +If the JSON data is an array, then each object within the array is parsed with +the configured settings. + +Config: +```toml +[[inputs.file]] + files = ["example"] + data_format = "json" + json_time_key = "b_time" + json_time_format = "02 Jan 06 15:04 MST" +``` + +Input: +```json +[ + { + "a": 5, + "b": { + "c": 6, + "time":"04 Jan 06 15:04 MST" + }, + }, + { + "a": 7, + "b": { + "c": 8, + "time":"11 Jan 07 15:04 MST" + }, + } +] +``` + +Output: +``` +file a=5,b_c=6 1136387040000000000 +file a=7,b_c=8 1168527840000000000 +``` + +#### Query + +The `json_query` option can be used to parse a subset of the document. + +Config: +```toml +[[inputs.file]] + files = ["example"] + data_format = "json" + tag_keys = ["first"] + json_string_fields = ["last"] + json_query = "obj.friends" +``` + +Input: +```json +{ + "obj": { + "name": {"first": "Tom", "last": "Anderson"}, + "age":37, + "children": ["Sara","Alex","Jack"], + "fav.movie": "Deer Hunter", + "friends": [ + {"first": "Dale", "last": "Murphy", "age": 44}, + {"first": "Roger", "last": "Craig", "age": 68}, + {"first": "Jane", "last": "Murphy", "age": 47} + ] + } +} +``` + +Output: +``` +file,first=Dale last="Murphy",age=44 +file,first=Roger last="Craig",age=68 +file,first=Jane last="Murphy",age=47 +``` + +[gjson]: https://github.com/tidwall/gjson +[gjson syntax]: https://github.com/tidwall/gjson#path-syntax +[json]: https://www.json.org/ +[time parse]: https://golang.org/pkg/time/#Parse diff --git a/plugins/parsers/logfmt/README.md b/plugins/parsers/logfmt/README.md new file mode 100644 index 000000000..fb3a565b3 --- /dev/null +++ b/plugins/parsers/logfmt/README.md @@ -0,0 +1,34 @@ +# Logfmt + +The `logfmt` data format parses data in [logfmt] format. + +[logfmt]: https://brandur.org/logfmt + +### Configuration + +```toml +[[inputs.file]] + files = ["example"] + + ## Data format to consume. + ## Each data format has its own unique set of configuration options, read + ## more about them here: + ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md + data_format = "logfmt" + + ## Set the name of the created metric, if unset the name of the plugin will + ## be used. + metric_name = "logfmt" +``` + +### Metrics + +Each key/value pair in the line is added to a new metric as a field. The type +of the field is automatically determined based on the contents of the value. + +### Examples + +``` +- method=GET host=example.org ts=2018-07-24T19:43:40.275Z connect=4ms service=8ms status=200 bytes=1653 ++ logfmt method="GET",host="example.org",ts="2018-07-24T19:43:40.275Z",connect="4ms",service="8ms",status=200i,bytes=1653i +``` diff --git a/plugins/parsers/nagios/README.md b/plugins/parsers/nagios/README.md new file mode 100644 index 000000000..e9be6a0dd --- /dev/null +++ b/plugins/parsers/nagios/README.md @@ -0,0 +1,17 @@ +# Nagios + +The `nagios` data format parses the output of nagios plugins. + +### Configuration + +```toml +[[inputs.exec]] + ## Commands array + commands = ["/usr/lib/nagios/plugins/check_load -w 5,6,7 -c 7,8,9"] + + ## Data format to consume. + ## Each data format has its own unique set of configuration options, read + ## more about them here: + ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md + data_format = "nagios" +``` diff --git a/plugins/parsers/value/README.md b/plugins/parsers/value/README.md new file mode 100644 index 000000000..db184d4e8 --- /dev/null +++ b/plugins/parsers/value/README.md @@ -0,0 +1,36 @@ +# Value + +The "value" data format translates single values into Telegraf metrics. This +is done by assigning a measurement name and setting a single field ("value") +as the parsed metric. + +### Configuration + +You **must** tell Telegraf what type of metric to collect by using the +`data_type` configuration option. Available options are: + +1. integer +2. float or long +3. string +4. boolean + +**Note:** It is also recommended that you set `name_override` to a measurement +name that makes sense for your metric, otherwise it will just be set to the +name of the plugin. + +```toml +[[inputs.exec]] + ## Commands array + commands = ["cat /proc/sys/kernel/random/entropy_avail"] + + ## override the default metric name of "exec" + name_override = "entropy_available" + + ## Data format to consume. + ## Each data format has its own unique set of configuration options, read + ## more about them here: + ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md + data_format = "value" + data_type = "integer" # required +``` + diff --git a/plugins/parsers/wavefront/README.md b/plugins/parsers/wavefront/README.md new file mode 100644 index 000000000..ab7c56eed --- /dev/null +++ b/plugins/parsers/wavefront/README.md @@ -0,0 +1,20 @@ +# Wavefront + +Wavefront Data Format is metrics are parsed directly into Telegraf metrics. +For more information about the Wavefront Data Format see +[here](https://docs.wavefront.com/wavefront_data_format.html). + +### Configuration + +There are no additional configuration options for Wavefront Data Format line-protocol. + +```toml +[[inputs.file]] + files = ["example"] + + ## Data format to consume. + ## Each data format has its own unique set of configuration options, read + ## more about them here: + ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md + data_format = "wavefront" +``` diff --git a/plugins/serializers/EXAMPLE_README.md b/plugins/serializers/EXAMPLE_README.md new file mode 100644 index 000000000..11965c07f --- /dev/null +++ b/plugins/serializers/EXAMPLE_README.md @@ -0,0 +1,46 @@ +# Example + +This description explains at a high level what the serializer does and +provides links to where additional information about the format can be found. + +### Configuration + +This section contains the sample configuration for the serializer. Since the +configuration for a serializer is not have a standalone plugin, use the `file` +or `http` outputs as the base config. + +```toml +[[inputs.file]] + files = ["stdout"] + + ## Describe variables using the standard SampleConfig style. + ## https://github.com/influxdata/telegraf/wiki/SampleConfig + example_option = "example_value" + + ## Data format to output. + ## Each data format has its own unique set of configuration options, read + ## more about them here: + ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md + data_format = "example" +``` + +#### example_option + +If an option requires a more expansive explanation than can be included inline +in the sample configuration, it may be described here. + +### Metrics + +The optional Metrics section contains details about how the serializer converts +Telegraf metrics into output. + +### Example + +The optional Example section can show an example conversion to the output +format using InfluxDB Line Protocol as the reference format. + +For line delimited text formats a diff may be appropriate: +```diff +- cpu,host=localhost,source=example.org value=42 ++ cpu|host=localhost|source=example.org|value=42 +``` diff --git a/plugins/serializers/graphite/README.md b/plugins/serializers/graphite/README.md new file mode 100644 index 000000000..031dee376 --- /dev/null +++ b/plugins/serializers/graphite/README.md @@ -0,0 +1,51 @@ +# Graphite + +The Graphite data format is translated from Telegraf Metrics using either the +template pattern or tag support method. You can select between the two +methods using the [`graphite_tag_support`](#graphite-tag-support) option. When set, the tag support +method is used, otherwise the [Template Pattern][templates]) is used. + +### Configuration + +```toml +[[outputs.file]] + ## Files to write to, "stdout" is a specially handled file. + files = ["stdout", "/tmp/metrics.out"] + + ## Data format to output. + ## Each data format has its own unique set of configuration options, read + ## more about them here: + ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md + data_format = "graphite" + + ## Prefix added to each graphite bucket + prefix = "telegraf" + ## Graphite template pattern + template = "host.tags.measurement.field" + + ## Support Graphite tags, recommended to enable when using Graphite 1.1 or later. + # graphite_tag_support = false +``` + +#### graphite_tag_support + +When the `graphite_tag_support` option is enabled, the template pattern is not +used. Instead, tags are encoded using +[Graphite tag support](http://graphite.readthedocs.io/en/latest/tags.html) +added in Graphite 1.1. The `metric_path` is a combination of the optional +`prefix` option, measurement name, and field name. + +The tag `name` is reserved by Graphite, any conflicting tags and will be encoded as `_name`. + +**Example Conversion**: +``` +cpu,cpu=cpu-total,dc=us-east-1,host=tars usage_idle=98.09,usage_user=0.89 1455320660004257758 +=> +cpu.usage_user;cpu=cpu-total;dc=us-east-1;host=tars 0.89 1455320690 +cpu.usage_idle;cpu=cpu-total;dc=us-east-1;host=tars 98.09 1455320690 +``` + +#### templates + +Consult the [Template Patterns](/docs/TEMPLATE_PATTERN.md) documentation for +details. diff --git a/plugins/serializers/influx/README.md b/plugins/serializers/influx/README.md new file mode 100644 index 000000000..d97fd42c8 --- /dev/null +++ b/plugins/serializers/influx/README.md @@ -0,0 +1,34 @@ +# Influx + +The `influx` data format outputs metrics into [InfluxDB Line Protocol][line +protocol]. This is the recommended format unless another format is required +for interoperability. + +### Configuration +```toml +[[outputs.file]] + ## Files to write to, "stdout" is a specially handled file. + files = ["stdout", "/tmp/metrics.out"] + + ## Data format to output. + ## Each data format has its own unique set of configuration options, read + ## more about them here: + ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md + data_format = "influx" + + ## Maximum line length in bytes. Useful only for debugging. + influx_max_line_bytes = 0 + + ## When true, fields will be output in ascending lexical order. Enabling + ## this option will result in decreased performance and is only recommended + ## when you need predictable ordering while debugging. + influx_sort_fields = false + + ## When true, Telegraf will output unsigned integers as unsigned values, + ## i.e.: `42u`. You will need a version of InfluxDB supporting unsigned + ## integer values. Enabling this option will result in field type errors if + ## existing data has been written. + influx_uint_support = false +``` + +[line protocol]: https://docs.influxdata.com/influxdb/latest/write_protocols/line_protocol_tutorial/ diff --git a/plugins/serializers/json/README.md b/plugins/serializers/json/README.md new file mode 100644 index 000000000..08bb9d4f7 --- /dev/null +++ b/plugins/serializers/json/README.md @@ -0,0 +1,77 @@ +# JSON + +The `json` output data format converts metrics into JSON documents. + +### Configuration + +```toml +[[outputs.file]] + ## Files to write to, "stdout" is a specially handled file. + files = ["stdout", "/tmp/metrics.out"] + + ## Data format to output. + ## Each data format has its own unique set of configuration options, read + ## more about them here: + ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md + data_format = "json" + + ## The resolution to use for the metric timestamp. Must be a duration string + ## such as "1ns", "1us", "1ms", "10ms", "1s". Durations are truncated to + ## the power of 10 less than the specified units. + json_timestamp_units = "1s" +``` + +### Examples: + +Standard form: +```json +{ + "fields": { + "field_1": 30, + "field_2": 4, + "field_N": 59, + "n_images": 660 + }, + "name": "docker", + "tags": { + "host": "raynor" + }, + "timestamp": 1458229140 +} +``` + +When an output plugin needs to emit multiple metrics at one time, it may use +the batch format. The use of batch format is determined by the plugin, +reference the documentation for the specific plugin. +```json +{ + "metrics": [ + { + "fields": { + "field_1": 30, + "field_2": 4, + "field_N": 59, + "n_images": 660 + }, + "name": "docker", + "tags": { + "host": "raynor" + }, + "timestamp": 1458229140 + }, + { + "fields": { + "field_1": 30, + "field_2": 4, + "field_N": 59, + "n_images": 660 + }, + "name": "docker", + "tags": { + "host": "raynor" + }, + "timestamp": 1458229140 + } + ] +} +``` diff --git a/plugins/serializers/splunkmetric/README.md b/plugins/serializers/splunkmetric/README.md index 02d69db66..e00286e57 100644 --- a/plugins/serializers/splunkmetric/README.md +++ b/plugins/serializers/splunkmetric/README.md @@ -79,7 +79,7 @@ The following aspects of the token can be overriden with tags: * source You can either use `[global_tags]` or using a more advanced configuration as documented [here](https://github.com/influxdata/telegraf/blob/master/docs/CONFIGURATION.md). - + Such as this example which overrides the index just on the cpu metric: ```toml [[inputs.cpu]] @@ -122,7 +122,7 @@ TIMESTAMP_FIELDS = time TIME_FORMAT = %s.%3N ``` -An example configuration of a file based output is: +An example configuration of a file based output is: ```toml # Send telegraf metrics to file(s)