telegraf/plugins/inputs/logparser/README.md

# Logparser Input Plugin

The `logparser` plugin streams and parses the given logfiles. Currently it
has the capability of parsing "grok" patterns from logfiles, which also supports
regex patterns.

### Configuration:

```toml
[[inputs.logparser]]
  ## Log files to parse.
  ## These accept standard unix glob matching rules, but with the addition of
  ## ** as a "super asterisk". ie:
  ##   /var/log/**.log     -> recursively find all .log files in /var/log
  ##   /var/log/*/*.log    -> find all .log files with a parent dir in /var/log
  ##   /var/log/apache.log -> only tail the apache log file
  files = ["/var/log/apache/access.log"]

  ## Read files that currently exist from the beginning. Files that are created
  ## while telegraf is running (and that match the "files" globs) will always
  ## be read from the beginning.
  from_beginning = false

  ## Method used to watch for file updates.  Can be either "inotify" or "poll".
  # watch_method = "inotify"

  ## Parse logstash-style "grok" patterns:
  ##   Telegraf built-in parsing patterns: https://goo.gl/dkay10
  [inputs.logparser.grok]
    ## This is a list of patterns to check the given log file(s) for.
    ## Note that adding patterns here increases processing time. The most
    ## efficient configuration is to have one pattern per logparser.
    ## Other common built-in patterns are:
    ##   %{COMMON_LOG_FORMAT}   (plain apache & nginx access logs)
    ##   %{COMBINED_LOG_FORMAT} (access logs + referrer & agent)
    patterns = ["%{COMBINED_LOG_FORMAT}"]

    ## Name of the outputted measurement name.
    measurement = "apache_access_log"

    ## Full path(s) to custom pattern files.
    custom_pattern_files = []

    ## Custom patterns can also be defined here. Put one pattern per line.
    custom_patterns = '''
    '''

    ## Timezone allows you to provide an override for timestamps that
    ## don't already include an offset
    ## e.g. 04/06/2016 12:41:45 data one two 5.43µs
    ##
    ## Default: "" which renders UTC
    ## Options are as follows:
    ##   1. Local             -- interpret based on machine localtime
    ##   2. "Canada/Eastern"  -- Unix TZ values like those found in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
    ##   3. UTC               -- or blank/unspecified, will return timestamp in UTC
    timezone = "Canada/Eastern"
```

### Grok Parser

The best way to get acquainted with grok patterns is to read the logstash docs,
which are available here:
  https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html

The Telegraf grok parser uses a slightly modified version of logstash "grok"
patterns, with the format

```
%{<capture_syntax>[:<semantic_name>][:<modifier>]}
```

The `capture_syntax` defines the grok pattern that's used to parse the input
line and the `semantic_name` is used to name the field or tag.  The extension
`modifier` controls the data type that the parsed item is converted to or
other special handling.

By default all named captures are converted into string fields.
Timestamp modifiers can be used to convert captures to the timestamp of the
parsed metric.  If no timestamp is parsed the metric will be created using the
current time.

- Available modifiers:
  - string   (default if nothing is specified)
  - int
  - float
  - duration (ie, 5.23ms gets converted to int nanoseconds)
  - tag      (converts the field into a tag)
  - drop     (drops the field completely)
- Timestamp modifiers:
  - ts               (This will auto-learn the timestamp format)
  - ts-ansic         ("Mon Jan _2 15:04:05 2006")
  - ts-unix          ("Mon Jan _2 15:04:05 MST 2006")
  - ts-ruby          ("Mon Jan 02 15:04:05 -0700 2006")
  - ts-rfc822        ("02 Jan 06 15:04 MST")
  - ts-rfc822z       ("02 Jan 06 15:04 -0700")
  - ts-rfc850        ("Monday, 02-Jan-06 15:04:05 MST")
  - ts-rfc1123       ("Mon, 02 Jan 2006 15:04:05 MST")
  - ts-rfc1123z      ("Mon, 02 Jan 2006 15:04:05 -0700")
  - ts-rfc3339       ("2006-01-02T15:04:05Z07:00")
  - ts-rfc3339nano   ("2006-01-02T15:04:05.999999999Z07:00")
  - ts-httpd         ("02/Jan/2006:15:04:05 -0700")
  - ts-epoch         (seconds since unix epoch, may contain decimal)
  - ts-epochnano     (nanoseconds since unix epoch)
  - ts-"CUSTOM"

CUSTOM time layouts must be within quotes and be the representation of the
"reference time", which is `Mon Jan 2 15:04:05 -0700 MST 2006`
See https://golang.org/pkg/time/#Parse for more details.

Telegraf has many of its own
[built-in patterns](./grok/patterns/influx-patterns),
as well as supporting
[logstash's builtin patterns](https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns).

If you need help building patterns to match your logs,
you will find the https://grokdebug.herokuapp.com application quite useful!

#### Timestamp Examples

This example input and config parses a file using a custom timestamp conversion:

```
2017-02-21 13:10:34 value=42
```

```toml
[[inputs.logparser]]
  [inputs.logparser.grok]
    patterns = ['%{TIMESTAMP_ISO8601:timestamp:ts-"2006-01-02 15:04:05"} value=%{NUMBER:value:int}']
```

This example input and config parses a file using a timestamp in unix time:

```
1466004605 value=42
1466004605.123456789 value=42
```

```toml
[[inputs.logparser]]
  [inputs.logparser.grok]
    patterns = ['%{NUMBER:timestamp:ts-epoch} value=%{NUMBER:value:int}']
```

This example parses a file using a built-in conversion and a custom pattern:

```
Wed Apr 12 13:10:34 PST 2017 value=42
```

```toml
[[inputs.logparser]]
  [inputs.logparser.grok]
	patterns = ["%{TS_UNIX:timestamp:ts-unix} value=%{NUMBER:value:int}"]
    custom_patterns = '''
      TS_UNIX %{DAY} %{MONTH} %{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND} %{TZ} %{YEAR}
    '''
```

For cases where the timestamp itself is without offset, the `timezone` config var is available
to denote an offset. By default (with `timezone` either omit, blank or set to `"UTC"`), the times
are processed as if in the UTC timezone. If specified as `timezone = "Local"`, the timestamp
will be processed based on the current machine timezone configuration. Lastly, if using a
timezone from the list of Unix [timezones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones), the logparser grok will attempt to offset
the timestamp accordingly. See test cases for more detailed examples.

#### TOML Escaping

When saving patterns to the configuration file, keep in mind the different TOML
[string](https://github.com/toml-lang/toml#string) types and the escaping
rules for each.  These escaping rules must be applied in addition to the
escaping required by the grok syntax.  Using the Multi-line line literal
syntax with `'''` may be useful.

The following config examples will parse this input file:

```
|42|\uD83D\uDC2F|'telegraf'|
```

Since `|` is a special character in the grok language, we must escape it to
get a literal `|`.  With a basic TOML string, special characters such as
backslash must be escaped, requiring us to escape the backslash a second time.

```toml
[[inputs.logparser]]
  [inputs.logparser.grok]
    patterns = ["\\|%{NUMBER:value:int}\\|%{UNICODE_ESCAPE:escape}\\|'%{WORD:name}'\\|"]
    custom_patterns = "UNICODE_ESCAPE (?:\\\\u[0-9A-F]{4})+"
```

We cannot use a literal TOML string for the pattern, because we cannot match a
`'` within it.  However, it works well for the custom pattern.
```toml
[[inputs.logparser]]
  [inputs.logparser.grok]
    patterns = ["\\|%{NUMBER:value:int}\\|%{UNICODE_ESCAPE:escape}\\|'%{WORD:name}'\\|"]
    custom_patterns = 'UNICODE_ESCAPE (?:\\u[0-9A-F]{4})+'
```

A multi-line literal string allows us to encode the pattern:
```toml
[[inputs.logparser]]
  [inputs.logparser.grok]
    patterns = ['''
	  \|%{NUMBER:value:int}\|%{UNICODE_ESCAPE:escape}\|'%{WORD:name}'\|
	''']
    custom_patterns = 'UNICODE_ESCAPE (?:\\u[0-9A-F]{4})+'
```

### Tips for creating patterns

Writing complex patterns can be difficult, here is some advice for writing a
new pattern or testing a pattern developed [online](https://grokdebug.herokuapp.com).

Create a file output that writes to stdout, and disable other outputs while
testing.  This will allow you to see the captured metrics.  Keep in mind that
the file output will only print once per `flush_interval`.

```toml
[[outputs.file]]
  files = ["stdout"]
```

- Start with a file containing only a single line of your input.
- Remove all but the first token or piece of the line.
- Add the section of your pattern to match this piece to your configuration file.
- Verify that the metric is parsed successfully by running Telegraf.
- If successful, add the next token, update the pattern and retest.
- Continue one token at a time until the entire line is successfully parsed.

### Additional Resources

- https://www.influxdata.com/telegraf-correlate-log-metrics-data-performance-bottlenecks/
Improve logparser README (#2664) 2017-04-14 20:47:43 +00:00			`# Logparser Input Plugin`
logparser input plugin closes #102 closes #328 2016-06-02 17:47:15 +00:00
Improve logparser README (#2664) 2017-04-14 20:47:43 +00:00			The `logparser` plugin streams and parses the given logfiles. Currently it
logparser input plugin closes #102 closes #328 2016-06-02 17:47:15 +00:00			`has the capability of parsing "grok" patterns from logfiles, which also supports`
			`regex patterns.`

			`### Configuration:`

			```toml
			`[[inputs.logparser]]`
			`## Log files to parse.`
			`## These accept standard unix glob matching rules, but with the addition of`
			`## ** as a "super asterisk". ie:`
			`## /var/log/**.log -> recursively find all .log files in /var/log`
			`## /var/log//.log -> find all .log files with a parent dir in /var/log`
			`## /var/log/apache.log -> only tail the apache log file`
Removing INFLUXDB_HTTP_LOG from logparser usage/docs this log format is likely soon going to be removed from a future influxdb release, so we should not be recommending that users base any of their log parsing infra on this. 2016-08-04 15:25:35 +00:00			`files = ["/var/log/apache/access.log"]`
Add polling method to logparser and tail inputs (#3213) 2017-09-11 18:56:04 +00:00
Add timezone support to logparser timestamps (#2882) 2017-06-05 21:45:11 +00:00			`## Read files that currently exist from the beginning. Files that are created`
			`## while telegraf is running (and that match the "files" globs) will always`
			`## be read from the beginning.`
logparser input plugin closes #102 closes #328 2016-06-02 17:47:15 +00:00			`from_beginning = false`

Add polling method to logparser and tail inputs (#3213) 2017-09-11 18:56:04 +00:00			`## Method used to watch for file updates. Can be either "inotify" or "poll".`
			`# watch_method = "inotify"`

logparser input plugin closes #102 closes #328 2016-06-02 17:47:15 +00:00			`## Parse logstash-style "grok" patterns:`
Removing INFLUXDB_HTTP_LOG from logparser usage/docs this log format is likely soon going to be removed from a future influxdb release, so we should not be recommending that users base any of their log parsing infra on this. 2016-08-04 15:25:35 +00:00			`## Telegraf built-in parsing patterns: https://goo.gl/dkay10`
logparser input plugin closes #102 closes #328 2016-06-02 17:47:15 +00:00			`[inputs.logparser.grok]`
			`## This is a list of patterns to check the given log file(s) for.`
			`## Note that adding patterns here increases processing time. The most`
Removing INFLUXDB_HTTP_LOG from logparser usage/docs this log format is likely soon going to be removed from a future influxdb release, so we should not be recommending that users base any of their log parsing infra on this. 2016-08-04 15:25:35 +00:00			`## efficient configuration is to have one pattern per logparser.`
			`## Other common built-in patterns are:`
			`## %{COMMON_LOG_FORMAT} (plain apache & nginx access logs)`
			`## %{COMBINED_LOG_FORMAT} (access logs + referrer & agent)`
			`patterns = ["%{COMBINED_LOG_FORMAT}"]`
Add timezone support to logparser timestamps (#2882) 2017-06-05 21:45:11 +00:00
Removing INFLUXDB_HTTP_LOG from logparser usage/docs this log format is likely soon going to be removed from a future influxdb release, so we should not be recommending that users base any of their log parsing infra on this. 2016-08-04 15:25:35 +00:00			`## Name of the outputted measurement name.`
			`measurement = "apache_access_log"`
Add polling method to logparser and tail inputs (#3213) 2017-09-11 18:56:04 +00:00
logparser input plugin closes #102 closes #328 2016-06-02 17:47:15 +00:00			`## Full path(s) to custom pattern files.`
			`custom_pattern_files = []`
Add polling method to logparser and tail inputs (#3213) 2017-09-11 18:56:04 +00:00
logparser input plugin closes #102 closes #328 2016-06-02 17:47:15 +00:00			`## Custom patterns can also be defined here. Put one pattern per line.`
			`custom_patterns = '''`
			`'''`
Add timezone support to logparser timestamps (#2882) 2017-06-05 21:45:11 +00:00
Add polling method to logparser and tail inputs (#3213) 2017-09-11 18:56:04 +00:00			`## Timezone allows you to provide an override for timestamps that`
Add timezone support to logparser timestamps (#2882) 2017-06-05 21:45:11 +00:00			`## don't already include an offset`
			`## e.g. 04/06/2016 12:41:45 data one two 5.43µs`
			`##`
			`## Default: "" which renders UTC`
			`## Options are as follows:`
			`## 1. Local -- interpret based on machine localtime`
			`## 2. "Canada/Eastern" -- Unix TZ values like those found in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones`
			`## 3. UTC -- or blank/unspecified, will return timestamp in UTC`
			`timezone = "Canada/Eastern"`
logparser input plugin closes #102 closes #328 2016-06-02 17:47:15 +00:00			```

Improve logparser README (#2664) 2017-04-14 20:47:43 +00:00			`### Grok Parser`
logparser input plugin closes #102 closes #328 2016-06-02 17:47:15 +00:00
			`The best way to get acquainted with grok patterns is to read the logstash docs,`
			`which are available here:`
			`https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html`

Improve logparser README (#2664) 2017-04-14 20:47:43 +00:00			`The Telegraf grok parser uses a slightly modified version of logstash "grok"`
			`patterns, with the format`
logparser input plugin closes #102 closes #328 2016-06-02 17:47:15 +00:00
Improve logparser README (#2664) 2017-04-14 20:47:43 +00:00			```
			`%{<capture_syntax>[:<semantic_name>][:<modifier>]}`
			```
logparser input plugin closes #102 closes #328 2016-06-02 17:47:15 +00:00
Improve logparser README (#2664) 2017-04-14 20:47:43 +00:00			The `capture_syntax` defines the grok pattern that's used to parse the input
			line and the `semantic_name` is used to name the field or tag. The extension
			`modifier` controls the data type that the parsed item is converted to or
			`other special handling.`
logparser input plugin closes #102 closes #328 2016-06-02 17:47:15 +00:00
			`By default all named captures are converted into string fields.`
			`Timestamp modifiers can be used to convert captures to the timestamp of the`
Improve logparser README (#2664) 2017-04-14 20:47:43 +00:00			`parsed metric. If no timestamp is parsed the metric will be created using the`
			`current time.`
logparser input plugin closes #102 closes #328 2016-06-02 17:47:15 +00:00
			`- Available modifiers:`
			`- string (default if nothing is specified)`
			`- int`
			`- float`
			`- duration (ie, 5.23ms gets converted to int nanoseconds)`
			`- tag (converts the field into a tag)`
			`- drop (drops the field completely)`
			`- Timestamp modifiers:`
Begin implementing generic timestamp logparser capability 2016-07-22 16:17:50 +00:00			`- ts (This will auto-learn the timestamp format)`
logparser input plugin closes #102 closes #328 2016-06-02 17:47:15 +00:00			`- ts-ansic ("Mon Jan _2 15:04:05 2006")`
			`- ts-unix ("Mon Jan _2 15:04:05 MST 2006")`
			`- ts-ruby ("Mon Jan 02 15:04:05 -0700 2006")`
			`- ts-rfc822 ("02 Jan 06 15:04 MST")`
			`- ts-rfc822z ("02 Jan 06 15:04 -0700")`
			`- ts-rfc850 ("Monday, 02-Jan-06 15:04:05 MST")`
			`- ts-rfc1123 ("Mon, 02 Jan 2006 15:04:05 MST")`
			`- ts-rfc1123z ("Mon, 02 Jan 2006 15:04:05 -0700")`
			`- ts-rfc3339 ("2006-01-02T15:04:05Z07:00")`
			`- ts-rfc3339nano ("2006-01-02T15:04:05.999999999Z07:00")`
			`- ts-httpd ("02/Jan/2006:15:04:05 -0700")`
Add support for decimal timestamps to ts-epoch modifier (#3358) 2017-10-19 23:36:32 +00:00			`- ts-epoch (seconds since unix epoch, may contain decimal)`
logparser input plugin closes #102 closes #328 2016-06-02 17:47:15 +00:00			`- ts-epochnano (nanoseconds since unix epoch)`
			`- ts-"CUSTOM"`

			`CUSTOM time layouts must be within quotes and be the representation of the`
			"reference time", which is `Mon Jan 2 15:04:05 -0700 MST 2006`
			`See https://golang.org/pkg/time/#Parse for more details.`
Improve logparser README (#2664) 2017-04-14 20:47:43 +00:00
			`Telegraf has many of its own`
			`[built-in patterns](./grok/patterns/influx-patterns),`
			`as well as supporting`
			`[logstash's builtin patterns](https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns).`

			`If you need help building patterns to match your logs,`
			`you will find the https://grokdebug.herokuapp.com application quite useful!`

			`#### Timestamp Examples`

			`This example input and config parses a file using a custom timestamp conversion:`

			```
			`2017-02-21 13:10:34 value=42`
			```

			```toml
			`[[inputs.logparser]]`
			`[inputs.logparser.grok]`
			`patterns = ['%{TIMESTAMP_ISO8601:timestamp:ts-"2006-01-02 15:04:05"} value=%{NUMBER:value:int}']`
			```

Add support for decimal timestamps to ts-epoch modifier (#3358) 2017-10-19 23:36:32 +00:00			`This example input and config parses a file using a timestamp in unix time:`

			```
			`1466004605 value=42`
			`1466004605.123456789 value=42`
			```

			```toml
			`[[inputs.logparser]]`
			`[inputs.logparser.grok]`
			`patterns = ['%{NUMBER:timestamp:ts-epoch} value=%{NUMBER:value:int}']`
			```

Improve logparser README (#2664) 2017-04-14 20:47:43 +00:00			`This example parses a file using a built-in conversion and a custom pattern:`

			```
			`Wed Apr 12 13:10:34 PST 2017 value=42`
			```

			```toml
			`[[inputs.logparser]]`
			`[inputs.logparser.grok]`
			`patterns = ["%{TS_UNIX:timestamp:ts-unix} value=%{NUMBER:value:int}"]`
			`custom_patterns = '''`
			`TS_UNIX %{DAY} %{MONTH} %{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND} %{TZ} %{YEAR}`
			`'''`
			```

Add timezone support to logparser timestamps (#2882) 2017-06-05 21:45:11 +00:00			For cases where the timestamp itself is without offset, the `timezone` config var is available
			to denote an offset. By default (with `timezone` either omit, blank or set to `"UTC"`), the times
			are processed as if in the UTC timezone. If specified as `timezone = "Local"`, the timestamp
Add polling method to logparser and tail inputs (#3213) 2017-09-11 18:56:04 +00:00			`will be processed based on the current machine timezone configuration. Lastly, if using a`
Add timezone support to logparser timestamps (#2882) 2017-06-05 21:45:11 +00:00			`timezone from the list of Unix [timezones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones), the logparser grok will attempt to offset`
			`the timestamp accordingly. See test cases for more detailed examples.`

Improve logparser README (#2664) 2017-04-14 20:47:43 +00:00			`#### TOML Escaping`

			`When saving patterns to the configuration file, keep in mind the different TOML`
			`[string](https://github.com/toml-lang/toml#string) types and the escaping`
			`rules for each. These escaping rules must be applied in addition to the`
			`escaping required by the grok syntax. Using the Multi-line line literal`
			syntax with `'''` may be useful.

			`The following config examples will parse this input file:`

			```
			`\|42\|\uD83D\uDC2F\|'telegraf'\|`
			```

			Since `\|` is a special character in the grok language, we must escape it to
			get a literal `\|`. With a basic TOML string, special characters such as
			`backslash must be escaped, requiring us to escape the backslash a second time.`

			```toml
			`[[inputs.logparser]]`
			`[inputs.logparser.grok]`
			`patterns = ["\\\|%{NUMBER:value:int}\\\|%{UNICODE_ESCAPE:escape}\\\|'%{WORD:name}'\\\|"]`
			`custom_patterns = "UNICODE_ESCAPE (?:\\\\u[0-9A-F]{4})+"`
			```

			`We cannot use a literal TOML string for the pattern, because we cannot match a`
			`'` within it. However, it works well for the custom pattern.
			```toml
			`[[inputs.logparser]]`
			`[inputs.logparser.grok]`
			`patterns = ["\\\|%{NUMBER:value:int}\\\|%{UNICODE_ESCAPE:escape}\\\|'%{WORD:name}'\\\|"]`
			`custom_patterns = 'UNICODE_ESCAPE (?:\\u[0-9A-F]{4})+'`
			```

			`A multi-line literal string allows us to encode the pattern:`
			```toml
			`[[inputs.logparser]]`
			`[inputs.logparser.grok]`
			`patterns = ['''`
			`\\|%{NUMBER:value:int}\\|%{UNICODE_ESCAPE:escape}\\|'%{WORD:name}'\\|`
			`''']`
			`custom_patterns = 'UNICODE_ESCAPE (?:\\u[0-9A-F]{4})+'`
			```

			`### Tips for creating patterns`

			`Writing complex patterns can be difficult, here is some advice for writing a`
			`new pattern or testing a pattern developed [online](https://grokdebug.herokuapp.com).`

			`Create a file output that writes to stdout, and disable other outputs while`
			`testing. This will allow you to see the captured metrics. Keep in mind that`
			the file output will only print once per `flush_interval`.

			```toml
			`[[outputs.file]]`
			`files = ["stdout"]`
			```

			`- Start with a file containing only a single line of your input.`
			`- Remove all but the first token or piece of the line.`
			`- Add the section of your pattern to match this piece to your configuration file.`
			`- Verify that the metric is parsed successfully by running Telegraf.`
			`- If successful, add the next token, update the pattern and retest.`
			`- Continue one token at a time until the entire line is successfully parsed.`

			`### Additional Resources`

			`- https://www.influxdata.com/telegraf-correlate-log-metrics-data-performance-bottlenecks/`