Improve logparser README (#2664)

This commit is contained in:
Daniel Nelson 2017-04-14 13:47:43 -07:00 committed by GitHub
parent 4df8b034bf
commit f005ea4a27
2 changed files with 121 additions and 22 deletions

View File

@ -1,6 +1,6 @@
# logparser Input Plugin # Logparser Input Plugin
The logparser plugin streams and parses the given logfiles. Currently it only The `logparser` plugin streams and parses the given logfiles. Currently it
has the capability of parsing "grok" patterns from logfiles, which also supports has the capability of parsing "grok" patterns from logfiles, which also supports
regex patterns. regex patterns.
@ -37,35 +37,28 @@ regex patterns.
''' '''
``` ```
## Grok Parser ### Grok Parser
The grok parser uses a slightly modified version of logstash "grok" patterns,
with the format
```
%{<capture_syntax>[:<semantic_name>][:<modifier>]}
```
Telegraf has many of it's own
[built-in patterns](https://github.com/influxdata/telegraf/blob/master/plugins/inputs/logparser/grok/patterns/influx-patterns),
as well as supporting
[logstash's builtin patterns](https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns).
The best way to get acquainted with grok patterns is to read the logstash docs, The best way to get acquainted with grok patterns is to read the logstash docs,
which are available here: which are available here:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
The Telegraf grok parser uses a slightly modified version of logstash "grok"
patterns, with the format
If you need help building patterns to match your logs, ```
you will find the http://grokdebug.herokuapp.com application quite useful! %{<capture_syntax>[:<semantic_name>][:<modifier>]}
```
The `capture_syntax` defines the grok pattern that's used to parse the input
line and the `semantic_name` is used to name the field or tag. The extension
`modifier` controls the data type that the parsed item is converted to or
other special handling.
By default all named captures are converted into string fields. By default all named captures are converted into string fields.
Modifiers can be used to convert captures to other types or tags.
Timestamp modifiers can be used to convert captures to the timestamp of the Timestamp modifiers can be used to convert captures to the timestamp of the
parsed metric. parsed metric. If no timestamp is parsed the metric will be created using the
current time.
- Available modifiers: - Available modifiers:
- string (default if nothing is specified) - string (default if nothing is specified)
@ -91,7 +84,112 @@ Timestamp modifiers can be used to convert captures to the timestamp of the
- ts-epochnano (nanoseconds since unix epoch) - ts-epochnano (nanoseconds since unix epoch)
- ts-"CUSTOM" - ts-"CUSTOM"
CUSTOM time layouts must be within quotes and be the representation of the CUSTOM time layouts must be within quotes and be the representation of the
"reference time", which is `Mon Jan 2 15:04:05 -0700 MST 2006` "reference time", which is `Mon Jan 2 15:04:05 -0700 MST 2006`
See https://golang.org/pkg/time/#Parse for more details. See https://golang.org/pkg/time/#Parse for more details.
Telegraf has many of its own
[built-in patterns](./grok/patterns/influx-patterns),
as well as supporting
[logstash's builtin patterns](https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns).
If you need help building patterns to match your logs,
you will find the https://grokdebug.herokuapp.com application quite useful!
#### Timestamp Examples
This example input and config parses a file using a custom timestamp conversion:
```
2017-02-21 13:10:34 value=42
```
```toml
[[inputs.logparser]]
[inputs.logparser.grok]
patterns = ['%{TIMESTAMP_ISO8601:timestamp:ts-"2006-01-02 15:04:05"} value=%{NUMBER:value:int}']
```
This example parses a file using a built-in conversion and a custom pattern:
```
Wed Apr 12 13:10:34 PST 2017 value=42
```
```toml
[[inputs.logparser]]
[inputs.logparser.grok]
patterns = ["%{TS_UNIX:timestamp:ts-unix} value=%{NUMBER:value:int}"]
custom_patterns = '''
TS_UNIX %{DAY} %{MONTH} %{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND} %{TZ} %{YEAR}
'''
```
#### TOML Escaping
When saving patterns to the configuration file, keep in mind the different TOML
[string](https://github.com/toml-lang/toml#string) types and the escaping
rules for each. These escaping rules must be applied in addition to the
escaping required by the grok syntax. Using the Multi-line line literal
syntax with `'''` may be useful.
The following config examples will parse this input file:
```
|42|\uD83D\uDC2F|'telegraf'|
```
Since `|` is a special character in the grok language, we must escape it to
get a literal `|`. With a basic TOML string, special characters such as
backslash must be escaped, requiring us to escape the backslash a second time.
```toml
[[inputs.logparser]]
[inputs.logparser.grok]
patterns = ["\\|%{NUMBER:value:int}\\|%{UNICODE_ESCAPE:escape}\\|'%{WORD:name}'\\|"]
custom_patterns = "UNICODE_ESCAPE (?:\\\\u[0-9A-F]{4})+"
```
We cannot use a literal TOML string for the pattern, because we cannot match a
`'` within it. However, it works well for the custom pattern.
```toml
[[inputs.logparser]]
[inputs.logparser.grok]
patterns = ["\\|%{NUMBER:value:int}\\|%{UNICODE_ESCAPE:escape}\\|'%{WORD:name}'\\|"]
custom_patterns = 'UNICODE_ESCAPE (?:\\u[0-9A-F]{4})+'
```
A multi-line literal string allows us to encode the pattern:
```toml
[[inputs.logparser]]
[inputs.logparser.grok]
patterns = ['''
\|%{NUMBER:value:int}\|%{UNICODE_ESCAPE:escape}\|'%{WORD:name}'\|
''']
custom_patterns = 'UNICODE_ESCAPE (?:\\u[0-9A-F]{4})+'
```
### Tips for creating patterns
Writing complex patterns can be difficult, here is some advice for writing a
new pattern or testing a pattern developed [online](https://grokdebug.herokuapp.com).
Create a file output that writes to stdout, and disable other outputs while
testing. This will allow you to see the captured metrics. Keep in mind that
the file output will only print once per `flush_interval`.
```toml
[[outputs.file]]
files = ["stdout"]
```
- Start with a file containing only a single line of your input.
- Remove all but the first token or piece of the line.
- Add the section of your pattern to match this piece to your configuration file.
- Verify that the metric is parsed successfully by running Telegraf.
- If successful, add the next token, update the pattern and retest.
- Continue one token at a time until the entire line is successfully parsed.
### Additional Resources
- https://www.influxdata.com/telegraf-correlate-log-metrics-data-performance-bottlenecks/

View File

@ -168,6 +168,7 @@ func (p *Parser) ParseLine(line string) (telegraf.Metric, error) {
} }
if len(values) == 0 { if len(values) == 0 {
log.Printf("D! Grok no match found for: %q", line)
return nil, nil return nil, nil
} }