Improve logparser README (#2664)

2017-04-14 13:47:43 -07:00
parent 4df8b034bf
commit f005ea4a27
2 changed files with 121 additions and 22 deletions
--- a/plugins/inputs/logparser/README.md
+++ b/plugins/inputs/logparser/README.md
@@ -1,6 +1,6 @@
-# logparser Input Plugin
+# Logparser Input Plugin

-The logparser plugin streams and parses the given logfiles. Currently it only
+The `logparser` plugin streams and parses the given logfiles. Currently it
 has the capability of parsing "grok" patterns from logfiles, which also supports
 regex patterns.

@@ -37,35 +37,28 @@ regex patterns.
    '''
 ```

-## Grok Parser
-
-The grok parser uses a slightly modified version of logstash "grok" patterns,
-with the format
-
-```
-%{<capture_syntax>[:<semantic_name>][:<modifier>]}
-```
-
-Telegraf has many of it's own
-[built-in patterns](https://github.com/influxdata/telegraf/blob/master/plugins/inputs/logparser/grok/patterns/influx-patterns),
-as well as supporting
-[logstash's builtin patterns](https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns).
-
+### Grok Parser

 The best way to get acquainted with grok patterns is to read the logstash docs,
 which are available here:
  https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html

+The Telegraf grok parser uses a slightly modified version of logstash "grok"
+patterns, with the format

-If you need help building patterns to match your logs,
-you will find the http://grokdebug.herokuapp.com application quite useful!
+```
+%{<capture_syntax>[:<semantic_name>][:<modifier>]}
+```

+The `capture_syntax` defines the grok pattern that's used to parse the input
+line and the `semantic_name` is used to name the field or tag.  The extension
+`modifier` controls the data type that the parsed item is converted to or
+other special handling.

 By default all named captures are converted into string fields.
-Modifiers can be used to convert captures to other types or tags.
 Timestamp modifiers can be used to convert captures to the timestamp of the
- parsed metric.
-
+parsed metric.  If no timestamp is parsed the metric will be created using the
+current time.

 - Available modifiers:
  - string   (default if nothing is specified)
@@ -91,7 +84,112 @@ Timestamp modifiers can be used to convert captures to the timestamp of the
  - ts-epochnano     (nanoseconds since unix epoch)
  - ts-"CUSTOM"

-
 CUSTOM time layouts must be within quotes and be the representation of the
 "reference time", which is `Mon Jan 2 15:04:05 -0700 MST 2006`
 See https://golang.org/pkg/time/#Parse for more details.
+
+Telegraf has many of its own
+[built-in patterns](./grok/patterns/influx-patterns),
+as well as supporting
+[logstash's builtin patterns](https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns).
+
+If you need help building patterns to match your logs,
+you will find the https://grokdebug.herokuapp.com application quite useful!
+
+#### Timestamp Examples
+
+This example input and config parses a file using a custom timestamp conversion:
+
+```
+2017-02-21 13:10:34 value=42
+```
+
+```toml
+[[inputs.logparser]]
+  [inputs.logparser.grok]
+    patterns = ['%{TIMESTAMP_ISO8601:timestamp:ts-"2006-01-02 15:04:05"} value=%{NUMBER:value:int}']
+```
+
+This example parses a file using a built-in conversion and a custom pattern:
+
+```
+Wed Apr 12 13:10:34 PST 2017 value=42
+```
+
+```toml
+[[inputs.logparser]]
+  [inputs.logparser.grok]
+	patterns = ["%{TS_UNIX:timestamp:ts-unix} value=%{NUMBER:value:int}"]
+    custom_patterns = '''
+      TS_UNIX %{DAY} %{MONTH} %{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND} %{TZ} %{YEAR}
+    '''
+```
+
+#### TOML Escaping
+
+When saving patterns to the configuration file, keep in mind the different TOML
+[string](https://github.com/toml-lang/toml#string) types and the escaping
+rules for each.  These escaping rules must be applied in addition to the
+escaping required by the grok syntax.  Using the Multi-line line literal
+syntax with `'''` may be useful.
+
+The following config examples will parse this input file:
+
+```
+|42|\uD83D\uDC2F|'telegraf'|
+```
+
+Since `|` is a special character in the grok language, we must escape it to
+get a literal `|`.  With a basic TOML string, special characters such as
+backslash must be escaped, requiring us to escape the backslash a second time.
+
+```toml
+[[inputs.logparser]]
+  [inputs.logparser.grok]
+    patterns = ["\\|%{NUMBER:value:int}\\|%{UNICODE_ESCAPE:escape}\\|'%{WORD:name}'\\|"]
+    custom_patterns = "UNICODE_ESCAPE (?:\\\\u[0-9A-F]{4})+"
+```
+
+We cannot use a literal TOML string for the pattern, because we cannot match a
+`'` within it.  However, it works well for the custom pattern.
+```toml
+[[inputs.logparser]]
+  [inputs.logparser.grok]
+    patterns = ["\\|%{NUMBER:value:int}\\|%{UNICODE_ESCAPE:escape}\\|'%{WORD:name}'\\|"]
+    custom_patterns = 'UNICODE_ESCAPE (?:\\u[0-9A-F]{4})+'
+```
+
+A multi-line literal string allows us to encode the pattern:
+```toml
+[[inputs.logparser]]
+  [inputs.logparser.grok]
+    patterns = ['''
+	  \|%{NUMBER:value:int}\|%{UNICODE_ESCAPE:escape}\|'%{WORD:name}'\|
+	''']
+    custom_patterns = 'UNICODE_ESCAPE (?:\\u[0-9A-F]{4})+'
+```
+
+### Tips for creating patterns
+
+Writing complex patterns can be difficult, here is some advice for writing a
+new pattern or testing a pattern developed [online](https://grokdebug.herokuapp.com).
+
+Create a file output that writes to stdout, and disable other outputs while
+testing.  This will allow you to see the captured metrics.  Keep in mind that
+the file output will only print once per `flush_interval`.
+
+```toml
+[[outputs.file]]
+  files = ["stdout"]
+```
+
+- Start with a file containing only a single line of your input.
+- Remove all but the first token or piece of the line.
+- Add the section of your pattern to match this piece to your configuration file.
+- Verify that the metric is parsed successfully by running Telegraf.
+- If successful, add the next token, update the pattern and retest.
+- Continue one token at a time until the entire line is successfully parsed.
+
+### Additional Resources
+
+- https://www.influxdata.com/telegraf-correlate-log-metrics-data-performance-bottlenecks/
--- a/plugins/inputs/logparser/grok/grok.go
+++ b/plugins/inputs/logparser/grok/grok.go
@@ -168,6 +168,7 @@ func (p *Parser) ParseLine(line string) (telegraf.Metric, error) {
 	}

 	if len(values) == 0 {
+		log.Printf("D! Grok no match found for: %q", line)
 		return nil, nil
 	}