This changes the current use of the InfluxDB client to instead use a
baked-in client that uses the fasthttp library.
This allows for significantly smaller allocations, the re-use of http
body buffers, and the re-use of the actual bytes of the line-protocol
metric representations.
We were having problems with telegraf talking to
carbon-relay-ng using the graphite output. When
the carbon-relay-ng server restarted the connection
the telegraf side would go into CLOSE_WAIT but telegraf
would continue to send statistics through the connection.
Reading around it seems you need to a read from the connection
and see a EOF error. We've implemented this and added a test
that replicates roughly the error we were having.
Pair: @whpearson @joshmyers
If we write a batch of points and get a "field type conflict" error
message in return, we should drop the entire batch of points because
this indicates that one or more points have a type that doesnt match the
database.
These errors will never go away on their own, and InfluxDB will
successfully write the points that dont have a conflict.
closes#2245
* Fix for broken librato output
These errors are delightful, but I'd rather avoid them:
```
Error parsing /etc/telegraf/telegraf.conf, line 2: field corresponding to `api_user' is not defined in `*librato.Librato'
```
* Fixed bad format from last commit
main reasons behind this:
- make adding/removing tags cheap
- make adding/removing fields cheap
- make parsing cheaper
- make parse -> decorate -> write out bytes metric flow much faster
Refactor serializer to use byte buffer
* NATS output plug-in now retries to reconnect forever after a lost connection.
* NATS input plug-in now retries to reconnect forever after a lost connection.
* Fixes#1953
in this commit:
- centralize logging output handler.
- set global Info/Debug/Error log levels based on config file or flags.
- remove per-plugin debug arg handling.
- add a I!, D!, or E! to every log message.
- add configuration option to specify where to send logs.
closes#1786
And use them in the prometheus output plugin.
Still need to test the prometheus output plugin.
Also need to actually create typed metrics in the system plugins.
closes#1683
closes#1539
First version of http put working
Refactored code to separate http handling from opentsdb module. Added batching support.
Fixed tag cleaning in http output and refactored telnet output.
Removed useless struct.
Fixed current unittest and added a new one.
Added benchmark test to test json serialization. Made sure http client would reuse connection.
Ran go fmt on opentsdb sources.
Updated README file
Removed useHttp in favor of parsing host string to determine the right API to use for sending metrics. Also renamed BatchSize to HttpBatchSize to better convey that it is only used when using Http API.
Updated changelog
Fixed format issues.
Removed TagSet type to make it more explicit.
Fixed unittest after removing TagSet type.
Revert "Updated changelog"
This reverts commit 24dba5520008d876b5a8d266c34a53e8805cc5f5.
Added PR under 1.1 release.
add missing redis metrics
This makes sure that all redis metrics are present without having to use a hard-coded list of what metrics to pull in.
* separate hello and authenticate functions, force connection close at end of write cycle so we don't hold open idle connections, which has the benefit of mostly removing the chance of getting hopelessly connection lost
* update changelog, though this will need to be updated again to merge into telegraf master
* bump instrumental agent version
* fix test to deal with better better connect/reconnect logic and changed ident & auth handshake
* Update CHANGELOG.md
correct URL from instrumental fork to origin and put the change in the correct part of the file
* go fmt
* Split out Instrumental tests for invalid metric and value.
* Ensure nothing remains on the wire after final test.
* Force valid metric names by replacing invalid parts with underscores.
* Multiple invalid characters being joined into a single udnerscore.
* Adjust comment to what happens.
* undo split hello and auth commands, to reduce roundtrips
* Split out Instrumental tests for invalid metric and value.
* Ensure nothing remains on the wire after final test.
* Force valid metric names by replacing invalid parts with underscores.
* Multiple invalid characters being joined into a single udnerscore.
* add an entry to CHANGELOG for easy merging upstream
* go fmt variable alignment
* remove some bugfixes from changelog which now more properly are in a different section.
* remove headers and whitespace should should have been removed with the last commit
* Source improvement for librato output
Build the source from the list of tag instead of a configuration specified
single tag
Graphite Serializer:
* make buildTags public
* make sure not to use empty tags
Librato output:
* Improve Error handling for librato API base on error or debug flag
* Send Metric per Batch (max 300)
* use Graphite BuildTag function to generate source
The change is made that it should be retro compatible
Metric sample:
server=127.0.0.1 port=80 state=leader env=test
measurement.metric_name value
service_n.metric_x
Metric before with source tags set as "server":
source=127.0.0.1
test.80.127_0_0_1.leader.measurement.metric_name
test.80.127_0_0_1.leader.service_n.metric_x
Metric now:
source=test.80.127.0.0.1.leader
measurement.metric_name
service_n.metric_x
As you can see the source in the "new" version is much more precise
That way when filter (only from source) you can filter by env or any other tags
* Using template to specify which tagsusing for source, default concat all
tags
* revert change in graphite serializer
* better documentation, change default for template
* fmt
* test passing with new host as default tags
* use host tag in api integration test
* Limit 80 char per line, change resolution to be a int in the sample
* fmt
* remove resolution, doc for template
* fmt
1. in prometheus client, do not check for invalid characters anymore,
because we are already replacing all invalid characters with regex
anyways.
2. in win_perf_counters, sanitize field name _and_ measurement name.
Also add '%' to the list of sanitized characters, because this character
is invalid for most output plugins, and can also easily cause string
formatting issues throughout the stack.
3. All '%' will now be translated to 'Percent'
closes#1430
closes#1412
separate hello and authenticate functions,
force connection close at end of write cycle so we don't
hold open idle connections,
which has the benefit of mostly removing
the chance of getting hopelessly connection lost
bump instrumental agent version
fix test to deal with better better connect/reconnect logic and changed ident & auth handshake
Update CHANGELOG.md
correct URL from instrumental fork to origin and put the change in the correct part of the file
go fmt
undo split hello and auth commands, to reduce roundtrips
Adding precision rounding to the accumulator. This means that now every
input metric will get rounded at collection, rather than at write (and
only for the influxdb output).
This feature is disabled for service inputs, because service inputs
should be in control of their own timestamps & precisions.
* Use shared AWS credential configuration.
* Cloudwatch dimension wilcards
* Allow configuring cache_ttl for cloudwatch metrics.
* Allow for wildcard in dimension values to select all available metrics.
* Use internal.Duration for CacheTTL and go fmt
* Refactor to not use embedded structs for config.
* Update AWS plugin READMEs with credentials details, update Changelog.
* Fix changelog after rebasing to master and 0.13.1 release.
* Fix changelog after rebase.
changes:
- -sample-config will now comment out all but a few default plugins.
- config file parse errors will output path to bad conf file.
- cleanup 80-char line-length and some other style issues.
- default package conf file will now have all plugins, but commented
out.
closes#199closes#944
The following configuration is now possible
## CompressionCodec represents the various compression codecs
recognized by Kafka in messages.
## "none" : No compression
## "gzip" : Gzip compression
## "snappy" : Snappy compression
# compression_codec = "none"
## RequiredAcks is used in Produce Requests to tell the broker how
many replica acknowledgements it must see before responding
## "none" : the producer never waits for an acknowledgement from the
broker. This option provides the lowest latency but the weakest
durability guarantees (some data will be lost when a server fails).
## "leader" : the producer gets an acknowledgement after the leader
replica has received the data. This option provides better durability
as the client waits until the server acknowledges the request as
successful (only messages that were written to the now-dead leader but
not yet replicated will be lost).
## "leader_and_replicas" : the producer gets an acknowledgement after
all in-sync replicas have received the data. This option provides the
best durability, we guarantee that no messages will be lost as long as
at least one in sync replica remains.
# required_acks = "leader_and_replicas"
## The total number of times to retry sending a message
# max_retry = "3"
if i understand the prometheus data model correctly, the current output
for this plugin is unusable
prometheus only accepts a single value per measurement. prior to this change, the range loop
causes a measurement to end up w/ a random value
for instance:
net,dc=sjc1,grp_dashboard=1,grp_home=1,grp_hwy_fetcher=1,grp_web_admin=1,host=sjc1-b4-8,hw=app,interface=docker0,state=live
bytes_recv=477596i,bytes_sent=152963303i,drop_in=0i,drop_out=0i,err_in=0i,err_out=0i,packets_recv=7231i,packets_sent=11460i
1457121990003778992
this 'net' measurent would have all it's tags copied to prometheus
labels, but any of 152963303, or 0, or 7231 as a value for
'net' depending on which field is last in the map iteration
this change expands the fields into new measurements by appending
the field name to the influxdb measurement name.
ie, the above example results with 'net' dropped and new measurements
to take it's place:
net_bytes_recv
net_bytes_sent
net_drop_in
net_err_in
net_packets_recv
net_packets_sent
i hope this can be merged, i love telegraf's composability of tags and
filtering
- Check and return error from NewBatchPoints to prevent runtime panic if
user provides an unparsable precision time unit in config.
- Provide correct sample config precision examples.
- Update etc/telegraf.conf precision comment.
closes#715
* Customizable 'separator' option instead of hard-coded '_'
* String values are sent as "State" instead of "Metric", preventing
Riemann from rejecting them
* Riemann service name is set to an (ugly) combination of input name &
(sorted) tags' values...this allows connecting different events for
the same input together on the Riemann side
closes#642
This constitutes a large change in how we will parse different data
formats going forward (for the plugins that support it)
This is working off @henrypfhu's changes.