The existing ceph input plugin only has access to the local admin daemon socket
on the local host, and as such has access to a limited subset of data. This
extends the plugin to use CLI commands to get access to the full spread of Ceph
data. This patch collects global OSD map and IO statistics, PG state and per pool
IO and utilization statistics.
closes#1513
* Some improvment in mesos input plugin,
Removing uneeded statistics prefix for task's metric,
Adding framework id tags into each task's metric,
Adding state (leader/follower) tags to master's metric,
Make sure the slave's metrics are tags with slave
* typo, replacing cpus_total with elected to determine leader
* Remove remaining statistics_ from sample
* using timestamp from mesos as metric timestamp
* change mesos-tasks to mesos_tasks, measurement
* change measurement name in test
* Replace follower by standby
* separate hello and authenticate functions, force connection close at end of write cycle so we don't hold open idle connections, which has the benefit of mostly removing the chance of getting hopelessly connection lost
* update changelog, though this will need to be updated again to merge into telegraf master
* bump instrumental agent version
* fix test to deal with better better connect/reconnect logic and changed ident & auth handshake
* Update CHANGELOG.md
correct URL from instrumental fork to origin and put the change in the correct part of the file
* go fmt
* Split out Instrumental tests for invalid metric and value.
* Ensure nothing remains on the wire after final test.
* Force valid metric names by replacing invalid parts with underscores.
* Multiple invalid characters being joined into a single udnerscore.
* Adjust comment to what happens.
* undo split hello and auth commands, to reduce roundtrips
* Split out Instrumental tests for invalid metric and value.
* Ensure nothing remains on the wire after final test.
* Force valid metric names by replacing invalid parts with underscores.
* Multiple invalid characters being joined into a single udnerscore.
* add an entry to CHANGELOG for easy merging upstream
* go fmt variable alignment
* remove some bugfixes from changelog which now more properly are in a different section.
* remove headers and whitespace should should have been removed with the last commit
* Source improvement for librato output
Build the source from the list of tag instead of a configuration specified
single tag
Graphite Serializer:
* make buildTags public
* make sure not to use empty tags
Librato output:
* Improve Error handling for librato API base on error or debug flag
* Send Metric per Batch (max 300)
* use Graphite BuildTag function to generate source
The change is made that it should be retro compatible
Metric sample:
server=127.0.0.1 port=80 state=leader env=test
measurement.metric_name value
service_n.metric_x
Metric before with source tags set as "server":
source=127.0.0.1
test.80.127_0_0_1.leader.measurement.metric_name
test.80.127_0_0_1.leader.service_n.metric_x
Metric now:
source=test.80.127.0.0.1.leader
measurement.metric_name
service_n.metric_x
As you can see the source in the "new" version is much more precise
That way when filter (only from source) you can filter by env or any other tags
* Using template to specify which tagsusing for source, default concat all
tags
* revert change in graphite serializer
* better documentation, change default for template
* fmt
* test passing with new host as default tags
* use host tag in api integration test
* Limit 80 char per line, change resolution to be a int in the sample
* fmt
* remove resolution, doc for template
* fmt
* Fix problem with metrics when ping return Destination net unreachable
Add test case TestUnreachablePingGather
Add percent_reply_loss
Fix some other tests
* Add errors measurment
* fir problem with ping reply "TTL expired in transit" ( use regex for more specific condition - TTL in line but it's a not valid replay )
add test case for "TTL expired in transit" - TestTTLExpiredPingGather
this log format is likely soon going to be removed from a future
influxdb release, so we should not be recommending that users base any
of their log parsing infra on this.
* Ping for windows
* En ping output
* Code format
* Code review
* Default timeout
* Fix problem with std error when no data received ( exit status = 1 )
1. in prometheus client, do not check for invalid characters anymore,
because we are already replacing all invalid characters with regex
anyways.
2. in win_perf_counters, sanitize field name _and_ measurement name.
Also add '%' to the list of sanitized characters, because this character
is invalid for most output plugins, and can also easily cause string
formatting issues throughout the stack.
3. All '%' will now be translated to 'Percent'
closes#1430
closes#1499closes#1019
Do no try to guess HAproxy stats url, just add ";csv" at the end of the
url if not present.
Signed-off-by: tgermain <timothee.germain@corp.ovh.com>
* add initial support to allow self-signed certs
When using self-signed the metrics collection will fail, this will allow
the user to specify in the input configuration file if they want to skip
certificate verification. This is functionally identical to `curl -k`
At some point this functionality should be moved to the agent as it is
already implemented identically in several different input plugins.
* Add initial comment strings to remove noise
These should be properly fleshed out at some point to ensure
code completeness
* refactor to use generic helper function
* fix import statement against fork
* update changelog
closes#1436
This also fixes the bad behavior of waiting until runtime to return log
parsing pattern compile errors when a pattern was simply unfound.
closes#1418
Also protect against user error when the telegraf user does not have
permission to open the provided file. We will now error and exit in this
case, rather than silently waiting to get permission to open it.
* Add mandrill webhook.
* Store the id of the msg as part of event.
Signed-off-by: Cyril Duez <cyril@stormz.me>
Signed-off-by: François de Metz <francois@stormz.me>
* Decode body to get the mandrill_events.
Signed-off-by: Cyril Duez <cyril@stormz.me>
Signed-off-by: François de Metz <francois@stormz.me>
* Handle HEAD request.
Signed-off-by: Cyril Duez <cyril@stormz.me>
Signed-off-by: François de Metz <francois@stormz.me>
* Add the README.
Signed-off-by: Cyril Duez <cyril@stormz.me>
Signed-off-by: François de Metz <francois@stormz.me>
* Add mandrill_webhooks to the README.
Signed-off-by: Cyril Duez <cyril@stormz.me>
Signed-off-by: François de Metz <francois@stormz.me>
* Update changelog.
Signed-off-by: Cyril Duez <cyril@stormz.me>
Signed-off-by: François de Metz <francois@stormz.me>
* Run gofmt.
Signed-off-by: Cyril Duez <cyril@stormz.me>
Signed-off-by: François de Metz <francois@stormz.me>
closes#1412
separate hello and authenticate functions,
force connection close at end of write cycle so we don't
hold open idle connections,
which has the benefit of mostly removing
the chance of getting hopelessly connection lost
bump instrumental agent version
fix test to deal with better better connect/reconnect logic and changed ident & auth handshake
Update CHANGELOG.md
correct URL from instrumental fork to origin and put the change in the correct part of the file
go fmt
undo split hello and auth commands, to reduce roundtrips
This is for better thread-safety when running with multiple outputs,
which can cause very odd panics at very high loads
primarily this is to address #1432closes#1432
closes#1289
Signed-off-by: François de Metz <francois@stormz.me>
Signed-off-by: Cyril Duez <cyril@stormz.me>
Rename internals struct.
Signed-off-by: François de Metz <francois@stormz.me>
Signed-off-by: Cyril Duez <cyril@stormz.me>
Update changelog.
Signed-off-by: François de Metz <francois@stormz.me>
Signed-off-by: Cyril Duez <cyril@stormz.me>
Update READMEs and CHANGELOG.
Signed-off-by: François de Metz <francois@stormz.me>
Signed-off-by: Cyril Duez <cyril@stormz.me>
Update SampleConfig.
Update the config format.
Update telegraf config.
Update the webhooks README.
Update changelog.
Update the changelog with an upgrade path.
Update default ports.
Fix indent.
Check for nil value on AvailableWebhooks.
Check for CanInterface.
* Allow for TLS connections to ElasticSearch
Extremely similar implementation to the HTTP JSON module's
implementation of the same code.
* Changelog update
I added Rows/Logs max size counters for tracking databases that do not have autogrowth enabled. The counters return numbers in 8KB pages since there are a few special values (such as -1 for no max size) that can't directly be multiplied by 8192 to get size in bytes.
Also added Rows/Logs size in 8KB pages for comparison from the same system table. Even though it returns the same size as sizes from sys.dm_io_virtual_file_stats which are already collected.
Adding precision rounding to the accumulator. This means that now every
input metric will get rounded at collection, rather than at write (and
only for the influxdb output).
This feature is disabled for service inputs, because service inputs
should be in control of their own timestamps & precisions.
* WIP: Initial support for ZFS on FreeBSD
* Added build directives
* Ignore 'kstatPath' config option on FreeBSD
* Added tests for ZFS FreeBSD input plugin.
* Updated the README to confrom with the guidelines and added FreeBSD info
* Fixed indents
* Spell check
- Updated README/CHANGELOG
- Added links to further info to input README
- Reduced lines to 80 chars
Removing input declaration from SampleConfig
Moved PR to unreleased section of changelog
closes#1165
- Collects conntrack stats from the configured directories and files.
Applying PR feedback:
- Rebased onto master
- Updated README/CHANGELOG
- Limited lines to 80 chars
- Improved plugin docs and README
- added a dummy notlinux build file
Fixed up CHANGELOG and README after rebase
closes#1164
Allow using glob pattern in the command list in configuration. This enables for
example placing all commands in a single directory and using /path/to/dir/*.sh
as one of the commands to run all shell scripts in that directory.
Glob patterns are applied on every run of the commands, so matching commands can
be added without restarting telegraf.
closes#1142
* Use shared AWS credential configuration.
* Cloudwatch dimension wilcards
* Allow configuring cache_ttl for cloudwatch metrics.
* Allow for wildcard in dimension values to select all available metrics.
* Use internal.Duration for CacheTTL and go fmt
* Refactor to not use embedded structs for config.
* Update AWS plugin READMEs with credentials details, update Changelog.
* Fix changelog after rebasing to master and 0.13.1 release.
* Fix changelog after rebase.
* Report rollbar events.
Signed-off-by: Cyril Duez <cyril@stormz.me>
Signed-off-by: François de Metz <francois@stormz.me>
* Fix indent with go fmt.
* Add test for rollbar webhooks.
* Report more data from new_item event.
* Handle new deploy webhook.
Signed-off-by: Cyril Duez <cyril@stormz.me>
Signed-off-by: François de Metz <francois@stormz.me>
* Update default port.
* Add readme.
* Add rollbar_webhooks to the readme.
* Add rollbar_webhooks to plugins list.
* Add tag level for new_item event.
* Update readme.
* Update changelog.
* Adding Varnish HTTP Cache input plugin
* Applying PR feedback
- Linked to varnish in input README
- Updated README/CHANGELOG
- Cleaned up sampleConfig to remove formatting
- Shorted lines to 80 chars (except where test input requires long strings)
- Using internal.RunTimeout to wrap call to varnishtat
- Added dummy file for windows
Also changing the net_response and http_response plugins to only accept
duration strings for their timeout parameters. This is a breaking config
file change.
closes#1214
Being able to override the process_name in the procstat module
is useful for daemonized perl, ruby, erlang etc. processes. This
allows for manually setting process_name rather than it being set to
the interpreter/VM of the process.
Allow using glob pattern in the command list in configuration. This enables for
example placing all commands in a single directory and using /path/to/dir/*.sh
as one of the commands to run all shell scripts in that directory.
Glob patterns are applied on every run of the commands, so matching commands can
be added without restarting telegraf.
closes#1127
First is to write an internal CombinedOutput and Run function with a
timeout.
Second, the following instances of command runners need to have timeouts:
plugins/inputs/ping/ping.go
125: out, err := c.CombinedOutput()
plugins/inputs/exec/exec.go
91: if err := cmd.Run(); err != nil {
plugins/inputs/ipmi_sensor/command.go
31: err := cmd.Run()
plugins/inputs/sysstat/sysstat.go
194: out, err := cmd.CombinedOutput()
plugins/inputs/leofs/leofs.go
185: defer cmd.Wait()
plugins/inputs/sysstat/sysstat.go
282: if err := cmd.Wait(); err != nil {
closes#1067
Lustre Jobstats allows for RPCs to be tagged with a value, such
as a job's ID. This allows for per job statistics. This plugin
collects statistics and tags the data with the jobid.
closes#1107
Allow overriding the the metrics "server" tag with the specified
value. Can be used to give a more user-friendly value for the server
name.
closes#1093
this is so that we don't call os.Stat twice for every file matched
by Match(). Also changing the behavior to _not_ return the name of a
file that doesn't exist if it's not a glob.
Network metrics are pretty important and the block adds a couple with a link to the names for more. This adds a block with a few counters to the Generic Queries examples in plugins/inputs/win_perf_counters/README.md
- renaming cont_name and cont_image to container_name and
container_image.
- cont_id is now a field, called container_id
- docker_cpu, docker_mem, docker_net measurements have been renamed to
docker_container_cpu, docker_container_mem, and docker_container_net
closes#1014closes#1052