Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add datadog #915

Merged
merged 38 commits into from
Feb 13, 2019
Merged

Feature/add datadog #915

merged 38 commits into from
Feb 13, 2019

Conversation

mstoykov
Copy link
Contributor

@mstoykov mstoykov commented Jan 31, 2019

Fixes on top of #893

@codecov-io
Copy link

codecov-io commented Jan 31, 2019

Codecov Report

Merging #915 into master will increase coverage by 0.56%.
The diff coverage is 89.81%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #915      +/-   ##
==========================================
+ Coverage   69.82%   70.38%   +0.56%     
==========================================
  Files         112      118       +6     
  Lines        8823     9088     +265     
==========================================
+ Hits         6161     6397     +236     
- Misses       2261     2285      +24     
- Partials      401      406       +5
Impacted Files Coverage Δ
cmd/collectors.go 0% <0%> (ø) ⬆️
stats/datadog/collector.go 100% <100%> (ø)
stats/statsd/common/api.go 100% <100%> (ø)
stats/statsd/collector.go 100% <100%> (ø)
lib/options.go 91.81% <100%> (+0.45%) ⬆️
cmd/config.go 41.05% <100%> (+1.26%) ⬆️
stats/statsd/common/config.go 100% <100%> (ø)
stats/statsd/common/collector.go 84.53% <84.53%> (ø)
stats/statsd/common/testutil/test_helper.go 97.95% <97.95%> (ø)
core/engine.go 92.99% <0%> (-0.94%) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 45f9913...1c93908. Read the comment docs.

Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Various minor and not-so-minor issues noted inline. There is also a generic issue that this code doesn't have even a single test...

cmd/collectors.go Show resolved Hide resolved
Both are very similar but DataDog has a concept of tags. By default both send on `localhost:8125` and currently only UDP is supported as transport.
In order to change this you can use the `K6_DATADOG_ADDR` or `K6_STATSD_ADDR` env variable which has to be in the format of `address:port`.
The new outputs also can add a `namespace` which a prefix before all the samples with `K6_DATADOG_NAMESPACE` or `K6_STATSD_NAMESPACE` respectively. By default the value is `k6.`, notice the dot at the end.
In the case of DataDog there is an additional configuration `K6_DATADOG_TAG_WHITELIST` which by default is equal to `status, method`. This is a comma separated list of tags that should be sent to DataDog. All other tags that k6 emits are discarded. This is done because DataDog does indexing on top of the tags and some highly variable tags like `vu` and `iter` will lead to problems with the service.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we should limit the tags only to status and method - all of those tags (with the possible exception of url and maybe name) seem like good candidates to also be included

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robingustafsson Any opinion on this ? I am agnostic just want to note that people do complain that datadog gets with a lot of tags. We probably won't hit it given that we wouldn't have all that many but still

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can start with this small set of whitelisted tags and then if it turns out users need more by default we can expand later. The only other tags that I would consider useful is the url and name that @na-- thinks should be excluded 😄. Given the kind of product that Datadog offers (being a long time customer) I don't believe anyone would really use it as a general purpose load testing result analysis tool. I can see though how you'd use it to store monitoring type of results but then you'd also likely set up custom metrics to track rather than just the default metrics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robingustafsson I don't think we should exclude both url and name, only url. Since they are either the same, or the user explicitly set name because url was too dynamic.
To reduce confusion, I think that the default value for TagWhitelist should be proto, subproto, status, method, name, group, check, error, tls_version (i.e. the DefaultSystemTagList, but without any highly variable things (and maybe with error_code instead of error, once that's merged?) )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@na-- Ah ok, misread you. I'm not sure we need such an expansive tag whitelist though to start with for Datadog, but I'm not strongly opposed to it either 🙂Yes, error_code would be preferred over error, once available, if you ask me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm something else that I realized - I'm not sure if a whitelist is actually the best approach here. It will work for system tags, but any custom user tags will be silently filtered, which isn't the best UX. If I, as a user, explicitly set a certain tag value on an HTTP request, I usually want it and can depend on it later for filtering the data...

In the InfluxDB collector, this is handled by transforming any highly variable tags into (non-indexable) fields. This is great, since user-supplied data won't be lost, while at the same time the InfluxDB server won't be overwhelmed with indexing highly-variable data.

I'm not very familiar with Datadog, but a quick check doesn't reveal any similar functionality there... So if no such functionality exists in Datadog, I wonder whether a blacklist (to block the most highly-variable tags like vu, iter, url, and error (once error_code is merged)) will be more user-friendly than the current whitelist. Or whether we need such an option at all, considering that users can just set the emitted tags directly via the global --system-tags option?

stats/datadog/collector.go Outdated Show resolved Hide resolved
stats/statsd/common/client.go Outdated Show resolved Hide resolved
stats/statsd/common/api.go Outdated Show resolved Hide resolved
stats/statsd/common/collector.go Outdated Show resolved Hide resolved
stats/statsd/common/collector.go Outdated Show resolved Hide resolved
stats/statsd/common/client.go Outdated Show resolved Hide resolved
stats/statsd/common/collector.go Outdated Show resolved Hide resolved
stats/statsd/common/config.go Outdated Show resolved Hide resolved
Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About unit tests: not sure how much we can test this, but we should at least have some... even on that tag filtering logic

stats/datadog/collector.go Outdated Show resolved Hide resolved
for _, entry := range data {
if err := c.dispatch(entry); err != nil {
// No need to return error if just one metric didn't go through
c.logger.WithError(err).Warnf("Error while sending metric %s", entry.Metric)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it could be very spammy in some situations... It's better to collect the number of errors and display only a single warning message like "we couldn't dispatch X out of Y metrics to datadog" or something like that. And maybe the individual errors could still be debug messages, so they're not shown by default, but users can see them with verbose mode enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this will lead to people wondering where their metrics are and it will make it harder to debug. If people complain that it is way too spammy in situations which are okay to not be spammy we can always make it less spammy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come on, this line can generate literally thousands of warning messages - not very useful. I'm not saying to suppress those errors, just to aggregate them and say "1234 metrics weren't sent" every 1 second, instead of 1234 messages every second...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my argument is that knowing what didn't get sent may help you debug it ... hiding the error behind why we couldn't send something and just telling you we couldn't will lead to questions of "why do I have no metrics?" and "what does 'we couldn't sent your metrics' mean?".

If someone is continuously getting 1234 messages, that we could not sent the data, they told us to sent, the error message might help them debug it. In case someone doesn't care that they can't sent the majority of they are metrics ... maybe they should not ask us to do it ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read my original comment again - I don't want to totally suppress the messages, just aggregate them by default. If they run k6 with verbose mode enabled, users will see the original error messages, however many there are. We can even suggest turning on the verbose mode in the aggregated error message. Or we can show the first 5 or 10 error messages and just mention that there are 1229 more 😄

I'm fine with a lot of different variants, as long as we don't bury the user in thousands of (likely similar) error messages on the console every second by default, which is terrible UX.

stats/datadog/collector.go Outdated Show resolved Hide resolved
stats/statsd/common/config.go Outdated Show resolved Hide resolved
stats/statsd/common/config.go Outdated Show resolved Hide resolved
@@ -8,6 +8,18 @@ You can now specify a file for all things logged by `console.log` to get written

Thanks to @cheesedosa for both proposing and implementing this!

### New result outputs: statsd and DataDog (#915)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for slightly expanded release notes, with some minor typos and formatting fixes:

### New result outputs: StatsD and Datadog (#915)

You can now output any metrics k6 collects to StatsD or Datadog by running `k6 run --out statsd script.js` or `k6 run --out datadog script.js` respectively. Both are very similar, but Datadog has a concept of metric tags, the key-value metadata pairs that will allow you to distinguish between requests for different URLs, response statuses, different groups, etc.

Some details:
- By default both outputs send metrics to a local agent listening on `localhost:8125` (currently only UDP is supported as a transport). You can change this address via the `K6_DATADOG_ADDR` or `K6_STATSD_ADDR` environment variables, by setting their values in the format of `address:port`.
- The new outputs also support adding a `namespace` - a prefix before all the metric names. You can set it via the `K6_DATADOG_NAMESPACE` or `K6_STATSD_NAMESPACE` environment variables respectively. Its default value is `k6.` - notice the dot at the end.
- You can configure how often data batches are sent via the  `K6_STATSD_PUSH_INTERVAL` / `K6_DATADOG_PUSH_INTEVAL` environment variables. The default value is `1s`.
- Another performance tweak can be done by changing the default buffer size of 20 through `K6_STATSD_BUFFER_SIZE` / `K6_DATADOG_BUFFER_SIZE`.
- In the case of Datadog, there is an additional configuration `K6_DATADOG_TAG_WHITELIST`, which by default is equal to `status,method,group`. This is a comma separated list of tags that should be sent to Datadog. All other metric tags that k6 emits are discarded. This is done because Datadog does indexing on top of the tags and some highly variable tags like `vu` and `iter` will lead to problems with the service.

rendered:

New result outputs: StatsD and Datadog (#915)

You can now output any metrics k6 collects to StatsD or Datadog by running k6 run --out statsd script.js or k6 run --out datadog script.js respectively. Both are very similar, but Datadog has a concept of metric tags, the key-value metadata pairs that will allow you to distinguish between requests for different URLs, response statuses, different groups, etc.

Some details:

  • By default both outputs send metrics to a local agent listening on localhost:8125 (currently only UDP is supported as a transport). You can change this address via the K6_DATADOG_ADDR or K6_STATSD_ADDR environment variables, by setting their values in the format of address:port.
  • The new outputs also support adding a namespace - a prefix before all the metric names. You can set it via the K6_DATADOG_NAMESPACE or K6_STATSD_NAMESPACE environment variables respectively. Its default value is k6. - notice the dot at the end.
  • You can configure how often data batches are sent via the K6_STATSD_PUSH_INTERVAL / K6_DATADOG_PUSH_INTEVAL environment variables. The default value is 1s.
  • Another performance tweak can be done by changing the default buffer size of 20 through K6_STATSD_BUFFER_SIZE / K6_DATADOG_BUFFER_SIZE.
  • In the case of Datadog, there is an additional configuration K6_DATADOG_TAG_WHITELIST, which by default is equal to status,method,group. This is a comma separated list of tags that should be sent to Datadog. All other metric tags that k6 emits are discarded. This is done because Datadog does indexing on top of the tags and some highly variable tags like vu and iter will lead to problems with the service.

Copy link
Member

@na-- na-- Feb 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I still think we should change the default value to be the same as infuxdb's (i.e. all default system tags, minus the ones marked as tagsAsFields (i.e. the most variable ones))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like @robingustafsson to weight in on this - I personally have no real opinion apart from at some point @ivoreis decided those are fine I don't know if they've discussed it of github for example

@na-- na-- mentioned this pull request Feb 1, 2019
Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants