-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/add datadog #915
Feature/add datadog #915
Conversation
There is no reason for them to be separate
Codecov Report
@@ Coverage Diff @@
## master #915 +/- ##
==========================================
+ Coverage 69.82% 70.38% +0.56%
==========================================
Files 112 118 +6
Lines 8823 9088 +265
==========================================
+ Hits 6161 6397 +236
- Misses 2261 2285 +24
- Partials 401 406 +5
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Various minor and not-so-minor issues noted inline. There is also a generic issue that this code doesn't have even a single test...
release notes/upcoming.md
Outdated
Both are very similar but DataDog has a concept of tags. By default both send on `localhost:8125` and currently only UDP is supported as transport. | ||
In order to change this you can use the `K6_DATADOG_ADDR` or `K6_STATSD_ADDR` env variable which has to be in the format of `address:port`. | ||
The new outputs also can add a `namespace` which a prefix before all the samples with `K6_DATADOG_NAMESPACE` or `K6_STATSD_NAMESPACE` respectively. By default the value is `k6.`, notice the dot at the end. | ||
In the case of DataDog there is an additional configuration `K6_DATADOG_TAG_WHITELIST` which by default is equal to `status, method`. This is a comma separated list of tags that should be sent to DataDog. All other tags that k6 emits are discarded. This is done because DataDog does indexing on top of the tags and some highly variable tags like `vu` and `iter` will lead to problems with the service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we should limit the tags only to status
and method
- all of those tags (with the possible exception of url
and maybe name
) seem like good candidates to also be included
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robingustafsson Any opinion on this ? I am agnostic just want to note that people do complain that datadog gets with a lot of tags. We probably won't hit it given that we wouldn't have all that many but still
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can start with this small set of whitelisted tags and then if it turns out users need more by default we can expand later. The only other tags that I would consider useful is the url
and name
that @na-- thinks should be excluded 😄. Given the kind of product that Datadog offers (being a long time customer) I don't believe anyone would really use it as a general purpose load testing result analysis tool. I can see though how you'd use it to store monitoring type of results but then you'd also likely set up custom metrics to track rather than just the default metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robingustafsson I don't think we should exclude both url
and name
, only url
. Since they are either the same, or the user explicitly set name
because url
was too dynamic.
To reduce confusion, I think that the default value for TagWhitelist
should be proto, subproto, status, method, name, group, check, error, tls_version
(i.e. the DefaultSystemTagList
, but without any highly variable things (and maybe with error_code
instead of error
, once that's merged?) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@na-- Ah ok, misread you. I'm not sure we need such an expansive tag whitelist though to start with for Datadog, but I'm not strongly opposed to it either 🙂Yes, error_code
would be preferred over error
, once available, if you ask me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm something else that I realized - I'm not sure if a whitelist is actually the best approach here. It will work for system tags, but any custom user tags will be silently filtered, which isn't the best UX. If I, as a user, explicitly set a certain tag value on an HTTP request, I usually want it and can depend on it later for filtering the data...
In the InfluxDB collector, this is handled by transforming any highly variable tags into (non-indexable) fields. This is great, since user-supplied data won't be lost, while at the same time the InfluxDB server won't be overwhelmed with indexing highly-variable data.
I'm not very familiar with Datadog, but a quick check doesn't reveal any similar functionality there... So if no such functionality exists in Datadog, I wonder whether a blacklist (to block the most highly-variable tags like vu
, iter
, url
, and error
(once error_code
is merged)) will be more user-friendly than the current whitelist. Or whether we need such an option at all, considering that users can just set the emitted tags directly via the global --system-tags
option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About unit tests: not sure how much we can test this, but we should at least have some... even on that tag filtering logic
stats/statsd/common/collector.go
Outdated
for _, entry := range data { | ||
if err := c.dispatch(entry); err != nil { | ||
// No need to return error if just one metric didn't go through | ||
c.logger.WithError(err).Warnf("Error while sending metric %s", entry.Metric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it could be very spammy in some situations... It's better to collect the number of errors and display only a single warning message like "we couldn't dispatch X out of Y metrics to datadog" or something like that. And maybe the individual errors could still be debug
messages, so they're not shown by default, but users can see them with verbose mode enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this will lead to people wondering where their metrics are and it will make it harder to debug. If people complain that it is way too spammy in situations which are okay to not be spammy we can always make it less spammy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Come on, this line can generate literally thousands of warning messages - not very useful. I'm not saying to suppress those errors, just to aggregate them and say "1234 metrics weren't sent" every 1 second, instead of 1234 messages every second...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my argument is that knowing what didn't get sent may help you debug it ... hiding the error behind why we couldn't send something and just telling you we couldn't will lead to questions of "why do I have no metrics?" and "what does 'we couldn't sent your metrics' mean?".
If someone is continuously getting 1234 messages, that we could not sent the data, they told us to sent, the error message might help them debug it. In case someone doesn't care that they can't sent the majority of they are metrics ... maybe they should not ask us to do it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read my original comment again - I don't want to totally suppress the messages, just aggregate them by default. If they run k6 with verbose mode enabled, users will see the original error messages, however many there are. We can even suggest turning on the verbose mode in the aggregated error message. Or we can show the first 5 or 10 error messages and just mention that there are 1229 more 😄
I'm fine with a lot of different variants, as long as we don't bury the user in thousands of (likely similar) error messages on the console every second by default, which is terrible UX.
release notes/upcoming.md
Outdated
@@ -8,6 +8,18 @@ You can now specify a file for all things logged by `console.log` to get written | |||
|
|||
Thanks to @cheesedosa for both proposing and implementing this! | |||
|
|||
### New result outputs: statsd and DataDog (#915) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for slightly expanded release notes, with some minor typos and formatting fixes:
### New result outputs: StatsD and Datadog (#915)
You can now output any metrics k6 collects to StatsD or Datadog by running `k6 run --out statsd script.js` or `k6 run --out datadog script.js` respectively. Both are very similar, but Datadog has a concept of metric tags, the key-value metadata pairs that will allow you to distinguish between requests for different URLs, response statuses, different groups, etc.
Some details:
- By default both outputs send metrics to a local agent listening on `localhost:8125` (currently only UDP is supported as a transport). You can change this address via the `K6_DATADOG_ADDR` or `K6_STATSD_ADDR` environment variables, by setting their values in the format of `address:port`.
- The new outputs also support adding a `namespace` - a prefix before all the metric names. You can set it via the `K6_DATADOG_NAMESPACE` or `K6_STATSD_NAMESPACE` environment variables respectively. Its default value is `k6.` - notice the dot at the end.
- You can configure how often data batches are sent via the `K6_STATSD_PUSH_INTERVAL` / `K6_DATADOG_PUSH_INTEVAL` environment variables. The default value is `1s`.
- Another performance tweak can be done by changing the default buffer size of 20 through `K6_STATSD_BUFFER_SIZE` / `K6_DATADOG_BUFFER_SIZE`.
- In the case of Datadog, there is an additional configuration `K6_DATADOG_TAG_WHITELIST`, which by default is equal to `status,method,group`. This is a comma separated list of tags that should be sent to Datadog. All other metric tags that k6 emits are discarded. This is done because Datadog does indexing on top of the tags and some highly variable tags like `vu` and `iter` will lead to problems with the service.
rendered:
New result outputs: StatsD and Datadog (#915)
You can now output any metrics k6 collects to StatsD or Datadog by running k6 run --out statsd script.js
or k6 run --out datadog script.js
respectively. Both are very similar, but Datadog has a concept of metric tags, the key-value metadata pairs that will allow you to distinguish between requests for different URLs, response statuses, different groups, etc.
Some details:
- By default both outputs send metrics to a local agent listening on
localhost:8125
(currently only UDP is supported as a transport). You can change this address via theK6_DATADOG_ADDR
orK6_STATSD_ADDR
environment variables, by setting their values in the format ofaddress:port
. - The new outputs also support adding a
namespace
- a prefix before all the metric names. You can set it via theK6_DATADOG_NAMESPACE
orK6_STATSD_NAMESPACE
environment variables respectively. Its default value isk6.
- notice the dot at the end. - You can configure how often data batches are sent via the
K6_STATSD_PUSH_INTERVAL
/K6_DATADOG_PUSH_INTEVAL
environment variables. The default value is1s
. - Another performance tweak can be done by changing the default buffer size of 20 through
K6_STATSD_BUFFER_SIZE
/K6_DATADOG_BUFFER_SIZE
. - In the case of Datadog, there is an additional configuration
K6_DATADOG_TAG_WHITELIST
, which by default is equal tostatus,method,group
. This is a comma separated list of tags that should be sent to Datadog. All other metric tags that k6 emits are discarded. This is done because Datadog does indexing on top of the tags and some highly variable tags likevu
anditer
will lead to problems with the service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I still think we should change the default value to be the same as infuxdb's (i.e. all default system tags, minus the ones marked as tagsAsFields
(i.e. the most variable ones))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like @robingustafsson to weight in on this - I personally have no real opinion apart from at some point @ivoreis decided those are fine I don't know if they've discussed it of github for example
07ea5da
to
e38b216
Compare
62dbecc
to
c0a3b1f
Compare
32187b3
to
6cfe4d8
Compare
6c83b5a
to
5244715
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes on top of #893