Fix zero-filled and wrongly emitted metrics #710

na-- · 2018-07-12T08:32:46Z

The fix is not to emit metric samples when a VU's context is canceled. This should fix #708, but it still needs unit tests and more checks in general. For example, are there some metrics we actually want to emit even from VUs that are canceled?

This should fix #708

na-- · 2018-07-12T08:39:14Z

Here's a specific example where I'm not sure if we should "fix" the old behavior. That's the metric for data_sent, data_received and iteration_duration that's emitted at the end of an iteration. If a VU is canceled halfway through its final iteration, don't we want those metrics?

codecov-io · 2018-07-12T08:44:27Z

Codecov Report

Merging #710 into master will decrease coverage by <.01%.
The diff coverage is 46.96%.

@@            Coverage Diff             @@
##           master     #710      +/-   ##
==========================================
- Coverage   64.39%   64.39%   -0.01%     
==========================================
  Files         101      101              
  Lines        8277     8302      +25     
==========================================
+ Hits         5330     5346      +16     
- Misses       2599     2608       +9     
  Partials      348      348

Impacted Files	Coverage Δ
stats/stats.go	`55.5% <0%> (-1.85%)`	⬇️
lib/netext/dialer.go	`36.23% <0%> (-1.65%)`	⬇️
js/modules/k6/ws/ws.go	`72.13% <100%> (+0.7%)`	⬆️
js/runner.go	`79.52% <100%> (+0.6%)`	⬆️
js/modules/k6/k6.go	`88.09% <100%> (+0.14%)`	⬆️
js/modules/k6/http/http_request.go	`80% <100%> (ø)`	⬆️
js/modules/k6/metrics/metrics.go	`93.33% <100%> (ø)`	⬆️
core/local/local.go	`77.89% <71.42%> (-1.22%)`	⬇️
core/engine.go	`91.94% <0%> (+2.36%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4875fb...48e84ce. Read the comment docs.

luizbafilho · 2018-07-12T16:22:09Z

In my opinion, it is ok to lose a single iteration metrics.

na-- · 2018-07-12T17:00:42Z

Maybe, but it won't be just a single iteration, it will be one iteration for every VU scaled down, so anywhere from 0 to hundreds. Also, imagine that when a VU's context is canceled in the middle of its iteration due to scaling the number of VUs down, some of the HTTP requests in that iteration have already been completed and have their metrics sent to the engine. So we might want to send at least data_sent and data_received, and maybe even the iteration duration...

luizbafilho · 2018-07-12T17:05:08Z

Yeah, but hundreds from many thousands is not that much. I fear that to accomplish not losing a single metric, will lead to another huge refactoring.

na-- · 2018-07-12T17:12:45Z

No, in this case the fix is very simple, for metrics we want to preserve even when a VU's context is canceled, I just leave them the old way instead of using the helper function I did in this PR. That is, revert to using state.Samples <- someSampleContainer instead of the new stats.PushIfNotCancelled(ctx, state.Samples, someSampleContainer). In essence - un-"fix" them 😄

robingustafsson · 2018-07-16T10:58:56Z

We should definitely not loose data on the number of requests sent, as well as data sent and received as we use those in the cloud execution functionality (with IP address info) to track what systems we're hitting with traffic and how much, for abuse prevention and audit trails.

na-- · 2018-07-16T12:01:31Z

Hmm after thinking about this a bit more, I realized that if we send the data_sent, data_received metrics, with the current implementation we'll also send the iteration_duration, which we definitely shouldn't do for VUs stopped in the middle of an iteration, because it could quite heavily skew the rest of the iteration_duration stats down... So either we leave it as currently is in this pull request (don't send any metric samples once a VU/iteration is cancelled) or I can add an extra parameter to netext/Dialer.GetTrail() to exclude the iteration_duration metric.

na-- · 2018-07-17T09:39:21Z

While finishing this up and writing the test, I realized that with the real-time metrics I'd also inadvertently reverted the decision we made in #652... That is, now even unfinished iterations emit an iterations metric 😑 ... I'll fix that as well and also test for it.

na-- · 2018-07-17T09:53:34Z

Slight correction to the previous statement. Things works as expected when duration is specified, any metrics emitted after the specified duration are discarded, including iterations. The problem with iterations happens when there are stages that scale down the number of VUs. The VU iterations that are canceled in the middle of their execution (bit still before the actual test ends) due to that scaling down emit the unexpected iterations metric.

na-- · 2018-07-17T12:54:55Z

With the latest commit this should hopefully be done. I'll take another look tomorrow just in case, and at least another pair of eyes would be very helpful, but I think that I've fixed all of the issues that I know of. And not only fixed the bugs, but with this patch the metrics emission when scaling down VUs would actually be a lot better than before we had real-time metrics.

robingustafsson

LGTM

Do not emit metric samples when a VU's context is canceled.

5fcfde1

This should fix #708

na-- requested review from luizbafilho and robingustafsson July 12, 2018 08:32

na-- added 2 commits July 17, 2018 14:26

Merge branch 'master' into rt-samples-fix

a149cdb

Fix the remaining issues from the real-time metrics refactoring

48e84ce

na-- changed the title ~~[WIP] Fix zero-filled metrics~~ Fix zero-filled and wrongly emitted metrics Jul 17, 2018

robingustafsson approved these changes Jul 22, 2018

View reviewed changes

na-- merged commit 9531175 into master Jul 23, 2018

na-- deleted the rt-samples-fix branch July 23, 2018 05:35

na-- mentioned this pull request Oct 3, 2018

Do not send interrupted iteartions' duration to the cloud ingest API #795

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix zero-filled and wrongly emitted metrics #710

Fix zero-filled and wrongly emitted metrics #710

na-- commented Jul 12, 2018

na-- commented Jul 12, 2018 •

edited

Loading

codecov-io commented Jul 12, 2018 •

edited

Loading

luizbafilho commented Jul 12, 2018

na-- commented Jul 12, 2018

luizbafilho commented Jul 12, 2018

na-- commented Jul 12, 2018

robingustafsson commented Jul 16, 2018

na-- commented Jul 16, 2018

na-- commented Jul 17, 2018

na-- commented Jul 17, 2018

na-- commented Jul 17, 2018

robingustafsson left a comment

Fix zero-filled and wrongly emitted metrics #710

Fix zero-filled and wrongly emitted metrics #710

Conversation

na-- commented Jul 12, 2018

na-- commented Jul 12, 2018 • edited Loading

codecov-io commented Jul 12, 2018 • edited Loading

Codecov Report

luizbafilho commented Jul 12, 2018

na-- commented Jul 12, 2018

luizbafilho commented Jul 12, 2018

na-- commented Jul 12, 2018

robingustafsson commented Jul 16, 2018

na-- commented Jul 16, 2018

na-- commented Jul 17, 2018

na-- commented Jul 17, 2018

na-- commented Jul 17, 2018

robingustafsson left a comment

Choose a reason for hiding this comment

na-- commented Jul 12, 2018 •

edited

Loading

codecov-io commented Jul 12, 2018 •

edited

Loading