Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix zero-filled and wrongly emitted metrics #710

Merged
merged 3 commits into from
Jul 23, 2018
Merged

Fix zero-filled and wrongly emitted metrics #710

merged 3 commits into from
Jul 23, 2018

Conversation

na--
Copy link
Member

@na-- na-- commented Jul 12, 2018

The fix is not to emit metric samples when a VU's context is canceled. This should fix #708, but it still needs unit tests and more checks in general. For example, are there some metrics we actually want to emit even from VUs that are canceled?

@na--
Copy link
Member Author

na-- commented Jul 12, 2018

Here's a specific example where I'm not sure if we should "fix" the old behavior. That's the metric for data_sent, data_received and iteration_duration that's emitted at the end of an iteration. If a VU is canceled halfway through its final iteration, don't we want those metrics?

@codecov-io
Copy link

codecov-io commented Jul 12, 2018

Codecov Report

Merging #710 into master will decrease coverage by <.01%.
The diff coverage is 46.96%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #710      +/-   ##
==========================================
- Coverage   64.39%   64.39%   -0.01%     
==========================================
  Files         101      101              
  Lines        8277     8302      +25     
==========================================
+ Hits         5330     5346      +16     
- Misses       2599     2608       +9     
  Partials      348      348
Impacted Files Coverage Δ
stats/stats.go 55.5% <0%> (-1.85%) ⬇️
lib/netext/dialer.go 36.23% <0%> (-1.65%) ⬇️
js/modules/k6/ws/ws.go 72.13% <100%> (+0.7%) ⬆️
js/runner.go 79.52% <100%> (+0.6%) ⬆️
js/modules/k6/k6.go 88.09% <100%> (+0.14%) ⬆️
js/modules/k6/http/http_request.go 80% <100%> (ø) ⬆️
js/modules/k6/metrics/metrics.go 93.33% <100%> (ø) ⬆️
core/local/local.go 77.89% <71.42%> (-1.22%) ⬇️
core/engine.go 91.94% <0%> (+2.36%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4875fb...48e84ce. Read the comment docs.

@luizbafilho
Copy link
Contributor

In my opinion, it is ok to lose a single iteration metrics.

@na--
Copy link
Member Author

na-- commented Jul 12, 2018

Maybe, but it won't be just a single iteration, it will be one iteration for every VU scaled down, so anywhere from 0 to hundreds. Also, imagine that when a VU's context is canceled in the middle of its iteration due to scaling the number of VUs down, some of the HTTP requests in that iteration have already been completed and have their metrics sent to the engine. So we might want to send at least data_sent and data_received, and maybe even the iteration duration...

@luizbafilho
Copy link
Contributor

Yeah, but hundreds from many thousands is not that much. I fear that to accomplish not losing a single metric, will lead to another huge refactoring.

@na--
Copy link
Member Author

na-- commented Jul 12, 2018

No, in this case the fix is very simple, for metrics we want to preserve even when a VU's context is canceled, I just leave them the old way instead of using the helper function I did in this PR. That is, revert to using state.Samples <- someSampleContainer instead of the new stats.PushIfNotCancelled(ctx, state.Samples, someSampleContainer). In essence - un-"fix" them 😄

@robingustafsson
Copy link
Member

We should definitely not loose data on the number of requests sent, as well as data sent and received as we use those in the cloud execution functionality (with IP address info) to track what systems we're hitting with traffic and how much, for abuse prevention and audit trails.

@na--
Copy link
Member Author

na-- commented Jul 16, 2018

Hmm after thinking about this a bit more, I realized that if we send the data_sent, data_received metrics, with the current implementation we'll also send the iteration_duration, which we definitely shouldn't do for VUs stopped in the middle of an iteration, because it could quite heavily skew the rest of the iteration_duration stats down... So either we leave it as currently is in this pull request (don't send any metric samples once a VU/iteration is cancelled) or I can add an extra parameter to netext/Dialer.GetTrail() to exclude the iteration_duration metric.

@na--
Copy link
Member Author

na-- commented Jul 17, 2018

While finishing this up and writing the test, I realized that with the real-time metrics I'd also inadvertently reverted the decision we made in #652... That is, now even unfinished iterations emit an iterations metric 😑 ... I'll fix that as well and also test for it.

@na--
Copy link
Member Author

na-- commented Jul 17, 2018

Slight correction to the previous statement. Things works as expected when duration is specified, any metrics emitted after the specified duration are discarded, including iterations. The problem with iterations happens when there are stages that scale down the number of VUs. The VU iterations that are canceled in the middle of their execution (bit still before the actual test ends) due to that scaling down emit the unexpected iterations metric.

@na-- na-- changed the title [WIP] Fix zero-filled metrics Fix zero-filled and wrongly emitted metrics Jul 17, 2018
@na--
Copy link
Member Author

na-- commented Jul 17, 2018

With the latest commit this should hopefully be done. I'll take another look tomorrow just in case, and at least another pair of eyes would be very helpful, but I think that I've fixed all of the issues that I know of. And not only fixed the bugs, but with this patch the metrics emission when scaling down VUs would actually be a lot better than before we had real-time metrics.

Copy link
Member

@robingustafsson robingustafsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@na-- na-- merged commit 9531175 into master Jul 23, 2018
@na-- na-- deleted the rt-samples-fix branch July 23, 2018 05:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some metrics are returning 0 as minimum value
4 participants