-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add InterruptedIterations metric #1769
Conversation
Some UI questions:
and the metric in the summary like this:
|
lib/executor/helpers.go
Outdated
// TODO: Use enum? | ||
// TODO: How to distinguish interruptions from signal (^C)? | ||
// tags["cause"] = "duration" | ||
// state.Samples <- stats.Sample{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where we would handle the 3rd scenario of executors interrupting the iteration, and while it worked in manual tests and TestMetricsEmission
, it would cause other tests to hang (presumably because they were setting a nil
channel), and I'm not sure if it wouldn't be racy, so I commented it out for now.
Is this the right place/approach, or would we have to receive two contexts here or something more complicated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Digging through the context chains we're using, this shouldn't be racy since the Samples
channel is closed only when the globalCtx
is done, which happens after all metrics processing is finished, so it should be usable even if this VU "run context" is done. Yet some of our tests use the same context for both, e.g.:
This explains why it works in real-world tests but not in the test suite. I'll see if I can fix the tests then.
This avoids outputting "GoError" as mentioned in #877 (comment)
This is definitely racy and causes hanging in tests, so it needs a different approach.
This isn't racy in real world tests since the Samples channel isn't closed until the global context is done.
9697bab
to
c12f449
Compare
Codecov Report
@@ Coverage Diff @@
## master #1769 +/- ##
==========================================
+ Coverage 71.47% 71.48% +0.01%
==========================================
Files 178 178
Lines 13777 13829 +52
==========================================
+ Hits 9847 9886 +39
- Misses 3317 3329 +12
- Partials 613 614 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@@ -288,6 +288,14 @@ func (e *Engine) processMetrics(globalCtx context.Context, processMetricsAfterRu | |||
if !e.NoThresholds { | |||
e.processThresholds() | |||
} | |||
if iim, ok := e.Metrics[metrics.InterruptedIterations.Name]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This map lookup is racy according to the current TestSentReceivedMetrics
failure. So I guess we'll need to rethink or abandon this hack...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure this breaks any output as you are only emitting the 1
to them ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends on how this metric is agreggated. If their system allows it they could factor it in with the iterations
metric and get a percentage that way, which is kind of what we're doing here for the summary.
It makes no sense to me to emit the 0
value of this metric considering it's equivalent to iterations
, and it would add a considerable amount of output data and affect the performance. A 5s/5VU test with an empty default
and no sleep does 2115298 iterations on my machine and generates a 1.5 GB JSON file when emitting the 0
value. The same test without it manages to do 3203002 iterations and generate a 1.9 GB JSON file, but this is because it's doing ~50% more iterations.
Considering we already have complaints about outputting too much data and interest in #1321, I think we should rather work on #1321 before we consider adding another default metric.
Though we should probably rethink duplicating iterations
...
How about instead of a new metric, we simply reused iterations
and added an interrupted: "<cause>"
tag to it? If the tag is omitted it's a complete iteration and if it's set then it was interrupted with the specific cause. WYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with prioritizing #1321, but that will actually probably require quite the refactoring, IMO even for an MVP.
The reusage of the iterations was discussed (I think) and we decided against it based on the fact the primary usage for this will be to check that a percentage of the iterations aren't actually interrupted, which with the current thresholds isn't possible if the metric isn't rate... so :(.
I think that maybe we should just make another PR with the changes that are in this PR (the test fixes and the lib.Exception) and merge that to both have them and make this PR easier to rebase and then work on #1321 so we can merge this then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reusage of the iterations was discussed (I think) and we decided against it based on the fact the primary usage for this will be to check that a percentage of the iterations aren't actually interrupted, which with the current thresholds isn't possible if the metric isn't rate
Then maybe that should be a feature? I'll see what I can do.
@@ -348,7 +348,7 @@ func TestBind(t *testing.T) { | |||
}}, | |||
{"Error", bridgeTestErrorType{}, func(t *testing.T, obj interface{}, rt *goja.Runtime) { | |||
_, err := RunString(rt, `obj.error()`) | |||
assert.Contains(t, err.Error(), "GoError: error") | |||
assert.Contains(t, err.Error(), "error") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be moved to another PR that we can merge today and make this ... more focused ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #1775.
lib/executor/helpers.go
Outdated
tags["cause"] = er.Cause() | ||
} | ||
logger.Error(err.Error()) | ||
case fmt.Stringer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This literally was worded as fmt.Stringer
because I didn't want to add one more place where lib and lib/executor specifically depend on goja. So I think it is better if we don't actually start using it and also you need to fix the fact that now exceptions don't have "source=stacktrace" ;)
I do think now is a good time to actually add and error type (under lib), probably just an interface "Exception", that is implemented in the js
package and wraps goja.Exception so that we don't use this directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hhmm sure, I'll give that a try, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I managed to remove the goja dependency in d52efc4, but I'm not happy with how it turned out...
I wanted to treat all errors and exceptions consistently, avoid logging in runFn()
and delegate that to getIterationRunner()
, but this took an embarassing amount of trial/error and I'm still not sure all errors will be handled correctly. :-/
Some notes:
- I didn't think making
lib.Exception
an interface was needed, as it's generic enough to have a single implementation. - This needs a lot more tests to ensure errors are rendered properly.
- The
source=stacktrace
logger field doesn't make sense to me. If anything the source should be the same as the "cause" value, as "stacktrace" isn't really a "source". - Should all errors output the full stack trace? I was trying to keep it consistent with
master
, but maybe errors fromfail()
should only output a short single-line trace with the last stack frame.
This was motivated by wanting to remove the goja dependency from the lib/executor package, see #1769 (comment) The idea is for runFn to always return a lib.Exception error that can handle all JS errors and Go panics consistently. In practice I'm not sure if this hack is worth it as it might mess up handling of some errors...
We didn't merge this as it was because we realized we needed #1321 (or something like it) for this PR to not introduce performance problems. At this point it has so many merge conflicts that it'd be easier to start from scratch, so I'll close it to reduce noise. |
These are partial changes for #877, but I'm creating a draft PR to clear up a few things and get some feedback. It's probably easier to view this commit by commit.
Currently only scenarios 1 and 2 from this comment are working (
cause: error
andcause: fail
).cause: interrupted
is commented out as it needs a different approach, and I'm not sure how to handle the last^C
scenario (cause: signal
would make sense for that).I settled on using a
Rate
metric forinterrupted_iterations
, but opted to hack around actually emitting the0
value in order to avoid an explosion of metric data, as this would be a duplicate ofiterations
, and given #1321 is still open we shouldn't exacerbate this problem. See e2b88ab.