-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance Parent circuit breaker error message #32056
Enhance Parent circuit breaker error message #32056
Conversation
This adds information about either the current real usage (if tracking "real" memory usage) or the child breaker usages to the exception message when the parent circuit breaker trips. The messages now look like: ``` [parent] Data too large, data for [my_request] would be [211288064/201.5mb], which is larger than the limit of [209715200/200mb], usages [request=157286400/150mb, fielddata=54001664/51.5mb, in_flight_requests=0/0b, accounting=0/0b] ``` Or when tracking real memory usage: ``` [parent] Data too large, data for [request] would be [251/251b], which is larger than the limit of [200/200b], real usage: [181/181b], new bytes reserved: [70/70b] ```
Pinging @elastic/es-core-infra |
I noticed a few of these recently and thought having the message show the sub-usages could be useful. |
@elasticmachine test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the new error message is definitely better than before. Thanks! I left a comment / question around the calculation of the value.
throw new CircuitBreakingException(message, totalUsed, parentLimit); | ||
parentLimit + "/" + new ByteSizeValue(parentLimit) + "]"); | ||
if (this.trackRealMemoryUsage) { | ||
final long realUsage = currentMemoryUsage(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling currentMemoryUsage()
again here is tricky because the value is likely different than the one that cause the breaker to trip. To get the correct value, we could calculate it as final long realUsage = totalUsed - newBytesReserved
which would basically revert the calculation in #parentUsed(long)
but that also doesn't feel entirely right. Alternatively, we could also return a more structured representation which returns the memory usage as a dedicated property. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this, but I thought that giving the real memory usage at exception construction time was the least-worst option (as part as plain-text is concerned).
For a structured error, I do like the idea of making this exception more structured, it's just a matter of coming up with the structure.
We currently have:
- bytesWanted
- bytesLimit
How about adding
- currentUsage (we could always retrieve it even if "track real memory" is disabled, just for message purposes)
- breakerUsages - a map of the current breakers to their usages
The only downside for this is that we already have logic for serializing the CircuitBreakingException
, and to make this structured we'd need a new exception (ParentCircuitBreakingException
) since it doesn't make sense to add this additional structure to every circuit breaker trip (or maybe it does? what do you think?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a structured error, I do like the idea of making this exception more structured, it's just a matter of coming up with the structure.
My point was not to change CircuitBreakingException
but rather to have #parentUsed(long)
return a more structured representation, something along the lines of:
private static class ParentMemUsage {
private final long baseUsage;
private final long reservation;
private final long totalUsage;
public ParentMemUsage(long baseUsage, long reservation, long totalUsage) {
// ...
}
// ... getters here ...
}
private long parentUsed(long newBytesReserved) {
if (this.trackRealMemoryUsage) {
long current = currentMemoryUsage();
return new ParentMemUsage(current, newBytesReserved, current + newBytesReserved);
} else {
long parentEstimated = 0;
for (CircuitBreaker breaker : this.breakers.values()) {
parentEstimated += breaker.getUsed() * breaker.getOverhead();
}
return new ParentMemUsage(parentEstimated, newBytesReserved, parentEstimated);
}
}
We could then just call ParentMemUsage#getBaseUsage()
in order to retrieve the original value again in case an exception occurs. A downside of this is that we allocate an additional object but it might be possible that the compiler is able to optimize it away (by inlining #parentUsed(long)
and then eliminating the allocation via escape analysis).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I gave this a shot, let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating! LGTM
* Enhance Parent circuit breaker error message This adds information about either the current real usage (if tracking "real" memory usage) or the child breaker usages to the exception message when the parent circuit breaker trips. The messages now look like: ``` [parent] Data too large, data for [my_request] would be [211288064/201.5mb], which is larger than the limit of [209715200/200mb], usages [request=157286400/150mb, fielddata=54001664/51.5mb, in_flight_requests=0/0b, accounting=0/0b] ``` Or when tracking real memory usage: ``` [parent] Data too large, data for [request] would be [251/251b], which is larger than the limit of [200/200b], real usage: [181/181b], new bytes reserved: [70/70b] ``` * Only call currentMemoryUsage once by returning structured object
* es/6.x: (24 commits) Fix broken backport Switch full-cluster-restart to new style Requests (#32140) Fix multi level nested sort (#32204) MINOR: Remove unused `IndexDynamicSettings` (#32237) (#32248) [Tests] Remove QueryStringQueryBuilderTests#toQuery class assertions (#32236) Switch rolling restart to new style Requests (#32147) Enhance Parent circuit breaker error message (#32056) [ML] Use default request durability for .ml-state index (#32233) Enable testing in FIPS140 JVM (#31666) (#32231) Remove indices stats timeout from monitoring docs TESTS: Check for Netty resource leaks (#31861) (#32225) Rename ranking evaluation response section (#32166) Dependencies: Upgrade to joda time 2.10 (#32160) Backport SSL context names (#30953) to 6.x (#32223) Require Gradle 4.9 as minimum version (#32200) Detect old trial licenses and mimic behaviour (#32209) Painless: Simplify Naming in Lookup Package (#32177) add support for write index resolution when creating/updating documents (#31520) A replica can be promoted and started in one cluster state update (#32042) Rest test - allow for snapshots to take 0 milliseconds ...
* es/master: (23 commits) Switch full-cluster-restart to new style Requests (#32140) [DOCS] Clarified that you must remove X-Pack plugin when upgrading from pre-6.3. (#32016) Remove BouncyCastle dependency from runtime (#32193) INGEST: Extend KV Processor (#31789) (#32232) INGEST: Make a few Processors callable by Painless (#32170) Add region ISO code to GeoIP Ingest plugin (#31669) [Tests] Remove QueryStringQueryBuilderTests#toQuery class assertions (#32236) Make sure that field aliases count towards the total fields limit. (#32222) Switch rolling restart to new style Requests (#32147) muting failing test for internal auto date histogram to avoid failure before fix is merged MINOR: Remove unused `IndexDynamicSettings` (#32237) Fix multi level nested sort (#32204) Enhance Parent circuit breaker error message (#32056) [ML] Use default request durability for .ml-state index (#32233) Remove indices stats timeout from monitoring docs Rename ranking evaluation response section (#32166) Dependencies: Upgrade to joda time 2.10 (#32160) Remove aliases resolution limitations when security is enabled (#31952) Ensure that field aliases cannot be used in multi-fields. (#32219) TESTS: Check for Netty resource leaks (#31861) ...
This adds information about either the current real usage (if tracking "real"
memory usage) or the child breaker usages to the exception message when the
parent circuit breaker trips.
The messages now look like:
Or when tracking real memory usage: