-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSQ: Report the warning directly as an error if none of it is allowed by the user #13198
MSQ: Report the warning directly as an error if none of it is allowed by the user #13198
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @LakshSingla. Left some comments.
msqErrorReport.getFault().getCodeWithMessage() | ||
); | ||
} | ||
if (expectedMSQFaultClass != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we always assert the MSQ fault?
Is there any specific reason to assert on the MSQ class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the unparseable exceptions, the exception message contained nonstandard characters (from the erroneous line) which I didn't want to use in the code file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We change the get message call at line 863 to be a pattern matcher. That way you need not code out the whole string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or change the source test file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This got me thinking, if we have a malformed line, we can have a parseException with x mb's of input.
Multiple such lines would blow up our report.
Should we chomp the input to lets day 300 bytes or something in ParseException#75
cc @jon-wei
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stack traces of innocuous warnings and errors can also cause a similar issue. Should this be taken up as a separate PR? Also, 300 bytes seem less to me, something like 4KB might work better, wdyt? (I think we should be fine with an even larger limit as the number of warnings sent to the controller are limited).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup We can do that outside this PR.
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java
Outdated
Show resolved
Hide resolved
workerError(errorReport1); | ||
break; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also throw an illegal state exception if we donot find the error code in the workerWarnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, or else the controller would fail to stop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking, let the worker impl directly hit worker error endpoint in case the limit is 0 wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying out the changes, that seems like a much better alternative.
ImmutableMap.of(CannotParseExternalDataFault.CODE, maxVerboseParseExceptions), | ||
disallowedWarningCode, | ||
controllerClient, | ||
id(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: id, host, controllerClient should be the first arguments
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/WorkerImpl.java
Outdated
Show resolved
Hide resolved
...uery/src/main/java/org/apache/druid/msq/indexing/error/MSQWarningReportLimiterPublisher.java
Show resolved
Hide resolved
try { | ||
controllerClient.postWorkerError(workerId, MSQErrorReport.fromException(workerId, host, stageNumber, e)); | ||
} | ||
catch (IOException e2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we throw a new RE with the message failed to post the worker error xyz to the controller
. That way the worker will get terminated with a relevant error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: rename from e2 to postException?
final ConcurrentHashMap<String, Long> errorCodeToCurrentCount = new ConcurrentHashMap<>(); | ||
private final MSQWarningReportPublisher delegate; | ||
private final long totalLimit; | ||
private final Map<String, Long> errorCodeToVerboseCountLimit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: errorCodeToLimit seems like a better variable to me.
we really do not need verboseCountLimt as that's a concept outside this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the errorCodeToLimit was slightly misleading. This is because there can be more than those error codes can be present in the controller. However, I do get your point. Maybe I will add clearer javadocs so that this ambiguity isn't present and revert the change.
msqErrorReport.getFault().getCodeWithMessage() | ||
); | ||
} | ||
if (expectedMSQFaultClass != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This got me thinking, if we have a malformed line, we can have a parseException with x mb's of input.
Multiple such lines would blow up our report.
Should we chomp the input to lets day 300 bytes or something in ParseException#75
cc @jon-wei
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments.
LGTM (+1 non binding)
|
||
/** | ||
* Creates a publisher which publishes the warnings to the controller if they have not yet exceeded the allowed limit. | ||
* Moreover, if a warning is disallowed, i.e. it's limit is set to 0, then the publisher directly reports the warning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It would be much better if we could explain each variable. We really do not need to mention the tooManyWarningsFault piece as it's just confusing.
try { | ||
controllerClient.postWorkerError(workerId, MSQErrorReport.fromException(workerId, host, stageNumber, e)); | ||
} | ||
catch (IOException e2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: rename from e2 to postException?
Description
In MSQ, there can be an upper limit to the number of worker warnings. For example, for parseExceptions encountered while parsing the external data, the user can specify an upper limit to the number of parse exceptions that can be allowed before it throws an error of type
TooManyWarnings
.This PR makes it so that if the user disallows warnings of a certain type i.e. the limit is 0 (or is executing in
strict
mode), instead of throwing an error of typeTooManyWarnings
, we can directly surface the warning as the error, saving the user from the hassle of going throw the warning reports.Key changed/added classes in this PR
ControllerImpl
This PR has: