-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-15755: LeaveGroupResponse v0 - v2 loses its member under certain error conditions #14635
Conversation
KIP-848 introduced this check, but we have seen with both Sarama and Librdkafka clients when issuing LeaveGroup and getting an error, the single member gets lost somewhere in the broker between the request and the response. instead of raising an exception when there are 0 members in the response, let it pass like it did before KIP-848.
@wolfchimneyrock Thanks for the PR. I am not sure to fully understand the issue yet. Could you please elaborate a bit more about the condition leading to it? |
there is more context here: |
I'm not sure I can elaborate on what the underlying error response that would have been sent with the LeaveGroup is, since the current code raises that exception without printing the topLevelError also it is very rare, out of around 800k LeaveGroup requests per week since our update to broker 3.4.1 I have seen this 400 times. |
Here is a broker trace:
I see that the request has |
Thanks @wolfchimneyrock. We will take a look into it. |
@wolfchimneyrock Thanks again for the PR. We verified and your suggestion makes sense. I have two asks:
|
sure, I've applied for an apache JIRA account, and I'll start on a unit test. |
6699b81
to
afb2679
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wolfchimneyrock Thanks for the PR! I left a few comments for consideration.
@@ -62,7 +62,7 @@ public LeaveGroupResponse(LeaveGroupResponseData data, short version) { | |||
if (version >= 3) { | |||
this.data = data; | |||
} else { | |||
if (data.members().size() != 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking a bit more into this issue. I think that we could harden a bit the logic because we could effectively get no members only when there is a top level error. How about this?
if (data.errorCode() != Errors.NONE.code()) {
this.data = new LeaveGroupResponseData().setErrorCode(data.errorCode());
} else {
if (data.members().size() != 1) {
throw new UnsupportedVersionException("LeaveGroup response version " + version +
" can only contain one member, got " + data.members().size() + " members.");
}
this.data = new LeaveGroupResponseData().setErrorCode(data.members().get(0).errorCode());
}
@dongnuo123 What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to use
if (data.errorCode() != Errors.NONE.code() && data.members().size() == 0) {
this.data = new LeaveGroupResponseData().setErrorCode(data.errorCode());
} else {...
I can't think of a situation where the top level error code is set and the response has multiple members but just to make sure.
@@ -165,4 +167,33 @@ public void testEqualityWithMemberResponses() { | |||
assertEquals(primaryResponse.hashCode(), reversedResponse.hashCode()); | |||
} | |||
} | |||
|
|||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You could use the following to parameterized the test.
@ParameterizedTest
@ApiKeyVersionsSource(apiKey = ApiKeys.LEAVE_GROUP)
@wolfchimneyrock Would you have time to address my comments in the coming days? |
yes, though I thought you were waiting on comment from @dongnuo123 |
@wolfchimneyrock Understood. If you agree with my suggestion, I think that we could just do it. Sorry for the confusion. |
657e365
to
87dd4c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wolfchimneyrock Thanks for the update! I left a few nits for consideration.
|
||
if (version < 3) { | ||
assertThrows(UnsupportedVersionException.class, | ||
() -> new LeaveGroupResponse(data, version)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could we indent this line with 4 spaces in order to remain consistent with the existing code?
() -> new LeaveGroupResponse(data, version)); | ||
} else { | ||
LeaveGroupResponse response = new LeaveGroupResponse(data, version); | ||
assertEquals(Errors.NONE, response.topLevelError()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should we also assert the members here?
|
||
if (version < 3) { | ||
assertThrows(UnsupportedVersionException.class, | ||
() -> new LeaveGroupResponse(data, version)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could we also intend this one with 4 spaces?
() -> new LeaveGroupResponse(data, version)); | ||
} else { | ||
LeaveGroupResponse response = new LeaveGroupResponse(data, version); | ||
assertEquals(Errors.NONE, response.topLevelError()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should we also assert members here?
looks like all the jenkins agents ran out of disk space. After your comments, the tests still pass locally for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! thanks for the patch, @wolfchimneyrock. Waiting on the build to merge it.
I cannot get a clean build for this one. @wolfchimneyrock Could you please merge trunk to your branch? |
@wolfchimneyrock I just did it. Let's see how the next build goes. |
…n error conditions (#14635) This patch fixes a bug in the LeaveGroupResponse construction. Basically, when a top level error is set, no members are expected but the current check always requires one for versions prior to version 3. Reviewers: David Jacot <djacot@confluent.io> (cherry picked from commit 3fd6293)
…n error conditions (#14635) This patch fixes a bug in the LeaveGroupResponse construction. Basically, when a top level error is set, no members are expected but the current check always requires one for versions prior to version 3. Reviewers: David Jacot <djacot@confluent.io> (cherry picked from commit 3fd6293)
…n error conditions (#14635) This patch fixes a bug in the LeaveGroupResponse construction. Basically, when a top level error is set, no members are expected but the current check always requires one for versions prior to version 3. Reviewers: David Jacot <djacot@confluent.io> (cherry picked from commit 3fd6293)
Merged to trunk, 3.6, 3.5 and 3.4. |
…n error conditions (apache#14635) This patch fixes a bug in the LeaveGroupResponse construction. Basically, when a top level error is set, no members are expected but the current check always requires one for versions prior to version 3. Reviewers: David Jacot <djacot@confluent.io> (cherry picked from commit 3fd6293)
…n error conditions (apache#14635) This patch fixes a bug in the LeaveGroupResponse construction. Basically, when a top level error is set, no members are expected but the current check always requires one for versions prior to version 3. Reviewers: David Jacot <djacot@confluent.io>
…n error conditions (apache#14635) This patch fixes a bug in the LeaveGroupResponse construction. Basically, when a top level error is set, no members are expected but the current check always requires one for versions prior to version 3. Reviewers: David Jacot <djacot@confluent.io>
…n error conditions (apache#14635) This patch fixes a bug in the LeaveGroupResponse construction. Basically, when a top level error is set, no members are expected but the current check always requires one for versions prior to version 3. Reviewers: David Jacot <djacot@confluent.io>
KAFKA-14367 introduced this check, but we have seen since upgrading to broker 3.4.1 with both Sarama and Librdkafka clients when issuing LeaveGroup and getting an error, the single member gets lost somewhere in the broker between the request and the response.
instead of raising an exception when there are 0 members in the response, let it pass like it did before KAFKA-14367.
More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.
Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.
Committer Checklist (excluded from commit message)