Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LeaveGroup bug #4402

Closed
5 of 7 tasks
wolfchimneyrock opened this issue Aug 22, 2023 · 1 comment · May be fixed by #4403
Closed
5 of 7 tasks

LeaveGroup bug #4402

wolfchimneyrock opened this issue Aug 22, 2023 · 1 comment · May be fixed by #4403

Comments

@wolfchimneyrock
Copy link
Contributor

Description

Upgrading our Kafka Brokers to 3.4.1 we start seeing some UnsupportedVersionExceptions in the broker logs:

[2023-08-21 21:50:47,887] ERROR [KafkaApi-50478] Unexpected error handling request RequestHeader(apiKey=LEAVE_GROUP, apiVersion=1, clientId=rdkafka, correlationId=50, headerVersion=1) -- LeaveGroupRequestData(groupId='<REDACTED>', memberId='rdkafka-72bc6db8-0909-4851-bf7e-514e3cdef376', members=[]) with context RequestContext(header=RequestHeader(apiKey=LEAVE_GROUP, apiVersion=1, clientId=rdkafka, correlationId=50, headerVersion=1), connectionId='<REDACTED>', clientAddress=<REDACTED>, principal=<REDACTED>, listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, clientInformation=ClientInformation(softwareName=confluent-kafka-python, softwareVersion=2.2.0-rdkafka-2.2.0), fromPrivilegedListener=false, principalSerde=Optional[<REDACTED>]) (kafka.server.KafkaApis)
java.util.concurrent.CompletionException: org.apache.kafka.common.errors.UnsupportedVersionException: LeaveGroup response version 1 can only contain one member, got 0 members.
	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315)
	at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320)
	at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:936)
	at java.base/java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:950)
	at java.base/java.util.concurrent.CompletableFuture.handle(CompletableFuture.java:2340)
	at kafka.server.KafkaApis.handleLeaveGroupRequest(KafkaApis.scala:1796)
	at kafka.server.KafkaApis.handle(KafkaApis.scala:196)
	at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.apache.kafka.common.errors.UnsupportedVersionException: LeaveGroup response version 1 can only contain one member, got 0 members.

This is happen relatively infrequently, from what I can tell there is nothing special about the consumer configuration:

config = {
    "bootstrap.servers": BROKER_ENDPOINT,
    "group.id": CONSUMER_GROUP_NAME,
    "enable.partition.eof": False,
}

A similar issue was raised and fixed on Sarama:

IBM/sarama#2486

In which they implemented version 3 of the LeaveGroup protocol. I suspect that the Kafka Broker is no longer concerned about correctly handling LeaveGroup v0 - v1 requests in all cases.

Also, it appears according to the kafka protocol for LeaveGroup that librdkafka is incorrectly parsing the LeaveGroup response for ApiVersion 1: There should be a ThrottleTime int32 before the ErrorCode int16. This is likely the cause of some flaky test results we've experienced.

Checklist

IMPORTANT: We will close issues where the checklist has not been completed.

Please provide the following information:

  • librdkafka version (release number or git tag): 2.2.0
  • Apache Kafka version: 3.4.1
  • librdkafka client configuration
  • Operating system: RHEL7
  • Provide logs (with debug=.. as necessary) from librdkafka
  • Provide broker log excerpts
  • Critical issue
@emasab
Copy link
Collaborator

emasab commented Oct 6, 2023

Hello @wolfchimneyrock
librdkafka is parsing ThrottleTime correctly for version 1, here.

I think you have checked rd_kafka_handle_LeaveGroup
but that code is never used and needs to be removed.

That java exception corresponds to this code. But it doesn't seem correct as in version 1 there are no members to write, not only one.

It should be faster to fix broker side. In LK we're currently focused on the new consumer group protocol that doesn't have LeaveGroup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants