Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-10251: increase timeout for consumeing records #10228

Merged
merged 1 commit into from
Mar 2, 2021

Conversation

showuon
Copy link
Contributor

@showuon showuon commented Feb 27, 2021

We need to wait for the translation state changed to READY to start consuming the records. But we didn't have any way to change the transationManager state in client, so we can just wait. I've confirmed that if we try another time, we can pass the tests. My test code is like this:

var isFailed = false
    try {
      pollRecordsUntilTrue(consumer, pollAction,
        waitTimeMs = waitTimeMs,
        msg = s"Consumed ${records.size} records before timeout instead of the expected $numRecords records")
    } catch {
      case e: AssertionFailedError => {
        isFailed = true
        System.err.println(s"Consumed ${records.size} records before timeout instead of the expected $numRecords records")
      }
    }

    if (isFailed) {
      pollRecordsUntilTrue(consumer, pollAction,
        waitTimeMs = 30000,
        msg = s"Consumed ${records.size} records before timeout instead of the expected $numRecords records")

      // if we go to this step, it means it passed in 2nd try
      fail("failed at 1st try")
    }

And they failed with failed at 1st try, which confirmed that we can pass the tests by increasing the timeout. Currently, we wait 15 seconds, increase to 30 seconds. Thanks.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@showuon
Copy link
Contributor Author

showuon commented Feb 27, 2021

@abbccdda @ableegoldman @mjsax , could you help review this PR? Thanks.

Copy link
Contributor

@ableegoldman ableegoldman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not that familiar with this test but bumping the timeout from 15s to 30s sounds reasonable regardless.

Were you able to reproduce this test failure on trunk locally?

@showuon
Copy link
Contributor Author

showuon commented Mar 2, 2021

@ableegoldman , to investigate this flaky test, I have a drafted PR, to output some debug log on jenkins build. This is what I mentioned in the PR description: failed at 1st try

https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-9724/148/testReport/junit/kafka.api/TransactionsBounceTest/Build___JDK_15___testWithGroupMetadata__/

So, I cannot reproduce it on my local env, instead, I reproduce it on jenkins env.

Thank you.

@ableegoldman
Copy link
Contributor

Three unrelated failures:

Build / JDK 15 / org.apache.kafka.connect.integration.RebalanceSourceConnectorsIntegrationTest.testMultipleWorkersRejoining
Build / JDK 15 / kafka.api.AuthorizerIntegrationTest.testFetchFollowerRequest()
Build / JDK 15 / kafka.server.ListOffsetsRequestTest.testResponseIncludesLeaderEpoch()

@ableegoldman
Copy link
Contributor

Since this test is failing pretty regularly, I think we should go ahead and merge this now. A 30s timeout is pretty standard and if there is a real issue, just bumping the timeout from 15s to 30s should not cover it up. We can reopen the ticket if new failures are seen

@ableegoldman ableegoldman merged commit 420162d into apache:trunk Mar 2, 2021
ijuma added a commit to ijuma/kafka that referenced this pull request Mar 2, 2021
* apache-github/trunk: (37 commits)
  KAFKA-10357: Extract setup of changelog from Streams partition assignor (apache#10163)
  KAFKA-10251: increase timeout for consuming records (apache#10228)
  KAFKA-12394; Return `TOPIC_AUTHORIZATION_FAILED` in delete topic response if no describe permission (apache#10223)
  MINOR: Disable transactional/idempotent system tests for Raft quorums (apache#10224)
  KAFKA-10766: Unit test cases for RocksDBRangeIterator (apache#9717)
  KAFKA-12289: Adding test cases for prefix scan in InMemoryKeyValueStore (apache#10052)
  KAFKA-12268: Implement task idling semantics via currentLag API (apache#10137)
  MINOR: Time and log producer state recovery phases (apache#10241)
  MINOR: correct the error message of validating uint32 (apache#10193)
  MINOR: Format the revoking active log output in `StreamsPartitionAssignor` (apache#10242)
  KAFKA-12323 Follow-up: Refactor the unit test a bit (apache#10205)
  MINOR: Remove stack trace of the lock exception in a debug log4j (apache#10231)
  MINOR: Word count should account for extra whitespaces between words (apache#10229)
  MINOR; Small refactor in `GroupMetadata` (apache#10236)
  KAFKA-10340: Proactively close producer when cancelling source tasks (apache#10016)
  KAFKA-12329; kafka-reassign-partitions command should give a better error message when a topic does not exist (apache#10141)
  KAFKA-12254: Ensure MM2 creates topics with source topic configs (apache#10217)
  MINOR: fix kafka-metadata-shell.sh (apache#10226)
  KAFKA-12374: Add missing config sasl.mechanism.controller.protocol (apache#10199)
  KAFKA-10101: Fix edge cases in Log.recoverLog and LogManager.loadLogs (apache#8812)
  ...
@showuon showuon deleted the KAFKA-10251 branch March 16, 2021 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants