Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test: TestConsumerSeekByTimeOnPartitionedTopic #971

Open
BewareMyPower opened this issue Feb 28, 2023 · 1 comment
Open

Flaky test: TestConsumerSeekByTimeOnPartitionedTopic #971

BewareMyPower opened this issue Feb 28, 2023 · 1 comment
Assignees

Comments

@BewareMyPower
Copy link
Contributor

See example failure

logs:
logs.txt

stacks:
stacks.txt

@shibd
Copy link
Member

shibd commented Mar 9, 2023

The root cause is that the message after seek may be cleaned up when cleaning messageCh. Causes messages loss, causing the test to be blocked in the Receive method

msg, err := consumer.Receive(ctx)

In short, when calls SeekByTime sub-consumers success, sub-consumers will send messages to messageCh. So, the message after seeking may be cleaned up, and finally, lose these messages.

func (c *consumer) SeekByTime(time time.Time) error {
c.Lock()
defer c.Unlock()
var errs error
// run SeekByTime on every partition of topic
for _, cons := range c.consumers {
if err := cons.SeekByTime(time); err != nil {
msg := fmt.Sprintf("unable to SeekByTime for topic=%s subscription=%s", c.topic, c.Subscription())
errs = pkgerrors.Wrap(newError(SeekFailed, err.Error()), msg)
}
}
// clear messageCh
for len(c.messageCh) > 0 {
<-c.messageCh
}
return errs
}

Refer logs: hello-0 and hello-99 are messages after seek.

time="2023-03-09T21:15:26+08:00" level=info msg="+++ clear messageCh: 10" topic="persistent://public/default/my-topic-432510000"
time="2023-03-09T21:15:26+08:00" level=info msg="+++ clear messages: hello-890 publish time: <nil>" topic="persistent://public/default/my-topic-432510000"
time="2023-03-09T21:15:26+08:00" level=info msg="+++ clear messages: hello-891 publish time: <nil>" topic="persistent://public/default/my-topic-432510000"
time="2023-03-09T21:15:26+08:00" level=info msg="+++ clear messages: hello-892 publish time: <nil>" topic="persistent://public/default/my-topic-432510000"
time="2023-03-09T21:15:26+08:00" level=info msg="+++ clear messages: hello-893 publish time: <nil>" topic="persistent://public/default/my-topic-432510000"
time="2023-03-09T21:15:26+08:00" level=info msg="+++ clear messages: hello-0 publish time: <nil>" topic="persistent://public/default/my-topic-432510000"
time="2023-03-09T21:15:26+08:00" level=info msg="+++ clear messages: hello-99 publish time: <nil>" topic="persistent://public/default/my-topic-432510000"

This issue exists the java client, about more information, refer to PIP: apache/pulsar#16757

This flaky test is not too easy to happen. I think we can wait for this PIP to finish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants