Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix e2e producer #268

Merged
merged 5 commits into from
Nov 3, 2024
Merged

Conversation

bachmanity1
Copy link
Contributor

image

I've noticed that when working with clusters having several controller and broker nodes, retrieving cluster metadata may occasionally return stale data. This issue is evident when, for example, an end-to-end test topic is created, and the topic metadata is retrieved immediately afterward. In this case, the metadata request may return error code 3, which corresponds to the UNKNOWN_TOPIC_OR_PARTITION error. This error likely occurs when a client connects to a broker that is not yet aware of the new topic because it hasn't consumed the latest data from the __cluster_metadata topic.

In summary, when the end-to-end code attempts to populate the value of s.partitionCount, it might encounter the UNKNOWN_TOPIC_OR_PARTITION error, even if the topic actually exists. This results in a partition count of zero, preventing the producer from sending any data.

if err != nil {
return fmt.Errorf("could not get topic metadata after validation: %w", err)
}
partitions := len(topicMetadata.Topics[0].Partitions)
Copy link
Contributor Author

@bachmanity1 bachmanity1 Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if the err above is nil the topicMetadata.Topics[0].ErrorCode is not guaranteed to be zero.

@bachmanity1
Copy link
Contributor Author

Hi @weeco. Can you please have a look?

@weeco
Copy link
Contributor

weeco commented Aug 8, 2024

@bachmanity1 I'll try to look at your PR soon, but I'm currently very busy with other things that have higher priority. Even though this change is just a few lines, I'll have to dig deeper to see if this could cause other sort of issues before approving & merging the PR.

@bachmanity1
Copy link
Contributor Author

bachmanity1 commented Sep 3, 2024

Hi @weeco,

I wanted to let you know that I've updated this PR. I've realized that my previous approach didn't account for scenarios where the broker count is updated.

I also wanted to discuss the partitionsPerBroker config. I believe it should be clarified that this configuration is only applied when a topic is created and does not take effect when the broker count changes. If the original intent was for this setting to apply even when the broker count changes, we’d need to adjust the topic reconciliation logic accordingly. However, I think that handling cases where the broker count decreases could be quite tricky.

By the way, it seems there’s a bug preventing new partitions from being created when new brokers are added, as reported in #257. Could you take a look at that PR too? Thanks!

@bachmanity1
Copy link
Contributor Author

Hi @weeco, is there any chance to get this reviewed anytime soon?

@weeco
Copy link
Contributor

weeco commented Nov 3, 2024

Thanks for your PR @bachmanity1 and sorry for the long waiting time. I added some minor things but generally LGTM

@weeco weeco self-requested a review November 3, 2024 15:37
@weeco weeco merged commit 7bed85e into redpanda-data:master Nov 3, 2024
2 checks passed
@bachmanity1 bachmanity1 deleted the fix-e2e-producer branch November 3, 2024 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants