alibabacloud recommends against using Sarama? #2036

KeenCN · 2021-09-22T06:43:56Z

I saw from alibabacloud that Sarama library is not recommended by them. Do we have a repair plan for these three problems?

source:
https://www.alibabacloud.com/help/faq-detail/266782.htm

Quote：

The following known problems exist with the Go client that is developed with the Sarama library:

When a partition is added to a topic, the Go client developed with the Sarama library cannot detect the new partition or consume messages from the partition. Only after the client is restarted, it can consume messages from the new partition.

When the client subscribes to more than two topics at the same time, the client may fail to consume messages from some partitions.

If the policy for resetting consumer offsets of the Go client developed with the Sarama library is set to Oldest(earliest), the client may start to consume messages from the earliest offset when the client breaks down or the broker version is upgraded. This is because the out_of_range class is implemented in the client.

dnwe · 2021-09-22T10:32:45Z

@KeenCN this is the first time I have seen this document, and I'd have to search to see if whoever is responsible for it ever raised any issues for these problems, but it seems somewhat out-of-date and it's unclear for the other two what is being described.

RE: 1., I believe that issue was fixed in Version 1.24.1 (2019-10-31) under #1525

RE: 2, I'm not sure what issue they are referring to here, I'm assuming they saw some behaviour where the fairness of message consumption between topics wasn't as balanced as they would like? It's not something I've personally seen before, but we could investigate if someone can put together a functional test or example that reproduces whatever they were seeing.

RE: 3., again I'm not sure exactly what they are describing here. If the broker is restarted and truncates a partition such that the consumer gets an out of range error, then the broker consumer will stop due to the ErrOffsetOutOfRange and if Consumer.Return.Errors is set then it will return that error to the user to decide what to do. Then when they reconnect the offset reset configuration should only be used in the event that their consumer group didn't have a committed offset on the __consumer_offsets topic that is still valid post-truncation.

onlyice · 2021-10-13T03:01:41Z

I can confirm that Sarama can detect new partition(s). But it happened after a metadata refresh, which by default is by a 10 minutes interval. Could be the cause of the misunderstanding of the first issue.

1998729 · 2021-11-01T01:57:04Z

You can implement this function yourself, here is a simple example:

type topicList map[string]struct{}

func (t topicList) fetch(client sarama.Client, prefix string) (bool, error) {
	if t == nil {
		return false, errors.New("topic list must not be empty")
	}

	topics, err := client.Topics()
	if err != nil {
		return false, err
	}

	changed := false
	for _, topic := range topics {
		if strings.Contains(topic, prefix) {
			if _, ok := t[topic]; !ok {
				changed = true
			}
			t[topic] = struct{}{}
		}
	}

	return changed, nil
}

func (t *topicList) poll(c sarama.Client, prefix string, interval time.Duration, handler func(), done <-chan struct{}) {
	for {
		select {
		case <-done:
			klog.Infof("poll stopped because consumer group was closed\n")
			return
		case <-time.After(interval):
			klog.Infof("start polling new topic list, interval is %v\n", interval)
			if c.Closed() {
				klog.Infof("poll stopped because client was closed\n")
				return
			}

			changed, err := t.fetch(c, prefix)
			if err != nil {
				klog.Errorf("failed to fetch new topic list! error: %v\n", err)
				continue
			}

			if changed {
				klog.Infof("new topics found, calling handler\n")
				handler()
				return
			}
		}
	}
}

And then poll it

for {
		consumerGroup, err := sarama.NewConsumerGroup(input.config.Servers, input.config.GroupID, input.saramaConf)

		go input.topicList.poll(client, input.config.TopicPrefix, input.config.TopicPollInterval, cancelFn, donePoll)

topics := make([]string, 0, len(input.topicList))
		for t, _ := range input.topicList {
			topics = append(topics, t)
		}
		klog.Infof("initial topic list (len %d) is: %v\n", len(topics), topics)

		err = consumerGroup.Consume(ctx, topics, input.handler)
}

erenming · 2022-09-02T02:30:21Z

I can confirm that Sarama can detect new partition(s). But it happened after a metadata refresh, which by default is by a 10 minutes interval. Could be the cause of the misunderstanding of the first issue.

Yes, you're right. And we should notice, in that case(eg. detect new partitions), the session will exit, so users should always retry the Consume method. Here explained detailly

cheaster062211 · 2023-03-30T05:15:42Z

我可以确认 Sarama 可以检测到新分区。但它发生在元数据刷新之后，默认情况下间隔为 10 分钟。可能是对第一个问题的误解的原因。

你是对的。我们应该注意到，在那种情况下（例如检测到新分区），会话将退出，因此用户应该始终重试 Consume 方法。这里有详细解释

if oldTopicToPartitionNum, err = c.topicToPartitionNumbers(topics); err != nil {
		return
	}

When this oldTopicToPartitionNum already changed , and subsequent checks based on the changed number, then cannot trigger the rebalance.
In extreme cases, this oldTopicToPartitionNum may not be the number of sessions that were created, making subsequent checks invalid.
The root cause is no reset timer in backgroundMetadataUpdater function, executing refreshMetadata before loopCheckPartitionNumbers ，so refreshed the partition number

cheaster062211 · 2023-03-30T05:17:14Z

@KeenCN this is the first time I have seen this document, and I'd have to search to see if whoever is responsible for it ever raised any issues for these problems, but it seems somewhat out-of-date and it's unclear for the other two what is being described.

RE: 1., I believe that issue was fixed in Version 1.24.1 (2019-10-31) under #1525

RE: 2, I'm not sure what issue they are referring to here, I'm assuming they saw some behaviour where the fairness of message consumption between topics wasn't as balanced as they would like? It's not something I've personally seen before, but we could investigate if someone can put together a functional test or example that reproduces whatever they were seeing.

RE: 3., again I'm not sure exactly what they are describing here. If the broker is restarted and truncates a partition such that the consumer gets an out of range error, then the broker consumer will stop due to the ErrOffsetOutOfRange and if Consumer.Return.Errors is set then it will return that error to the user to decide what to do. Then when they reconnect the offset reset configuration should only be used in the event that their consumer group didn't have a committed offset on the __consumer_offsets topic that is still valid post-truncation.

When this oldTopicToPartitionNum already changed , and subsequent checks based on the changed number, then cannot trigger the rebalance.
In extreme cases, this oldTopicToPartitionNum may not be the number of sessions that were created, making subsequent checks invalid.
The root cause is no reset timer in backgroundMetadataUpdater function, executing refreshMetadata before loopCheckPartitionNumbers ，so refreshed the partition number

dnwe · 2023-08-20T08:13:09Z

PR #2563 is also relevant here

github-actions · 2023-11-19T16:03:05Z

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur.
Please check if the main branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

KeenCN changed the title ~~https://www.alibabacloud.com/help/faq-detail/266782.htm~~ I saw from alibabacloud that Sarama library is not recommended. Do we have a repair plan for these three problems? Sep 22, 2021

dnwe changed the title ~~I saw from alibabacloud that Sarama library is not recommended. Do we have a repair plan for these three problems?~~ alibabacloud recommends against using Sarama? Sep 22, 2021

This comment was marked as outdated.

Sign in to view

github-actions bot added the stale Issues and pull requests without any recent activity label Aug 20, 2023

github-actions bot removed the stale Issues and pull requests without any recent activity label Aug 20, 2023

homier mentioned this issue Nov 14, 2023

Configuring Sarama's initial offset to OffsetOldest leads the existing group to reprocess topic messages from start lovoo/goka#441

Closed

github-actions bot added the stale Issues and pull requests without any recent activity label Nov 19, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alibabacloud recommends against using Sarama? #2036

alibabacloud recommends against using Sarama? #2036

KeenCN commented Sep 22, 2021 •

edited by dnwe

Loading

dnwe commented Sep 22, 2021

onlyice commented Oct 13, 2021 •

edited

Loading

1998729 commented Nov 1, 2021 •

edited

Loading

erenming commented Sep 2, 2022

cheaster062211 commented Mar 30, 2023 •

edited

Loading

cheaster062211 commented Mar 30, 2023 •

edited

Loading

This comment was marked as outdated.

dnwe commented Aug 20, 2023

github-actions bot commented Nov 19, 2023

alibabacloud recommends against using Sarama? #2036

alibabacloud recommends against using Sarama? #2036

Comments

KeenCN commented Sep 22, 2021 • edited by dnwe Loading

dnwe commented Sep 22, 2021

onlyice commented Oct 13, 2021 • edited Loading

1998729 commented Nov 1, 2021 • edited Loading

erenming commented Sep 2, 2022

cheaster062211 commented Mar 30, 2023 • edited Loading

cheaster062211 commented Mar 30, 2023 • edited Loading

This comment was marked as outdated.

dnwe commented Aug 20, 2023

github-actions bot commented Nov 19, 2023

KeenCN commented Sep 22, 2021 •

edited by dnwe

Loading

onlyice commented Oct 13, 2021 •

edited

Loading

1998729 commented Nov 1, 2021 •

edited

Loading

cheaster062211 commented Mar 30, 2023 •

edited

Loading

cheaster062211 commented Mar 30, 2023 •

edited

Loading