-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Channel closed issue during consumption since #312 #317
Comments
Could this issue be resolved by setting a greater value of the |
I saw that there is a default to 100, which value to I need to put? I wonder about I will give it a try. |
So, I tried to set the value and I have a few questions,
To me this break the developer experience and add not necessary complexity. #[tracing::instrument]
pub async fn create_client(
opts: &Opts,
batch_size: Option<usize>,
) -> Result<Pulsar<TokioExecutor>, pulsar::Error> {
Pulsar::builder(&opts.endpoint, TokioExecutor)
.with_allow_insecure_connection(true)
.with_auth(Authentication {
name: "token".to_string(),
data: opts.token.to_owned().into_bytes(),
})
.with_outbound_channel_size(
batch_size
.map(|batch_size| batch_size * 2)
.unwrap_or(10_000),
)
.map_err(|err| pulsar::Error::Custom(err.to_string()))?
.with_connection_retry_options(ConnectionRetryOptions {
connection_timeout: Duration::from_millis(opts.retry.timeout),
keep_alive: Duration::from_millis(opts.retry.keep_alive),
min_backoff: Duration::from_millis(opts.retry.min_backoff),
max_backoff: Duration::from_millis(opts.retry.min_backoff),
max_retries: opts.retry.max_retries,
})
.build()
.instrument(info_span!("Pulsar::builder.build"))
.await
} instead of #[tracing::instrument]
pub async fn create_client(
opts: &Opts,
batch_size: Option<usize>,
) -> Result<Pulsar<TokioExecutor>, pulsar::Error> {
Pulsar::builder(&opts.endpoint, TokioExecutor)
.with_allow_insecure_connection(true)
.with_auth(Authentication {
name: "token".to_string(),
data: opts.token.to_owned().into_bytes(),
})
.with_outbound_channel_size(
batch_size
.map(|batch_size| batch_size * 2)
.unwrap_or(10_000),
)
.with_connection_retry_options(ConnectionRetryOptions {
connection_timeout: Duration::from_millis(opts.retry.timeout),
keep_alive: Duration::from_millis(opts.retry.keep_alive),
min_backoff: Duration::from_millis(opts.retry.min_backoff),
max_backoff: Duration::from_millis(opts.retry.min_backoff),
max_retries: opts.retry.max_retries,
})
.build()
.instrument(info_span!("Pulsar::builder.build"))
.await
} |
So far, the computation of outbound size from twice the batch size seems to be quite ok. |
Besides, I wonder why I am impacted by the change of the outbound (bounded) channel as we used only the non-blocking method to send messages 🤔 |
Hello @BewareMyPower, Please correct me, if I am wrong. This is my understanding of the introduction bounded channel with #312 and why it works well with twice the batch size. First, the introduction of the bounded channel affects only producers. To me, this behavior is here to implements back pressure on production in case of the pulsar broker is under heavy load and respond slowly. Is it correct? So, if it is the case, I have no idea for now of how implementing the following pattern. I am suggesting that we should move away from sending a SlowDown error, but instead use the Rust asynchronous environment to do so. The idea is when the broker is under heavy load, the future that want to produce a message have to send a For the reason of why twice the batch size works well is due to this line 918c25a#diff-ec6805b5ebfa2ad15066ddd189a37d48cc5abd2516ec350d3d4a2ea1f2c11d19R165 |
Honestly, I didn't look into details for this error code that pulsar-rs deals with the back pressure. But your point makes sense to me. This approach looks very hacky. |
I have a theory of what's happening here. But I couldn't find the exact cause. The message to ACK/NACK a consumed message is also pushed into the bounded channel. When messages are arriving in a high rate, the client emits ACK messages in a high rate as well. Then it triggers the slowdown error, and the code doesn't handle the error properly which then crashes the engine. @FlorentinDUBOIS Could you enable the debug level logging for pulsar-rs. And see if this line is triggered? https://github.com/streamnative/pulsar-rs/blob/master/src/consumer/topic.rs#L126 |
I think it's here. But I'm not sure. @FlorentinDUBOIS A minimal example to reproduce the error would be very helpful. |
Just pushed a potential fix above. I couldn't test it since I can't reproduce the issue. @FlorentinDUBOIS Can you help you test the fix? |
Hello @cirias 👋, Thank you for the fix, I will try it this morning with debug level enabled. |
@cirias I have tried your fix and it works pretty well, I was able to read and produce up to 18k msg/s. I couldn't go above as logging is slow 😅. So far, I will approve your fix, merge it and close this issue. Thanks a lot for the debugging. |
@cirias without the debug logging level, I was able to read and produce up to 27k msg/s which is pretty good. As I merged your fix, I close this issue as well. |
Hello @cirias and @BewareMyPower 👋,
Since the introduction of #312, I keep having the following error in production at Clever Cloud:
This error is blocking the consumption, the workaround was to fork and revert the following commit : 918c25a
As I did not have the context of this feature, could you enlighten me?
The text was updated successfully, but these errors were encountered: