Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix panic: cannot consume from pending buffer #303

Merged
merged 3 commits into from
Oct 20, 2024
Merged

Fix panic: cannot consume from pending buffer #303

merged 3 commits into from
Oct 20, 2024

Conversation

NobodyXu
Copy link
Collaborator

@NobodyXu NobodyXu commented Oct 15, 2024

Fixed #298

This PR changes the decoder do_poll_read impl, to not advance the buffer on first flush.

The panic is because the decoder try to advance the buffer before polling the underlying buf reader.

In this PR I tried a different fix, by making sure buf.consume is always called, even on error.
I suspect that previously we didn't consume the buffer on error, and that might have caused the same data to be decompressed again.

I can't think of anywhere else that could fail, the decoder implementation looks alright.

Fixed #298

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
@NobodyXu
Copy link
Collaborator Author

@Turbo87 Can you try this PR please?

@Turbo87
Copy link

Turbo87 commented Oct 15, 2024

yep, I'll give it a try.

@robjtede robjtede added the A-semver-patch bug fixes label Oct 15, 2024
@Turbo87
Copy link

Turbo87 commented Oct 16, 2024

Bildschirmfoto 2024-10-16 um 09 32 57

unfortunately still failing 😢

@NobodyXu
Copy link
Collaborator Author

Thanks.

Is the software open-source?

Can I have a look at the code and the test?

@Turbo87
Copy link

Turbo87 commented Oct 16, 2024

Can I have a look at the code and the test?

the code yes, the test no. unfortunately we don't have a test that reproduces it. I can only run it on our staging environment where I can reproduce it. our test suite runs with an in-memory object_store instance instead of the S3 implementation we use on staging and production.

Signed-off-by: Jiahao XU <30436523+NobodyXu@users.noreply.github.com>
Signed-off-by: Jiahao XU <30436523+NobodyXu@users.noreply.github.com>
@NobodyXu
Copy link
Collaborator Author

I've found the cause of the panic.

It is because the decoder try to advance the buffer before polling the underlying buf reader.

cc @Turbo87 I've updated the PR, can you try again please?

Thank you!

@Turbo87
Copy link

Turbo87 commented Oct 16, 2024

the panic appears to be gone, but we're now seeing an "interval out of range" error result without a stacktrace. I will have to improve our logging a bit to figure out where exactly that is coming from.

@NobodyXu
Copy link
Collaborator Author

cc @robjtede Shall we merge and publish this for now, since it at least fixes the panic for @Turbo87

@Turbo87
Copy link

Turbo87 commented Oct 19, 2024

we're now seeing an "interval out of range" error

it turns out that this was a bug on our side, related to how we calculate exponential backoff for failed jobs.

I can confirm that #303 appears to fix the issue for us! 🎉

thanks again! :)

@NobodyXu
Copy link
Collaborator Author

Thank you!

1 similar comment
@NobodyXu

This comment has been minimized.

@NobodyXu
Copy link
Collaborator Author

cc @robjtede let's get this merged and cut a new release, as it is confirmed to fix the panic

@NobodyXu NobodyXu added this pull request to the merge queue Oct 20, 2024
@NobodyXu
Copy link
Collaborator Author

I will get this merged and ask for review in the release PR.

Merged via the queue into main with commit 014e6e4 Oct 20, 2024
16 checks passed
@NobodyXu NobodyXu deleted the fix/panic branch October 20, 2024 06:36
@robjtede
Copy link
Member

Ahh wonderful, yes, lets get this out today.

@Turbo87
Copy link

Turbo87 commented Oct 20, 2024

thanks again for the investigation, fix and release! I just merged the latest update into crates.io :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

panic: cannot consume from pending buffer
3 participants