Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(swarm): prevent overflow in keep-alive computation #4644

Merged
merged 11 commits into from
Oct 18, 2023

Conversation

zrus
Copy link
Contributor

@zrus zrus commented Oct 13, 2023

Description

Add safe check to prevent Delay::reset() panic when duration overflow.

Fixes: #4641.

Attributions

Co-authored-by: Leonz bellerophon00530@gmail.com
Co-authored-by: Max Inden mail@max-inden.de

Notes & open questions

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fast turnaround time. Overall looks good to me.

@thomaseizinger mind taking a look given that this is quite subtle?

@zrus can you add an entry to swarm/CHANGELOG.md?

swarm/src/connection.rs Outdated Show resolved Hide resolved
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the follow-up. Again, let's give Thomas a chance to review. The more eyes on this subtle code path, the better.

swarm/CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Embarrassing that we didn't catch this in the first patch.

I'd like to test this better. In #4595, we are extracting this big code block into a function. Would you be up for copying that to this PR and writing some prop-tests using quickcheck for it?

@zrus
Copy link
Contributor Author

zrus commented Oct 14, 2023

I'd like to test this better. In #4595, we are extracting this big code block into a function. Would you be up for copying that to this PR and writing some prop-tests using quickcheck for it?

I have done it. But I have not added any tests yet. Please help me on this.

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Regarding tests, are you familiar with quickcheck? https://github.com/BurntSushi/quickcheck

swarm/src/connection.rs Outdated Show resolved Hide resolved
@zrus
Copy link
Contributor Author

zrus commented Oct 14, 2023

Regarding tests, are you familiar with quickcheck? https://github.com/BurntSushi/quickcheck

I have not used it before. Could you guide me?

@thomaseizinger
Copy link
Contributor

Regarding tests, are you familiar with quickcheck? BurntSushi/quickcheck

I have not used it before. Could you guide me?

Sure! You'll like this :)

The idea of quickcheck is to property-based testing by generate test-cases automatically asserting no the code having certain properties. We can use that for this code to check that regardless of which inputs we generate, the code never pancis but always yields a valid Shutdown value.

You can find an example test in the connection.rs file: max_negotiating_inbound_streams.

To write your own test, you need to create a prop function (name doesn't matter) that takes the inputs you want quickcheck to permute. The function needs to call compute_new_shutdown. To start with, we don't need to make any further assertions, simply not panicking is good already!

We'll need a couple of things:

  1. Add an implementation of quickcheck::Arbitrary to KeepAlive. The implementation should be gated by #[cfg(test)].
  2. Generalize the KeepAliveUntilConnectionHandler to take an entire KeepAlive and not just an Instant.
  3. Write a quickcheck-test with a prop function that takes a KeepAlive and a Duration:
    • The test should make a new KeepAliveUntilConnectionHandler with the passed in KeepAlive
    • Call compute_new_shutdown with Shutdown::None, the handler and the provided Duration as idle_timeout

That should get us started. We can then think about whether we want to permute the existing Shutdown as well which is a bit trickier.

@thomaseizinger
Copy link
Contributor

Tagging @leonzchang for visibility of related effort. We cherry-picked some of your code to make testing this easier :)

I added you as a co-author to the PR description to correctly attribute your contribution!

@mergify
Copy link
Contributor

mergify bot commented Oct 16, 2023

This pull request has merge conflicts. Could you please resolve them @zrus? 🙏

@mxinden mxinden mentioned this pull request Oct 17, 2023
4 tasks
@mxinden
Copy link
Member

mxinden commented Oct 18, 2023

In order to unblock #4625, I am looking into the quickcheck test right now.

Panic reproduced with test without patch.
mxinden
mxinden previously approved these changes Oct 18, 2023
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work!

thomaseizinger
thomaseizinger previously approved these changes Oct 18, 2023
Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mxinden !

Added some ideas that we can tackle in a follow-up :)

swarm/src/connection.rs Outdated Show resolved Hide resolved
swarm/src/connection.rs Outdated Show resolved Hide resolved
swarm/src/connection.rs Show resolved Hide resolved
@thomaseizinger
Copy link
Contributor

@mxinden There are merge conflicts unfortunately.

@mergify mergify bot dismissed stale reviews from mxinden and thomaseizinger October 18, 2023 09:19

Approvals have been dismissed because the PR was updated after the send-it label was applied.

mxinden
mxinden previously approved these changes Oct 18, 2023
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomaseizinger assuming that with all comments addressed, you are fine with me merging here.

@mergify mergify bot dismissed mxinden’s stale review October 18, 2023 11:24

Approvals have been dismissed because the PR was updated after the send-it label was applied.

@mergify mergify bot merged commit c112703 into libp2p:master Oct 18, 2023
71 of 72 checks passed
@zrus
Copy link
Contributor Author

zrus commented Oct 22, 2023

I apologize for disappearing suddenly from last conversation about the test. I was too busy last week. And thank both of you, @thomaseizinger, @mxinden, for the great work and being patient with me.

@thomaseizinger
Copy link
Contributor

I apologize for disappearing suddenly from last conversation about the test. I was too busy last week. And thank both of you, @thomaseizinger, @mxinden, for the great work and being patient with me.

No worries at all! Normally nothing is super urgent but we wanted to start the next breaking release and this felt important enough to be patched before that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.with_idle_connection_timeout(Duration::from_secs(u64::MAX)) still a bug.
3 participants