Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

litep2p/discovery: Fix memory leak in litep2p.public_addresses() #5998

Merged
merged 16 commits into from
Oct 16, 2024

Conversation

lexnv
Copy link
Contributor

@lexnv lexnv commented Oct 9, 2024

This PR ensures that the litep2p.public_addresses() never grows indefinitely.

  • effectively fixes subtle memory leaks
  • fixes authority DHT records being dropped due to size limits being exceeded
  • provides a healthier subset of public addresses to the /identify protocol

This PR adds a new ExternalAddressExpired event to the litep2p/discovery process.

Substrate uses an LRU address_confirmations bounded to 32 address entries.
The oldest entry is propagated via the ExternalAddressExpired event when a new address is added to the list (if capacity is exceeded).

The expired address is then removed from the litep2p.public_addresses(), effectively limiting its size to 32 entries (the size of address_confirmations LRU).

cc @paritytech/networking @alexggh

lexnv added 3 commits October 9, 2024 18:43
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv lexnv added T0-node This PR/Issue is related to the topic “node”. I2-bug The node fails to follow expected behavior. labels Oct 9, 2024
@lexnv lexnv self-assigned this Oct 9, 2024
substrate/client/network/src/litep2p/discovery.rs Outdated Show resolved Hide resolved
Co-authored-by: Bastian Köcher <git@kchr.de>
@lexnv lexnv added the A4-needs-backport Pull request must be backported to all maintained releases. label Oct 10, 2024
lexnv added 5 commits October 10, 2024 15:04
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@dmitry-markin
Copy link
Contributor

Why do we have so many (>32) external addresses confirmed? Is there a linked issue in polkadot-sdk?

prdoc/pr_5998.prdoc Outdated Show resolved Hide resolved
substrate/client/network/src/litep2p/mod.rs Outdated Show resolved Hide resolved
@lexnv
Copy link
Contributor Author

lexnv commented Oct 11, 2024

Why do we have so many (>32) external addresses confirmed? Is there a linked issue in polkadot-sdk?

This issue was reproduced in Rococo on litep2p validators, don't believe we have an gitissue for it. It looks like over time peers report via identify protocol connections on different ports.
Libp2p has a few events on which we clean up the addresses (AddressExpired / ExpiredListenAddr / ListenerClosed / port-reuse logic).

lexnv and others added 7 commits October 11, 2024 14:56
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv lexnv enabled auto-merge October 15, 2024 16:09
@alexggh
Copy link
Contributor

alexggh commented Oct 16, 2024

Why do we have so many (>32) external addresses confirmed? Is there a linked issue in polkadot-sdk?

This is the long standing issue for this happening on rococo: #3519 (comment), we never uncovered why so many external addresses accumulate on rococo.

@alexggh alexggh closed this Oct 16, 2024
auto-merge was automatically disabled October 16, 2024 06:49

Pull request was closed

@alexggh alexggh reopened this Oct 16, 2024
@alexggh alexggh enabled auto-merge October 16, 2024 06:50
@alexggh alexggh added this pull request to the merge queue Oct 16, 2024
Merged via the queue into master with commit 38aa4b7 Oct 16, 2024
324 of 328 checks passed
@alexggh alexggh deleted the lexnv/bound-dht-addreses branch October 16, 2024 08:18
@paritytech-cmd-bot-polkadot-sdk

Created backport PR for stable2407:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-5998-to-stable2407
git worktree add --checkout .worktree/backport-5998-to-stable2407 backport-5998-to-stable2407
cd .worktree/backport-5998-to-stable2407
git reset --hard HEAD^
git cherry-pick -x 38aa4b755c5bb332436950363a9826a91de4b2ef
git push --force-with-lease

github-actions bot pushed a commit that referenced this pull request Oct 16, 2024
…5998)

This PR ensures that the `litep2p.public_addresses()` never grows
indefinitely.
- effectively fixes subtle memory leaks
- fixes authority DHT records being dropped due to size limits being
exceeded
- provides a healthier subset of public addresses to the `/identify`
protocol

This PR adds a new `ExternalAddressExpired` event to the
litep2p/discovery process.

Substrate uses an LRU `address_confirmations` bounded to 32 address
entries.
The oldest entry is propagated via the `ExternalAddressExpired` event
when a new address is added to the list (if capacity is exceeded).

The expired address is then removed from the
`litep2p.public_addresses()`, effectively limiting its size to 32
entries (the size of `address_confirmations` LRU).

cc @paritytech/networking @alexggh

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: Bastian Köcher <git@kchr.de>
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
(cherry picked from commit 38aa4b7)
@paritytech-cmd-bot-polkadot-sdk

Successfully created backport PR for stable2409:

@dmitry-markin
Copy link
Contributor

Why do we have so many (>32) external addresses confirmed? Is there a linked issue in polkadot-sdk?

This is the long standing issue for this happening on rococo: #3519 (comment), we never uncovered why so many external addresses accumulate on rococo.

In litep2p backend we have the requirement that at least 2 different peers should report the same external address for it to become confirmed. So, in theory, it should be better with litep2p backend.

fairax pushed a commit to UniqueNetwork/polkadot-sdk that referenced this pull request Oct 30, 2024
lexnv added a commit that referenced this pull request Nov 15, 2024
…5998)

This PR ensures that the `litep2p.public_addresses()` never grows
indefinitely.
- effectively fixes subtle memory leaks
- fixes authority DHT records being dropped due to size limits being
exceeded
- provides a healthier subset of public addresses to the `/identify`
protocol

This PR adds a new `ExternalAddressExpired` event to the
litep2p/discovery process.

Substrate uses an LRU `address_confirmations` bounded to 32 address
entries.
The oldest entry is propagated via the `ExternalAddressExpired` event
when a new address is added to the list (if capacity is exceeded).

The expired address is then removed from the
`litep2p.public_addresses()`, effectively limiting its size to 32
entries (the size of `address_confirmations` LRU).

cc @paritytech/networking @alexggh

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: Bastian Köcher <git@kchr.de>
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
0xbillw added a commit to CESSProject/polkadot-sdk that referenced this pull request Dec 23, 2024
…stable2409` for a temporary apply to the downstream project.
0xbillw added a commit to CESSProject/cess that referenced this pull request Dec 24, 2024
…ry leak about `litep2p` on `stable2409` branch

[PR #5998](paritytech/polkadot-sdk#5998) in the official Polkadot repository addresses this memory leak issue, but it has not been adopted because it constitutes a major change for the `stable2409` branch.
0xbillw added a commit to CESSProject/cess that referenced this pull request Dec 25, 2024
…ry leak about `litep2p` on `stable2409` branch

[PR #5998](paritytech/polkadot-sdk#5998) in the official Polkadot repository addresses this memory leak issue, but it has not been adopted because it constitutes a major change for the `stable2409` branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A4-needs-backport Pull request must be backported to all maintained releases. I2-bug The node fails to follow expected behavior. T0-node This PR/Issue is related to the topic “node”.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants