Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge unstable branch to deneb 20231005 #4808

Merged

Conversation

jimmygchen
Copy link
Member

@jimmygchen jimmygchen commented Oct 5, 2023

Issue Addressed

Merge latest unstable branch to deneb-free-blobs.

Main conflicts / additional changes are:

  • CI: Update network-tests to use the same toolings as unstable for installing Rust and running tests.
  • CI: always run make install-lcli, as the pre-installed lcli on self-hosted runner isn't compatible with Deneb.
  • mock_builder: re-implemented Deneb changes on top of the unstable version (without mev-rs).

realbigsean and others added 13 commits October 3, 2023 03:37
unblock sigp#4755

Co-authored-by: Paul Hauner <paul@paulhauner.com>
Co-authored-by: realbigsean <seananderson33@gmail.com>
## Issue Addressed

sigp#4675 

## Proposed Changes

 - Update local ENR (**only port numbers**) with local addresses received from libp2p (via `SwarmEvent::NewListenAddr`)
 - Only use the zero port for CLI tests

## Additional Info

### See Also ###

 - sigp#4705 
 - sigp#4402 
 - sigp#4745
Attempting to improve our CI speeds as its recently been a pain point.

Major changes:

 - Use a github action to pull stable/nightly rust rather than building it each run
 - Shift test suite to `nexttest` https://github.com/nextest-rs/nextest for CI
 
 UPDATE:

So I've iterated on some changes, and although I think its still not optimal I think this is a good base to start from. Some extra things in this PR:
- Shifted where we pull rust from. We're now using this thing: https://github.com/moonrepo/setup-rust . It's got some interesting cache's built in, but was not seeing the gains that Jimmy managed to get. In either case tho, it can pull rust, cargofmt, clippy, cargo nexttest all in < 5s. So I think it's worthwhile. 
- I've grouped a few of the check-like tests into a single test called `code-test`. Although we were using github runners in parallel which may be faster, it just seems wasteful. There were like 4-5 tests, where we would pull lighthouse, compile it, then run an action, like clippy, cargo-audit or fmt. I've grouped these into a single action, so we only compile lighthouse once, then in each step we run the checks. This avoids compiling lighthouse like 5 times.
- Ive made doppelganger tests run on our local machines to avoid pulling foundry, building and making lcli which are all now baked into the images. 
- We have sccache and do not incremental compile lighthouse

Misc bonus things:
- Cargo update
- Fix web3 signer openssl keys which is required after a cargo update
- Use mock_instant in an LRU cache test to avoid non-deterministic test
- Remove race condition in building web3signer tests

There's still some things we could improve on. Such as downloading the EF tests every run and the web3-signer binary, but I've left these to be out of scope of this PR. I think the above are meaningful improvements.



Co-authored-by: Paul Hauner <paul@paulhauner.com>
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: antondlr <anton@delaruelle.net>
## Proposed Changes

- only use LH types to avoid build issues
- use warp instead of axum for the server to avoid importing the dep

## Additional Info

- wondering if we can move the `execution_layer/test_utils` to its own crate and import it as a dev dependency
- this would be made easier by separating out our engine API types into their own crate so we can use them in the test crate
- or maybe we can look into using reth types for the engine api if they are in their own crate


Co-authored-by: realbigsean <seananderson33@gmail.com>
…hed (sigp#4569)

## Issue Addressed
sigp#4543
## Proposed Changes
- Removes `NotBanned` from `BanResult`, implements `Display` and `std::error::Error` for `BanResult` and changes `ban_result` return type to `Option<BanResult>` which helps returning `BanResult` on `handle_established_inbound_connection`  
- moves the check from for banned peers from `on_connection_established` to `handle_established_inbound_connection` to start addressing sigp#4543.
- Removes `allow_block_list` as it's now redundant? Not sure about this one but if `PeerManager` keeps track of the banned peers, no need to send a `Swarm` event for `alow_block_list` to also keep that list right? 
 
## Questions

-  sigp#4543 refers:
>  More specifically, implement the connection limit behaviour inside the peer manager.

@AgeManning do you mean copying `libp2p::connection_limits::Behaviour`'s code into `PeerManager`/ having it as an inner `NetworkBehaviour` of `PeerManager`/other? If it's the first two, I think it probably makes more sense to have it as it is as it's less code to maintain.

> Also implement the banning of peers inside the behaviour, rather than passing messages back up to the swarm.

I tried to achieve this, but we still need to pass the `PeerManagerEvent::Banned` swarm event as `DiscV5` handles it's node and ip management internally and I did not find a method to query if a peer is banned. Is there anything else we can do from here?

https://github.com/sigp/lighthouse/blob/33976121601d9d31d26c1bb2e1f0676e9c4d24fc/beacon_node/lighthouse_network/src/discovery/mod.rs#L931-L940

Same as the question above, I did not find a way to check if `DiscV5` has the peer banned, so that we could check here and avoid sending `Swarm` events

https://github.com/sigp/lighthouse/blob/33976121601d9d31d26c1bb2e1f0676e9c4d24fc/beacon_node/lighthouse_network/src/peer_manager/network_behaviour.rs#L168-L178

Is there a chance we try to dial a peer that has been banned previously? 

Thanks!
## Issue Addressed

Right now lighthouse accepts zero as enr ports. Since enr ports should be reachable, zero ports should be rejected here

## Proposed Changes

- update the config to use `NonZerou16` as an ENR port for all enr-related fields.
- the enr builder from config now sets the enr to the listening port only if the enr port is not already set (prev behaviour) and the listening port is not zero (new behaviour)
- reject zero listening ports when used with `enr-match`. 
- boot node now rejects listening port as zero, since those are advertised.
- generate-bootnode-enr also rejected zero listening ports for the same reason.
- update local network scripts

## Additional Info

Unrelated, but why do we overwrite `enr-x-port` values with listening ports if `enr-match` is present? we prob should only do this for enr values that are not already set.
## Issue Addressed

N/A

## Proposed Changes

Removing the two Teku mainnet bootnodes that are being sunset.

## Additional Info

We are leaving only these two bootnodes: https://github.com/eth-clients/eth2-networks/blob/master/shared/mainnet/bootstrap_nodes.txt#L10-L11
## Issue Addressed

N/A

## Proposed Changes

We were currently downscoring a peer for sending us a block that we already have in fork choice. This is unnecessary as we get duplicates in lighthouse only when
1. We published the block, so the block is already in fork choice
2. We imported the same block over rpc

In both scenarios, the peer who sent us the block over gossip is not at fault.

This isn't exploitable as valid duplicates will get dropped by the gossipsub duplicate filter
## Issue Addressed

We've had a report of sync committee performance suffering with the beacon processor HTTP API prioritisations.

## Proposed Changes

Increase the priority of `/eth/v1/beacon/blocks/head/root` requests, which are used by the validator client to form sync committee messages, here:

https://github.com/sigp/lighthouse/blob/441fc1691b69f9edc4bbdc6665f3efab16265c9b/validator_client/src/sync_committee_service.rs#L181-L188

Additionally, avoid loading the blinded block in all but the `block_id=block_root` case. I'm not sure why we were doing this previously, I suspect it was just an oversight during the implementation of the `finalized` status on API requests.

## Additional Info

I think this change should have minimal negative impact as:

- The block root endpoint is quick to compute (a few ms max).
- Only the priority of `head` requests is increased. Analytical processes that are making lots of block root requests for past slots are unable to DoS the beacon processor, as their requests will still be processed after attestations.
## Issue Addressed

This PR closes sigp#3237

## Proposed Changes

Remove topic weight of old topics when the fork happens.

## Additional Info

- Divided `NetworkService::start()` into `NetworkService::build()` and `NetworkService::start()` for ease of testing.
…igp#4758)

## Issue Addressed
- Close sigp#4596 

## Proposed Changes
- Add `Filter::recover` to handle rejections specifically as 404 NOT FOUND

Please list or describe the changes introduced by this PR.

## Additional Info

Similar to PR sigp#3836
## Issue Addressed

Closes sigp#4712

## Proposed Changes

Exit aggregation step early if no validator is aggregator. This avoids an unnecessary request to the beacon node and more importantly fixes noisy errors if Lighthouse VC is used with other clients such as Lodestar and Prysm.

## Additional Info

Related issue ChainSafe/lodestar#5553
# Conflicts:
#	.github/workflows/test-suite.yml
#	Cargo.lock
#	beacon_node/execution_layer/Cargo.toml
#	beacon_node/execution_layer/src/test_utils/mock_builder.rs
#	beacon_node/execution_layer/src/test_utils/mod.rs
#	beacon_node/network/src/service/tests.rs
#	consensus/types/src/builder_bid.rs
@jimmygchen jimmygchen added deneb ready-for-review The code is ready for review labels Oct 5, 2023
jimmygchen and others added 4 commits October 5, 2023 16:38
## Issue Addressed

While reviewing sigp#4801 I noticed that our use of `take_while` in the block replayer means that if a state root iterator _with gaps_ is provided, some additonal state roots will be dropped unnecessarily. In practice the impact is small, because once there's _one_ state root miss, the whole tree hash cache needs to be built anyway, and subsequent misses are less costly. However this was still a little inefficient, so I figured it's better to fix it.

## Proposed Changes

Use [`peeking_take_while`](https://docs.rs/itertools/latest/itertools/trait.Itertools.html#method.peeking_take_while) to avoid consuming the next element when checking whether it satisfies the slot predicate.

## Additional Info

There's a gist here that shows the basic dynamics in isolation: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=40b623cc0febf9ed51705d476ab140c5. Changing the `peeking_take_while` to a `take_while` causes the assert to fail. Similarly I've added a new test `block_replayer_peeking_state_roots` which fails if the same change is applied inside `get_state_root`.
Copy link
Member

@realbigsean realbigsean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks jimmy !

@realbigsean realbigsean merged commit 203ac65 into sigp:deneb-free-blobs Oct 5, 2023
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deneb ready-for-review The code is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.