Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: decouple CFE from SC events #4382

Merged
merged 10 commits into from
Jan 18, 2024
Merged

Feat: decouple CFE from SC events #4382

merged 10 commits into from
Jan 18, 2024

Conversation

msgmaxim
Copy link
Contributor

@msgmaxim msgmaxim commented Jan 4, 2024

Pull Request

Closes PRO-1091

Checklist

Please conduct a thorough self-review before opening the PR.

  • I am confident that the code works.
  • I have updated documentation where appropriate.

Summary

After discussion with @dandanlen I decided to create a separate pallet for CFE events, so that all events are in the same place and we don't need to repeat the mechanism for cleaning up events etc (the alternative I first considered was having CFE read storage items from separate pallets).

All events are stored in the CfeEvents storage item, which can be queried by block number. In SC Observer, once we receive a new finalized header, we now query all events for that block. Events are stored for 20 blocks after which they are removed, which should give the CFE enough time to retrieve them.

All events are defined in the CfeEvent enum. I imagine we can create simple tests decoding/encoding each variant from/to some hardcoded values to ensure that we don't accidentally make incompatible changes (probably in a separate PR).

I created a few traits, some of which are generic over chain or crypto, so that our generic code (broadcaster, threshold signature etc) can use them while running slightly different code for each. I think for the most part I was able to avoid duplicating code where possible.

I'm pretty sure this doesn't require a runtime storage migration, but let me know if I'm wrong. I did test upgrading runtime from main to this branch and it worked as expected, initialising CfeEvents with the default value (empty container).

@msgmaxim msgmaxim requested review from dandanlen and kylezs January 4, 2024 07:03
match_event! {event, {
CfeEvent::EthThresholdSignatureRequest(req) => {
handle_signing_request::<_, _, _, EthereumInstance>(
scope,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting

@msgmaxim msgmaxim force-pushed the feat/cfe-events-pallet branch from 571210a to 77e996c Compare January 9, 2024 05:59
Copy link

codecov bot commented Jan 9, 2024

Codecov Report

Attention: 209 lines in your changes are missing coverage. Please review.

Comparison is base (08a1a47) 72% compared to head (191434d) 72%.

Files Patch % Lines
engine/src/state_chain_observer/sc_observer/mod.rs 20% 97 Missing and 4 partials ⚠️
state-chain/pallets/cf-cfe-interface/src/lib.rs 57% 21 Missing and 10 partials ⚠️
state-chain/cfe-events/src/lib.rs 19% 1 Missing and 16 partials ⚠️
...chain/pallets/cf-cfe-interface/src/benchmarking.rs 0% 16 Missing ⚠️
state-chain/cf-integration-tests/src/network.rs 92% 11 Missing and 1 partial ⚠️
state-chain/traits/src/mocks/cfe_interface_mock.rs 80% 7 Missing and 4 partials ⚠️
...tate-chain/pallets/cf-cfe-interface/src/weights.rs 50% 8 Missing ⚠️
state-chain/pallets/cf-broadcast/src/tests.rs 84% 5 Missing ⚠️
state-chain/traits/src/lib.rs 0% 3 Missing ⚠️
state-chain/chains/src/btc.rs 0% 1 Missing ⚠️
... and 4 more
Additional details and impacted files
@@          Coverage Diff          @@
##            main   #4382   +/-   ##
=====================================
- Coverage     72%     72%   -0%     
=====================================
  Files        396     401    +5     
  Lines      67561   67659   +98     
  Branches   67561   67659   +98     
=====================================
+ Hits       48756   48798   +42     
- Misses     16363   16392   +29     
- Partials    2442    2469   +27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@msgmaxim msgmaxim force-pushed the feat/cfe-events-pallet branch 5 times, most recently from fcdb6df to 39098a5 Compare January 11, 2024 05:00
@msgmaxim msgmaxim force-pushed the feat/cfe-events-pallet branch from 39098a5 to b546f91 Compare January 11, 2024 05:18
@msgmaxim msgmaxim marked this pull request as ready for review January 11, 2024 05:19
@msgmaxim msgmaxim changed the title WIP: cfe event emitter pallet Feat: decouple CFE from SC events Jan 11, 2024
Copy link
Contributor

@kylezs kylezs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌 product team requests to update events can be accommodated much more easily now

@@ -756,6 +770,7 @@ construct_runtime!(
{
System: frame_system,
Timestamp: pallet_timestamp,
CfeEventEmitter: pallet_cf_cfe_event_emitter,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved back to the end of the list (now using custom order of pallet execution instead)

@msgmaxim
Copy link
Contributor Author

So just to summarise, in this PR we need a way to clear events at exact block boundary (so that if we request storage for a block hash, we only get events generated at that block). The current solution is/was to put the CFE events pallet before any pallet that might emit a CFE event and clear the events in on_initialize. However, as discussed on discord, this is prolematic as we don't want to change the indexes of the existing pallets. I see a few options as to how to proceed:

a) Go back to the original approach where we store events in a map so that events from different blocks can be queried separately (i.e. reverting 9760930)

b) Clear events in the earliest pallet that currently emits CFE events, so that we can be sure that no new events are lost. @AlastairHolmes also suggested modifying the existing system pallet to do something similar (or even moving the storage for the CFE events there).

Any other suggestions? Kyle suggested that you might have some thoughts @dandanlen.

@dandanlen
Copy link
Collaborator

dandanlen commented Jan 15, 2024

We can customise the execution order of the hooks.

In the runtime.rs we have:

pub type Executive = frame_executive::Executive<
	Runtime,
	Block,
	frame_system::ChainContext<Runtime>,
	Runtime,
	AllPalletsWithSystem,
	PalletMigrations,
>;

We can replace AllPalletsWithSystem with our own custom tuple with the pallets in any order we please (same thing as we do with PalletMigrations). This will then execute all the hooks in that order.

The only downsides I can think is that we need to remember to maintain this, ie. when we add a new pallet, the hooks need to be added to the custom tuple. This also applies to eg. integration tests: we need to replicate the same execution order instead of relying on the default.

edit: another downside is that the hooks are executed after runtime upgrades, meaning we would be unable to emit any cfe events during a runtime upgrade. For this reason, the system events are in fact deleted before anything else (before any hooks and before the runtime upgrade). We should consider doing the same for the cfe events as Alastair suggested, but this would involve editing either frame_executive::Executive or the system pallet itself. I'm happy to use the hooks for now, but in the long run changing the Executive/System pallet seems like the more robust approach.

The benefit is that we don't need to worry about re-ordering pallets in the runtime declaration - as you mentioned @msgmaxim this will continue to cause issues whenever we want to add eg. a new chain.


A third option would be to make the breaking change and re-order the pallets, but to use some pre-defined explicit ranges with some gaps for future additions, ie.:

construct_runtime!(
	pub struct Runtime
	{
		System: frame_system = 0,
		Timestamp: pallet_timestamp,
		CfeEventEmitter: pallet_cf_cfe_event_emitter,
		// 3: TBC
		// 4: TBC
		Environment: pallet_cf_environment = 5,
		Flip: pallet_cf_flip,
		Emissions: pallet_cf_emissions,
		// etc..
	}
);

This would buy us some time but will eventually also run into the same limitation (at some point we might run out of free indices between pallets).

Copy link
Collaborator

@dandanlen dandanlen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I would prefer a less restrictive pallet name, for example cfe-interface - long term, we might use this pallet for things other than events, for example to group cfe-specific storage values, or to host cfe-specific extrinsics (for example the version number for compatibility checks, or submitting/storing the peer info, which is used only by the cfe, not by the runtime).


benchmarks! {

clear_events {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will break when we generate new benchmarks: it needs to be called remove_events_for_block or alternatively, the method in WeightInfo needs to be renamed to clear_events. (I actually prefer the latter given that you removed the index per-block).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 0b710a2

payload: broadcast_attempt.transaction_payload.clone(),
});

// TODO: consider removing this
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we not remove it? I guess polkaJS, debugging?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit - I saw there was another comment thread about this, think I agree with keeping it for now

}
}

pub trait CfeEventEmitterT<T: Chainflip> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not something like CfePeerRegistration?

We don't need to assume that the underlying implementation emits events.

Similarly for the other traits. Could be CfeMultisigRequest, CfeBroadcastRequest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that the same trait would be used for many potentially unrelated events (ideally all of them, but because some pallets are generic over either Crypto or Chain, I had to create separate traits for them). I see that CfeMultisigRequest is likely to cover any current/future Crypto related events, but I wonder if we would want to add more events from non-generic pallets to CfeEventEmitterT. Would we then prefer separate traits for different events? (I don't really mind either way.)

I will rename the traits like you are suggesting (considering that we can easily rename them again if needed).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I would prefer more granular traits over a single 'god trait'. Makes it easier to mock, makes the intent clearer etc.

I also feel like it's worth distinguishing the interface from the implementation. It doesn't matter that it emits an event - for example you mentioned we might remove peer registration events in favour of storage writes. What matters is that when you use an implementation of this trait, you expect to notify the engine of a peer update. Whether it's an event or a storage write, or some mock that simply writes to a log, is not really relevant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed traits in d2fdc66

@msgmaxim
Copy link
Contributor Author

The only downsides I can think is that we need to remember to maintain this

If we forget, does this mean the hooks for that pallet won't be executed? If that's the case, then I don't think it is a really big deal, since we will certaintly notice that something doesn't work.

another downside is that the hooks are executed after runtime upgrades, meaning we would be unable to emit any cfe events during a runtime upgrade

Do you mean in migration code? Seems unlikely that there will be a need for that. Even if do we need something like that, I imagine it wouldn't be too difficult to come up with a workaround.

We can replace AllPalletsWithSystem with our own custom tuple with the pallets in any order we please (same thing as we do with PalletMigrations). This will then execute all the hooks in that order.

Sould like a reasonable approach. Should I go ahead with this or do we want to think about this more? @dandanlen

@dandanlen
Copy link
Collaborator

I think a custom tuple gives us the quickest win right now. Also, we'll need this anyway whenever we add a new chain.

We will need to remain aware of the runtime upgrade weirdness: For example say we want to trigger a threshold signature or a broadcast during a runtime upgrade, the current approach would not allow this.

I'll open a Linear issue to customize the executor/system pallet events reset mechanism.

@dandanlen
Copy link
Collaborator

@msgmaxim
Copy link
Contributor Author

I think a custom tuple gives us the quickest win right now. Also, we'll need this anyway whenever we add a new chain.

Made this change here:
0b710a2. Tested on localnet, seems to work as expected. The order in the tuple is exactly what was used before with the cfe interface pallet moved up right after the Timestamp pallet.

@dandanlen
Copy link
Collaborator

It looks like some integration tests are failing. Could be related to the change in ordering... either way, it looks like all the tests are failing at the same assertion, so should (hopefully!) be a simple fix. @syan095 might be able to help out here, he's quite familiar with the integration tests. (Also it looks like his recent PR conflicts with this one).

@dandanlen
Copy link
Collaborator

LGTM otherwise 🎉

@msgmaxim
Copy link
Contributor Author

Fixed integration tests in 71b2087 and resolved the conflict with main @dandanlen.

Copy link
Collaborator

@dandanlen dandanlen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kylezs still needs to approve.

@dandanlen dandanlen mentioned this pull request Jan 18, 2024
2 tasks
Copy link
Contributor

@kylezs kylezs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

@msgmaxim msgmaxim merged commit acc6698 into main Jan 18, 2024
42 of 43 checks passed
@msgmaxim msgmaxim deleted the feat/cfe-events-pallet branch January 18, 2024 11:32
syan095 added a commit that referenced this pull request Jan 19, 2024
…itness-swap

* origin/main: (24 commits)
  fix: restore correct restriction on redemption expiry (PRO-1072)
  Feat/migrate-to-polkadot-sdk-repo (#4361)
  chore: fix RUSTSEC-2024-0003 (#4426)
  Feat: decouple CFE from SC events (#4382)
  chore: add docker tag prefix to `chainflip-ingress-egress-tacker` 🏷️ (#4427)
  refactor/PRO-1101: rid of swapping minimum swap amount (#4419)
  refactor(ingress-egress-tracker): remove unnecessary fields (#4425)
  Improved the retry queue storage (#4420)
  fix: bump spec version command only bumps when necessary (#4422)
  refactor: use yargs for all try_runtime_command options (#4423)
  fix: await finalisation before starting broker (#4412)
  chore: debug logs to see get_raw_txs query (#4413)
  doc: correct env vars (#4416)
  fix: pool_orders rpc filters empty orders (PRO-1039) (#4377)
  Produce an event in case the swap yields a zero egress amount (#4410)
  fix: don't have conflicting redis port with localnet (#4415)
  Ability to specify output for the subcommands, other than `/dev/null` (#4411)
  chore: increase limit on max number of bitcoin payloads in a ceremony to theoretical maximum (#4396)
  refactor(ingress-egress-tracker): configurable expiry times [WEB-761] (#4406)
  fix: use existing script for upgrade job (#4403)
  ...

# Conflicts:
#	Cargo.lock
#	state-chain/primitives/Cargo.toml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants