-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relayer Concurrency Architecture #121
Comments
Some thoughts after the discussion today about the relayer concurrency and concerns about its complexity ^. Here is one possibility maybe, collecting from bits of conversations and suggestions today:
Regarding the scale numbers for the active builders, it should be naturally proportional to the number of channels. Let' say in a block at height We should still think the details through but at first look:
(*) we need some background LC thread to keep latest headers within trusted period. Also the event handler will still need to keep the on-chain client states within trusted period. This is something we haven't discussed today but I think it's a minor issue and we can minimize the work here by just doing updates from the LC thread even if some might be redundant. |
My understanding of the complexity in the first diagram is that it's hard to see complete relayer flows (eg. receive event->query state->query lightclient->tx etc.) by looking at the EventHandler - we'd need to build a ton of little state machines in the EventHandler to track/execute those flows clearly. In the end, it seems this would be a prime candidate for async/await - we'd be able to code complete relayer sequences directly, so it'd be easy to see the flow, and it would all execute async. This might be what we ultimately want to work towards, but probably not where we start. Re the new proposal:
I wasn't really sure what was meant by "object associated with an event" here. Can you clarify what we would create a thread for and how often? One idea that might be extremely simple would be to start with 2 long-lived synchronous threads for each pair of chains A and B, one relaying from A to B, the other from B to A. For each chain we're subscribed to, we would route its incoming events into queues for each other chain, and have one thread reading off each queue and doing complete relayer flows. Something like: I think this would eliminate the need for a central Event Handler thread altogether The Event Router would manage some internal state so it can figure out which events are for which chains. It may need to query the chain to figure out for eg. clients/conns/channs it doesn't know about yet. And each relayer thread from A->X would need to be able to query A's light client. We'd then have one synchronous thread handling all logic for relaying from chain A to chain B. So we'd never race with ourselves eg. on client updates. Conceivably, for each block on chain A, we could submit one tx to chain B with msgs for all relevant events in that block. |
We build one state machine with < 10 states. With one function per state. This should cover all connection, channel, packets handling
Not sure how it would look with async/await. We would need to handle incoming events and future cancellation.
Potentially for every IBC event in a block we would create a thread, similar on how we create a "builder message" object in the initial proposal. Packet events belonging to the same channel are grouped to create a single thread/ builder. You don't need to know the destination chain so all logic and queries are done from the thread/ builder.
This is a problem as it would block the Event Router.
This may sound appealing in this one sentence. But it will still have to deal with timeouts, errors and retries. In the Go Relayer there are go routines spawned and things are retried a number of times between sleeps.
For connection, the A->B thread may need to update the client on A. And of course there would be races with other relayers. |
Trying to clarify my own thoughts, hope this will help others as well. Maybe the most important part worth understanding is that we are discussing here two orthogonal problems:
Takeaways so far:
|
When it comes to moving forward, some simple questions that can guide us would be:
|
I must apologize here for not providing a more precise recommendation that builds on the momentum here. After reading the ADR a few times and following along with the discussion here I still find it very difficult to understand the underlying motivation for the suggested approach. It is very difficult to validate a concurrency design without identifying the bottlenecks and mapping a proposed design onto them. Specifically If our bottlenecks are IO and signature verification, those bottlenecks that need to be saturated for an “optimal” solution. However I suspect that those bottlenecks are not equally important, it could be that IO dominates performance before signature verification becomes a bottleneck. And so if we were to introduce progressive levels of performance (which I suspect we should) we might want to do IO concurrently first before we worry about signature verification. Another component that I would look for in validating a concurrency design is a proposal of how it will make complex scenarios testable. I think it makes sense to start with a tested synchronous implementation to move with confidence towards a more performant implementation and ideally the test harness/cases would still apply. It isn't clear to me how any of the above would be tested and so to be frank, it's not really possible for me to consider it a good idea until that bit is clear. So my regretfully broad recommendation here is identify the bottlenecks and at least form a credible hypothesis over their interactions along with a solid test regimen. And then frame the proposal in how it would solve those problems because that would provide us with the confidence we need to evaluate and make different tradeoffs. |
This is superseded by #326. |
Summary
Describe and implement the relayer concurrency architecture for first release of the Relayer.
Problem Definition
There are a few potential threads required by the event based relayer, global, per chain and/ or per relaying path. In general congestion or failure in the communication with one chain should not affect the communication with other chains and for unrelated paths.
The issue should be addressed with a minimal proposal, improvements will be added in the next releases.
Proposal
Here is one proposal, final decision and implementation may look differently.
config executor
- single thread that runs upon start and triggers initiation messages for objects included in the config file. These are disjoint from the chain IBC events, something likeConnOpenStart
orChannOpenStart
that could potentially cause the relayer to build and sendConnOpenInit
orChannOpenInit
messages to the chain. It should work even these events are received by theevent handler
in the same time with the live chain IBC events. In other words, no synchronization with starts of other threads should be required.chain monitor
- one thread per chain that registers with the chain and forwards the notification messages to theevent handler
. Currently the relayer registers forTx
andBlock
notifications. It then extracts the IBC events from theTx
and generates a "NewBlock" event also for the block. Note that the notification may include multiple IBC Events. Also they arrive before theNewBlock
event.chain querier
- these should be short lived threads that block on ABCI queries and send the response back to event handler. Or maybe this is per chain thread in first release? Event relayer should not block forever on full channel to this thread.chain light client
- one thread per chain that receives relayer requests, downloads and verifies headers, and returns the minimal header set in the response. In addition this thread should ensure latest locally stored state does not expire, i.e. when some timer fires it fetches and verifies the latest header. Timeout should be a fraction of the trusting period. Timer should be reset by the on-demand relayer requests.chain tx
- per chain thread that submits IBC messages and packets to the chain. Some sort of confirmation is expected but we need still to clarify if and how the relayer will act on this, or waits for an IBC event for confirmation, or times out.relayer thread
- one global thread that includes the:event handler
that:message builder
message builder
message builder
and forwards them to the appropriate querier, light client or tx threadsmessage builder
- runs within the same context of the event handler. There is one per builder object, i.e. instances of connections, channels and packets. It implements an FSM that triggers new queries as required, deals with "conflicting" events caused by race conditions. Note: the CPU load is minimum and will probably be fine even when we add the (succinct) proof verification later. The IO performed by theevent handler
is also minimal as it involves only inter-thread communication...again if no blocking channels. Otherwise we revise.For Admin Use
The text was updated successfully, but these errors were encountered: