Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hermes relaying modes: push versus pull relaying #2850

Closed
2 of 12 tasks
adizere opened this issue Nov 9, 2022 · 4 comments · Fixed by #3323
Closed
2 of 12 tasks

Hermes relaying modes: push versus pull relaying #2850

adizere opened this issue Nov 9, 2022 · 4 comments · Fixed by #3323
Assignees
Labels
I: configuration Internal: related to Hermes configuration O: new-feature Objective: cause to add a new feature or support O: usability Objective: cause to improve the user experience (UX) and ease using the product
Milestone

Comments

@adizere
Copy link
Member

adizere commented Nov 9, 2022

Summary of problem

Some Hermes users are running the relayer with clear_interval = 1. This effectively means that the relayer does not use the WS and scans each chain every block, relaying packets based on the periodic packet clearing functionality.

In general, it's not good to allow users to use Hermes in ways in which is not properly documented and tested, and in environments for which we didn't design it.

I learned about this use-case from a user that was transferring NFTs across networks. They could not rely on the websocket functionality because sending NFTs was done from a WASM contract, which does not emit events tagged with module = ibc attribute, hence Hermes does not catch those events. Similar problem was reported by this other user (#2809), who is also using WASM contracts for triggering transfers.

Proposal

I think we should consider introducing a native feature that allows Hermes to bypass the Websocket and run in a "periodic packet clearing mode" only. For the moment, let's distinguish push relaying from pull relaying (TODO: figure out more appropriate names).

  • push: the current start functionality
  • pull: a more basic packet relaying mode that skips the WS connection
    • one way to think of pull relaying is that it's like batch processing (versus online/push processing)

It's unclear what the pull mode would look like, eg:

  • new CLI?
  • keep the start CLI but add an option --periodic or something like that?
  • investigate also if anything important (client refresh, misbehaviour detection, channel workers) is not functioning correctly in the absence of the websocket, and clarify that pull mode has limited functionality

Next steps

Note the milestone for this is v1.4, so not an immediate priority.

This is just a meta issue to trace some next steps and feasiblity:

  • @adizere to gather more feedback from operators on the production use-case for pull-based mode
    • is this mode could be potentially useful beyond just relaying of packets from WASM contracts?
    • is is problematic if the pull mode does not accomplish misbhevarior detection, channel or conn. handshakes?
    • is there any utility in keeping the current push-mode behavior? Likely there is utility in allowing relayers to quickly catch events and relay (i.e., optimize for latency). In the context of relayer fees & MEV, latency is critical.
    • pull mode is clearly useful for relayers that act as backups (on channels that have a lot of redundancy)
  • @soareschen & @ljoss17 and @seanchen1991 to consider the implications and feasibility of a pull-based design in the context of new architecture
    • is the pull mode possible with v1.5 / v2 ?
  • @romac and @ancazamfir to consider the implications of a pull mode in the context of IBC node design
  • @ljoss17 to consider the implication in the integration testing framework: the framework would need to support testing of the pull-based mode
  • @seanchen1991 to add a section to the Hermes guide detailing the tradeoffs between push and pull methodologies and why operators might choose one over the other

Related issues


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate milestone (priority) applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@adizere adizere added this to the v1.4 milestone Nov 9, 2022
@adizere adizere added O: new-feature Objective: cause to add a new feature or support O: usability Objective: cause to improve the user experience (UX) and ease using the product I: configuration Internal: related to Hermes configuration labels Nov 9, 2022
@soareschen
Copy link
Contributor

It sounds like what you want is to have poll-based relaying compared to the current push-based or event-based relaying. For the new relayer design, this should be just a matter of switching from a poll-based event source to a push-based event source. Otherwise, we could implement multiple event source implementations and allow relaying based on contract-specific events.

For the current v1 relayer, we could refactor the init_subscriptions function such that it can switch to spawning background tasks that periodically polls the chain state and send the collected events to the MPSC channel. Alternatively, we could spawn a different task that don't take any event source and periodically only call handle_clear_packet to clear the packets.

@adizere adizere changed the title Hermes relaying modes: online versus offline relaying Hermes relaying modes: push versus pull relaying Nov 11, 2022
@adizere
Copy link
Member Author

adizere commented Nov 11, 2022

Two examples of CW IBC that users provided where the Websocket functionality doesn't work:

@kaisbaccour
Copy link

kaisbaccour commented Apr 15, 2023

I shared here a tweet and it happened that the suggestion is already being discussed so I would like to add my motivation here.

  • Not many node runners share their WS endpoint publicly and if I just want to test things and relay a couple packets I don't want to run two nodes on two chains (because I don't have the HW resources, or because it just takes a lot of time)
  • This point is a suspicion but I am not sure about. So I am trying to relay randomness beacon from Nois Network over IBC to other chains. the IBC protocol is a custom nois-v6 and the channel operates on a wasm port. I feel like hermes might be only relaying when there is a clearing event (but I am not sure about this). Running ts-relayer which doesn't support WS and just periodically pulls seems to perform better in this case. The performance issue is only a suspition from my side.
    This is a dashboard where we request randomness from uni-6 to nois every minute. and this is the time it takes https://monitoring.noislabs.com/d/tuD0SeYVk/nois-performance?orgId=1&from=1681411658387&to=1681450908145 Notice that at some point it got fast and consistent. that's when I started ts-relayer on my local mac that pulls periodically every 3 seconds. then it I turned the ts-relayer off and it became slow again. (maybe Nois relayers who mainly run hermes set the clear interval to a high value).
    I think covering the pull based relaying helps achieve two things:
  1. adds a good combination for all cases where the push fails (Until there's an easy way to customise the push events, maybe the edge cases of push miss are not that rare I donnow)
  2. allows for inefficient but easier to setup relaying.

@romac
Copy link
Member

romac commented Apr 17, 2023

@kaisbaccour Thanks for sharing your experience with us! This all makes sense! We are tracking the issue with wasm events in #3190 and #2809. I am actually working on switching to a poll-based model which eschews the WebSocket stream, which should solve the problem you are seeing: #3209

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I: configuration Internal: related to Hermes configuration O: new-feature Objective: cause to add a new feature or support O: usability Objective: cause to improve the user experience (UX) and ease using the product
Projects
Status: ✅ Done
5 participants