Hermes caching layer MVP #1908

adizere · 2022-02-22T13:11:13Z

Crate

ibc-relayer

Summary

Add general support for a caching layer in Hermes.

Problem Definition

The motivation for the caching layer is to reduce pressure on full nodes.

Proposal & Acceptance Criteria

Hermes should cache the static data it is querying from a chain and using most frequently
- using the profiling feature and the telemetry data may prove helpful to track down the most frequent queries
- note that there is already a minimal cache implemented, which stores connections, channel, and client identifier data to filter by client trust threshold

For Admin Use

Not duplicate issue
Appropriate labels applied
Appropriate milestone (priority) applied
Appropriate contributors tagged
Contributor assigned/self-assigned

The text was updated successfully, but these errors were encountered:

ancazamfir · 2022-03-03T14:26:48Z

Great work in a short time! Just one observation, maybe you have discussed this already. I think we also need to maintain/ update the cache via the IBC events.
One problem now for example is that I can create a connection in Init-Try state, start the relayer with channel mode enabled and then separately finish the connection handshake. In this case the relayer will not relay on channel events over that connection because it thinks the connection is not opened. This will not be fixed by the connection TTL as I don't think we have the equivalent of clear packets at channel level (i.e. we don't rescan for unfinished channel handshakes).
The other case is a stale channel that hermes thinks is opened but that has been closed.

I see that the TTL for client state and latest height is relatively small and probably the stale cache is less noticeable. But we should consider fixing this also.

adizere · 2022-03-03T16:08:55Z

I think we also need to maintain/ update the cache via the IBC events.

We were debating this yesterday, but the architectural implications for making the cache accessible from the websocket (supervisor code) was not clear. It may actually be easy to pass the cache (after it is created, somewhere here) to the supervisor, but we haven't tried. The idea we toyed with was to automatically update the ClientState.latest_height automatically from the websocket upon detecting ClientUpdate events, but there was no time to work it through.

One problem now for example

Is this a problem that was introduced with PR #1932? I think we only cache channels and connections once they are in Open state. I don't fully understand the scenario yet.

The other case is a stale channel that hermes thinks is opened but that has been closed.

We debated the implications of this problem, and reduce the TTL from 10min to 1min for channel ends. Decided that the cost for Hermes to retry submitting packets on that (closed) channel for up to a minute before it finally detects the channel closure is a decent tradeoff for the moment.

ancazamfir · 2022-03-03T18:27:36Z

We were debating this yesterday, but the architectural implications for making the cache accessible from the websocket (supervisor code) was not clear. It may actually be easy to pass the cache (after it is created, somewhere here) to the supervisor, but we haven't tried. The idea we toyed with was to automatically update the ClientState.latest_height automatically from the websocket upon detecting ClientUpdate events, but there was no time to work it through.

Yes, I think we can get the chain latest_height from NewBlock and ClientState.latest_height from ClientUpdate.

Is this a problem that was introduced with PR #1932? I think we only cache channels and connections once they are in Open state. I don't fully understand the scenario yet.

You are right. Just tested now with an older gaia version and it's fine. I was hitting cosmos/ibc-go#601 not being included in gaia v6.0.3.

The other case is a stale channel that hermes thinks is opened but that has been closed.

We debated the implications of this problem, and reduce the TTL from 10min to 1min for channel ends. Decided that the cost for Hermes to retry submitting packets on that (closed) channel for up to a minute before it finally detects the channel closure is a decent tradeoff for the moment.

Agreed for the short term it's ok. My question was for long term. We get the IBC events anyway and may as well use them for cache updates. We could reduce the queries to proofs only. The drawback is that we need to deal with the websocket reconnects.

romac · 2022-03-04T13:52:31Z

We get the IBC events anyway and may as well use them for cache updates. We could reduce the queries to proofs only. The drawback is that we need to deal with the websocket reconnects.

Yeah maybe we can invalidate the cache when we get disconnected.

adizere added I: logic Internal: related to the relaying logic O: usability Objective: cause to improve the user experience (UX) and ease using the product P-high labels Feb 22, 2022

adizere added this to the v0.12.1 milestone Feb 22, 2022

mzabaluev mentioned this issue Mar 2, 2022

Add caching layer in Hermes #1932

Merged

6 tasks

adizere closed this as completed in #1932 Mar 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hermes caching layer MVP #1908

Hermes caching layer MVP #1908

adizere commented Feb 22, 2022 •

edited

Loading

ancazamfir commented Mar 3, 2022

adizere commented Mar 3, 2022

ancazamfir commented Mar 3, 2022

romac commented Mar 4, 2022

Hermes caching layer MVP #1908

Hermes caching layer MVP #1908

Comments

adizere commented Feb 22, 2022 • edited Loading

Crate

Summary

Problem Definition

Proposal & Acceptance Criteria

For Admin Use

ancazamfir commented Mar 3, 2022

adizere commented Mar 3, 2022

ancazamfir commented Mar 3, 2022

romac commented Mar 4, 2022

adizere commented Feb 22, 2022 •

edited

Loading