Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autopilot fallback address #1039

Merged
merged 34 commits into from
Jan 11, 2024
Merged

autopilot fallback address #1039

merged 34 commits into from
Jan 11, 2024

Conversation

sampocs
Copy link
Collaborator

@sampocs sampocs commented Dec 28, 2023

Closes: #XXX

Context and purpose of the change

We now use a hashed address as the sender of the outbound transfer during autopilot liquid stake and forward. However, we need to gracefully hand ack failures and timeouts.

In the event of an ack failure, we should send the tokens to a fallback address (which will be the original receiver address of the autopilot memo). In the event of a timeout, we just retry infinitely.

Brief Changelog

  • Added keeper functions to store the fallback address
  • Added OnAckPacket for autopilot that will handle the bank send to the fallback address during an ack failure
  • Added OnTimeoutPacket for autopilot that will handle the infinite retries
  • Added relevant unit tests

Testing

Timeout

  • Hard coded the timeout timestamp in the original transfer submission code to be 1 second pass the block time (to force a timeout)
  • Started dockernet and sent an autopilot LS and forward tx
  • Confirmed via logging that the original packet timed out and the retry was submitted
  • Confirmed via balance checks that the retry landed

Ack Error

  • Started dockernet and sent an autopilot tx with an invalid IBC reciever address (to force an ack error)
  • Confirmed via balance queries that
    • (1) uatom left the gaia address
    • (2)stuatom appeared in escrow address (after the forwarding transfer was submitted)
    • (3) the stuatom moved from the escrow address to the fallback address (during the ack error handling)

@sampocs sampocs marked this pull request as ready for review December 29, 2023 01:12
x/autopilot/types/keys.go Show resolved Hide resolved
x/autopilot/keeper/ibc.go Outdated Show resolved Hide resolved
Copy link
Contributor

@asalzmann asalzmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two main comments

  • why not use icacallbacks to bank send to the fallback address?
  • on second thought, the infinite retry approach seems bad because it could lead to full stride blocks (with a queue of messages that is impossible to process and relayers running out of funds)

x/autopilot/keeper/ibc.go Show resolved Hide resolved
x/autopilot/keeper/fallback.go Show resolved Hide resolved
x/autopilot/module_ibc.go Show resolved Hide resolved
x/autopilot/keeper/ibc.go Outdated Show resolved Hide resolved
Copy link
Contributor

@riley-stride riley-stride left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed with particular attention to ibc.go and fallback.go.

Looking very good. Thanks for the thorough comments and replies on the PR threads, they helped with the review.

The main thing I'd want to see to get more confidence is some integration tests that test the various fallback address code paths. Two cases that come to mind: (1) LS&forward + ack failure => check tokens landed in fallback addr (2) LS&forward + timeout => check tokens are retrying infinitely (not sure how to check this, maybe see the balances oscillating at the freq of the retry period). Of these, (1) seems by far most important to integration test.

x/autopilot/keeper/fallback.go Show resolved Hide resolved
x/autopilot/keeper/fallback.go Outdated Show resolved Hide resolved
x/autopilot/keeper/fallback.go Show resolved Hide resolved
x/autopilot/keeper/ibc.go Show resolved Hide resolved
@sampocs
Copy link
Collaborator Author

sampocs commented Jan 6, 2024

@asalzmann @riley-stride

I took a very rough first pass at using callbacks in #1047; however, it's far from functional right now.

You can see the PR description for details, but the TLDR is it seems like it would be a pretty big effort to add callbacks to autopilot since it's in the transfer stack.

The PR should help illustrate the trade off of what the solution would look like with callbacks vs how it's implemented in this PR. ​​_​I'm going to pause on this for now until you get back to me on which approach you'd prefer​_​​.

My two cents atm is that the callbacks approach does seem to maybe clarify things a tad, but it's mostly just a swapping of keeper boilerplate for callbacks boilerplate, and I'm not sure it simplifies things enough to justify the effort required to overcome the challenge described above.

Also, the PR description is probably quite confusing, but if any of you are planning to work on this this weekend, I can put a loom together for ya to help clarify.

@riley-stride
Copy link
Contributor

riley-stride commented Jan 7, 2024

@asalzmann @riley-stride

I took a very rough first pass at using callbacks in #1047; however, it's far from functional right now.

You can see the PR description for details, but the TLDR is it seems like it would be a pretty big effort to add callbacks to autopilot since it's in the transfer stack.

The PR should help illustrate the trade off of what the solution would look like with callbacks vs how it's implemented in this PR. ​​_​_​I'm going to pause on this for now until you get back to me on which approach you'd prefer​_​_​​.

My two cents atm is that the callbacks approach does seem to maybe clarify things a tad, but it's mostly just a swapping of keeper boilerplate for callbacks boilerplate, and I'm not sure it simplifies things enough to justify the effort required to overcome the challenge described above.

Also, the PR description is probably quite confusing, but if any of you are planning to work on this this weekend, I can put a loom together for ya to help clarify.

Thanks for the loom, that was very helpful.

I lean towards keeping this approach for now, and potentially refactoring in a future upgrade.

I'm a bit further from the middleware stack so can't speak on this confidently, but seems like sifting through the assumptions baked into the middleware stack (outlined in your video) to re-wire middleware stack three could be dangerous to do on a fast timeline.

It ​_does_​ however feel like the more technically correct solution, so long term we should probably move toward it.

Don't feel strongly though, @sampocs and @asalzmann's opinions should carry much more weight here as they both understand the middleware stack more deeply.

@sampocs
Copy link
Collaborator Author

sampocs commented Jan 9, 2024

Discussed offline and decided on the same as above. Callbacks approach seems too complex atm, but we can revist our mdidleware stack later

Copy link
Contributor

@riley-stride riley-stride left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Removing CheckAcknowledgementStatus and having OnTimeoutPacket send to the Fallback address simplifies the PR meaningfully imo.

Copy link
Contributor

@asalzmann asalzmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! main changes I reviewed

  • use the icacallbacks ack parsing function
  • use a timeout and don't retry forwards

@sampocs sampocs changed the base branch from sam/autopilot-hash-sender-2 to main January 10, 2024 23:32
@sampocs sampocs added the A:automerge Automatically merge PR once checks pass label Jan 10, 2024
@mergify mergify bot merged commit b9be32b into main Jan 11, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A:automerge Automatically merge PR once checks pass C:app-wiring C:stakeibc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants