-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Hermes retrying mechanism not regenerating messages #1951
Fix Hermes retrying mechanism not regenerating messages #1951
Conversation
Based on manual testing, it looks like regenerating the operational data for the transaction for a pending tx hash stops indefinite retries/resubmissions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some suggestions for improvements, nothing critical.
The approach is sound, testing it automatically is difficult so we'll do that in separate work. I will test this manually.
@ancazamfir ready for you to take a look when you have time |
I will try this more but so far something seems wrong ...still have to look at the code. I have a local setup where I managed to reproduce the original problem. The sequence of steps is:
With master I can see the indefinite retries.
|
Ah, actually i had a non-zero (very high) clearing interval in the config. Once I set What would happen if there are simulation or CheckTx failures? We don't have pending Tx-es in this case. I think there is a separate short retry loop but if that fails then we cannot clear those packets anymore. The other problem is the following test:
Here is a log summary:
|
@ancazamfir Thanks for digging into this! Is there any action to be taken based on your findings? |
After discussing with @ancazamfir:
@seanchen1991 Could you please look into (1) whenever you have time (not urgent)? cf. The other problem is the following test in #1951 (comment). |
Note that if
Not sure what we would mean by this. |
Sorry, what I meant was that |
Found what appears to be a pretty obvious bug in the pub fn push_back(&self, val: T) {
- self.0.acquire_write().push_front(val)
+ self.0.acquire_write().push_back(val)
} @ancazamfir I wasn't able to quite reproduce the log outputs you were seeing in order to corroborate that this fix indeed addresses the reversal of txs that you observed. Either you could pull these changes and re-trace the steps you took, or you could add some idiot-proof instructions for me to re-trace your steps on my end 🙂 |
Just tested and 74e7689 fixed it! thanks! The setup is pretty rough, requires custom gaia image that includes a hack in tendermint block execution, plus some patches in hermes to deal with the app height vs blockchain height issue. Then poking around with the timeout in the |
Seems like 1 and 2 are done. @romac @ancazamfir: Should we block this PR on point 4, or shall we move forward here? |
Point 4 can wait, given that there are operators using tx confirmations. Let's move forward! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks @seanchen1991!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Unless Romain or Anca want to add anything, I think this is ready. Great work Sean!
…ems#1951) * Sketching out refactor * Only relay operational data if `clear_interval` == 0 * Back to making changes to `process_pending` * Pass `clear_interval` parameter to `process_pending` fn * Add RelayPath parameter to `process_pending` fn * Make `regenerate_operational_data` private. * Call `process_pending` such that operational data can now be regenerated * Fix clippy warnings * Remove unnecessary comment * Add changelog entry * Update doc comment for `regenerate_operational_data` method * Replace `clear_interval` param with `do_resubmit` in fn signatures * Improve doc comments for the `process_pending` fn * Introduce `Resubmit` type instead of boolean * Document the interaction between `clear_interval` and `tx_confirmation` parameters * Fix incorrect comment * Switch from if on `Resubmit` to match * Fix Queue::push_back method Co-authored-by: Romain Ruetschi <romain@informal.systems>
Closes: #1792 & #2074
Description
PR author checklist:
unclog
.docs/
).Reviewer checklist:
Files changed
in the GitHub PR explorer.