htlcswitch: send an error before marking a channel w/ local data loss #7312

Roasbeef · 2023-01-12T01:08:51Z

Change Description

In this commit, we move to also send an error before we "fail" our channel when we realize that either we've lost data, or are recovering from some partial state.

This is the first half of the set of changes needed to fix: #6017. To start with, we'll ensure that the new lnd nodes always send the error once the receive a chan reest and determine that they're behind. In the second half of the fix, we'll start to hold off on force closing until the other party sends an error. We opt to do this in two parts as: there're many lnd nodes out there that still rely on the current behavior. Once we get another update cycle under our belt, we can start to catch up our behavior fully.

Steps to Test

Interop test with Eclair+CLN+LDK to ensure that the SCB flow still works.

The SCB flow in this PR should also still work with the existing lnd node versions as well.

Pull Request Checklist

Testing

Your PR passes all CI checks.
Tests covering the positive and negative (error paths) are included.
Bug fixes contain tests triggering the bug to prevent regressions.

Code Style and Documentation

The change obeys the Code Documentation and Commenting guidelines, and lines wrap at 80.
Commits follow the Ideal Git Commit Structure.
Any new logging statements use an appropriate subsystem and logging level.
There is a change description in the release notes, or [skip ci] in the commit message for small changes.

📝 Please see our Contribution Guidelines for further guidance.

Fixes #7301

In this commit, we move to also send an error before we "fail" our channel when we realize that either we've lost data, or are recovering from some partial state.

Crypt-iQ · 2023-01-13T15:15:31Z

htlcswitch/link.go

+				// continue.
+				//
+				// TODO(roasbeef): use reliable sender here?
+				l.sendFatalError()


I think this should be before the call to MarkDataLoss? I don't believe MarkBorked is called in peer/brontide.go for this since PermanentFailure will be false. It also seems like an Error is already sent since ErrRecoveryError is used in the call to l.fail(...) below. Is the issue here that the p2p connection may go down after MarkDataLoss, but before we eventually call ShouldSendToPeer in handleLinkFailure?

Yeah I actually had this initially, but the rationale was that by the time we send the Error they might have up already, which would cause some of the logic below to short circuit.

Is the issue here that the p2p connection may go down after MarkDataLoss, but before we eventually call ShouldSendToPeer in handleLinkFailure?

My understanding going into this change was that other implementations started to mostly ignore what we send for chan reest, even if it's all very invalid. So we now need to send the Error as that's the only thing they'll force close off of.

IIUC, on restart, we'll also repeat this loop all over again, w/ the MarkDataLoss then being a noop.

Tracing the pipeline a bit more, you're right that we'll eventually call handleLinkFailure in, which'll then fail since it's sending out ErrRecoveryError.

Roasbeef · 2023-01-13T20:41:37Z

Ok, based on the above comment, this PR is unnecessary: we always send the error when we detect that we're behind and go into SCB or recovery mode. So then an open question is: what behavioral change did the other implementations make that caused the SCB flow to no longer work for some users.

htlcswitch: send an error before marking a channel w/ local data loss

412f296

In this commit, we move to also send an error before we "fail" our channel when we realize that either we've lost data, or are recovering from some partial state.

Roasbeef added safety General label for issues/PRs related to the safety of using the software spec interop interop with other implementations SCB Related to static channel backup labels Jan 12, 2023

yyforyongyu self-requested a review January 12, 2023 11:44

Crypt-iQ reviewed Jan 13, 2023

View reviewed changes

Roasbeef closed this Jan 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

htlcswitch: send an error before marking a channel w/ local data loss #7312

htlcswitch: send an error before marking a channel w/ local data loss #7312

Roasbeef commented Jan 12, 2023 •

edited

Loading

Crypt-iQ Jan 13, 2023

Roasbeef Jan 13, 2023

Roasbeef Jan 13, 2023

Roasbeef commented Jan 13, 2023

htlcswitch: send an error before marking a channel w/ local data loss #7312

htlcswitch: send an error before marking a channel w/ local data loss #7312

Conversation

Roasbeef commented Jan 12, 2023 • edited Loading

Change Description

Steps to Test

Pull Request Checklist

Testing

Code Style and Documentation

Crypt-iQ Jan 13, 2023

Choose a reason for hiding this comment

Roasbeef Jan 13, 2023

Choose a reason for hiding this comment

Roasbeef Jan 13, 2023

Choose a reason for hiding this comment

Roasbeef commented Jan 13, 2023

Roasbeef commented Jan 12, 2023 •

edited

Loading