Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve splice_locked retransmission logic #1223

Open
t-bast opened this issue Jan 27, 2025 · 2 comments
Open

Improve splice_locked retransmission logic #1223

t-bast opened this issue Jan 27, 2025 · 2 comments

Comments

@t-bast
Copy link
Collaborator

t-bast commented Jan 27, 2025

The splice spec currently requires retransmitting splice_locked on reconnection if:

A receiving node:
  - MUST send `splice_locked` for the most recent splice transaction that reached
    acceptable depth, unless `splice_locked` was already sent for this transaction
    and `next_commitment_number` is strictly greater than the `commitment_number + 1`
    that was used when `splice_locked` was previously sent (which indicates that
    the previous `splice_locked` was received).

Those requirements are a bit hard to express and mean that we may be retransmitting splice_locked even though our peer has already received it. It also creates a slightly weird race condition if one side has not received splice_locked and tries using the channel immediately on reconnection, which we detail below. The goal of this issue is to improve the splice_locked retransmission requirements to get rid of those issues.

splice_locked / update_add_htlc reconnect race condition

We start with Alice and Bob having no pending splice, and thus the following initial active commitments:

+------------+
| FundingTx1 |
+------------+

Alice initiates a splice, but disconnects before Bob receives her splice_locked:

   Alice                           Bob
     | splice_init                  |
     |----------------------------->|
     |                   splice_ack |
     |<-----------------------------|
     |       <interactive-tx>       |
     |<---------------------------->|
     |                              |
     |      <splice confirms>       |
     |                              |
     | splice_locked                |
     |----------------------------->|
     |                splice_locked |
     |              X---------------|
     |                              |
     |       <disconnection>        |
     |                              |

At that point, the splice tx is fully locked for Bob, but not for Alice who is waiting for Bob's splice_locked.

Alice's active commitments are:

+------------+        +------------+
| FundingTx1 |------->| FundingTx2 |
+------------+        +------------+

Bob can forget the previous commitment since the splice is locked, and his active commitments are just:

+------------+
| FundingTx2 |
+------------+

On reconnection, Bob will re-send splice_locked, but if Alice wants to use the channel right after receiving channel_reestablish, there is a race condition:

   Alice                           Bob
     |                              |
     | channel_reestablish          |
     |----------------------------->|
     |          channel_reestablish |
     |<-----------------------------|
     | update_add_htlc              |
     |---------------X              |
     | commit_sig                   | batch_size=2, funding_tx=funding_tx_1
     |---------------X              |
     | commit_sig                   | batch_size=2, funding_tx=funding_tx_2
     |---------------X              |
     |                splice_locked |
     |              X---------------|
     | update_add_htlc              |
     |              --------------->|
     | commit_sig                   | IGNORED: unknown funding tx
     |              --------------->|
     | commit_sig                   |
     |              --------------->|
     |                splice_locked |
     |<---------------              |
     |               revoke_and_ack |
     |<-----------------------------|
     |                   commit_sig | batch_size=1, funding_tx=funding_tx_2
     |<-----------------------------|
     | revoke_and_ack               |
     |----------------------------->|

Now that Alice has received Bob's splice_locked, she can forget fundingTx1.
Bob can safely ignore Alice's commit_sig for fundingTx1, so this race condition is harmless.

But for taproot channels it's more complicated: Alice will not be able to create commit_sig because she will be missing nonces!
The main issue is that splice_locked retransmission isn't atomic with channel_reestablish, even though it conceptually should: how do we fix that?

Option 1: add current_splice_locked field to channel_reestablish

A simple option would be to add a current_splice_locked TLV in the channel_reestablish message containing the latest funding_txid for which:

  • we have received splice_locked
  • we have sent or are ready to send splice_locked

In the example above, Alice would have set it to funding_tx_1 while Bob would have set it to funding_tx_2.

When receiving Alice's channel_reestablish:

  • Bob would know that she hasn't received his splice_locked.
  • Bob would thus retransmit it before any other channel message.

When receiving Bob's channel_reestablish:

  • Alice would know that he received her splice_locked.
  • Alice would also know that Bob has sent splice_locked and will retransmit it immediately.
  • She could then immediately prune funding_tx_1 and only send commit_sig for funding_tx_2.

A nice side-effect of this change is that we only retransmit splice_locked when we know that our peer hasn't already received it.
This is simpler than the current requirement to retransmit until the commitment_number has increased (which implicitly tells you that your peer received your splice_locked).

Option 2: include splice_locked inside channel_reestablish

Another option would be to remove the retransmission of splice_locked entirely and instead include the whole splice_locked message inside channel_reestablish, making it atomic.
It works, but it feels hackish: why not do the same with channel_ready retransmission if we're doing it for splice_locked?
Also, including a whole message in another message's TLV field doesn't feel very elegant...but it guarantees that we atomically receive all the data we need ¯_(ツ)_/¯.

A drawback for this is that we will keep unnecessarily including this splice_locked until the commitment_number has increased, even when our peer has received it already.

Option 3: delay messages until splice_locked is received

Without taproot channels, this race condition is harmless, so we could decide to do nothing about it.

With taproot channels, Alice would detect in Bob's channel_reestablish that she is missing nonces for funding_tx_1.
She would then implicitly know that it means that Bob has sent splice_locked and will retransmit it.
She could then pause the channels and avoid sending update_add_htlc while she waits for this splice_locked message.

This sounds brittle, I'm not a big fan of this solution.

@t-bast
Copy link
Collaborator Author

t-bast commented Jan 27, 2025

@ddustin I'd like to discuss this during today's spec meeting if you can have a look at this issue!

t-bast added a commit to t-bast/bolts that referenced this issue Jan 28, 2025
If one side sent `splice_locked` and the other side is ready to send
its own `splice_locked` while they are disconnected, this creates a
race condition on reestablish because `splice_locked` is retransmitted
after `channel_reestablish`, and other channel updates can be inserted
by the other node before receiving `splice_locked`. This will be an
issue for taproot channels, because nonces will be missing.
This race condition is described in more details in lightning#1223.

We fix this race condition by adding a TLV to `channel_reestablish`
that provides information about the latest locked transaction. This
additional information also makes it easier to detect when we need
to retransmit our previous `splice_locked`.
@t-bast
Copy link
Collaborator Author

t-bast commented Jan 28, 2025

I have specified option 1 in 2c1b500, this simplifies the retransmission requirements and resolves this race condition, let me know what you think!

t-bast added a commit to t-bast/bolts that referenced this issue Feb 12, 2025
If one side sent `splice_locked` and the other side is ready to send
its own `splice_locked` while they are disconnected, this creates a
race condition on reestablish because `splice_locked` is retransmitted
after `channel_reestablish`, and other channel updates can be inserted
by the other node before receiving `splice_locked`. This will be an
issue for taproot channels, because nonces will be missing.
This race condition is described in more details in lightning#1223.

We fix this race condition by adding a TLV to `channel_reestablish`
that provides information about the latest locked transaction. This
additional information also makes it easier to detect when we need
to retransmit our previous `splice_locked`.
remyers added a commit to remyers/eclair that referenced this issue Feb 14, 2025
This TLV is relevant for splice funding txs only.

When the `my_current_funding_locked_txid` TLV attribute confirms the latest funding tx we prune previous funding transaction similar to receiving `splice_locked` from our peer for that txid.

When we receive `your_last_funding_locked_txid` that does not match our latest confirmed funding tx, then we know our peer did not receive our last `splice_locked` and retransmit it.

We also retransmit `splice_locked` after `channel_reestablish` if we have not received `announcement_signatures` for the latest confirmed funding tx. This is needed to prompt our peer to also retransmit their own `splice_locked` and `announcement_signatures`.

This ensures both sides will have exchanged `splice_locked` after a disconnect and will be relevant for simple taproot channels to exchange nonces.

Note: This changes previous behavior for retransmitting `splice_locked` after a disconnect. Previous behavior was susceptible to a race condition if one node sent a channel update after `channel_reestablish`, but before receiving `splice_locked` from a peer that had confirmed the latest funding tx while offline.

cf. lightning/bolts#1223
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant