-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Beat [3/4]: prepare resolvers to handle the blockbeat
#9276
base: yy-feature-blockbeat
Are you sure you want to change the base?
Conversation
Important Review skippedAuto reviews are limited to specific labels. 🏷️ Labels to auto review (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Pull reviewers statsStats of the last 30 days for lnd:
|
763bfc7
to
d7b0d2f
Compare
ac8ce82
to
a16b2d4
Compare
02cebbb
to
e40d7aa
Compare
a16b2d4
to
ae69b98
Compare
e40d7aa
to
736edda
Compare
3456f2e
to
3d08b74
Compare
1e17c5c
to
9ab2e53
Compare
3d08b74
to
e69cd41
Compare
e69cd41
to
cdd805f
Compare
9ab2e53
to
9ed2522
Compare
Minor point, it it isn't inheritance, it's composition, with the inner resolver replacing the outer one. Go technically doesn't have inheritance at all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review the first two PRs, so one thing I'm missing context on is: why dees Resolve
need to split up in order to be incorporated into the new blockbeat
architecture?
Is that that since we don't know which ones will be launched ahead of time, we can't set up the blockbeat
consumer DAG upfront?
contractcourt/contract_resolver.go
Outdated
prefix) | ||
|
||
// If the ShortChanID is empty, we use the ChannelPoint instead. | ||
if r.ShortChanID.IsDefault() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the scid isn't used above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yikes yeah removed since SCID isn't that useful for lookups
@@ -304,23 +247,14 @@ func (h *htlcSuccessResolver) resolveRemoteCommitOutput() ( | |||
return nil, err | |||
} | |||
|
|||
// Once the transaction has received a sufficient number of | |||
// confirmations, we'll mark ourselves as fully resolved and exit. | |||
h.resolved = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this attribute no longer used? IIRC, it has a role for the subset of outputs that are still sent to the nursery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it's moved into checkpointClaim
@@ -373,6 +307,7 @@ func (h *htlcSuccessResolver) checkpointClaim(spendTx *chainhash.Hash, | |||
} | |||
|
|||
// Finally, we checkpoint the resolver with our report(s). | |||
h.resolved = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nvm I see it here restored. Though IIRC, with the current control flow, we shouldn't do this unconditionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah the slight difference is we put it in checkpointClaim
, after we've done NotifyFinalHtlcEvent
, and move it closer to Checkpoint
as it's where we write to disk.
return err | ||
} | ||
|
||
h.outputIncubating = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here re this state value. I do think it's possible to make the resolvers 100% stateless, by relying on chain notifications to figure out if something has been mined or not. The nursery might need to be queried distinctly though .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I also realized this after reviewing #9062. What's more, because the sweeper now handles re-offered inputs by first checking the db and mempool to grab the existing tx, it means it's safe to re-offer the same inputs during startup. And since deep down the nursery also uses the sweeper, it means we can easily make all the resolvers stateless.
Atm another usage of the outputIncubating
is to decide whether we should resolve the stage one or stage two HTLC, which can also be changed to rely on chain notifications. So the changes would be something like,
- subscribe the htlc tx spending at the very beginning when initializing the resolver
- when
Resolve
it, if the htlc tx is already spent, subscribe to the output of the htlc tx- if the output is spent, we are done here
- otherwise, wait for the spend to resolve the output spent.
Basically we just outsource the info provided by outputIncubating
to chain notification, further simplify the resolvers here. The only part I'm not sure is for neutrino backend, since there's no mempool, it can only rely on the sweeper's store to decide whether an input is new or not, which means we may need to store more info there (currently only the txid).
I actually plan to do this after the blockbeat
is in to limit the scope, rough idea is 1) refactor a bit to get rid of outputIncubating
; 2) do a db migration to remove the outputIncubating
(maybe not needed?); 3) completely remove the nursery! WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically we just outsource the info provided by outputIncubating to chain notification, further simplify the resolvers here. The only part I'm not sure is for neutrino backend, since there's no mempool, it can only rely on the sweeper's store to decide whether an input is new or not, which means we may need to store more info there (currently only the txid).
does our SpendRegistration also consider mempool spends, I thought for this we implemented waitForMempoolOrBlockSpend
so I am not sure why the mempool is relevant here and could cause problems for neutrino.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the mempool is relevant here and could cause problems for neutrino.
because of this,
Line 1212 in fa309c9
state, rbfInfo := s.decideStateAndRBFInfo(input.input.OutPoint()) |
// | ||
// NOTE: Part of the ContractResolver interface. | ||
func (c *commitSweepResolver) IsResolved() bool { | ||
return c.resolved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move this method to the contract kit, that way we hide the underlying atomic details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -204,7 +204,7 @@ func (h *htlcIncomingContestResolver) Resolve() (ContractResolver, error) { | |||
log.Infof("%T(%v): HTLC has timed out (expiry=%v, height=%v), "+ | |||
"abandoning", h, h.htlcResolution.ClaimOutpoint, | |||
h.htlcExpiry, currentHeight) | |||
h.resolved = true | |||
h.resolved.Store(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MarkResolved
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -1294,13 +1294,13 @@ func (h *htlcTimeoutResolver) resolveTimeoutTxOutput(op wire.OutPoint) error { | |||
// Launch creates an input based on the details of the outgoing htlc resolution | |||
// and offers it to the sweeper. | |||
func (h *htlcTimeoutResolver) Launch() error { | |||
if h.launched { | |||
if h.launched.Load() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here re read/write methods to encapsulate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I like it, done!
c.activeResolversLock.Lock() | ||
c.activeResolvers = resolvers | ||
c.activeResolversLock.Unlock() | ||
|
||
// Launch all resolvers. | ||
c.launchResolvers() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a recent PR, we eliminated a circular waiting condition related to tap channels (in the litd
context). With this method now blocking to launch resolvers, we'll need to make sure we don't reintroduce such a livelock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we know why it was blocking? In this version the Launch
only sends the input to the sweeper, and the waiting for spend is handled in the Resolve
.
contractcourt/channel_arbitrator.go
Outdated
var reports []*ContractReport | ||
for _, resolver := range c.activeResolvers { | ||
for _, resolver := range c.resolvers() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where was the race? At a glance, everything looks to be mutex proteted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm looks like it no longer existed due to the fix of resolved
and launched
, now removed.
1e58126
to
abbee3b
Compare
897e642
to
d854e87
Compare
return err | ||
} | ||
|
||
h.outputIncubating = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically we just outsource the info provided by outputIncubating to chain notification, further simplify the resolvers here. The only part I'm not sure is for neutrino backend, since there's no mempool, it can only rely on the sweeper's store to decide whether an input is new or not, which means we may need to store more info there (currently only the txid).
does our SpendRegistration also consider mempool spends, I thought for this we implemented waitForMempoolOrBlockSpend
so I am not sure why the mempool is relevant here and could cause problems for neutrino.
Budget: budget, | ||
|
||
// For second level success tx, there's no rush | ||
// to get it confirmed, so we use a nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think nil
is not correct here technically, we use a None Option
err := h.DeliverResolutionMsg(ResolutionMsg{ | ||
SourceChan: h.ShortChanID, | ||
HtlcIndex: h.htlc.HtlcIndex, | ||
Failure: failureMsg, | ||
}) | ||
if err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I think this can also be done where we resolve the remote commitment ? Because this method is called checkpointClaim which imo referes to the h.Checkpoint func ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we call this method in resolveRemoteCommitOutput
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but tbh I do not understand why we are not calling the DeliverResoluitonMsg
in resolveRemoteCommitOutput
rather then doing this with an if clause here, to me it does not logically connect to the name checkpointClaim
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually i can be placed there, let me try and see if the itests pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just my opinion if you think it might fit here as well leave it :)
289c5e4
to
070a95c
Compare
d854e87
to
0397aaf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks great had my final pass,
I tend to go even further with refactoring and try to not call the Launch
methods every time a block comes it.
In general this PR is a really nice refactor, makes things now super easy to read 👌
err := h.DeliverResolutionMsg(ResolutionMsg{ | ||
SourceChan: h.ShortChanID, | ||
HtlcIndex: h.htlc.HtlcIndex, | ||
Failure: failureMsg, | ||
}) | ||
if err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but tbh I do not understand why we are not calling the DeliverResoluitonMsg
in resolveRemoteCommitOutput
rather then doing this with an if clause here, to me it does not logically connect to the name checkpointClaim
?
contractcourt/contract_resolver.go
Outdated
|
||
// launched specifies whether the resolver has been launched. Calling | ||
// `Launch` will be a no-op if this is true. | ||
launched bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand why we launch the resolvers every time we receive a blockbeat, isn't it enough to launch them when we resolve the contracts which happens when the commitment tx confirms or the resolvers relaunch ?
@@ -116,139 +114,60 @@ func (h *htlcSuccessResolver) ResolverKey() []byte { | |||
// anymore. Every HTLC has already passed through the incoming contest resolver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question: why aren't we using the sweeper result channel to check when the sweep is confirrmed for the htlc resolvers at least for the anchor channel case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We def should, not sure why it wasn't used in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok probably worth mentioning a todo or something so don't forget it ?
h.log.Debugf("launching contest resolver...") | ||
|
||
// Query the preimage and apply it if we already know it. | ||
applied, err := h.findAndapplyPreimage() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we cannot just let the Resolve() method handle this, do you need this one here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to find the preimage here if it's available, otherwise we cannot create the sweep tx, and Resolve
is not bounded by the blockbeat.
// | ||
// TODO(yy): move this logic to link and let the preimage be accessed | ||
// via the preimage beacon. | ||
resolution, err := h.Registry.NotifyExitHopHtlc( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relates to my question above, do we really need the IncomingContest do this in the launch method, feels like it is already part of it's resolve method ?
return err | ||
} | ||
|
||
if uint32(bestHeight) < h.htlcResolution.Expiry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why we cannot do this in the resolve method ? The resolver get's notified with each block an acts accordingly.
@@ -176,13 +176,13 @@ var _ ContractResolver = (*anchorResolver)(nil) | |||
|
|||
// Launch offers the anchor output to the sweeper. | |||
func (c *anchorResolver) Launch() error { | |||
if c.launched { | |||
if c.isLaunched() { | |||
c.log.Tracef("already launched") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this concurrent case due to the tests or is there also the possibility in the normal program flow that the Launch is called concurrently ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it's possible multiple places call this.
@@ -3117,6 +3181,9 @@ func (c *ChannelArbitrator) handleBlockbeat(beat chainio.Blockbeat) error { | |||
} | |||
} | |||
|
|||
// Launch all active resolvers when a new blockbeat is received. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not do this, it feels hacky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this hacky? Or what is the alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So cmiiw looking at the code, we only need to relaunch the resolvers after every blockbeat because of two contest resolver. Given this fact I would say we should probably try to target this problem from that point of view maybe ?
In general all the contest resolvers still have the RegisterBlockEpochNtfn
which kinda circumvents the blockbeat notification.
htlcIncomingContestResolver: So in the resolver method we will still listens to the other block source and also act accordingly and launch the htlctimeoutresolver
so why do we need to trigger this one also via the blockbeat notification this seems redundant and should all be triggered only by the blockbeat?
same thing for htlcOutgoingContestResolver
So maybe we can funnel in the blockbeat notification to the resolver and then can trigger the correct timeout/success Resolver in that way ?
Maybe I am still missing something so happy to hear your explanation 🙏
0397aaf
to
1a6653f
Compare
070a95c
to
c472db5
Compare
1a6653f
to
33064a3
Compare
c472db5
to
837d5b0
Compare
33064a3
to
ef85fb5
Compare
We now put the outpoint in the resolvers's logging so it's easier to debug.
This commit adds a few helper methods to decide how the htlc output should be spent.
This commit is a pure refactor in which moves the sweep handling logic into the new methods.
This commit refactors the `Resolve` method by adding two resolver handlers to handle waiting for spending confirmations.
This commit adds new methods to handle making sweep requests based on the spending path used by the outgoing htlc output.
This commit adds checkpoint methods in `htlcTimeoutResolver`, which are similar to those used in `htlcSuccessResolver`.
This commit adds more methods to handle resolving the spending of the output based on different spending paths.
We will use this and its following commits to break the original `Resolve` methods into two parts - the first part is moved to a new method `Launch`, which handles sending a sweep request to the sweeper. The second part remains in `Resolve`, which is mainly waiting for a spending tx. Breach resolver currently doesn't do anything in its `Launch` since the sweeping of justice outputs are not handled by the sweeper yet.
This commit breaks the `Resolve` into two parts - the first part is moved into a `Launch` method that handles sending sweep requests, and the second part remains in `Resolve` which handles waiting for the spend. Since we are using both utxo nursery and sweeper at the same time, to make sure this change doesn't break the existing behavior, we implement the `Launch` as following, - zero-fee htlc - handled by the sweeper - direct output from the remote commit - handled by the sweeper - legacy htlc - handled by the utxo nursery
This commit breaks the `Resolve` into two parts - the first part is moved into a `Launch` method that handles sending sweep requests, and the second part remains in `Resolve` which handles waiting for the spend. Since we are using both utxo nursery and sweeper at the same time, to make sure this change doesn't break the existing behavior, we implement the `Launch` as following, - zero-fee htlc - handled by the sweeper - direct output from the remote commit - handled by the sweeper - legacy htlc - handled by the utxo nursery
When calling `NotifyExitHopHtlc` it is allowed to pass a chan to subscribe to the HTLC's resolution when it's settled. However, this method will also return immediately if there's already a resolution, which means it behaves like a notifier and a getter. If the caller decides to only use the getter to do a non-blocking lookup, it can pass a nil subscriber chan to bypass the notification.
A minor refactor is done to support implementing `Launch`.
This commit makes `resolved` an atomic bool to avoid data race. This field is now defined in `contractResolverKit` to avoid code duplication.
In this commit, we break the old `launchResolvers` into two steps - step one is to launch the resolvers synchronously, and step two is to actually waiting for the resolvers to be resolved. This is critical as in the following commit we will require the resolvers to be launched at the same blockbeat when a force close event is sent by the chain watcher.
We need to offer the outgoing htlc one block earlier to make sure when the expiry height hits, the sweeper will not miss sweeping it in the same block. This also means the outgoing contest resolver now only does one thing - watch for preimage spend till height expiry-1, which can easily be moved into the timeout resolver instead in the future.
ef85fb5
to
cebad6d
Compare
Depends on
blockbeat
#8894NOTE: itest is fixed in the final PR
Turns out mounting
blockbeat
inChannelArbitrator
can be quite challenging (unit tests, itest, etc). This PR attempts to implement it in hopefully the least disruptive way - onlychainWatcher
implementsConsumer
, and the contract resolvers are kept stateless (in terms of blocks). The main changes are,Resolve
method is broken into two steps: 1)Launch
the resolver, which handles sending the sweep request, and 2)Resolve
the resolver, which handles monitoring the spending of the output.chainWatcher
implementsConsumer
in the following PR.Alternatives
The original attempt is to make the resolvers subscribe to a blockbeat chan, as implemented in #8717. The second attempt is to make the resolvers also blockbeat
Consumer
, as implemented here.This third approach is chosen as 1) it greatly limits the scope otherwise a bigger refactor of channel arbitrator may be needed, and 2) the resolvers can be made stateless in terms of blocks, and be fully managed by the channel arbitrator. In other words, when a new block arrives, the channel arbitrator decides whether to launch the resolvers or not, so the resolvers themselves don't need this block info.
In fact, there are only two resolvers that subscribe to blocks, the incoming contest resolver, which uses block height to decide whether to give up resolving an expired incoming htlc; and the outgoing contest resolver, which uses the block height to choose to transform itself into a timeout resolver. IMO if we can remove the inheritance pattern used in
contest resolver -> time/success resolver
and manage to transform resolvers in channel arbitrator, we can further remove those two block subscriptions. As for now, we can leave them there as they have little impact on the block consumption order enforced by the blockbeat.