-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
query+neutrino: use query dispatcher for GetBlock and GetCFilter #273
Conversation
d853b50
to
a182aad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice improvement! Did a first pass just to load some context, didn't see anything that stood out as being problematic. I assume we'll battle test all these improvements mostly indirectly through the lnd itests?
Ah, that's a good point - I should open an LND PR pointing to all these changes so that the neutrino itest suite can be run 👍 EDIT: cool, doing that here for now |
b3ecf6e
to
72c016e
Compare
c1691b6
to
7824721
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 🎉
2aadf9c
to
2870613
Compare
hmmm - ok so I think the unit-race fail is a flake. Gonna re-run it... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 💯
I saw that the unit tests were failing in #69 but they all passed locally.
ah - let me open one that only points to this and not the others. Just to isolate it a bit. |
ok, I've updated ellemouton/lnd#69 to just point to this PR. fingers crossed for unit tests 🤞 |
erg - realised we will need to make an update in btcwallet aswell since it uses the old WorkManager struct which is now an interface 🙃 |
ok the LND unit tests seem to consistently fail (at least on GH) when pointing neutrino to this PR. I will need to investigate. I will ping on here once things are working 👍 |
2870613
to
cf6d843
Compare
alrighty! after searching for waaaayyy too many hours - I found the issue 🙈 The issue is that: the worker manager, as it currently is, doesnt have any form of "maximum number of retries" and so if any query fails, it just reschedules the job so that it can be tried again later. This is not the behaviour that the rescan code expects especially during a re-org. The rescan function wants the query to error so that it can retry itself. The reason we want the calling function to control retries is cause perhaps the query failed due to a re-org and so the actual query itself needs to be changed. So what was happening in the tests that were failing is: a query was started to get a filter for a certain block hash, then a re-org happens, and then the query manager just keeps trying the same query with the old block hash forever. So now I have added a commit that allows the caller to set the maximum number of retries. gonna re-request review since a new commit was added (it is the second commit) LND things now passing: https://github.com/ellemouton/lnd/actions/runs/5046731469/jobs/9052661472?pr=69 |
aa9aec2
to
df858a8
Compare
Ok y'all - I think this is finally ready for another look :) The One thing i'm not sure how to deal with and would love a second opinion on: the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, LGTM 🎉
Can we turn ellemouton/lnd#69 into an upstream PR so we can track CI there as well?
query/workmanager_test.go
Outdated
wm, workers := startWorkManager(t, numWorkers) | ||
workMgr, workers := startWorkManager(t, numWorkers) | ||
|
||
wm, ok := workMgr.(*workManager) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could use require.IsType(t, &workManager{}, workMgr)
for a nicer failure message if this ever is false
. Also, there's a (pre-existing) typo in the godoc comment of this test function.
Need to check out the updated diff, but is it not possible to create a 1:1 mapping between them? Or is this about documenting a potential breaking API change? |
|
||
// Prepare the query options. | ||
queryOpts := []query.QueryOption{ | ||
query.Encoding(qo.encoding), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a glance the option mapping here looks to be lossy, but need to dig deeper to see if it'll matter in practice (eg: we don't need things like PeerConnectTimeout
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's defs lossy. See my overall review comment on the trickiness of the mapping between these two sets of
functional options
@@ -619,23 +745,20 @@ func (s *ChainService) prepareCFiltersQuery(blockHash chainhash.Hash, | |||
} | |||
bestHeight := int64(bestBlock.Height) | |||
|
|||
qo := defaultQueryOptions() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rather than remove the old set of options, we should instead seek to unify them. If we need to add extra options that are relevant to the main package, but not the query
package, then we can use a type alias here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this method specifically, the old options arent remove, just moved one call back. See where this method is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to add extra options that are relevant to the main package, but not the query package, then we can use a type alias here.
can you elaborate on "we can use a type alias here"?
Ah ok, I see the distinction now: Lines 140 to 151 in d209c5e
vs Lines 35 to 48 in d209c5e
|
In this commit, the worker manager is adjusted to only retry a request a certain maximum number of times. This is done so that a smooth transition can be made from call sights currently using the neutrino.queryChainServicePeers method to using the query.WorkManager instead so that the same default behaviour can be expected. In order to also cater for call sights currently using the query.WorkManager, a NoRetryMax optional paramater is added that can used to disable the retry maximum.
Introduce a `WorkManager` interface that encapsulates the Dispatcher interface along with `Start` and `Stop` methods. The previous `WorkManager` struct is converted to `workManager` which is an implementation of the new `WorkManager` interface. The `ChainService` struct is then adapted to hold the new interface instead of the struct. This will make testing easier in future.
Use the work dipatcher interface from the query package to make getdata requests instead of using the old queryPeers function.
Rename the `query` variable in `GetCFilter` so that the `query` package can be imported and used cleanly.
Use the work dispatcher query interface instead of the old queryPeers method for making getcfilter requests. This ensures that the queries are made to the most responsive peers.
Since both GetBlock and GetCFilter now make use of the work dispatcher instead of the `queryPeers` function, we can now remove the `queryPeers` and `queryChainServicePeers` functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but is it not possible to create a 1:1 mapping between them?
I've tried various things to do this but it always ends up being quite contrived. So for example:
- the query dispatcher makes use of a dynamic query timeout meaning that it tries a low timeout and increases it if needed. This also means that this timeout is not exposed to the caller. But the neutrino query options have a
Timeout
option used to set the static timeout that was used in the now deleted query method. So does this then mean we add aWithStaticQueryTimout
option to the query dispatcher and lose the nice dynamic timeout feature there?
Also, just in general: mapping one set of functional options to another is tricky cause it is hard to tell what the user specifically set/requested and what was set by default. So it is hard to know what is ok to override in the original set of options.
Or is this about documenting a potential breaking API change?
So im trying to determine if we should do a whole thing to make the options mappable or if we should rather just add extra documentation around the functional options saying "This one wont apply for GetBlock" or w/e. And then the other option is to break the API completely and have a new set of functional options.
@@ -619,23 +745,20 @@ func (s *ChainService) prepareCFiltersQuery(blockHash chainhash.Hash, | |||
} | |||
bestHeight := int64(bestBlock.Height) | |||
|
|||
qo := defaultQueryOptions() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this method specifically, the old options arent remove, just moved one call back. See where this method is called.
|
||
// Prepare the query options. | ||
queryOpts := []query.QueryOption{ | ||
query.Encoding(qo.encoding), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's defs lossy. See my overall review comment on the trickiness of the mapping between these two sets of
functional options
@@ -619,23 +745,20 @@ func (s *ChainService) prepareCFiltersQuery(blockHash chainhash.Hash, | |||
} | |||
bestHeight := int64(bestBlock.Height) | |||
|
|||
qo := defaultQueryOptions() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to add extra options that are relevant to the main package, but not the query package, then we can use a type alias here.
can you elaborate on "we can use a type alias here"?
df858a8
to
663cacc
Compare
@Roasbeef - re-requesting just to get your opinion on how to move forward with the 2 separate/conflicting query option sets |
Yeah! done here: lightningnetwork/lnd#7788 |
@Roasbeef: review reminder |
So I think we may be able to unify things by inserting a type param here: there'd be a set of "static" values (say the timeout, etc), then a value I don't think this should block the rest of this PR series though, as upfront we've been able to identify some option abstraction leaks while ensuring the existing behavior is properly capture as is. |
Agreed that in most cases we do want a dynamic timeout vs a static one. In the past, we had some hard coded timeouts tuned by devs in SF, then those in South America needed to retune things as certain timeouts were no longer valid across the higher latency environment (was mobile also). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🦠
lightninglabs/neutrino#273 Author: Elle Mouton <elle.mouton@gmail.com> Date: Tue May 9 12:33:19 2023 +0200 and btcsuite#884 Author: Olaoluwa Osuntokun <laolu32@gmail.com> Date: Fri Aug 4 11:46:12 2023 -0700
…locksync Plus lightninglabs/neutrino#273 Author: Elle Mouton <elle.mouton@gmail.com> Date: Mon May 22 15:53:49 2023 +0200 chain: PrunedBlockDispatcher bugfix.
lightninglabs/neutrino#273 Author: Elle Mouton <elle.mouton@gmail.com> Date: Tue May 9 12:33:19 2023 +0200 and btcsuite#884 Author: Olaoluwa Osuntokun <laolu32@gmail.com> Date: Fri Aug 4 11:46:12 2023 -0700
…locksync Plus lightninglabs/neutrino#273 Author: Elle Mouton <elle.mouton@gmail.com> Date: Mon May 22 15:53:49 2023 +0200 chain: PrunedBlockDispatcher bugfix.
This PR refactors the
GetBlock
andGetCFilter
methods so that they use thework dispatcher for their queries instead of using the old
queryPeers
function.The function is now removed bringing us one step closer to removing all query
logic from the main package.