-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial chain access implementation #9
Conversation
d548d1c
to
766a0a7
Compare
766a0a7
to
b42e3e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly questions.
src/access.rs
Outdated
let mut unconfirmed_registered_txs = Vec::new(); | ||
|
||
for txid in registered_txs { | ||
if let Some(tx_status) = client.get_tx_status(&txid)? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are each of these calls - are they hitting a server somewhere, or are they hitting a local cache? When does that cache get updated? How do we add new entries to the cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, they are all just requests directly hitting the Esplora server for now. While these calls are kind of specific to get the current status of a transaction, we may want to look into using the BDK wallet cache to retrieve some data when we tackle #3 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, so this is gonna be a really slow function? There's a lot of serial requests. Is there some way to speed that up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, Esplora doesn't support any kind of batching really, but we could at least make some of the functionality run concurrently? As it stands I grouped the method into three scopes anyways (essentially corresponding to best_block_updated
, transactions_confirmed
, transaction_unconfirmed
). Could these three run in separate threads without an issue? The Confirm::transactions_confirmed
docs say: "However, in the event of a chain reorganization, it must not be called with a header that is no longer in the chain as of the last call to best_block_updated." But how bad would it be to let all three run concurrently rather than in order, especially if we check again for updates at the end of the method?
Also, I could take the road of BDK's wallet sync and spawn a number of threads to allow for concurrent client requests, e.g., to get_tx_status
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood that we were doing things async, does BDK not support rust-async esplora requests? Spinning up more threads probably isn't worth it, but if we can do parallel requests by just polling multiple futures together that likely is.
More generally, I wonder seriously about what happens if we get a reorg in the middle of this function, or between calls for a single transaction. I didn't carefully review it, but at first glance I'm a bit dubious that we meet the LDK requirements of topological sorting of transactions (and block-disconnections) in such a case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood that we were doing things async, does BDK not support rust-async esplora requests? Spinning up more threads probably isn't worth it, but if we can do parallel requests by just polling multiple futures together that likely is.
No, currently all is blocking. rust-esplora-client
supports async, but using this would complicate things quite a bit, IIUC. For example, we'd need to run LdkLiteChainAccess
in its dedicated thread and then start communicating with it, e.g., via message passing. Happy to explore this further in a follow-up, but to keep things a bit more simple, I'd like to keep it blocking for now.
More generally, I wonder seriously about what happens if we get a reorg in the middle of this function, or between calls for a single transaction. I didn't carefully review it, but at first glance I'm a bit dubious that we meet the LDK requirements of topological sorting of transactions (and block-disconnections) in such a case.
Mh, are there any (undocumented) requirements that I'm not aware of? I tried to consider the following:
- Notify of new blocks, if there are intermediaries, feeding just the newest one is fine.
- Notify of newly confirmed transactions, ensure inter- and intra-block ordering.
- Notify of newly unconfirmed transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, currently all is blocking. rust-esplora-client supports async, but using this would complicate things quite a bit, IIUC. For example, we'd need to run LdkLiteChainAccess in its dedicated thread and then start communicating with it, e.g., via message passing. Happy to explore this further in a follow-up, but to keep things a bit more simple, I'd like to keep it blocking for now.
Yea, we can do that as a followup, but I don't see why we're building blocking things to begin with here. If we want to end up async ISTM we should just start there, converting everything is a pain. The nice thing with async is we don't have to spawn a new thread for everything, which ends up with a lot of overhead. I'm not sure why we'd have more message passing with an async implementation in LDKLite than with a sync one - most calls would be correctly done in a sync context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, now went async with 35bad47.
7182aca
to
553d966
Compare
Rebased on main. |
src/access.rs
Outdated
} | ||
} | ||
|
||
// TODO: check whether new outputs have been registered by now and process them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this required ?
won't it just sync in next sync call ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if I understand @TheBlueMatt's concerns correctly, double-checking if something changed while we were running this is the least we can do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo, this shouldn't be required.
if we called sync from outside this chain-access interface, we synced according to tx's and outputs available at that point in time.
when we release the lock, there can be another set of tx and outputs added but that should be handled in next call. we can not make sure to sync everything at this place.
Ok(()) | ||
} | ||
|
||
pub(crate) async fn sync(&self, confirmables: Vec<&(dyn Confirm + Sync)>) -> Result<(), Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for my understanding, top level function will eventually block on this right ?
and that is before starting next sync call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we spawn a new background task via the tokio runtime. This task runs in a loop however, currently set to sleep 5 secs between each sync (attempt).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this go wrong ?
For e.g. if we start this sync every 1 sec and it takes 10 secs to complete each sync. Multiple syncs are active at the same instance of time.
Normally it would have been fine but since each sync acquires resource locks, multiple instances of this sync cannot run in parallel or independent of each other.
In effect, they are running serially waiting for the previous one to complete first. This can create an ever growing waiting tasks queue if completionTime > syncTriggerInterval.
Is this understanding correct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, no, it's just a single loop, so no concurrent sync attempts should be spawned, it would always wait 5 secs between finishing the last and starting the next attempt:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The chosen parameters are however entirely up for debate)
src/access.rs
Outdated
if tx_status.confirmed { | ||
if let Some(tx) = client.get_tx(&txid).await? { | ||
if let Some(block_height) = tx_status.block_height { | ||
let block_header = client.get_header(block_height).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is race-y - if there's a reorg while we're processing this transaction we'll get the header at height X, but that may no longer be the block in which this transaction was confirmed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is, looking at these two lines only. However, if the transaction would have been reorged-out the next call to get_merkle_proof
would fail. I now added a explicit check that the height didn't change inbetween.
Mh, but this likely won't catch the case in which the transaction is also included in the new tip. In this case we could end up reporting the old header. To fix that we probably want to query the block by hash here, just opened an upstream PR that would allow us to do so (bitcoindevkit/rust-esplora-client#17).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I think the check
block_height == merkle_proof.block_height
doesn't help but is misleading.
because the reorged new block could also have same height ? -
And even with get_merkle_proof(), block could always get re-orged after that. So we mainly rely on next sync to fix this for us.
-
So the fix is to get header from block_hash inside of tx_status, right?
For now what we can do is compare block_hash from tx_status and the one from header? It will remove any unintended confirms with mismatched header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, yea, we'll need upstream to fix this. A simpler fix, if its available, would be to extract the block header from the merkle proof. That would also reduce the number of requests we have to send the server...I think we could then replace the whole block with a single request to get a merkle proof (and understand that the tx is not confirmed if the request fails?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, yea, we'll need upstream to fix this. A simpler fix, if its available, would be to extract the block header from the merkle proof.
Unfortunately, it is not available. While there is an alternate (so far unimplemented) API method that would return the merkle proof in Bitcoind's merkleblock
format which would feature the header, this then in turn wouldn't give us the pos
, IIUC.
src/access.rs
Outdated
if tx_status.confirmed { | ||
if let Some(tx) = client.get_tx(&txid).await? { | ||
if let Some(block_height) = tx_status.block_height { | ||
let block_header = client.get_header(block_height).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I think the check
block_height == merkle_proof.block_height
doesn't help but is misleading.
because the reorged new block could also have same height ? -
And even with get_merkle_proof(), block could always get re-orged after that. So we mainly rely on next sync to fix this for us.
-
So the fix is to get header from block_hash inside of tx_status, right?
For now what we can do is compare block_hash from tx_status and the one from header? It will remove any unintended confirms with mismatched header.
Note: also need to carefully review for compliance with https://docs.rs/lightning/latest/lightning/chain/trait.Confirm.html#order |
Will review this more carefully later today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this approach is going to work with the LDK ordering requirements. We'll need to somewhat rethink what we're doing here. (a) we need to connect transactions in topological order. That should at least be somewhat easy, I think. (b) we need to do disconnections first, and then refuse to do any connections if the chain reorgs out from under us while we're running, which may imply somehow ensuring any headers we're connecting a tx from are in the chain when we started?
I agree that should be straight forward, if we apply the change to
For the time being, could we track all block hashes we get from the |
I don't see how this helps with topological ordering? We need to sort tx connection into the order they appear on the chain, basically, but I think that's doable here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the plan for testing this?
bdk = { git = "https://github.com/tnull/bdk", branch="feat/use-external-esplora-client", features = ["use-esplora-ureq", "key-value-db"]} | ||
bdk = { git = "https://github.com/tnull/bdk", branch="feat/use-external-esplora-client", features = ["use-esplora-reqwest", "key-value-db"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How comfortable are we with taking on the reqwest
dependency? @TheBlueMatt I believe you had a concern awhile back: lightningdevkit/rust-lightning#763 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a bit out of date these days. I'd really love to avoid taking an HTTP dependency, I think for something like this its generally overkill, but we probably do need to take a TLS dependency, and at the end of the day its not our stuff anyway - we can't pick what BDK offers for their async HTTP client.
What's the plan wrt ensuring the code is safe? Are we intending to merge it before we get to that point and fix it later (if so I think we need a Big Fat Warning both in the file and in the README) or do we intend to leave this as a PR until we get there? |
I'd prefer to go merge it and revisit it once we still find it to be lacking and have a clear path forward. However, before we're thinking of merging I want address two main things to bring this as close as possible to being safe:
In parallel I'm exploring whether a push-based approach based on Electrum is feasible and preferable. |
I imagine unit tests for individual parts of the chain access could be a bit tricky (and would likely amount to replicating coverage from upstream projects), but I'll implement end-to-end tests using the |
Now making use of lightningdevkit/rust-lightning#1796 and also implemented yet another API method upstream (bitcoindevkit/rust-esplora-client#28) that allows us to retrieve a |
Let me know when you want another review pass here. |
18e7fe6
to
d5df624
Compare
Alright, I iterated once more on the newer approach and added additional checks. I now squashed fixups and split out the wallet functionality into its own commit. I'll probably split out the wallet entirely into it's own dedicated module in the future, but avoided to do here since I'd otherwise would need to replicate a good part of #26 here.. Should be ready for another round. |
I had copy pasted a lot of this in here. We ran into an issue where a channel would never become ready because we never mark it confirmed. |
Right, the need to re-register transactions (or just keep monitoring all returned by Filter) has been discussed above (see #9 (comment)) |
Closing this in favor of the upstreamed version: lightningdevkit/rust-lightning#1870. Will open another PR for the wallet parts ASAP. |
Based on #8.This PR adds an initial implementation for
LdkLiteChainAccess
, which provides the necessary interfaces for accessing chain data.