Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial chain access implementation #9

Closed

Conversation

tnull
Copy link
Collaborator

@tnull tnull commented Sep 1, 2022

Based on #8.

This PR adds an initial implementation for LdkLiteChainAccess, which provides the necessary interfaces for accessing chain data.

Copy link

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly questions.

src/access.rs Outdated Show resolved Hide resolved
src/access.rs Outdated Show resolved Hide resolved
src/access.rs Outdated Show resolved Hide resolved
src/access.rs Outdated
let mut unconfirmed_registered_txs = Vec::new();

for txid in registered_txs {
if let Some(tx_status) = client.get_tx_status(&txid)? {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are each of these calls - are they hitting a server somewhere, or are they hitting a local cache? When does that cache get updated? How do we add new entries to the cache?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, they are all just requests directly hitting the Esplora server for now. While these calls are kind of specific to get the current status of a transaction, we may want to look into using the BDK wallet cache to retrieve some data when we tackle #3 .

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, so this is gonna be a really slow function? There's a lot of serial requests. Is there some way to speed that up?

Copy link
Collaborator Author

@tnull tnull Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, Esplora doesn't support any kind of batching really, but we could at least make some of the functionality run concurrently? As it stands I grouped the method into three scopes anyways (essentially corresponding to best_block_updated, transactions_confirmed, transaction_unconfirmed). Could these three run in separate threads without an issue? The Confirm::transactions_confirmed docs say: "However, in the event of a chain reorganization, it must not be called with a header that is no longer in the chain as of the last call to best_block_updated." But how bad would it be to let all three run concurrently rather than in order, especially if we check again for updates at the end of the method?

Also, I could take the road of BDK's wallet sync and spawn a number of threads to allow for concurrent client requests, e.g., to get_tx_status.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood that we were doing things async, does BDK not support rust-async esplora requests? Spinning up more threads probably isn't worth it, but if we can do parallel requests by just polling multiple futures together that likely is.

More generally, I wonder seriously about what happens if we get a reorg in the middle of this function, or between calls for a single transaction. I didn't carefully review it, but at first glance I'm a bit dubious that we meet the LDK requirements of topological sorting of transactions (and block-disconnections) in such a case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood that we were doing things async, does BDK not support rust-async esplora requests? Spinning up more threads probably isn't worth it, but if we can do parallel requests by just polling multiple futures together that likely is.

No, currently all is blocking. rust-esplora-client supports async, but using this would complicate things quite a bit, IIUC. For example, we'd need to run LdkLiteChainAccess in its dedicated thread and then start communicating with it, e.g., via message passing. Happy to explore this further in a follow-up, but to keep things a bit more simple, I'd like to keep it blocking for now.

More generally, I wonder seriously about what happens if we get a reorg in the middle of this function, or between calls for a single transaction. I didn't carefully review it, but at first glance I'm a bit dubious that we meet the LDK requirements of topological sorting of transactions (and block-disconnections) in such a case.

Mh, are there any (undocumented) requirements that I'm not aware of? I tried to consider the following:

  1. Notify of new blocks, if there are intermediaries, feeding just the newest one is fine.
  2. Notify of newly confirmed transactions, ensure inter- and intra-block ordering.
  3. Notify of newly unconfirmed transactions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, currently all is blocking. rust-esplora-client supports async, but using this would complicate things quite a bit, IIUC. For example, we'd need to run LdkLiteChainAccess in its dedicated thread and then start communicating with it, e.g., via message passing. Happy to explore this further in a follow-up, but to keep things a bit more simple, I'd like to keep it blocking for now.

Yea, we can do that as a followup, but I don't see why we're building blocking things to begin with here. If we want to end up async ISTM we should just start there, converting everything is a pain. The nice thing with async is we don't have to spawn a new thread for everything, which ends up with a lot of overhead. I'm not sure why we'd have more message passing with an async implementation in LDKLite than with a sync one - most calls would be correctly done in a sync context.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, now went async with 35bad47.

src/access.rs Outdated Show resolved Hide resolved
src/access.rs Outdated Show resolved Hide resolved
@tnull tnull force-pushed the 2022-09-initial-chain-access branch 4 times, most recently from 7182aca to 553d966 Compare September 19, 2022 16:50
@tnull
Copy link
Collaborator Author

tnull commented Sep 19, 2022

Rebased on main.

src/access.rs Outdated Show resolved Hide resolved
src/access.rs Outdated Show resolved Hide resolved
src/access.rs Show resolved Hide resolved
src/access.rs Outdated Show resolved Hide resolved
src/access.rs Outdated Show resolved Hide resolved
src/access.rs Show resolved Hide resolved
src/access.rs Outdated
}
}

// TODO: check whether new outputs have been registered by now and process them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this required ?
won't it just sync in next sync call ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if I understand @TheBlueMatt's concerns correctly, double-checking if something changed while we were running this is the least we can do here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, this shouldn't be required.
if we called sync from outside this chain-access interface, we synced according to tx's and outputs available at that point in time.
when we release the lock, there can be another set of tx and outputs added but that should be handled in next call. we can not make sure to sync everything at this place.

src/access.rs Outdated Show resolved Hide resolved
Ok(())
}

pub(crate) async fn sync(&self, confirmables: Vec<&(dyn Confirm + Sync)>) -> Result<(), Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for my understanding, top level function will eventually block on this right ?
and that is before starting next sync call

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we spawn a new background task via the tokio runtime. This task runs in a loop however, currently set to sleep 5 secs between each sync (attempt).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this go wrong ?
For e.g. if we start this sync every 1 sec and it takes 10 secs to complete each sync. Multiple syncs are active at the same instance of time.
Normally it would have been fine but since each sync acquires resource locks, multiple instances of this sync cannot run in parallel or independent of each other.
In effect, they are running serially waiting for the previous one to complete first. This can create an ever growing waiting tasks queue if completionTime > syncTriggerInterval.

Is this understanding correct ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, no, it's just a single loop, so no concurrent sync attempts should be spawned, it would always wait 5 secs between finishing the last and starting the next attempt:

https://github.com/lightningdevkit/ldk-lite/blob/389f960d7533fcc3c57071d4dc15549d14de538b/src/lib.rs#L488-L524

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The chosen parameters are however entirely up for debate)

src/access.rs Outdated
if tx_status.confirmed {
if let Some(tx) = client.get_tx(&txid).await? {
if let Some(block_height) = tx_status.block_height {
let block_header = client.get_header(block_height).await?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is race-y - if there's a reorg while we're processing this transaction we'll get the header at height X, but that may no longer be the block in which this transaction was confirmed.

Copy link
Collaborator Author

@tnull tnull Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is, looking at these two lines only. However, if the transaction would have been reorged-out the next call to get_merkle_proof would fail. I now added a explicit check that the height didn't change inbetween.

Mh, but this likely won't catch the case in which the transaction is also included in the new tip. In this case we could end up reporting the old header. To fix that we probably want to query the block by hash here, just opened an upstream PR that would allow us to do so (bitcoindevkit/rust-esplora-client#17).

Copy link
Contributor

@G8XSU G8XSU Oct 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think the check block_height == merkle_proof.block_height doesn't help but is misleading.
    because the reorged new block could also have same height ?

  • And even with get_merkle_proof(), block could always get re-orged after that. So we mainly rely on next sync to fix this for us.

  • So the fix is to get header from block_hash inside of tx_status, right?
    For now what we can do is compare block_hash from tx_status and the one from header? It will remove any unintended confirms with mismatched header.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, yea, we'll need upstream to fix this. A simpler fix, if its available, would be to extract the block header from the merkle proof. That would also reduce the number of requests we have to send the server...I think we could then replace the whole block with a single request to get a merkle proof (and understand that the tx is not confirmed if the request fails?).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, yea, we'll need upstream to fix this. A simpler fix, if its available, would be to extract the block header from the merkle proof.

Unfortunately, it is not available. While there is an alternate (so far unimplemented) API method that would return the merkle proof in Bitcoind's merkleblock format which would feature the header, this then in turn wouldn't give us the pos, IIUC.

src/access.rs Outdated Show resolved Hide resolved
src/access.rs Outdated
if tx_status.confirmed {
if let Some(tx) = client.get_tx(&txid).await? {
if let Some(block_height) = tx_status.block_height {
let block_header = client.get_header(block_height).await?;
Copy link
Contributor

@G8XSU G8XSU Oct 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think the check block_height == merkle_proof.block_height doesn't help but is misleading.
    because the reorged new block could also have same height ?

  • And even with get_merkle_proof(), block could always get re-orged after that. So we mainly rely on next sync to fix this for us.

  • So the fix is to get header from block_hash inside of tx_status, right?
    For now what we can do is compare block_hash from tx_status and the one from header? It will remove any unintended confirms with mismatched header.

src/access.rs Outdated Show resolved Hide resolved
This was referenced Oct 20, 2022
@TheBlueMatt
Copy link

Note: also need to carefully review for compliance with https://docs.rs/lightning/latest/lightning/chain/trait.Confirm.html#order

@TheBlueMatt
Copy link

Will review this more carefully later today.

Copy link

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this approach is going to work with the LDK ordering requirements. We'll need to somewhat rethink what we're doing here. (a) we need to connect transactions in topological order. That should at least be somewhat easy, I think. (b) we need to do disconnections first, and then refuse to do any connections if the chain reorgs out from under us while we're running, which may imply somehow ensuring any headers we're connecting a tx from are in the chain when we started?

src/access.rs Outdated Show resolved Hide resolved
src/access.rs Outdated Show resolved Hide resolved
@tnull
Copy link
Collaborator Author

tnull commented Oct 24, 2022

(a) we need to connect transactions in topological order. That should at least be somewhat easy, I think.

I agree that should be straight forward, if we apply the change to Confirm::get_relevant_txids (see #9 (comment)).

(b) we need to do disconnections first, and then refuse to do any connections if the chain reorgs out from under us while we're running, which may imply somehow ensuring any headers we're connecting a tx from are in the chain when we started?

For the time being, could we track all block hashes we get from the get_tx_status/get_merkle_proof calls and then have dedicated section where we only mark the transactions confirmed when all of them are still in the best chain and the tip is still equal to the one we started with? This is still somewhat race-y, but possibly less so, and hopefully still would only spawn 1-2 additional queries in the normal case, as syncs should happen more frequently than new blocks arrive.
Moreover, I'd then don't quite see a big difference to the (unavoidable) case in which we get a reorg after we're done syncing: in both cases we'd have stale information about the confirmation status until the next sync fixes this.

@tnull tnull requested a review from jkczyz October 24, 2022 10:14
@TheBlueMatt
Copy link

I agree that should be straight forward, if we apply the change to Confirm::get_relevant_txids (see #9 (comment)).

I don't see how this helps with topological ordering? We need to sort tx connection into the order they appear on the chain, basically, but I think that's doable here.

Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the plan for testing this?

bdk = { git = "https://github.com/tnull/bdk", branch="feat/use-external-esplora-client", features = ["use-esplora-ureq", "key-value-db"]}
bdk = { git = "https://github.com/tnull/bdk", branch="feat/use-external-esplora-client", features = ["use-esplora-reqwest", "key-value-db"]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How comfortable are we with taking on the reqwest dependency? @TheBlueMatt I believe you had a concern awhile back: lightningdevkit/rust-lightning#763 (comment)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a bit out of date these days. I'd really love to avoid taking an HTTP dependency, I think for something like this its generally overkill, but we probably do need to take a TLS dependency, and at the end of the day its not our stuff anyway - we can't pick what BDK offers for their async HTTP client.

src/access.rs Outdated Show resolved Hide resolved
src/access.rs Outdated Show resolved Hide resolved
src/access.rs Outdated Show resolved Hide resolved
src/access.rs Outdated Show resolved Hide resolved
src/access.rs Outdated Show resolved Hide resolved
src/access.rs Outdated Show resolved Hide resolved
src/access.rs Show resolved Hide resolved
src/access.rs Outdated Show resolved Hide resolved
src/access.rs Outdated Show resolved Hide resolved
src/access.rs Show resolved Hide resolved
src/access.rs Show resolved Hide resolved
@TheBlueMatt
Copy link

What's the plan wrt ensuring the code is safe? Are we intending to merge it before we get to that point and fix it later (if so I think we need a Big Fat Warning both in the file and in the README) or do we intend to leave this as a PR until we get there?

@tnull
Copy link
Collaborator Author

tnull commented Nov 3, 2022

What's the plan wrt ensuring the code is safe? Are we intending to merge it before we get to that point and fix it later (if so I think we need a Big Fat Warning both in the file and in the README) or do we intend to leave this as a PR until we get there?

I'd prefer to go merge it and revisit it once we still find it to be lacking and have a clear path forward. However, before we're thinking of merging I want address two main things to bring this as close as possible to being safe:

  • Build in additional checks for detecting inconsistencies/reorgs, as I think we can mostly get there by doing consistency checks between all data returned by API calls.
  • Based on that find a better loop construction that allows us to restart sync as soon as we detect any inconsistencies. This should also allow us to process any leftover/dependent transactions queued up in the previous iteration, ensuring the right ordering. It should also help discern permanent vs temporary failures (i.e., do we ever want to ? out of the function or even panic?)

In parallel I'm exploring whether a push-based approach based on Electrum is feasible and preferable.

@tnull
Copy link
Collaborator Author

tnull commented Nov 3, 2022

What's the plan for testing this?

I imagine unit tests for individual parts of the chain access could be a bit tricky (and would likely amount to replicating coverage from upstream projects), but I'll implement end-to-end tests using the electrsd/bitcoind crates, likely reusing the approach from the Esplora client (cf. https://github.com/bitcoindevkit/rust-esplora-client/blob/e24663b6bc8536fef60a77681f78792d770ea315/src/lib.rs#L218-L298 ff.)

@tnull
Copy link
Collaborator Author

tnull commented Nov 4, 2022

Now making use of lightningdevkit/rust-lightning#1796 and also implemented yet another API method upstream (bitcoindevkit/rust-esplora-client#28) that allows us to retrieve a MerkleBlock, which reduces the number of needed calls. Also DRYed up the code a bit, will have another go at it though.

@TheBlueMatt
Copy link

Let me know when you want another review pass here.

@tnull tnull force-pushed the 2022-09-initial-chain-access branch from 18e7fe6 to d5df624 Compare November 7, 2022 15:33
@tnull
Copy link
Collaborator Author

tnull commented Nov 7, 2022

Alright, I iterated once more on the newer approach and added additional checks. I now squashed fixups and split out the wallet functionality into its own commit. I'll probably split out the wallet entirely into it's own dedicated module in the future, but avoided to do here since I'd otherwise would need to replicate a good part of #26 here..

Should be ready for another round.

src/access.rs Show resolved Hide resolved
src/access.rs Show resolved Hide resolved
src/access.rs Outdated Show resolved Hide resolved
src/access.rs Outdated Show resolved Hide resolved
src/access.rs Outdated Show resolved Hide resolved
src/access.rs Show resolved Hide resolved
src/access.rs Outdated Show resolved Hide resolved
src/access.rs Outdated Show resolved Hide resolved
@benthecarman
Copy link

benthecarman commented Nov 22, 2022

I had copy pasted a lot of this in here. We ran into an issue where a channel would never become ready because we never mark it confirmed. watched_transactions would never have anything added to it so they would never be checked if they were queued. Not sure if I was using it wrong or what, but we added *self.watched_transactions.lock().unwrap() = to_watch; (diff here) to the end of sync_unconfirmed_transactions which fixed the issue.

@tnull
Copy link
Collaborator Author

tnull commented Nov 22, 2022

I had copy pasted a lot of this in here. We ran into an issue where a channel would never become ready because we never mark it confirmed. watched_transactions would never have anything added to it so they would never be checked if they were queued. Not sure if I was using it wrong or what, but we added *self.watched_transactions.lock().unwrap() = to_watch; (diff here) to the end of sync_unconfirmed_transactions which fixed the issue.

Right, the need to re-register transactions (or just keep monitoring all returned by Filter) has been discussed above (see #9 (comment))

@tnull
Copy link
Collaborator Author

tnull commented Nov 24, 2022

Closing this in favor of the upstreamed version: lightningdevkit/rust-lightning#1870.

Will open another PR for the wallet parts ASAP.

@tnull tnull closed this Nov 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants