-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvdb wrapper for etcd #4015
kvdb wrapper for etcd #4015
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work! Once this lands, lnd
will take a large step towards a greater degree of robustness and reliability for more critical operations.
I've completed an initial pass through this draft PR, doing only a shallow pass over the unit tests as some of the behavior/API may change throughput the early review process. With that said though, upon an initial pass this PR appears to have rather solid unit test coverage! In the future we may want to extend the current default walletdb
interface-level tests to encompass the scenarios covered in this PR.
After my initial pass, I have (so far) one core question:
- How will errors be surfaced to the caller? By errors here, I mean terminal errors, so us running into errors that mean we _cannot) proceed at all. An example of such an error would be something like
etcd
crashing mid transaction.- Along the way to settling on a definite design/policy for this aspect of the interface, we may want to re-examine the way we handle database errors globally within
lnd
. - As is right now, in a few areas, we'll actually proceed with normal operations of the database if a persistence operation fails. This means that we may risk entering a dangerous or invalid state if say the disk fills up as we go to commit a new invoice.
- Ideally, we should halt all operations if we encounters a persistence failure. By halt here I mean enter into a clean shutdown procedure, and fail to start if the persistence error persists (pun intended ;)). We can possibly implement this via a
panic
at the db-call site, with arecover
at the top-level goroutine.
- Along the way to settling on a definite design/policy for this aspect of the interface, we may want to re-examine the way we handle database errors globally within
One aspect of this PR I had trouble grokking was the logic implementing the various range scan operations. It's likely the case that I need to revisit the etcd
client documentation to properly understand the defined API and various options.
|
||
kv := b.tx.stm.First(prefix) | ||
for kv != nil { | ||
if err := cb([]byte(kv.key[prefixLen:]), []byte(kv.val)); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bounds check key before accessing prefixLen
, both here and below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure we need prefix check here? stm.First
will either return nil
or a match in which case it contains prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bhandras completed an initial pass!
my main concerns right now are about the path/key format. iiuc, i have the following observations:
- storing binary data directly can collide with the control characters. this means there is some overhead in storage, e.g. 2x for hex, 4/3x for base64
- the path prefix is serialized for every leaf value, which incurs a blow up esp. for highly nested data since we store path data linear in the number of leaves.
i don't have a great solution for these issues, perhaps this is what you were referring when saying it is awkward to implement the kvdb
interface using etcd? it would be interesting to take a real-world lnd database and compute how much space it would occupy using this format and how much overhead we are grappling with.
channeldb/kvdb/etcd/stm.go
Outdated
// First returns the first k/v that begins with prefix or nil if there's | ||
// no such k/v pair. If the key is found it is inserted to the txn's | ||
// read set. | ||
First(prefix string) *KV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason the interface operates on strings and not byte slices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to use byte slices but we either need to convert everything to string due to etcd's interface or etcd does this anyway. One other thing that makes this hard to implement nicely is that we use two maps for storing gets/puts before commit and we need string keys for them and also need to alloc/copy anyway (both for inserting and retrieval where user gets a value).
I believe for these reasons staying with good old immutable strings is a better choice (for now). Open to suggestions if you have a better approach.
return nil | ||
} | ||
|
||
return []byte(val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly we may have to decode hex/base64 back into binary if we needed to convert it when storing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand what you meant here. If you referred to binary keys, that's done.
Can be rebased now that the |
79d3cb6
to
03e8d72
Compare
a0ea0c6
to
a92b90a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is getting much closer! Completed a new pass with the latest design of components like buckets and the rest. I'll be switching over to an active testing phase now, where I may modify the last commit slightly to be closer to how we intend things to be run (graph stored locally as an example). This modified structure should work for now (for a new node), but we'll need to make sure the eventual migration tool is able to handle the switch over.
out <- err | ||
}() | ||
|
||
return <-out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this will likely block a daemon shutdown until the transaction commits or it runs into a fatal error, which I exactly what we want I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: in testing i was unable to shut down the daemon without kill -9
due to the large read requests. we can follow up with passing cancel contexts into the queries so we can break out if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great idea and we should do it, but imho this is perhaps a bit out of scope for this PR as to achieve this we need to refactor the init order to be able to derive the private key before the channeldb is constructed.
Ah yes, good point. Indeed sounds like something for a follow up.
In that case, I think we should proceed! LGTM 💰
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🌊
Needs a rebase! Excited for what comes after this 😈
This commit adds an extended STM, similar to what available in etcd's clientv3 module. This incarnation of said STM supports additional features, like positioning in key intervals while taking into account deletes and writes as well. This is a preliminary work to support all features of the kvdb interface.
This commit adds a full interface implementation of the walletdb/kvdb interface with detailed tests.
This commit adds walletdb interface test suite to the existing test set.
The commit itslef adds stats to the transactions such that we can see how LND behaves with and etcd backend.
This commit extends lncfg to support user specified database backend. This supports configuration for both bolt and etcd (while only allowing one or the other).
This commit adds support for user configured channeldb backend.
This commit reduces the compare set size the STM will submit in transactions by adding only the bucket keys along the bucket path to a specific lock set. This lock set then used to filter the read set, effectively removing all read only keys from the transaction predicate that are not bucket keys. By tracking if a read-write tx actually changes something, we can also "bump" the mod revision of the bucket keys. With this trick we essentially implement a read-write lock for our bucket structure greatly reducing transaction processing time.
This commit adds the ExtendedBackend interface which is an extension to the walletdb.DB interface. This paves the way to using etcd.db.View and etcd.db.Update in the global View and Update functions without much code rewrite.
This commit extends lncfg with etcd TLS config and passes these parameters to the etcd client upon construction.
This commit separates all etcd related sources (sans a few stubs and config) from the rest of the source tree and makes compilation conditional depending on whether the kvdb_etcd build tag is specified.
This commit extends etcd db with namespaces without additional storage space requirements. This is simply done by instead of using an all zero root bucket id, we use the sha256 hash of the name space as our root bucket id.
Is the db time_to_open reported by LND under-reported after this merge?
I'm used to a few seconds, but not milliseconds, especially after a reboot. Before merge:
After merge:
|
In this commit, we fix a regression in our DB open time logging that was introduced in lightningnetwork#4015. Obtaining the target backend from the configuration will actually also open the database, so we need to include that in the time delta as well.
@githorray nice catch, should be fixed in this PR: #4309 |
In this commit, we fix a regression in our DB open time logging that was introduced in lightningnetwork#4015. Obtaining the target backend from the configuration will actually also open the database, so we need to include that in the time delta as well.
This PR builds on top of channeldb/kvdb: introduce new KV-store database abstraction and adds a kvdb wrapper for etcd.
The PR is still somewhat WIP, but mostly ready for initial reviews.
Missing: