-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArchiveDB #1911
ArchiveDB #1911
Conversation
220b315
to
d49c883
Compare
x/archivedb/db.go
Outdated
// If the value does not exists or it was actually removed an error is returned. | ||
// Otherwise a value does exists it will be returned, alongside with the height | ||
// at which it was updated prior the requested height. | ||
func (db *archiveDB) Get(key []byte, height uint64) ([]byte, uint64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very curious what the benchmarked performance on this is.
Wondering if it makes sense to add a cache to avoid iteration for commonly used key + heights?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrick-ogrady The iterator will consume just a single entry at most. I don't think it would be overkill to be honest; I will create some benchmark scripts with some real load and many million entries to measure performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also add the golang style benchmarks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be in favor of making any caching a follow-on PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a nice start.
x/archivedb/db.go
Outdated
// If the value does not exists or it was actually removed an error is returned. | ||
// Otherwise a value does exists it will be returned, alongside with the height | ||
// at which it was updated prior the requested height. | ||
func (db *archiveDB) Get(key []byte, height uint64) ([]byte, uint64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also add the golang style benchmarks?
2e1bc97
to
0369c4b
Compare
@darioush No, because when iterating I know the last 9 bytes are outside of the Prefix, and the last byte is |
My guess is that we'll want to use bbolt (https://github.com/etcd-io/bbolt) as the underlying db to get wicked fast iterations on the read path (it is a B-Tree). We should compare with Level/Pebble to see if that is true in practice with large DBs and random reads. |
0369c4b
to
b37846e
Compare
@patrick-ogrady Yes. @StephenButtolph's idea was to the MVP out ASAP with leveldb and our interface but we find the fastest database engine before going to master with archivedb. I will play with bbolt today. |
Might be pushing Bolt to it's limits. We should also consider https://github.com/erigontech/mdbx-go (which is just a wrapper around https://libmdbx.dqdkfa.ru/) |
As Stephen also mentions, this would be for iterating over keys that share a prefix or in lexicographical order (eg, at a given height). (not referring to iterating over the DB for the most recent record for a single key) |
I was actually talking about a bug in the current |
Please add:
as a regression test. |
Signed-off-by: Cesar <137245636+nytzuga@users.noreply.github.com> Co-authored-by: aaronbuchwald <aaron.buchwald56@gmail.com>
Signed-off-by: Cesar <137245636+nytzuga@users.noreply.github.com> Co-authored-by: Stephen Buttolph <stephen@avalabs.org> Co-authored-by: Richard Pringle <rpring9@gmail.com> Co-authored-by: Darioush Jalali <darioush.jalali@avalabs.org> Co-authored-by: Dan Laine <daniel.laine@avalabs.org>
Unless otherwise notice, all operations, by default, are performed using the last known height. Every operation has their counterpart function with a given height.
Remove the unimplemented features, pretty much anything related with iterators
1. Remove height safety checks to make the code simpler 2. Use single byte metadata as keys 3. Use `NewIteratorWithStartAndPrefix` to make iteration simpler. Introduced `parseDBKeyPrefix` to extract the prefix from a dbKey
1556ef2
to
e2d2daf
Compare
The database user can now write at any height
e2d2daf
to
811afef
Compare
x/archivedb/prefix_test.go
Outdated
var ( | ||
key = []byte("key") | ||
value = []byte("value") | ||
maliciousKey = newDBKey(key, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could we pick a different name for maliciousKey
?
require.Equal(value, val) | ||
height, err := db.GetHeight(key) | ||
require.NoError(err) | ||
require.Equal(uint64(1), height) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we also get the maliciousKey
at the end of the iteration?
require.Equal(uint64(1), writer.Height()) | ||
require.NoError(writer.Write()) | ||
|
||
for i := 0; i < 10000; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems we could get the same benefit with a shorter loop like 10 or 100 since this is not a benchmark.
require.Equal(uint64(1), height) | ||
} | ||
|
||
func TestDBMoreEfficientLookups(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this test different from the one above?
x/archivedb/db_test.go
Outdated
for _, test := range tests { | ||
db, err := getBasicDB() | ||
require.NoError(t, err) | ||
test(t, db) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice 👍
x/archivedb/batch.go
Outdated
c.db.lock.Lock() | ||
defer c.db.lock.Unlock() | ||
|
||
batch := c.db.inner.NewBatch() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we take a batch from the inner db when the batch
struct is created then directly call put on the batch from the inner db with the modified keys when put/delete operations are done?
seems it would simplify this file a bit.
} | ||
} | ||
|
||
if err := database.PutUInt64(batch, heightKey, c.height); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should enforce that this is > what is currently in the database?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the DB should expose the height for atomic-ness... but if the user wants to modify something at an older height I think it's fine to let them
This is the MVP for #1715.
This PR creates a thin database layer on top of database.Database. ArchiveDb is an append only database which stores all state changes happening at every block height. Each record is stored in such way to perform both fast inserts and selects.
Currently its API is quite simple, it has two main functions, one to create a Batch write with a given block height, inside this batch entries can be added with a given value or they can be deleted. It also provides a Get function that takes a given key and a height.
The way it works is as follows:
When requesting
Get(foo, 9)
it will return an errNotFound error because foo was not defined at block height 9, it was defined later. When callingGet(foo, 99)
it will return a tuple("foo's value is bar", 10)
returning the value offoo
at height 99 (which was set at height 10). If requestingGet(foo, 2000)
it will return an error becausefoo
was deleted at heightThis that should be considered
foo
at height 100) twice, it should error and the whole write batch should fail.errKeyDeleted
error and return the height where the deletion happened. Thoughts @aaronbuchwald @StephenButtolph ?Why this should be merged
How this works
How this was tested
Unit tests