-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Ingest External SST #187
Comments
That's because we should make transactions stored in WAL contiguously to achieve atomic. Badger will check the consistency of WAL and truncate it if necessary. I think maybe it's not necessary to use transaction and consider write channel. We could use oracle to perform conflict check and get commit timestamp, and after getting the commit timestamp successfully, we just update the file and place it on level 0? |
The api of oracle needs a txn struct, I just don't want to break the api. And the conflict check of oracle needs write keys hash of transaction which is fields Line 177 in 475e17b
Line 46 in 475e17b
Due to the particularity of level0, put SST just in level0 may cause significant read amplification and write stall. Put files in lower level can reduce read/write amplification caused by compaction but will cause space amplification. I have a look to the strategy of rocksdb.
It makes sence and I forget the third point~. For the second point, it's because rocksdb protects new keys in higher level which is unnecessary in agatedb. I'll try and see the result of ingest a file in rocksdb. |
We certainly should record write keys for the conflict check. I mean add method for oracle to get commit timestamp for the ingesting file scenario.
Yes it will cause write stall if there are many files being ingested in the same time, but in my opinion, it will not cause read amplification since we always read every SSTable in the LSM tree. Maybe we could refer to compacting to level base and consider to put the file to level base first? |
Okay, I got it.
Oh yes, I was wrong for level0's read amplification but the higher we put ingested files the more read/write amplification cause by compaction we get. What does level base mean? Target for L0 to compact to or else? |
Yes, the target level into which level 0 will merge its data. |
About how to check files to ingested overlap with running compaction output. I saw we have Suppose a running compaction pick
The So I can't use Any good ideas? I think we can extend |
I think that's ok. By the way, I think this problem only happens when ingesting files, and will not appear in current implementation. |
Yes, it only happens when ingesting files. I didn't make myself clear. |
This issue describe a possible way to ingest external SST file to a running agatedb instance just as rocksdb.
Background
TiKV use ingest external SST feature for some use case. To integrate agatedb into TiKV, we need this feature.
From Creating and Ingesting SST files · facebook/rocksdb Wiki (github.com), ingesting a SST needs follow steps:
Things go different in agatedb.
Analysis
Step 1 and step 4 are the same.
For step 3, it is unnecessary to flush memtable when overlap for agatedb. Agatedb reads all levels when
get
and picks the latest one (less or equal to read_ts), so it's okay to have new keys below old keys in LSM-tree. This really makes the whole process efficient. (Please point out if I'm wrong)For step 2, it is the most difficult: How to protect ACID during and after ingestion?
Agatedb has ACID transaction (SSI). The architecture is totally different from rocksdb. We have a read_ts and a commit_ts and we have conflict check when commit.
Possible Implementation
Main idea: Regard ingesting SST files as writing large range of keys and make the whole process a transaction.
Ingesting files shall be an expensive job.
Time Point: before commit
LevelsController
then move or copy files to DB dir with allocated file id(name).Check files before starting a txn.
Time Point: commit
CommittedTxn
inCommitInfo
which contains smallest and largest keys in each ingest files.This is for fast but inaccurate conflict check, otherwise we need to calc every key hash in SST which really takes time.
global_version
LevelHandler
, and will block read process of other txn)global_version
is stored in manifest.A question here. I see we protect the order of Requests send to write channel the same as commit_ts and I wonder why? For WAL in order?
If ingest job can be out-of-order, I think it's possible and better to make ingest job running in current thread and not sending to write channel.
You can see that the whole commit process has only one small I/O — append to manifest. But when picking level, it takes time if there are many tables in this level and this happens under a write lock of
LevelHandler
.Conflict Check
There are two ways for conflict check as I described in
TP: commit (step 1)
.CommittedTxn
inCommitInfo
.And for any update txn, when adding read keys, mark smallest and largest read key for this txn. When checking conflict, beside checking hash, also check if read range overlap with ingest range.
This is really inaccurate and will cause many txn conflict but I think it makes sense.
The second way needs to refactor transaction much more.
(A more simple way is to break ACID…hhh)
Global Version
This concept is the same as rocksdb. For ingested files, all keys has wrong version and it's unreasonable to modify each. When file has a global version, every key in this file has this version.
In latest rocksdb, this value is stored in manifest to avoid random write to ingested files.
We need to add a field in meta.proto and update Table iterator (also block iterator) to fit this feature.
global_version
may have ambiguity. Maybe another name when impl.Discuss
The text was updated successfully, but these errors were encountered: