-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
storage: build SSTs from KV_BATCH snapshot
Incrementally build SSTs from the batches sent in a KV_BATCH snapshot. This logic is only on the receiver side for ease of testing and compatibility. The complications of subsumed replicas that are not fully contained by the current replica are also handled. The following is an example of this case happening. a b c d |---1---|-------2-------| S1 |---1-------------------| S2 |---1-----------|---3---| S3 Since the merge is the first operation to happen, a follower could be down before it completes. It is reasonable for r1-snapshot from S3 to subsume both r1 and r2 in S1. Note that it's impossible for a replica to subsume anything to its left. The maximum number of SSTs created using the strategy is 4 + SR + 2 where SR is the number of subsumed replicas. - Three SSTs get created when the snapshot is being received (range local keys, replicated range-id local keys, and user keys). - One SST is constructed for the unreplicated range-id local keys when the snapshot is being applied. - One SST is constructed for every subsumed replica to clear the range-id local keys. These SSTs consist of one range deletion tombstone and one RaftTombstoneKey. - A maximum of two SSTs for all subsumed replicas are constructed to account the case of not fully contained subsumed replicas. We need to delete the key space of the subsumed replicas that we did not delete in the previous SSTs. We need one for the range-local keys and one for the user keys. These SSTs consist of normal tombstones, one range deletion tombstone, or they could be empty. This commit also introduced a cluster setting "kv.snapshot_sst.sync_size" which defines the maximum SST chunk size before fsync-ing. Fsync-ing is necessary to prevent the OS from accumulating such a large buffer that it blocks unrelated small/fast writes for a long time when it flushes. Release note (performance improvement): Snapshots sent between replicas are now applied more performantly and use less memory.
- Loading branch information
1 parent
56c7a56
commit b320ff5
Showing
14 changed files
with
835 additions
and
266 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.