sstable: reduce block cache fragmentation #3508

CheranMahalingam · 2024-04-12T18:07:32Z

Currently, the sstable writer contains heuristics to flush sstable blocks once the size reaches a specified threshold. In CRDB this is defined as 32KiB. However, when these blocks are loaded into memory additional metadata is allocated sometimes exceeding the 32KiB threshold. Since CRDB uses jemalloc, these allocations use a 40KiB size class which leads to significant internal fragmentation. In addition, since the system is unaware of these size classes we cannot design heuristics that prioritize reducing memory fragmentation. Reducing internal fragmentation can help reduce CRDB's memory footprint. This commit decrements the target block size to prevent internal fragmentation for small key-value pairs and adds support for optionally specifying size classes to enable a new set of heuristics that will reduce internal fragmentation for workloads with larger key-value pairs.

Fixes: #999.

cockroach-teamcity · 2024-04-12T18:07:39Z

This change is

CheranMahalingam · 2024-04-12T20:10:13Z

The internal fragmentation reduction is mostly observable for small key value pairs where data is packed efficiently near the 32KiB threshold. In these cases the 32B of metadata for a block cache entry can cause jemalloc to allocate a 40KiB block instead of a 32KiB block. We expect that these changes will not reduce fragmentation for large key value pairs, however, it should not introduce any memory regressions.

The following tests involved running kv-50 workloads on a single cockroach node with 4vCPUs and 16GB of RAM with varying block sizes. To illustrate the fragmentation improvements and shorten the test duration the block cache size was reduced to 400MB.

TLDR

KV Block Size (B)	Old Average Fragmentation (B/Alloc)	New Average Fragmentation (B/Alloc)	Fragmentation Reduction (%)	Old Jemalloc Total Bytes (MiB)	New Jemalloc Total Bytes (MiB)	Jemalloc Memory Usage Reduction (%)
1-16	3630	1221	+63.4	650	510	+21.5
1-1024	778	361	+53.6	570	556	+2.5
1024-16384	3793	3765	+0.7	524	532	-1.5

Note

Fragmentation was computed by computing the difference between the size of mallocs in Pebble and the appropriate jemalloc size class (documented size classes).

Test 1 (KV Block Size: 1-16B)

Previous System

Allocations: 14393295
Average Allocation Size (B): 29612

Bin Size (B)	Count	Fragmentation (B/Alloc)
48	535	4320
40	4239577	8178
32	3433477	15
28	`6640636`	2603
24	12007	1672
20	10329	2033

Note

For small key value pairs a large number of allocations fall into the 40KiB size class and we are wasting ~8MiB of space on these allocations.

Block Cache Usage: ~252MiB
Jemalloc Allocated Bytes: ~420MiB
Jemalloc Resident Bytes: ~650MiB

New Heuristics

Allocations: 30157396
Average Allocation Size (B): 29561

Bin Size (B)	Count	Fragmentation (B/Alloc)
48	928	4432
40	3844	6006
32	15779481	30
28	14238856	2539
24	19798	1536
20	16900	1960

Block Cache Usage: ~252MiB
Jemalloc Allocated Bytes: ~386MiB
Jemalloc Resident Bytes: ~510MiB

Test 2 (KV Block Size: 1-1024B)

Previous System

Allocations: 20823143
Average Allocation Size (B): 32217

Bin Size (B)	Count	Fragmentation (B/Alloc)
48	1738	3698
40	1136895	8169
32	19454511	343
28	16336	2414
24	21417	1645
20	16546	2065

Block Cache Usage: ~252MiB
Jemalloc Allocated Bytes: ~393MiB
Jemalloc Resident Bytes: ~570MiB

New Heuristics

Allocations: 20823143
Average Allocation Size (B): 32217

Bin Size (B)	Count	Fragmentation (B/Alloc)
48	1496	3832
40	3095	6685
32	14105612	355
28	9812	2575
24	14556	1682
20	11255	2179

Block Cache Usage: ~252MiB
Jemalloc Allocated Bytes: ~389MiB
Jemalloc Resident Bytes: ~556MiB

Test 3 (KV Block Size: 1KiB-16KiB)

Previous System

Allocations: 5883656
Average Allocation Size (B): 33775

Bin Size (B)	Count	Fragmentation (B/Alloc)
48	621285	6390
40	3085923	4635
32	1868052	1559
28	6660	2049
24	10379	2182
20	5098	2182

Block Cache Usage: ~252MiB
Jemalloc Allocated Bytes: ~413MiB
Jemalloc Resident Bytes: ~524MiB

New Heuristics

Allocations: 7904419
Average Allocation Size (B): 33871

Bin Size (B)	Count	Fragmentation (B/Alloc)
48	816899	6407
40	4132956	4626
32	2578651	1566
28	7995	2034
24	12272	2146
20	6689	2178

Note

Even with the new heuristics many allocations will fall into the 40KiB size class due to a minimum size threshold (29.4KiB) required to flush an sstable block to disk. However, the size of the KV blocks also helps reduce the internal fragmentation in larger size classes.

Block Cache Usage: ~252MiB
Jemalloc Allocated Bytes: ~414MiB
Jemalloc Resident Bytes: ~532MiB

petermattis

My recollection is that the block flushing heuristics allow the target block size to be exceeded if the block isn't too full. Specifically, the sizeThreshold for a block is computed as 99% of the target block size. If the block is less than that size then a new key value is always allowed to be added. 99% of 32768 is 32440. So if a new key value pair is larger than 328 bytes we'll almost always get bumped past the target block size. This is independent of the value metadata size. If we know the target block size and the jemalloc size classes, I think we could enhance shouldFlush to make a more informed decision that optimizes for reducing internal fragmentation (i.e. make the choice of whether to flush or not which minimizes internal fragmentation).

Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @CheranMahalingam)

internal/cache/value_invariants.go line 20 at r1 (raw file):

// NewValueMetadataSize returns the number of bytes of metadata allocated for
// a cache entry.
func NewValueMetadataSize() int {

Nit: I think this can be a const ValueMetadataSize = 0 with a little effort of sprinkling the other definitions in files that have appropriate build constraints.

CheranMahalingam

During my experiments I noticed that at runtime the sizeThreshold is now 90% of the target block size (29492B) and we seem to be using the default threshold (https://github.com/cockroachdb/pebble/blob/master/internal/base/options.go#L11). In this case there is about 3KiB of leeway, however, in cases where the key value pair is larger such as 4KiB, this threshold gets in the way and contributes to internal fragmentation. For key value pairs of 10-11KiB this isn't problematic since they will go over the 32KiB target but land just under the 40KiB size class.

The simplest solution I can think of when the block size is below the threshold is a check to see whether adding the next key value pair would decrease fragmentation (i.e. 32768 - blockSize > 40960 - futureBlockSize). However, I am not sure what the behaviour should be for large key value pairs that set the block size beyond 40KiB and there is an argument that blocks above 32KiB could have less internal fragmentation if we waited until the size was closer to 40KiB. I am wondering whether we still want to pursue a size-class aware heuristic.

Reviewable status: 0 of 4 files reviewed, all discussions resolved (waiting on @CheranMahalingam)

petermattis

Ah, you're correct that the default target block size threshold is 90%. I looked at the code really quickly and misread it. I also misread your testing. I thought "KV Block" referred to the target block size (which was confusing given the numbers you provided), but it really refers to the size of a "key+value".

A size-class aware heuristic shouldn't be difficult. Do you think it wouldn't be worthwhile? The jemalloc size classes can either be hardcoded, or injected via an Option (the Option approach would probably be better). When deciding whether to flush we see if the new KV would cause us to bump up to the next size class from the one the block currently fits in. If it is then we look to see which is smaller: nextSizeClass(currentBlockSize) - currentBlockSize or nextSizeClass(futureBlockSize) - futureBlockSize.

When the target block size is exceeded we can also keep adding KVs until the next size class would be exceeded (we don't have to flush the block immediately once it exceeds the target).

PS The new internal/cache code . Did you mean to revert the changes to options.go?

Reviewable status: 0 of 4 files reviewed, all discussions resolved (waiting on @CheranMahalingam)

CheranMahalingam

I can try a size-class aware heuristic and see how it performs with key value pair sizes that introduce a lot of fragmentation. Just for clarification should the shouldFlush logic look something like this, because I am not sure how we want to prioritize reducing internal fragmentation vs. keeping the sstable block size close to the 32KiB target and how performant this code path has to be?

if (size-class options are set) {
	if (blockSize > targetBlockSize) {
		nextSizeClass = sizeClassFunc(blockSize)
		if (nextSizeClass - blockSize > nextSizeClass - futureBlockSize 
                    && futureBlockSize <= nextSizeClass
                    // bound future block size to 40KiB to prevent rare scenario where blockSize continually increases without flushing because fragmentation decreases
                    && nextSizeClass == sizeClassFunc(targetBlockSize + 1)) {
			return false // wait because we can definitely reduce fragmentation in the future
		} else {
			return true
		}
	} else { // remove sizeThreshold heuristic
		if (futureBlockSize > targetBlockSize) {
			if (sizeClassFunc(futureBlockSize) - futureBlockSize < sizeClassFunc(blockSize) - blockSize) {
				return false // option that reduces fragmentation with a small risk of having a very large sstable block being flushed
			} else {
				return true
			}
		} else {
			return false
		}
	}
} else {
	existing logic…
}

Sorry, I reverted my changes while experimenting with changes that would have the Cockroach layer adjust the block size due to a significant number of test failures this change causes in Pebble, however, I found some workarounds in the latest revision.

Reviewable status: 0 of 10 files reviewed, all discussions resolved (waiting on @CheranMahalingam)

petermattis

Yes, something like those heuristics is what I'm imagining. We do call shouldFlush on every KV pair, but I think you can have fast-path checks to make this very fast.

Reviewable status: 0 of 10 files reviewed, all discussions resolved (waiting on @CheranMahalingam)

CheranMahalingam

The size-class aware heuristics show a significant improvement in fragmentation for general workloads where the key-value pair size has a large range. However, these improvements do not directly correspond to a reduction in memory usage. The tests were run on a machine with 16vCPUs and 64GB of RAM where the block cache size was set to 16GB (also ran the following on the previous setup with similar results). We ran a kv-50 workload with key-value pair sizes ranging from 1B to 16KiB and enough concurrency/batching to saturate the cache.

TLDR

Version	Fragmentation (B/Alloc)	Space Wasted Per Allocation (%)	Jemalloc Allocated Bytes (GiB)	Jemalloc Total Bytes (GiB)
Current	3755	11.26	16.621	19.096
New Heuristics	1730	6.21	15.785	18.726
Improvement		+5.05%	+5.03%	+1.94%

Results

Current

Allocations: 16187686
Average Allocation Size (B): 33358

Jemalloc Stats
Average Allocated Bytes (GiB): 16.621
Average Resident Bytes (GiB): 19.096

Bin Size (KiB)	Count	Fragmentation (B/Alloc)
64	695	5024
56	4181	4816
48	1590425	6409
40	8343194	4658
32	5284819	1554
28	18039	2153
24	22626	2257
20	17984	2031
16	11573	920

New Heuristics

Allocations: 16207665
Average Allocation Size (B): 27867

Jemalloc Stats
Average Allocated Bytes (GiB): 15.785
Average Resident Bytes (GiB): 18.726

Bin Size (KiB)	Count	Fragmentation (B/Alloc)
64	986	5173
56	6166	4151
48	101524	5263
40	2285861	1492
32	5855568	1818
28	3766744	1689
24	2492485	1776
20	740086	1346
16	13756	953

Reviewable status: 0 of 13 files reviewed, all discussions resolved (waiting on @CheranMahalingam)

petermattis

I didn't scrutinize the implementation, but the numbers here look compelling. I'm going to leave it to the Storage folks to do a detailed code review.

I'm surprised that Jemalloc Total Bytes didn't fall by the same size as Jemalloc Allocated Bytes.

Reviewable status: 0 of 13 files reviewed, all discussions resolved (waiting on @CheranMahalingam)

sumeerbhola

Reviewed 2 of 4 files at r2, 1 of 8 files at r3, 2 of 5 files at r4, all commit messages.
Reviewable status: 5 of 13 files reviewed, 1 unresolved discussion (waiting on @CheranMahalingam and @jbowens)

sstable/writer.go line 1656 at r5 (raw file):

	newSize += 4                            // varint for shared prefix length
	newSize += uvarintLen(uint32(keyLen))   // varint for unshared key bytes
	newSize += uvarintLen(uint32(valueLen)) // varint for value size

Worth adding a code comment that newSize is also an upper-bound, even though it is tighter than newEstimatedSize since it does not account for key prefix compression.
For very long keys with a long shared prefix, and tiny values, such as secondary indexes, this over-estimation could be a problem in that it could result in tiny blocks. I am not sure what to do about this given we don't want to incur the expense of computing the shared prefix (that we do in blockWriter.storeWithOptionalValuePrefix). We may want to use the sizeThreshold after all, and reduce it to something smaller, like 60% (and return false if estimatedBlockSize <= sizeThreshold).

I was expecting to see cache.ValueMetadataSize accounted for here in adjusting estimatedBlockSize, instead of in options. It will naturally bump up the effective BlockSize if someone sets a very tiny value, since we will add at least one entry. And after that entry is added the block will be bigger than BlockSize and all subsequent attempts to add will fall through to the size class based comparisons. Was there a reason to do it in options.go? The logic there seems not quite right in that if someone sets a block size of 2 bytes, they should be getting a tiny block, and not a block size of DefaultBlockSize.

CheranMahalingam

Reviewable status: 4 of 14 files reviewed, 2 unresolved discussions (waiting on @jbowens and @sumeerbhola)

sstable/writer.go line 1656 at r5 (raw file):

Previously, sumeerbhola wrote…

Worth adding a code comment that newSize is also an upper-bound, even though it is tighter than newEstimatedSize since it does not account for key prefix compression.
For very long keys with a long shared prefix, and tiny values, such as secondary indexes, this over-estimation could be a problem in that it could result in tiny blocks. I am not sure what to do about this given we don't want to incur the expense of computing the shared prefix (that we do in blockWriter.storeWithOptionalValuePrefix). We may want to use the sizeThreshold after all, and reduce it to something smaller, like 60% (and return false if estimatedBlockSize <= sizeThreshold).

I was expecting to see cache.ValueMetadataSize accounted for here in adjusting estimatedBlockSize, instead of in options. It will naturally bump up the effective BlockSize if someone sets a very tiny value, since we will add at least one entry. And after that entry is added the block will be bigger than BlockSize and all subsequent attempts to add will fall through to the size class based comparisons. Was there a reason to do it in options.go? The logic there seems not quite right in that if someone sets a block size of 2 bytes, they should be getting a tiny block, and not a block size of DefaultBlockSize.

Nice, I overlooked how the shared prefix is computed later. Do you have a preference for what that size threshold should be? I have left it as 60% (~19KiB) but we can increase it if we want to prioritize having block sizes closer to 32KiB or if we want to shift the bin size distribution up (i.e. larger blocks -> fewer allocations in saturated cache -> less overhead from allocator metadata).

You're right, pushing the metadata accounting down to the flush heuristics also avoids breaking existing tests.

sstable/writer.go line 1649 at r6 (raw file):

	// us at risk of overestimating the size and flushing small blocks. We
	// mitigate this by imposing a minimum size restriction.
	sizeClassAwareThreshold := (targetBlockSize*base.SizeClassAwareBlockSizeThreshold + 99) / 100

This change is awkward because we already store size thresholds in the Writer and pass down this value to shouldFlushWithHints. We could make those thresholds dynamic by checking whether size class hints are empty, however, then if we are forced to fall back to the old heuristics the original 90% threshold is lost. Wondering whether there is a cleaner way to express this.

sumeerbhola

Can you rerun the last experiment with these changes to validate that the results are still good.

Reviewed 2 of 10 files at r6.
Reviewable status: 6 of 14 files reviewed, 4 unresolved discussions (waiting on @CheranMahalingam and @jbowens)

sstable/writer.go line 1638 at r6 (raw file):

	// loaded into the block cache we also allocate space for the cache entry
	// metadata. The new allocation of size ~1052B may now only fit within a
	// 2048B class, which increases fragmentation.

nit: increases internal fragmentation.

sstable/writer.go line 1649 at r6 (raw file):

Previously, CheranMahalingam wrote…

This change is awkward because we already store size thresholds in the Writer and pass down this value to shouldFlushWithHints. We could make those thresholds dynamic by checking whether size class hints are empty, however, then if we are forced to fall back to the old heuristics the original 90% threshold is lost. Wondering whether there is a cleaner way to express this.

Let's just compute this once in Writer and wrap them in flushDecisionOptions struct with 3 fields blockSize, blockSizeThreshold, sizeClassAwareThreshold, and pass that options down.

sstable/writer.go line 1674 at r6 (raw file):

	if blockSizeWithMetadata < targetBlockSize {
		newSizeClass, ok := blockSizeClass(newSize, sizeClassHints)
		if !ok || newSizeClass-newSize >= sizeClass-blockSizeWithMetadata {

nit: shouldn't !ok result in returning false? Since then we should continue adding to this block and will eventually fall into one of the earlier cases that returns true. I realize !ok won't happen in a properly configured system.

CheranMahalingam

I can confirm that there are no significant regressions following these changes. There is a slight bump in fragmentation for the 40KiB size class which can be attributed to the 60% size threshold. There is also a slight increase in bytes allocated for the new heuristics and in total bytes for the old heuristics as reported by jemalloc. However, note that I ran this experiment almost 2x longer than the previous experiment to make sure that the computed averages were stable.

Version	Fragmentation (B/Alloc)	Space Wasted Per Allocation (%)	Jemalloc Allocated Bytes (GiB)	Jemalloc Total Bytes (GiB)
Current	3760	11.26	16.66	19.43
New Heuristics	1778	6.22	15.97	18.80
Improvement		+5.04%	+4.16%	+3.22%

Raw Results

Current Heuristics v3

Allocations: 30252804
Average Allocation Size (B): 33387

Jemalloc Stats
Average Allocated Bytes (GiB): 16.6588
Average Resident Bytes (GiB): 19.4298

Bin Size (KiB)	Count	Fragmentation (B/Alloc)
64	2136	3848
56	8812	4723
48	3025767	6402
40	15593149	4656
32	9793786	1546
28	44441	2137
24	47351	2206
20	34000	2047
16	21698	920

Allocations: 30246591
Average Allocation Size (B): 28582

Jemalloc Stats
Average Allocated Bytes (GiB): 15.9657
Average Resident Bytes (GiB): 18.8035

Bin Size (KiB)	Count	Fragmentation (B/Alloc)
64	1411	4191
56	8549	3880
48	101385	1879
40	5315069	2002
32	11033603	1794
28	7004610	1683
24	4602729	1774
20	536362	522
16	21589	986

Reviewable status: 3 of 14 files reviewed, 1 unresolved discussion (waiting on @jbowens and @sumeerbhola)

sumeerbhola

I am guessing all the numbers above are based on an instrumented version of the commit in this PR. Can you drop a link to that commit in a PR comment, and what workload you ran, so that we can easily reproduce these if we need to in the future.

Reviewed 1 of 10 files at r6, 3 of 4 files at r7, 1 of 1 files at r9, all commit messages.
Reviewable status: 8 of 14 files reviewed, 3 unresolved discussions (waiting on @CheranMahalingam and @jbowens)

sstable/writer_test.go line 675 at r9 (raw file):

			td.ScanArgs(t, "key-size", &keySize)
			td.ScanArgs(t, "val-size", &valSize)
			td.ScanArgs(t, "block-size-with-metadata", &blockSize)

nit: this is not the block size with metadata, yes?

sstable/writer_test.go line 703 at r9 (raw file):

					blockSize:               targetSize,
					blockSizeThreshold:      sizeThreshold,
					sizeClassAwareThreshold: sizeThreshold,

can you add some test cases where this sizeClassAwareThreshold is different from the blockSizeThreshold.

Currently, the sstable writer contains heuristics to flush sstable blocks once the size reaches a specified threshold. In CRDB this is defined as 32KiB. However, when these blocks are loaded into memory additional metadata is allocated sometimes exceeding the 32KiB threshold. Since CRDB uses jemalloc, these allocations use a 40KiB size class which leads to significant internal fragmentation. In addition, since the system is unaware of these size classes we cannot design heuristics that prioritize reducing memory fragmentation. Reducing internal fragmentation can help reduce CRDB's memory footprint. This commit decrements the target block size to prevent internal fragmentation for small key-value pairs and adds support for optionally specifying size classes to enable a new set of heuristics that will reduce internal fragmentation for workloads with larger key-value pairs. Fixes: cockroachdb#999.

CheranMahalingam

Pebble commit: CheranMahalingam@43a74ee
Cockroach commit: CheranMahalingam/cockroach@7ce83ae

Reproduction Steps

// Spin up single node cluster with sufficient memory
$ roachprod create --gce-machine-type=n2-standard-16 $CLUSTER -n 1

$ roachprod put $CLUSTER stats.sh
$ roachprod start $CLUSTER

// Run kv-50 workload
$ roachprod run $CLUSTER:1 -- './cockroach workload init kv {pgurl:1}'
$ roachprod run $CLUSTER:1 -- './cockroach workload run kv --read-percent 50 --tolerate-errors --max-block-bytes 16384 --min-block-bytes 1 --concurrency 256 --batch 10 --insert-count 100000 {pgurl:1}'

// SSH into the node to check progress
$ roachprod ssh $CLUSTER:1
// Wait until block cache has saturated in dbconsole and for sufficient time to pass, I used 30000000 allocations as a stopping point
$ cat logs/cockroach.stdout.log | grep "ALLOC" | wc -l
$ cat logs/cockroach.stdout.log > flush
// Note that there are some assumptions made in this script to get an accurate average for jemalloc stats which are documented inside the script
$ ./stats.sh

Reviewable status: 3 of 15 files reviewed, 3 unresolved discussions (waiting on @jbowens and @sumeerbhola)

sstable/writer_test.go line 675 at r9 (raw file):

Previously, sumeerbhola wrote…

nit: this is not the block size with metadata, yes?

This is a hacky way of dealing with the fact that with certain builds metadata size is 0 (cgo disabled or a set of flags defined in value_cgo.go) and we want blockSizeWithMetadata to be consistent. Then, I decrement by cache.ValueMetadataSize (0 or 32) to compute the block size for the flush call.

I am realizing that this has been a huge pain for testing and to avoid future friction I have set cache.ValueMetadataSize to 32 for all builds. If cgo were ever disabled in production our heuristics would break anyways.

Note that there is still a large point of friction preventing extending the tests because the value of cache.ValueMetadataSize is reduced for 32-bit builds.

sstable/writer_test.go line 703 at r9 (raw file):

Previously, sumeerbhola wrote…

can you add some test cases where this sizeClassAwareThreshold is different from the blockSizeThreshold.

Done.

sumeerbhola

Reviewed 1 of 4 files at r7, 2 of 6 files at r10, 1 of 2 files at r11.
Reviewable status: 7 of 15 files reviewed, 3 unresolved discussions (waiting on @CheranMahalingam and @jbowens)

sstable/writer_test.go line 675 at r9 (raw file):

I am realizing that this has been a huge pain for testing and to avoid future friction I have set cache.ValueMetadataSize to 32 for all builds. If cgo were ever disabled in production our heuristics would break anyways.

This is fine.

This patch makes use of the pebble size class aware flush heuristics introduced in cockroachdb/pebble#3508. The experimentation there showed improvements in internal fragmentation. Fixes cockroachdb#123090. Release note: None

125793: storage: use jemalloc size classes when intializing pebble r=RaduBerinde a=aadityasondhi This patch makes use of the pebble size class aware flush heuristics introduced in cockroachdb/pebble#3508. The experimentation there showed improvements in internal fragmentation. Fixes #123090. Release note: None Co-authored-by: Aaditya Sondhi <20070511+aadityasondhi@users.noreply.github.com>

CheranMahalingam self-assigned this Apr 12, 2024

CheranMahalingam force-pushed the jemalloc-fragmentation branch from 7beb636 to 70be36b Compare April 12, 2024 18:08

CheranMahalingam force-pushed the jemalloc-fragmentation branch 2 times, most recently from afe5e62 to c72080f Compare April 15, 2024 14:23

petermattis reviewed Apr 15, 2024

View reviewed changes

CheranMahalingam force-pushed the jemalloc-fragmentation branch 3 times, most recently from c176db1 to 825ca11 Compare April 15, 2024 18:06

CheranMahalingam commented Apr 15, 2024

View reviewed changes

petermattis reviewed Apr 16, 2024

View reviewed changes

CheranMahalingam force-pushed the jemalloc-fragmentation branch from 825ca11 to 458a389 Compare April 16, 2024 13:44

CheranMahalingam commented Apr 16, 2024

View reviewed changes

petermattis reviewed Apr 16, 2024

View reviewed changes

CheranMahalingam force-pushed the jemalloc-fragmentation branch from 458a389 to 1b1901e Compare April 18, 2024 20:59

CheranMahalingam commented Apr 18, 2024

View reviewed changes

petermattis reviewed Apr 19, 2024

View reviewed changes

CheranMahalingam force-pushed the jemalloc-fragmentation branch from 1b1901e to f874ba5 Compare April 19, 2024 15:33

CheranMahalingam marked this pull request as ready for review April 19, 2024 15:43

CheranMahalingam requested review from jbowens and sumeerbhola April 19, 2024 15:43

sumeerbhola requested changes Apr 23, 2024

View reviewed changes

CheranMahalingam force-pushed the jemalloc-fragmentation branch from f874ba5 to 852a5db Compare April 23, 2024 21:41

CheranMahalingam requested a review from a team as a code owner April 23, 2024 21:41

CheranMahalingam commented Apr 23, 2024

View reviewed changes

sumeerbhola reviewed Apr 24, 2024

View reviewed changes

CheranMahalingam force-pushed the jemalloc-fragmentation branch 3 times, most recently from d30fb06 to 6511784 Compare April 25, 2024 13:21

CheranMahalingam commented Apr 25, 2024

View reviewed changes

sumeerbhola approved these changes Apr 25, 2024

View reviewed changes

CheranMahalingam force-pushed the jemalloc-fragmentation branch 2 times, most recently from 6c053d7 to aa02ea5 Compare April 25, 2024 17:44

CheranMahalingam force-pushed the jemalloc-fragmentation branch from aa02ea5 to 96d6f61 Compare April 25, 2024 17:51

CheranMahalingam commented Apr 25, 2024

View reviewed changes

CheranMahalingam mentioned this pull request Apr 25, 2024

storage: configure allocator size class hints cockroachdb/cockroach#123090

Closed

sumeerbhola approved these changes Apr 25, 2024

View reviewed changes

CheranMahalingam merged commit 1b234bd into cockroachdb:master Apr 26, 2024
11 checks passed

CheranMahalingam deleted the jemalloc-fragmentation branch April 26, 2024 13:33

aadityasondhi mentioned this pull request Jun 13, 2024

storage: use jemalloc size classes when intializing pebble cockroachdb/cockroach#125635

Closed

aadityasondhi mentioned this pull request Jun 17, 2024

storage: use jemalloc size classes when intializing pebble cockroachdb/cockroach#125793

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sstable: reduce block cache fragmentation #3508

sstable: reduce block cache fragmentation #3508

CheranMahalingam commented Apr 12, 2024 •

edited

Loading

cockroach-teamcity commented Apr 12, 2024

CheranMahalingam commented Apr 12, 2024 •

edited

Loading

petermattis left a comment

CheranMahalingam left a comment

petermattis left a comment

CheranMahalingam left a comment

petermattis left a comment

CheranMahalingam left a comment •

edited

Loading

petermattis left a comment

sumeerbhola left a comment

CheranMahalingam left a comment

sumeerbhola left a comment

CheranMahalingam left a comment

sumeerbhola left a comment

CheranMahalingam left a comment

sumeerbhola left a comment

sstable: reduce block cache fragmentation #3508

sstable: reduce block cache fragmentation #3508

Conversation

CheranMahalingam commented Apr 12, 2024 • edited Loading

cockroach-teamcity commented Apr 12, 2024

CheranMahalingam commented Apr 12, 2024 • edited Loading

TLDR

Test 1 (KV Block Size: 1-16B)

Previous System

New Heuristics

Test 2 (KV Block Size: 1-1024B)

Previous System

New Heuristics

Test 3 (KV Block Size: 1KiB-16KiB)

Previous System

New Heuristics

petermattis left a comment

Choose a reason for hiding this comment

CheranMahalingam left a comment

Choose a reason for hiding this comment

petermattis left a comment

Choose a reason for hiding this comment

CheranMahalingam left a comment

Choose a reason for hiding this comment

petermattis left a comment

Choose a reason for hiding this comment

CheranMahalingam left a comment • edited Loading

Choose a reason for hiding this comment

TLDR

Results

Current

New Heuristics

petermattis left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

CheranMahalingam left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

CheranMahalingam left a comment

Choose a reason for hiding this comment

Raw Results

sumeerbhola left a comment

Choose a reason for hiding this comment

CheranMahalingam left a comment

Choose a reason for hiding this comment

Reproduction Steps

sumeerbhola left a comment

Choose a reason for hiding this comment

CheranMahalingam commented Apr 12, 2024 •

edited

Loading

CheranMahalingam commented Apr 12, 2024 •

edited

Loading

CheranMahalingam left a comment •

edited

Loading