Zram can be configured more optimally by using lz4 instead of zstd1 #1570

ahydronous · 2024-08-29T11:49:52Z

Describe the bug

Zram is currently configured to use zstd1, which is suboptimal

What did you expect to happen?

I've spent an inordinate amount of time optimizing zram on my system.

Gathered wisdoms

Swap is usually written sequentially but read randomly
Zram writes don't matter nearly as much as because Linux page cache data is already in RAM, and RAM>RAM transfers are absurdly quick
Zram hardcoded blocksize is 4K
For system memory performance applications care jack about bandwidth, latency (CAS etc in hardware, IOPS in software) is what matters
You still want a swapfile, even with zram, due to the fact that you're gonna have incompressible or very idle pages that you want to evict from RAM. This goes double for the Deck, which will be RAM starved on some games due to the GPU claiming a lot of RAM. More on this later.

IOPS benchmark on Samsung 970 EVO Plus 1TB

- lz4:      2 030 000 (!)
- zstd1:    820 000
- 970 EVO:  15 300

Compression ratios on mixed data

- lz4:    2.1
- zstd1:  2.9

This is very relevant for the Deck, because <12GB is right where in a lot of scenarios the benefits of extra memory from zstd1 start to outstrip the latency benefits of lz4.
Valve probably has a lot of profiled data, but as far as I've been able to tell, even the heaviest games don't go much over 4GB of VRAM.

Swappiness

Swappiness can be derived via formula. On kernel.org, they state

For example, if the random IO against the swap device is on average 2x faster than IO from the filesystem, swappiness should be 133 (x + 2x = 200, 2x = 133.33).

You can reduce that to (yx = 200 -x), where y is filesystem-to-swap IO ratio.
With the 970 Evo Plus as example again, we have aforementioned read IOPS values. 970EVO vs. lz4 = 15 300 / 2 030 000 = 0.008, so 0.008 is our ratio.
We plug that in, 0.008x = 200 -x = 198.4, and we get vm.swappiness=198.

Page clusters

These are logarithmic. With zram, you get noticeable latency improvements with 1 page, vm.page-cluster=0

Writeback device (backing swap partition)

https://www.kernel.org/doc/html/v5.9/admin-guide/blockdev/zram.html#writeback
Remember how I mentioned still needing a swapfile?
Here is where it gets slightly more convoluted.

Zram currently only accepts swap partitions as a writeback device.
This swap partition can also not be the "ordinary" swap partition of the system, although I don't know if the Deck uses the swapfile for anything beyond. Thankfully you can replace the swapfile with the swap partition, effectively costing no extra space
Currently, marking pages idle and evicting them has to be invoked via commensurate commands. This could be done via either cron or perhaps a systemd service. This will also need a writeback limit set to prevent wearing out SSDs, especially the 64GB eMMC Deck.
Two links about configuring page eviction with writeback:
RFE: Actually use the writeback device systemd/zram-generator#164
https://android.googlesource.com/platform/frameworks/base/+/master/services/core/java/com/android/server/ZramWriteback.java

Extra

There is also secondary algorithm recompression, although I have not yet tried this out and it is only in the newer kernels.
https://www.kernel.org/doc/html/latest/admin-guide/blockdev/zram.html#recompression

Output of `rpm-ostree status`

No response

Hardware

No response

Extra information or context

No response

The text was updated successfully, but these errors were encountered:

KyleGospo · 2024-08-29T17:26:13Z

Thanks! Will be digging into this more, but for now:
5ef67b4

ahydronous · 2024-09-03T19:32:35Z

This should help a lot with understanding and tweaking various Virtual Memory settings @KyleGospo : https://gist.github.com/ahydronous/7ceaa00df96ef99131600edd4c2c73f2

fiftydinar · 2024-09-09T17:24:43Z

Question

What is (more) preferred?

Increased possible amount of programs that can be open at the same time without crashing? (bandwidth efficiency)
or Lower latency (better responsiveness)?

My answer

Focus on lower latency without regression in bandwidth efficiency.

What are the best configuration values?

That generally depends on each PC configuration & usage scenario.

With the current approach, we cannot satisfy every usage scenario & PC configuration, because custom values are statically written only once during boot.

Examples

It is desirable to want more ZRAM swapiness during heavy usage scenario (bandwidth efficiency),
while with light-medium usage scenario you want less ZRAM swapiness (lower latency).

It is desirable to want ZSTD ZRAM compression for low-RAM configurations (bandwidth efficiency),
while with sufficient RAM configurations, you want LZ4 (lower latency).

etc, feel free to show more examples.

Implementation

I looked through @ahydronous's gist & I applied all values from there (except swapiness, where I use 180), to my custom image.

Here's how that looks:

Memory tweaks:
https://github.com/fiftydinar/gidro-os/blob/b172d940c85cfa7a988010e2598281138674d290/files/0-system/usr/bin/memory-tweaks-gidro

https://github.com/fiftydinar/gidro-os/blob/b172d940c85cfa7a988010e2598281138674d290/files/systemd/system/memory-tweaks-gidro.service

Dirty centisecs:
https://github.com/fiftydinar/gidro-os/blob/b172d940c85cfa7a988010e2598281138674d290/files/0-system/usr/bin/dirty-centisecs

https://github.com/fiftydinar/gidro-os/blob/b172d940c85cfa7a988010e2598281138674d290/files/systemd/system/dirty-centisecs.service

You can notice that MaxPerfWiz tries to adjust some dynamic memory values to be as ideal as possible for all configurations, like

vm.dirty_expire_centisecs
vm.dirty_writeback_centisecs
vm.dirty_bytes (or ratio)
vm.dirty_backround_bytes (or ratio)

This can be improved further.

Tuned can also dynamically change sysctl values depending on some scenarios, so that can also possibly work well.

EPOCHvoyager · 2024-10-05T15:25:54Z

As Bazzite is now shipping with TuneD, I took the liberty of creating a configuration for it with the values linked above; based on the TuneD balanced profile, for testing purposes. Do note that these values are relevant to this hardware specific setup - so, 16GB of RAM and a CPU with 16 threads:

/etc/tuned/profiles/balanced-tweaked/tuned.conf

#
# tuned configuration
#

[main]
summary = General non-specialized tuned profile with memory tweaks
include = balanced

[sysctl]
# Values taken from:
# https://gist.github.com/ahydronous/7ceaa00df96ef99131600edd4c2c73f2
# NOTE: Certain values are omited due to already being Bazzite defaults.
vm.dirty_background_bytes = 209715200
vm.dirty_bytes = 419430400
vm.vfs_cache_pressure = 66
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 250

# Convention says KiB of RAM * 0.01
# From MPW: RAM / num_of_logical_cores * 0.058
# 16777216 KiB / 8 threads * 0.058 = 121634
vm.min_free_kbytes = 60817

tduck973564 · 2024-10-06T05:45:58Z

Would it be possible to use a TuneD profile to adapt things like the algorithm and swappiness based on system specs?

EPOCHvoyager · 2024-10-08T14:14:24Z

Would it be possible to use a TuneD profile to adapt things like the algorithm and swappiness based on system specs?

I believe the way dynamic tuning works with TuneD might be coded into it directly, can't seem to find any word on configuring it. It's mainly meant for things like changing the CPU governor and things of that sort that change with system load, from what I gather.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zram can be configured more optimally by using lz4 instead of zstd1 #1570

Zram can be configured more optimally by using lz4 instead of zstd1 #1570

ahydronous commented Aug 29, 2024 •

edited

Loading

KyleGospo commented Aug 29, 2024

ahydronous commented Sep 3, 2024

fiftydinar commented Sep 9, 2024 •

edited

Loading

EPOCHvoyager commented Oct 5, 2024 •

edited

Loading

tduck973564 commented Oct 6, 2024

EPOCHvoyager commented Oct 8, 2024

Zram can be configured more optimally by using lz4 instead of zstd1 #1570

Zram can be configured more optimally by using lz4 instead of zstd1 #1570

Comments

ahydronous commented Aug 29, 2024 • edited Loading

Describe the bug

What did you expect to happen?

Benchmarks on zstd-1 vs lz4

An explanation of vm.swappiness

Overcommitting memory (zram being bigger than RAM size) is good

Gathered wisdoms

IOPS benchmark on Samsung 970 EVO Plus 1TB

Compression ratios on mixed data

Swappiness

Page clusters

Writeback device (backing swap partition)

Extra

Output of rpm-ostree status

Hardware

Extra information or context

KyleGospo commented Aug 29, 2024

ahydronous commented Sep 3, 2024

fiftydinar commented Sep 9, 2024 • edited Loading

Question

My answer

What are the best configuration values?

Examples

Implementation

EPOCHvoyager commented Oct 5, 2024 • edited Loading

tduck973564 commented Oct 6, 2024

EPOCHvoyager commented Oct 8, 2024

ahydronous commented Aug 29, 2024 •

edited

Loading

Benchmarks on `zstd-1` vs `lz4`

Output of `rpm-ostree status`

fiftydinar commented Sep 9, 2024 •

edited

Loading

EPOCHvoyager commented Oct 5, 2024 •

edited

Loading