auto compression #5928

n1kl · 2017-03-27T15:22:19Z

As part of my master thesis at University of Hamburg I have targeted to improve ZFS through compression. Now I would like to share my 3 feature branches with the community.

lz4fast lz4fast compression #5927
autocompression (current)
qos Quality of service for ZFS + improvement through compression #5929

Description

This patch adds auto as ZFS compression type.
zfs set compression=auto

Motivation and Context

Which compression algorithm is best for high throughput? The answer to this depends on the type of hardware in use.
If compression takes long then the disk remains idle. If compression is faster than the writing speed of the disk then the CPU remains idle as compression and writing to the disk happens in parallel.
Auto compression tries to keep both as busy as possible.
The disk load is observed through the vdev queue. If the queue is empty a fast compression algorithm like lz4 with low compression rates is used and if the queue is full then gzip-[1-9] can require more CPU time for higher compression rates.
The already existing zio_dva_throttle might conflict with the concept described above. Therefore it is recommended to deactivate zio_dva_throttle.

Benchmark

Copy file from Tempfs to ZFS

8 Cores:

Name	Ratio	MB/s
auto	0.44	245
gzip-1	0.43	255
lz4	0.58	195
off	1	99

1 Core:

Name	Ratio	MB/s
auto	0.56	151
gzip-1	0.43	51
lz4	0.58	179
off	1	99

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)

Branch overlapping changes (feature, compress values)

The patch is has read-only backward compatibility by using the new introduced SPA_FEATURE_COMPRESS_AUTO feature. The feature activation procedure is equivalent to my other code branches.
Regarding the limited namespace of BP_GET_COMPRESS() (128 values), the
zio_compress enum's first part is for block pointer & dataset values, the second part for dataset values only. This is an alternative suggestion to #3908.

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.
Change has been approved by a ZFS on Linux member.

ahrens · 2017-03-27T22:27:22Z

This is a neat idea.

I don't see how it conflicts with zio_dva_throttle. The allocation throttle should be configured to send at least enough i/os to each device to fill its queue (zfs_vdev_async_write_max_active i/os), so I wouldn't think it would impact your measurement. But if it's true that your code is incompatible with the zio_dva_throttle, you'll need to resolve that before integration.

I would suggest using separate enums for the blkptr_t's compression field, and the property value's compression field, as is done in #3908

ahrens · 2017-03-27T22:28:01Z

man/man5/zpool-features.5

+pool.
+
+This feature becomes \fBactive\fR as soon as it is used on one dataset and will 
+return to being \fBenabled\fB once all filesystems that have ever had their compression set to


typo, should be enabled\fR

n1kl · 2017-03-29T18:51:28Z

@ahrens
Thank you for reviewing.

For performance it is important that the vdev queue never gets empty.
To ensure this the auto algorithm needs a minimum queue size (in Bytes) to start as seen in the code below (module/zfs/compress_auto.c [61]).

uint32_t max_queue_depth = zfs_vdev_async_write_max_active * zfs_vdev_queue_depth_pct / 100;

/* keep at least 25 ZIOs in queue * compression factor about 2 = average 50 */

uint64_t buffer = size * (max_queue_depth / 4);
if (vd_queued_size_write >= buffer) {
vd_queued_size_write -= buffer;
} else {
vd_queued_size_write = 0;
}

If the compression factor is >4 there would have to be more than 100 ZIOs in the queue, the zio_dva_trottle becomes active and we would stick to lz4.
I wanted to point out this possible interference.
Reducing the queue buffer increases compression but increases the risk for the queue to become empty if there is high variance in the compression duration.

An alternative approach is to monitor vqc_active to count the number of ZIOs inside the queue but this is more risky because of variable ZIO size for compression and disk block size.

jumbi77 · 2017-06-24T23:43:32Z

Is that a similar feature like the smart compression (https://reviews.csiden.org/r/266/) ? btw does anyone know why this was discsrded?

n1kl · 2017-06-26T10:31:28Z

@jumbi77 From reading the description of smart compression the compression is turned off for a while when data is classified incompressible to save computational resources. This is a further improvement that can also be added for this feature in the future.
At the moment the main focus is on automatically choosing a faster compression algorithm for faster storage devices, lower CPU power and high compressible data and a slower algorithm for slower storage devices, lower CPU power and low compressible data to achieve the best possible throughput.

RubenKelevra · 2018-04-07T22:41:02Z

Thanks for taking the time and effort to implement this, I bet this will majorly improve the usability of compression on desktop computers.

Did you test if this improves the system load while saving files of the system is busy? This is one of the major blockers IMHO for compression.

Does it support more than gzip-1 at the moment? Usually gzip-5 or even gzip-7 often gives a lot better compression on stuff like text files or software libraries.

This questions are just out of curiosity and shall not block any attempts of merging this, when rebase is completed.

RubenKelevra · 2018-04-08T12:08:13Z

So, I digged in your code to answer the question. We have support for gzip compression offloading, so there might be machines out there which can handle much faster compression with gzip than before and gzip1 might not fully utilize the cards. Is it possible and reasonable to add more than one gzip compression level?

How do you think about renaming this feature, I feel auto compression might need some more explanation than for example adaptive compression.

What do you (all) think about changing the default for new pools to enable this feature? It should increase the performance without hurting system performance, CPU wise, much.

What is your opinion on this, @behlendorf?

ahrens · 2019-01-08T22:17:08Z

superseded by #7560

This was referenced Mar 27, 2017

Quality of service for ZFS + improvement through compression #5929

Closed

lz4fast compression #5927

Closed

n1kl force-pushed the autocompression branch from da7f14c to 32f4310 Compare March 27, 2017 18:43

behlendorf added the Type: Performance Performance improvement or performance problem label Mar 27, 2017

ahrens requested changes Mar 27, 2017

View reviewed changes

n1kl force-pushed the autocompression branch 3 times, most recently from df94bbc to 7cd574f Compare March 29, 2017 15:35

n1kl force-pushed the autocompression branch from 1d6bba5 to 80f3dee Compare June 24, 2017 18:59

autocompress

c3c7349

n1kl force-pushed the autocompression branch from 5fc01c1 to c3c7349 Compare October 12, 2017 13:50

rincebrain mentioned this pull request Apr 17, 2018

ZFS stalls on gzip-9 while writing incompressible files #7373

Open

RubenKelevra mentioned this pull request May 24, 2018

WIP: Adaptive compression [was: auto compression] #7560

Open

13 tasks

behlendorf added the Status: Inactive Not being actively updated label Sep 25, 2018

ahrens added the Status: Revision Needed Changes are required for the PR to be accepted label Sep 27, 2018

ahrens closed this Jan 8, 2019

PrivatePuffin mentioned this pull request Dec 6, 2019

[WIP] Adaptive Compression Refresh #9689

Closed

12 tasks

PrivatePuffin mentioned this pull request Oct 8, 2020

[WIP] Add Adaptive Compression (Rework/Revamp) #11002

Closed

12 tasks

rincebrain mentioned this pull request Aug 2, 2024

[Feature Request] Implement Media Type-Based Compression Exclusion #16407

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auto compression #5928

auto compression #5928

n1kl commented Mar 27, 2017 •

edited

Loading

ahrens commented Mar 27, 2017

ahrens Mar 27, 2017

n1kl commented Mar 29, 2017

jumbi77 commented Jun 24, 2017

n1kl commented Jun 26, 2017

RubenKelevra commented Apr 7, 2018

RubenKelevra commented Apr 8, 2018 •

edited

Loading

ahrens commented Jan 8, 2019

auto compression #5928

auto compression #5928

Conversation

n1kl commented Mar 27, 2017 • edited Loading

Description

Motivation and Context

Benchmark

Types of changes

Branch overlapping changes (feature, compress values)

Checklist:

ahrens commented Mar 27, 2017

ahrens Mar 27, 2017

Choose a reason for hiding this comment

n1kl commented Mar 29, 2017

jumbi77 commented Jun 24, 2017

n1kl commented Jun 26, 2017

RubenKelevra commented Apr 7, 2018

RubenKelevra commented Apr 8, 2018 • edited Loading

ahrens commented Jan 8, 2019

n1kl commented Mar 27, 2017 •

edited

Loading

RubenKelevra commented Apr 8, 2018 •

edited

Loading