Shrink vec after block compression #98

CosmicHorrorDev · 2023-04-04T04:08:29Z

I was a bit surprised that my reported memory usage was higher after compressing a bunch of highly compressible in-memory data instead of smaller. Took a bit of digging to realize that the lz4_flex::compress family of functions allocate a buffer for the maximum possible output size

demo

const MIB: f32 = 1_024.0 * 1_024.0;

fn main() {
    let input = vec![0; 100 * 1_024 * 1_024];
    let compressed = lz4_flex::compress(&input);

    // Prints: len 0.39 MiB - cap 110.00 MiB
    // 😱
    println!(
        "len {:.2} MiB - cap {:.2} MiB",
        compressed.len() as f32 / MIB,
        compressed.capacity() as f32 / MIB
    );
}

This change just shrinks the vec before returning

It's a bit unfortunate that compressing allocates such a large buffer upfront. I get that on Linux (and Mac I think) this won't really allocate all the extra memory since zeroed pages aren't actually allocated till they're used. On Windows though I believe that this will allocate everything upfront (although I haven't tested). It's not that big of a deal for me though since I just switched to using the LZ4 Frame Format instead (which works great!)

codecov · 2023-04-04T04:41:10Z

Codecov Report

Merging #98 (8a2bc71) into main (be2ea76) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main      #98   +/-   ##
=======================================
  Coverage   89.39%   89.39%           
=======================================
  Files          11       11           
  Lines        2179     2179           
=======================================
  Hits         1948     1948           
  Misses        231      231

Impacted Files	Coverage Δ
src/block/compress.rs	`98.11% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

PSeitz · 2023-04-04T04:45:35Z

The block format is not really the right choice for data that large, maybe that should be clearer in the docs.
The global compress function should be removed, or replaced to use the frame format.

The allocation upfront is done for performance reason, without it it would significantly slower.

CosmicHorrorDev · 2023-04-04T04:55:52Z

The block format is not really the right choice for data that large, maybe that should be clearer in the docs.

Ahhh, that would explain my confusion 😅

I was using it for in-memory data that would regularly get into the 10s of MiBs until I checked a memory profile

The global compress function should be removed, or replaced to use the frame format.

This was part of my confusion as well. I was using them because it was more convenient, and I figured that it was the preferred format if it was in the crate root

Thanks for all the feedback! I've got some more API considerations, but I'll open a new issue tomorrow since I'm off to bed

Shrink vec after block compression

8a2bc71

PSeitz merged commit 29914d4 into PSeitz:main Apr 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shrink vec after block compression #98

Shrink vec after block compression #98

CosmicHorrorDev commented Apr 4, 2023 •

edited

Loading

codecov bot commented Apr 4, 2023

PSeitz commented Apr 4, 2023 •

edited

Loading

CosmicHorrorDev commented Apr 4, 2023

Shrink vec after block compression #98

Shrink vec after block compression #98

Conversation

CosmicHorrorDev commented Apr 4, 2023 • edited Loading

codecov bot commented Apr 4, 2023

Codecov Report

PSeitz commented Apr 4, 2023 • edited Loading

CosmicHorrorDev commented Apr 4, 2023

CosmicHorrorDev commented Apr 4, 2023 •

edited

Loading

PSeitz commented Apr 4, 2023 •

edited

Loading