Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shrink vec after block compression #98

Merged
merged 1 commit into from
Apr 4, 2023

Conversation

CosmicHorrorDev
Copy link
Contributor

@CosmicHorrorDev CosmicHorrorDev commented Apr 4, 2023

I was a bit surprised that my reported memory usage was higher after compressing a bunch of highly compressible in-memory data instead of smaller. Took a bit of digging to realize that the lz4_flex::compress family of functions allocate a buffer for the maximum possible output size

demo

const MIB: f32 = 1_024.0 * 1_024.0;

fn main() {
    let input = vec![0; 100 * 1_024 * 1_024];
    let compressed = lz4_flex::compress(&input);

    // Prints: len 0.39 MiB - cap 110.00 MiB
    // 😱
    println!(
        "len {:.2} MiB - cap {:.2} MiB",
        compressed.len() as f32 / MIB,
        compressed.capacity() as f32 / MIB
    );
}

This change just shrinks the vec before returning

It's a bit unfortunate that compressing allocates such a large buffer upfront. I get that on Linux (and Mac I think) this won't really allocate all the extra memory since zeroed pages aren't actually allocated till they're used. On Windows though I believe that this will allocate everything upfront (although I haven't tested). It's not that big of a deal for me though since I just switched to using the LZ4 Frame Format instead (which works great!)

@codecov
Copy link

codecov bot commented Apr 4, 2023

Codecov Report

Merging #98 (8a2bc71) into main (be2ea76) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main      #98   +/-   ##
=======================================
  Coverage   89.39%   89.39%           
=======================================
  Files          11       11           
  Lines        2179     2179           
=======================================
  Hits         1948     1948           
  Misses        231      231           
Impacted Files Coverage Δ
src/block/compress.rs 98.11% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@PSeitz
Copy link
Owner

PSeitz commented Apr 4, 2023

The block format is not really the right choice for data that large, maybe that should be clearer in the docs.
The global compress function should be removed, or replaced to use the frame format.

The allocation upfront is done for performance reason, without it it would significantly slower.

@PSeitz PSeitz merged commit 29914d4 into PSeitz:main Apr 4, 2023
@CosmicHorrorDev
Copy link
Contributor Author

The block format is not really the right choice for data that large, maybe that should be clearer in the docs.

Ahhh, that would explain my confusion 😅

I was using it for in-memory data that would regularly get into the 10s of MiBs until I checked a memory profile

The global compress function should be removed, or replaced to use the frame format.

This was part of my confusion as well. I was using them because it was more convenient, and I figured that it was the preferred format if it was in the crate root


Thanks for all the feedback! I've got some more API considerations, but I'll open a new issue tomorrow since I'm off to bed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants