Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compression of larger files takes really long #5

Open
Krensi opened this issue Sep 18, 2023 · 3 comments
Open

Compression of larger files takes really long #5

Krensi opened this issue Sep 18, 2023 · 3 comments

Comments

@Krensi
Copy link

Krensi commented Sep 18, 2023

Hi!

I have issues compressing larger files (400kB and upwards).
Again, I compared the results with the C-library, and even a ~800kB file compresses just fine. I used the test files below (some e-book files), and I can observe that compression time increases a lot. Can you observe this behavior too?

Thank you again and greetings,
Christian

text_file_1_medium.txt
text_file_2.txt
text_file_1.txt

@snakehand
Copy link
Owner

The implementation is really minimal, and is quite stupid when it comes to dictionary searching. It basically does a linear search in the window, so as the file and window size grows, it is expected that compression times will increase. Speeding it up would be easy with dictionaries or other data structures that use dynamic memory allocation, but that is not an option in a no_std crate. I can however try to try to think of an option to speed up compression if the user will provide a buffer that can be used for holding lookup related data. The initial use case was to compress small messages on a radio link, not several 100ks of text :-)

@Krensi
Copy link
Author

Krensi commented Sep 20, 2023

Ah, I see. The clib requires an internal buffer with a size which is related to the window size (see snippet from file below).

typedef struct {
    uint16_t input_size;        /* bytes in input buffer */
    uint16_t input_index;       /* offset to next unprocessed input byte */
    uint16_t output_count;      /* how many bytes to output */
    uint16_t output_index;      /* index for bytes to output */
    uint16_t head_index;        /* head of window buffer */
    uint8_t state;              /* current state machine node */
    uint8_t current_byte;       /* current byte of input */
    uint8_t bit_index;          /* current bit index */

#if HEATSHRINK_DYNAMIC_ALLOC
    /* Fields that are only used if dynamically allocated. */
    uint8_t window_sz2;         /* window buffer bits */
    uint8_t lookahead_sz2;      /* lookahead bits */
    uint16_t input_buffer_size; /* input buffer size */

    /* Input buffer, then expansion window buffer */
    uint8_t buffers[];
#else
    /* Input buffer, then expansion window buffer */
    uint8_t buffers[(1 << HEATSHRINK_DECODER_WINDOW_BITS(_))
        + HEATSHRINK_DECODER_INPUT_BUFFER_SIZE(_)];
#endif
} heatshrink_decoder;

So maybe this internal buffer enables the higher processing speed.
Maybe I can come up with a pull request, which provides the possibility for an optional buffer or something. This way I could finally remove the dependency to the heatshrink clib in my rust code.
Unfortunately, I need to process files up to 500kB because the original size of the firmware which shall be compressed is in this range.

Thanks for your reply and explanation :-)

@jcdubois
Copy link
Contributor

jcdubois commented Oct 15, 2023

I believe the C implementation provides an optional search_index that is speeding the compression quite a bit (it does not apply to the uncompress side).

See https://spin.atomicobject.com/2014/01/13/lightweight-indexing-for-embedded-systems/ for details.

This is enabled with HEATSHRINK_USE_INDEX in the C version.

I guess you could try to disable the search index in the C version (by unsetting HEATSHRINK_USE_INDEX in the heatshrink config file) and compare the performance you get then with the performance of the RUST version. It should then be similar as the rust version is missing the search index feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants