Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seekable_format no zstd header compressing empty string to stream #3058

Closed
wants to merge 1 commit into from
Closed

seekable_format no zstd header compressing empty string to stream #3058

wants to merge 1 commit into from

Conversation

yhoogstrate
Copy link
Contributor

Dear zstd maintainers,

I believe I ran into a small bug in zstd_seekable. When compressing an empty string, the output of the compressed data does not include the zstd seekable header:

// Example 1, compressing "test"

char *data_in = (char *) malloc_or_die(255 * sizeof(char));
data_in = "test\0";

char *data_out = (char *) malloc_or_die(255 * 255 * sizeof(char));

ZSTD_seekable_CStream *s = ZSTD_seekable_createCStream();
size_t const initResult = ZSTD_seekable_initCStream(s, 1, 1, 1024 * 1024);

ZSTD_inBuffer input = { data_in, strlen(data_in), 0 };
ZSTD_outBuffer output = { data_out, 255*255, 0 };

size_t toRead = ZSTD_seekable_compressStream(s, &output, &input);
size_t const remainingToFlush = ZSTD_seekable_endStream(s, &output);

printf("size compressed output: %i\n",output.pos);

ZSTD_seekable_freeCStream(s);
// Example 2, compressing "" (empty string)

char *data_in = (char *) malloc_or_die(255 * sizeof(char));
data_in = "\0";

char *data_out = (char *) malloc_or_die(255 * 255 * sizeof(char));

ZSTD_seekable_CStream *s = ZSTD_seekable_createCStream();
size_t const initResult = ZSTD_seekable_initCStream(s, 1, 1, 1024 * 1024);

ZSTD_inBuffer input = { data_in, strlen(data_in), 0 };
ZSTD_outBuffer output = { data_out, 255*255, 0 };

size_t toRead = ZSTD_seekable_compressStream(s, &output, &input);
size_t const remainingToFlush = ZSTD_seekable_endStream(s, &output);

printf("size compressed output: %i\n",output.pos);

ZSTD_seekable_freeCStream(s);

results in the following two compressed strings:

test 1:

00000000: 28b5 2ffd 0048 2100 0074 6573 745e 2a4d  (./..H!..test^*M
00000010: 1815 0000 000d 0000 0004 0000 0039 8167  .............9.g
00000020: db01 0000 0080 b1ea 928f                 ..........
test 2:

00000000: 5e2a 4d18 0900 0000 0000 0000 80b1 ea92  ^*M.............
00000010: 8f                                       .

In case an empty string is compressed (test 2), the compressed output does not start with the ZSTD MAGIC but with a skippable frame (5e2a 4d18) directly.

I think I have found the issue and resolved it in this PR. After the patch, it outputs the following:

00000000: 28b5 2ffd 2000 0100 005e 2a4d 1815 0000  (./. ....^*M....
00000010: 0009 0000 0000 0000 0099 e9d8 5101 0000  ............Q...
00000020: 0080 b1ea 928f                           ......

The PR also contains a test case testing for the ZSTD MAGIC in the compressed output of an empty string.

all best,

Youri

@yhoogstrate
Copy link
Contributor Author

Hi @iburinoc @Cyan4973,

From looking into the git history, I think you are the most technically grounded committers regarding this part of the codebase. Would you like to take a look at it? I believe it's a rather simple commit, but the patch would be really helpful in proceeding with our bioinformatics DNA sequence compressor.

kind regards,

Youri

@daniellerozenblit
Copy link
Contributor

Closed and re-opened the PR in order to trigger the CI tests to run again.

@yhoogstrate
Copy link
Contributor Author

Closing this in favor of a superseding PR by @daniellerozenblit

@Cyan4973 Cyan4973 mentioned this pull request Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants