Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not reproducible compression results with LZ4 #1939

Closed
TheLevti opened this issue Feb 27, 2021 · 15 comments · Fixed by laravel/framework#36412
Closed

Not reproducible compression results with LZ4 #1939

TheLevti opened this issue Feb 27, 2021 · 15 comments · Fixed by laravel/framework#36412
Assignees

Comments

@TheLevti
Copy link

TheLevti commented Feb 27, 2021

I just tested the LZ4 and ZSTD compression options against the REDLOCK algorithm with the laravel cache lock library and it seems like I have the same problem as in #1477. ZSTD worked out of the box with different compression levels.

For LZ4 I am not able to get the same result by calling lz4_compress($string, $compressionLevel) with any compression level. Most of the time the difference is the first byte, which seems to be off.

With compression level 0 (default lz4), I get the following result:

1614439016.383115 [0 10.150.2.1:56450] "SET" "laravel_database_laravel_cache:foo" "\xef\x10\x00\x00\x00\xf0\x011r2fM79GOE1goTTj" "EX" "10" "NX"
1614439016.383433 [0 10.150.2.1:56450] "EVAL" "if redis.call(\"get\",KEYS[1]) == ARGV[1] then\n    return redis.call(\"del\",KEYS[1])\nelse\n    return 0\nend" "1" "laravel_database_laravel_cache:foo" "\x10\x00\x00\x00\xf0\x011r2fM79GOE1goTTj"
1614439016.383451 [0 lua] "get" "laravel_database_laravel_cache:foo"

As you can see the first SET and the EVAL ARGV[1] should be the same, they are but just the first view bytes are different. For the redlock algorithm the user needs to serialize/compress the arguments himself for the EVAL to work, because phpredis is not doing that for the user. For LZF, ZSTD it works fine. As a result the key is never deleted and thus the lock never released until it times out.

Expected behaviour

1614439016.383115 [0 10.150.2.1:56450] "SET" "laravel_database_laravel_cache:foo" "\xef\x10\x00\x00\x00\xf0\x011r2fM79GOE1goTTj" "EX" "10" "NX"

and

1614439016.383433 [0 10.150.2.1:56450] "EVAL" "if redis.call(\"get\",KEYS[1]) == ARGV[1] then\n    return redis.call(\"del\",KEYS[1])\nelse\n    return 0\nend" "1" "laravel_database_laravel_cache:foo" "\x10\x00\x00\x00\xf0\x011r2fM79GOE1goTTj"

should have the same value.

Actual behaviour

phpredis compresses the string differently than the php extension lz4 by calling lz4_compress. (https://github.com/kjdev/php-ext-lz4).

I'm seeing this behaviour on

  • OS: Ubuntu 20.04
  • Redis: 5.0.9
  • PHP: 8.0.2
  • phpredis: 5.3.3

Steps to reproduce, backtrace or example script

This is the EVAL script that is running to remove the locked key. See #1477 for examples.

if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end
``

### I've checked
- [x] There is no similar issue from other users
- [x] Issue isn't fixed in `develop` branch
@TheLevti TheLevti changed the title Not reproducible compression results with LZ4 and ZSTD Not reproducible compression results with LZ4 Feb 27, 2021
@yatsukhnenko
Copy link
Member

phpredis uses LZ4_compress_HC for compression_level greater than zero and less than REDIS_LZ4_MAX_CLEVEL and LZ4_compress_default for all other levels. Also REDIS_LZ4_MAX_CLEVEL depends on version of liblz4
php_lz4_compress uses LZ4_compress_default for compression level 0 and LZ4_compress_HC for levels from 0 to PHP_LZ4_CLEVEL_MAX inclusively.
So the difference should be only when max clevel is used.
@TheLevti could you test different compression levels?

@TheLevti
Copy link
Author

phpredis uses LZ4_compress_HC for compression_level greater than zero and less than REDIS_LZ4_MAX_CLEVEL and LZ4_compress_default for all other levels. Also REDIS_LZ4_MAX_CLEVEL depends on version of liblz4
php_lz4_compress uses LZ4_compress_default for compression level 0 and LZ4_compress_HC for levels from 0 to PHP_LZ4_CLEVEL_MAX inclusively.
So the difference should be only when max clevel is used.
@TheLevti could you test different compression levels?

I tested several compression levels with lz4. Let me post the result here, maybe it will help you. So far I was not able to find a level that produces the same value as the phpredis extension.

@TheLevti
Copy link
Author

phpredis uses LZ4_compress_HC for compression_level greater than zero and less than REDIS_LZ4_MAX_CLEVEL and LZ4_compress_default for all other levels. Also REDIS_LZ4_MAX_CLEVEL depends on version of liblz4
php_lz4_compress uses LZ4_compress_default for compression level 0 and LZ4_compress_HC for levels from 0 to PHP_LZ4_CLEVEL_MAX inclusively.
So the difference should be only when max clevel is used.
@TheLevti could you test different compression levels?

So basically this is my test using laravel's redis cache locking mechanism:

$client->setOption(Redis::OPT_SERIALIZER, Redis::SERIALIZER_NONE);
$client->setOption(Redis::OPT_COMPRESSION, Redis::COMPRESSION_LZ4);
$client->setOption(Redis::OPT_COMPRESSION_LEVEL, 0);

$lock = $store->lock('foo', 10);
$this->assertTrue($lock->get());
// Do something while holding the lock...
$lock->release();
$this->assertNull($store->lockConnection()->get($store->getPrefix().'foo')); // Make sure that the key is gone after our lock was released.

Test result:

image

Redis monitor result:

1614450355.729803 [0 10.150.2.1:58410] "SET" "laravel_database_laravel_cache:foo" "\xef\x10\x00\x00\x00\xf0\x01qN1EMnzkvtPonkJi" "EX" "10" "NX"
1614450355.730312 [0 10.150.2.1:58410] "EVAL" "if redis.call(\"get\",KEYS[1]) == ARGV[1] then\n    return redis.call(\"del\",KEYS[1])\nelse\n    return 0\nend" "1" "laravel_database_laravel_cache:foo" "\x10\x00\x00\x00\xf0\x01qN1EMnzkvtPonkJi"
1614450355.730342 [0 lua] "get" "laravel_database_laravel_cache:foo"
1614450355.730460 [0 10.150.2.1:58410] "GET" "laravel_database_laravel_cache:foo"

As you can see there is no lua execution that calls del as for example with LZF it looks like this:

1614450513.021563 [0 10.150.2.1:58432] "SET" "laravel_database_laravel_cache:foo" "\x0f26jsySpY1HOFwAqe" "EX" "10" "NX"
1614450513.021865 [0 10.150.2.1:58432] "EVAL" "if redis.call(\"get\",KEYS[1]) == ARGV[1] then\n    return redis.call(\"del\",KEYS[1])\nelse\n    return 0\nend" "1" "laravel_database_laravel_cache:foo" "\x0f26jsySpY1HOFwAqe"
1614450513.021885 [0 lua] "get" "laravel_database_laravel_cache:foo"
1614450513.021891 [0 lua] "del" "laravel_database_laravel_cache:foo"
1614450513.021935 [0 10.150.2.1:58432] "GET" "laravel_database_laravel_cache:foo"

@TheLevti
Copy link
Author

Level 1:

1614450641.541730 [0 10.150.2.1:58512] "SET" "laravel_database_laravel_cache:foo" "\xef\x10\x00\x00\x00\xf0\x01uSdBIUgLxv1sIg57" "EX" "10" "NX"
1614450641.542022 [0 10.150.2.1:58512] "EVAL" "if redis.call(\"get\",KEYS[1]) == ARGV[1] then\n    return redis.call(\"del\",KEYS[1])\nelse\n    return 0\nend" "1" "laravel_database_laravel_cache:foo" "\x10\x00\x00\x00\xf0\x01uSdBIUgLxv1sIg57"
1614450641.542040 [0 lua] "get" "laravel_database_laravel_cache:foo"
1614450641.542099 [0 10.150.2.1:58512] "GET" "laravel_database_laravel_cache:foo"

Level 3:

1614450691.033702 [0 10.150.2.1:58518] "SET" "laravel_database_laravel_cache:foo" "\xef\x10\x00\x00\x00\xf0\x01hmYI3UBmjpd5lr0b" "EX" "10" "NX"
1614450691.033941 [0 10.150.2.1:58518] "EVAL" "if redis.call(\"get\",KEYS[1]) == ARGV[1] then\n    return redis.call(\"del\",KEYS[1])\nelse\n    return 0\nend" "1" "laravel_database_laravel_cache:foo" "\x10\x00\x00\x00\xf0\x01hmYI3UBmjpd5lr0b"
1614450691.033959 [0 lua] "get" "laravel_database_laravel_cache:foo"
1614450691.034053 [0 10.150.2.1:58518] "GET" "laravel_database_laravel_cache:foo"

Level 12:

1614450731.977182 [0 10.150.2.1:58528] "SET" "laravel_database_laravel_cache:foo" "\xef\x10\x00\x00\x00\xf0\x01M2J1qHUB2Zh5XkQr" "EX" "10" "NX"
1614450731.977468 [0 10.150.2.1:58528] "EVAL" "if redis.call(\"get\",KEYS[1]) == ARGV[1] then\n    return redis.call(\"del\",KEYS[1])\nelse\n    return 0\nend" "1" "laravel_database_laravel_cache:foo" "\x10\x00\x00\x00\xf0\x01M2J1qHUB2Zh5XkQr"
1614450731.977488 [0 lua] "get" "laravel_database_laravel_cache:foo"
1614450731.977553 [0 10.150.2.1:58528] "GET" "laravel_database_laravel_cache:foo"

Interestingly I get an exception when using lvl 20 for example:
image

I saw in the C code that it should overwrite the level to lvl 12 and still execute. I see no error when I use 0 though.

@TheLevti
Copy link
Author

TheLevti commented Feb 28, 2021

This is the code that tries to mimic the phpredis extension's serialization/compression for the eval call. For serialization its straight forward by just calling the extension's exposed utility method. For compression its a bit cumbersome. But it works with LZF and ZSTD. We had a similar issue with LZF and it was fixed (#1477) so phpredis and lzf both compress in the same way. It would be nice to achieve the same result with the lz4 implementation.

/**
 * Release the lock.
 *
 * @return bool
 */
public function release()
{
    /* If a serialization mode such as "php" or "igbinary" and/or a
     * compression mode such as "lzf" or "zstd" is enabled, the owner
     * must be serialized and/or compressed by us, because phpredis does
     * not do this for the eval command.
     *
     * Name must not be modified!
     */
    $owner = $this->redis->client()->_serialize($this->owner);

    /* Once the phpredis extension exposes a compress function like the
     * above `_serialize()` function, we should switch to it to guarantee
     * consistency in the way the extension serializes and compresses to
     * avoid the need to check each compression option ourselves.
     *
     * @see https://github.com/phpredis/phpredis/issues/1938
     */
    if ($this->compressed()) {
        if ($this->lzfCompressed()) {
            $owner = \lzf_compress($owner);
        } elseif ($this->zstdCompressed()) {
            $owner = \zstd_compress(
                $owner,
                $this->redis->client()->getOption(Redis::OPT_COMPRESSION_LEVEL)
            );
        } elseif ($this->lz4Compressed()) {
            $owner = \lz4_compress(
                $owner,
                $this->redis->client()->getOption(Redis::OPT_COMPRESSION_LEVEL)
            );
        } else {
            throw new UnexpectedValueException(sprintf(
                'Unknown phpredis compression in use (%d). Unable to release lock.',
                $this->redis->client()->getOption(Redis::OPT_COMPRESSION)
            ));
        }
    }

    return (bool) $this->redis->eval(LuaScripts::releaseLock(), 1, $this->name, $owner);
}

@TheLevti
Copy link
Author

TheLevti commented Mar 14, 2021

Another example I noticed while adding compression/serialization support to laravel phpredis integration:

image

For some reason there is always some random letter appended when using lz4_compress, which is not gone when reading with phpredis with lz4 enabled. Only the first hex characters are stripped away. The dump seen here in the screenshot is from calling lz4 on the argument list when passing down to eval(). Currently it is impossible to use LZ4 together with any eval call, because phpredis or lz4 produce inconsistent result. It would be really nice if both extensions work the same. Again, we had the same issue with LZF and this got fixed. LZF and ZSTD work perfectly fine although there is some inconsistency with ZSTD. I will create another ticket for it, but its usable at least.

@TheLevti
Copy link
Author

Anyone got an idea how to resolve this? This is currently a blocker to add compression support to the laravel framework. At least if someone could give me a hint where to look, I would try myself.

@michael-grunder
Copy link
Member

Hi @TheLevti,

The binary mismatch in LZ4 data is because I attach a crc8 + length header as a quick sanity check. I honestly can't remember why I wrote it that way, but unfortunately, we can't just change it because it would break backward compatibility.

I'm happy to help get this working for you though.

One solution would be to add something like OPT_COMPRESSION_UNCHECKED which would by default be disabled. Another possibility would be to create _compress and _uncompress helper methods that you could call.

@TheLevti
Copy link
Author

TheLevti commented Mar 29, 2021

Hi @TheLevti,

The binary mismatch in LZ4 data is because I attach a crc8 + length header as a quick sanity check. I honestly can't remember why I wrote it that way, but unfortunately, we can't just change it because it would break backward compatibility.

I'm happy to help get this working for you though.

One solution would be to add something like OPT_COMPRESSION_UNCHECKED which would by default be disabled. Another possibility would be to create _compress and _uncompress helper methods that you could call.

The last option would be a perfect solution. Or both. For the utility function I even have opened an issue already: #1938

And thank you for the hint with crc. Maybe i can just do 1:1 on php side to achieve the same result. But yes if a helper function can be added, it would solve all our problems ^^

@michael-grunder michael-grunder self-assigned this Mar 31, 2021
michael-grunder added a commit that referenced this issue Apr 21, 2021
This commit splits compression and serialization into two distinct parts
and adds some utility functions so the user can compress/uncompress
or pack/unpack data explicily.

See #1939
@michael-grunder
Copy link
Member

michael-grunder commented Apr 21, 2021

It's still a WIP but you can check out the changes pushed above.

Not sure what the final API should look like but I've added _compress, _uncompress, _pack, and _unpack helper methods that can be tested.

I'm going to see what it would take to do the UNCHECKED option.

_pack and _unpack might be overkill since one could replicate the functionality with _compress and _serialize.

Edit: Turns out there was a reason I used a checksum and a header. It's because there's no way to distinguish between insufficient space in the buffer and malformed data with LZ4 (related LZ4 issue)

It may be possible to accomplish unchecked decompression by growing the buffer up to some reasonable multiple of the compressed length when hitting an error.

@TheLevti
Copy link
Author

It's still a WIP but you can check out the changes pushed above.

Not sure what the final API should look like but I've added _compress, _uncompress, _pack, and _unpack helper methods that can be tested.

I'm going to see what it would take to do the UNCHECKED option.

_pack and _unpack might be overkill since one could replicate the functionality with _compress and _serialize.

Edit: Turns out there was a reason I used a checksum and a header. It's because there's no way to distinguish between insufficient space in the buffer and malformed data with LZ4 (related LZ4 issue)

It may be possible to accomplish unchecked decompression by growing the buffer up to some reasonable multiple of the compressed length when hitting an error.

Thank you very much, I will try out these the coming days and hope can give some helpful feedback. If we have the pack/unpack helpers or even separated by compress/serialize, it in the end really does not matter how the extension modifies the values and whether it matches the other php extensions. Its just important to be able to reproduce the same results as phpredis to be able to pass correct values to the eval call.

Pack/unpack would be nice, because then we would not need to care about the order in which it needs to be modified. But its not that important if you say its too much work. I am already super happy to be able to have a similar compression helper as we currently have with _serialize.

@TheLevti
Copy link
Author

TheLevti commented May 2, 2021

It's still a WIP but you can check out the changes pushed above.

Not sure what the final API should look like but I've added _compress, _uncompress, _pack, and _unpack helper methods that can be tested.

I'm going to see what it would take to do the UNCHECKED option.

_pack and _unpack might be overkill since one could replicate the functionality with _compress and _serialize.

Edit: Turns out there was a reason I used a checksum and a header. It's because there's no way to distinguish between insufficient space in the buffer and malformed data with LZ4 (related LZ4 issue)

It may be possible to accomplish unchecked decompression by growing the buffer up to some reasonable multiple of the compressed length when hitting an error.

Ok I finally found the time to fork the repo and build your branch. All my tests for the laravel pull request that were waiting for the convenience functions are passing now. I had troubles with LZ4 and ZSTD and with the _pack function all problems are gone and tests pass now.

Code with manual packing logic + not working LZ4:

image

Code after your addition:

image

Finally uncommented tests:

image

Test executions without LZ4 and with this big logic block to properly manually pack (last execution is with LZ4 and a single line call):

image

Also the redlock algorithm is now working with lz4 enabled.

@TheLevti
Copy link
Author

Can we create a pull request out of your branch? For me the utility functions provided there solve all problems.

@michael-grunder
Copy link
Member

I'll create a PR this weekend, @yatsukhnenko can find all my bugs, and then we'll get it merged.

michael-grunder added a commit that referenced this issue Jun 22, 2021
This commit splits compression and serialization into two distinct parts
and adds some utility functions so the user can compress/uncompress
or pack/unpack data explicily.

See #1939
michael-grunder added a commit that referenced this issue Oct 5, 2021
This commit splits compression and serialization into two distinct parts
and adds some utility functions so the user can compress/uncompress
or pack/unpack data explicily.

See #1939
@yatsukhnenko
Copy link
Member

Resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants