Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add zstd long range option #17184

Merged
merged 3 commits into from
Jan 13, 2021
Merged

Conversation

ygrek
Copy link
Contributor

@ygrek ygrek commented Nov 18, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

add support for zstd long option for better compression of string columns to save space

Detailed description / Documentation draft:

ZSTD compression codec now takes second optional argument - window size (expressed as power of 2). If this argument is missing or specified as zero - zstd will use default window size.

@@ -22,7 +22,7 @@ struct MergeTreeWriterSettings
MergeTreeWriterSettings(const Settings & global_settings, bool can_use_adaptive_granularity_,
size_t aio_threshold_, bool blocks_are_granules_size_ = false)
: min_compress_block_size(global_settings.min_compress_block_size)
, max_compress_block_size(global_settings.min_compress_block_size)
, max_compress_block_size(global_settings.max_compress_block_size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -33,7 +33,7 @@ class IColumn;

#define COMMON_SETTINGS(M) \
M(UInt64, min_compress_block_size, 65536, "The actual size of the block to compress, if the uncompressed data less than max_compress_block_size is no less than this value and no less than the volume of data for one mark.", 0) \
M(UInt64, max_compress_block_size, 1048576, "The maximum size of blocks of uncompressed data before compressing for writing to a table.", 0) \
M(UInt64, max_compress_block_size, 8048576, "The maximum size of blocks of uncompressed data before compressing for writing to a table.", 0) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not change it by default, because it will slow down point queries.

@ygrek
Copy link
Contributor Author

ygrek commented Dec 8, 2020

currently failing fast tests are
01200_mutations_memory_consumption (which also fails on master) with

01200_mutations_memory_consumption:                                     [ FAIL ] - return code 60
[devbox1] 2020.12.08 23:40:04.368173 [ 117234 ] {8c8e0ab5-dc07-49bf-9f44-095afdc0bfc0} <Error> executeQuery: Code: 60, e.displayText() = DB::Exception: Table system.part_log doesn't exist. (version 20.12.1.1) (from 127.0.0.1:40216) (in query: SELECT DISTINCT read_bytes >= peak_memory_usage FROM system.part_log WHERE event_type = 'MutatePart' AND table = 'table_with_single_pk' AND database = currentDatabase(); ), Stack trace (when copying this message, always include the lines below):

and
01593_concurrent_alter_mutations_kill_many_replicas which I cannot reproduce :

$ ./tests/clickhouse-test 01593_concurrent_alter_mutations_kill_many_replicas
Using queries from '/home/user/clickhouse/tests/queries' directory
Connecting to ClickHouse server... OK
Won't run stateful tests because test data wasn't loaded.
No tests were run.

@ygrek ygrek force-pushed the add_zstd_long_range branch 2 times, most recently from 68390c6 to 8dc7d12 Compare December 11, 2020 23:24
@ygrek ygrek marked this pull request as ready for review December 11, 2020 23:25
@ygrek ygrek force-pushed the add_zstd_long_range branch from 8dc7d12 to 5d8869a Compare December 11, 2020 23:42
@abyss7
Copy link
Contributor

abyss7 commented Dec 22, 2020

Is it ready for review? Can you fix the title and description then.

@abyss7 abyss7 self-assigned this Dec 22, 2020
@ygrek ygrek changed the title [wip] add zstd long range option add zstd long range option Dec 28, 2020
@ygrek
Copy link
Contributor Author

ygrek commented Dec 28, 2020

updated

@ygrek ygrek force-pushed the add_zstd_long_range branch from 5d8869a to 0abf724 Compare December 28, 2020 23:55
Copy link
Contributor

@abyss7 abyss7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is good, but there is a couple of minor issues. The main request is to add a simple test that will cover your new code at least: like reading and writing with custom window log param.

if (arguments->children.size() > 1)
{
const auto * window_literal = children[1]->as<ASTLiteral>();
if (!window_literal)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not checking if this is really an integer at this point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, this code does the same as other compression codecs (and same what is done for first parameters in zstd codec, above)

src/Compression/CompressionCodecZSTD.cpp Outdated Show resolved Hide resolved
src/Compression/CompressionCodecZSTD.cpp Outdated Show resolved Hide resolved
src/Compression/CompressionCodecZSTD.cpp Show resolved Hide resolved
@ygrek
Copy link
Contributor Author

ygrek commented Dec 30, 2020

test added

@ygrek ygrek force-pushed the add_zstd_long_range branch from 72765cd to 4a7854d Compare January 8, 2021 22:05
@ygrek ygrek requested a review from abyss7 January 8, 2021 22:05
@abyss7 abyss7 merged commit 8f2a830 into ClickHouse:master Jan 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants