Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduced memory overhead of preparing LZ4-compressed data for server. #110

Conversation

Enmk
Copy link
Collaborator

@Enmk Enmk commented Oct 30, 2021

Do not compress a whole serialized block, but instead only a reasonable-sized chunk.
This removes some temporary buffers and reduces memory pressure.

Also minor refactoring:

  • moved all serialization-format code to WireFormat class.
  • removed CodedOutputStream and CodedInputStream classes.

Memory usage

Test executed against a single block of 5 columns by INSERTing and SELECTing rows back from the server. Both binaries were built in RelWithDebInfo mode.

  • initial - almost right after the program start
  • before INSERTing - a moment when the block is prepared in memory, but before actual call to Client::Insert
  • after INSERTing - right after Client::Insert, NOTE: original Block is still in memory for validation.
  • after SELECTing - right after client.Select("SELECT * FROM ..."), NOTE: original Block is still in memory for validation.

Original implementation

// 10M rows:
Block of: U64 UInt64, S String, FS FixedString(8), LCS LowCardinality(String), LCFS LowCardinality(FixedString(8)) with 10000000rows
RSS
                     initial    value :    4550656      peak:    4550656
            before INSERTing    value :  476459008      peak:  503341056
             after INSERTing    value :  476602368      peak: 1117761536
             after SELECTing    value :  545083392      peak: 1277018112

// 100M rows:
Block of: U64 UInt64, S String, FS FixedString(8), LCS LowCardinality(String), LCFS LowCardinality(FixedString(8)) with 100000000rows
unknown file: Failure
C++ exception with description "DB::Exception: Unexpected packet Data received from client" thrown in the test body.

Version in this PR

// 10M rows:
Block of: U64 UInt64, S String, FS FixedString(8), LCS LowCardinality(String), LCFS LowCardinality(FixedString(8)) with 10000000rows
RSS
                     initial    value :    5767168      peak:    5767168
            before INSERTing    value :  477839360      peak:  504721408
             after INSERTing    value :  477839360      peak:  504721408
             after SELECTing    value :  546471936      peak: 1278402560

// 100M Rows
Block of: U64 UInt64, S String, FS FixedString(8), LCS LowCardinality(String), LCFS LowCardinality(FixedString(8)) with 100000000rows
RSS
                     initial    value :    5857280      peak:    5857280
            before INSERTing    value : 4714172416      peak: 4850671616
             after INSERTing    value : 4714172416      peak: 4850671616
             after SELECTing    value : 5696409600      peak: 12135694336

Comparison

Since the original version failed to insert 100M rows, we are going to compare 10M rows memory usage.
As you can see, original implementation peaks to 1117761536 bytes upon insertion, 503341056 of whose are related to the original Block residing in memory, that is 1117761536 - 503341056 = 614420480 bytes (~0.57Gib) of memory used only to INSERT and send data to the server.

The modified implementation uses an insignificant amount of extra memory, untraceable with the current approach. Moreover, it is undetectable even upon insertion of 100M rows.

Conclusion

Modified version (presented in this PR) uses O(1) extra memory, vs O(n) for the original version.

@traceon traceon self-assigned this Nov 3, 2021
@traceon
Copy link
Collaborator

traceon commented Nov 3, 2021

@Enmk I see some seemingly conflicting changes between this and #109. Let's merge #109 before fully reviewing this one.

Do not compress a whole serialized block, but instead only a reasonable-sized chunk.
This removes some temporary buffers and reduces memory pressure.

Also minor refactoring:
- moved all serialization-format code to WireFormat class.
- removed CodedOutputStream and CodedInputStream classes.
@Enmk Enmk force-pushed the memory_optimization_on_send_and_receive_lz4_blocks branch from 009b8af to b148348 Compare November 12, 2021 14:56
Copy link
Collaborator

@traceon traceon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left inline comments.

ut/stream_ut.cpp Outdated Show resolved Hide resolved
clickhouse/base/coded.cpp Outdated Show resolved Hide resolved
clickhouse/base/coded.h Show resolved Hide resolved
clickhouse/base/compressed.cpp Outdated Show resolved Hide resolved
clickhouse/base/compressed.cpp Outdated Show resolved Hide resolved
clickhouse/base/wire_format.h Outdated Show resolved Hide resolved
clickhouse/base/wire_format.cpp Outdated Show resolved Hide resolved
clickhouse/base/wire_format.cpp Outdated Show resolved Hide resolved
clickhouse/base/wire_format.cpp Outdated Show resolved Hide resolved
clickhouse/base/wire_format.cpp Outdated Show resolved Hide resolved
@traceon
Copy link
Collaborator

traceon commented Nov 12, 2021

Could you please provide some numbers of improvements (and absence of degradation)? Like "before vs after" perf tests results, etc.

if (estimated_compressed_buffer_size <= 0)
throw std::runtime_error("Failed to estimate compressed buffer size, LZ4 error: " + std::to_string(estimated_compressed_buffer_size));

compressed_buffer_.resize(estimated_compressed_buffer_size + HEADER_SIZE + EXTRA_COMPRESS_BUFFER_SIZE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a low hanging fruit for optimization: resize without initialization.

@traceon traceon merged commit b10d71e into ClickHouse:master Nov 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants