Reduce Keeper memory usage #59002

antonio2368 · 2024-01-19T15:36:40Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Keeper improvement: reduce Keeper's memory usage for stored nodes.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

robot-clickhouse · 2024-01-19T15:39:43Z

This is an automated comment for commit 274c128 with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Successful checks

Check name	Description	Status
AST fuzzer	Runs randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help	✅ success
ClickBench	Runs [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table	✅ success
ClickHouse Keeper Jepsen	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
ClickHouse build check	Builds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process	✅ success
Compatibility check	Checks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help	✅ success
Docker server and keeper images	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Docs check	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Fast tests	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Flaky tests	Checks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc	✅ success
Install packages	Checks that the built packages are installable in a clear environment	✅ success
Integration tests	The integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests	✅ success
Mergeable Check	Checks if all other necessary checks are successful	✅ success
Performance Comparison	Measure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests	✅ success
SQLTest	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
SQLancer	Fuzzing tests that detect logical bugs with SQLancer tool	✅ success
Sqllogic	Run clickhouse on the sqllogic test set against sqlite and checks that all statements are passed	✅ success
Stateful tests	Runs stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	✅ success
Stress test	Runs stateless functional tests concurrently from several clients to detect concurrency-related errors	✅ success
Style check	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Unit tests	Runs the unit tests for different release types	✅ success

Check name	Description	Status
CI running	A meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR	⏳ pending
Stateless tests	Runs stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	❌ failure
Upgrade check	Runs stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts	❌ failure

nikitamikhaylov · 2024-01-19T16:17:25Z

src/Coordination/SnapshotableHashTable.h

 };

 template <class V>
 class SnapshotableHashTable
 {
 private:
+    struct GlobalArena
+    {
+        char * alloc(const size_t size)


Why not returning std::unique_ptr<char[]> ?

because I don't want to store that pointer anywhere but inside the StringRef key, it will only take extra space
also, this way it's compatible with other Arenas and it's able to use e.g. copyStringInArena so I can change them easily if needed

hanfei1991 · 2024-01-23T13:12:34Z

src/Coordination/SnapshotableHashTable.h

+    /// Arena used for keys
+    /// we don't use std::string because it uses 24 bytes (because of SSO)
+    /// we want to always allocate the key on heap and use StringRef to it
+    GlobalArena arena;


I am curious that why we used ArenaWithFreeLists before.

Now we use GlobalArena, but why not Common/Allocator.h. It looks having same features but traced by MemoryTracker.

All our new/delete calls are tracked with MemoryTracker.
As this is a simple new/delete call I didn't see a reason to use something more complex like Allocator.h

I am curious that why we used ArenaWithFreeLists before.

I'm not sure what were the benefits but it had huge downsides IMO.
ArenaWithFreeLists doesn't work for global usage like this because it doesn't deallocate memory on free, just adds it to free list. E.g. you create 50m nodes and delete all of them, all memory allocated for their paths will be kept in memory for no reason.
ArenaWithFreeList has bins with 2^n size. The overhead is much worse for larger bins and we end up with a lot of unused memory. In case of global allocator we will use jemallocs bins which are much closer to the real size we want to allocate (https://jemalloc.net/jemalloc.3.html#size_classes)

hanfei1991 · 2024-01-23T13:14:09Z

src/Coordination/SnapshotableHashTable.h

@@ -288,6 +363,7 @@ class SnapshotableHashTable

    void clear()
    {
+        clearOutdatedNodes();


this function seems to clear all the map. Why clear outdated nodes first?

because the key is shared so we don't want to deallocate the same key twice.
We first clear all duplicates created for snapshot with clearOutdatedNodes and then all the nodes.

hanfei1991 · 2024-01-23T13:14:32Z

src/Coordination/SnapshotableHashTable.h

            approximate_data_size += node.key.size;
            approximate_data_size += node.value.sizeInBytes();
        }
    }

-    uint64_t keyArenaSize() const { return arena.allocatedBytes(); }
+    uint64_t keyArenaSize() const { return 0; }


Why return 0? Do we still need this metric?

we don't but I don't want to break backwards compatibility for now by removing this row from mntr command.

antonio2368 added the jepsen-test Need to test this PR with jepsen tests label Jan 19, 2024

robot-clickhouse added the pr-improvement Pull request with some product improvements label Jan 19, 2024

antonio2368 force-pushed the keeper-reduce-memory branch 2 times, most recently from 94ddbd1 to da3153c Compare January 19, 2024 16:03

nikitamikhaylov reviewed Jan 19, 2024

View reviewed changes

Reduce Keeper memory usage

0132455

antonio2368 force-pushed the keeper-reduce-memory branch from da3153c to 0132455 Compare January 19, 2024 17:01

hanfei1991 self-assigned this Jan 22, 2024

Merge branch 'master' into keeper-reduce-memory

59f9abc

antonio2368 force-pushed the keeper-reduce-memory branch from 92cbe9a to 82dce23 Compare January 22, 2024 10:45

Free memory

274c128

antonio2368 force-pushed the keeper-reduce-memory branch from 82dce23 to 274c128 Compare January 22, 2024 10:48

hanfei1991 reviewed Jan 23, 2024

View reviewed changes

antonio2368 added the pr-must-backport-cloud label Jan 23, 2024

hanfei1991 approved these changes Jan 24, 2024

View reviewed changes

antonio2368 merged commit 34463fd into master Jan 24, 2024
262 of 270 checks passed

antonio2368 deleted the keeper-reduce-memory branch January 24, 2024 10:54

robot-clickhouse-ci-2 added the pr-backports-created-cloud label Jan 24, 2024

robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jan 24, 2024

antonio2368 mentioned this pull request Feb 12, 2024

Reduce size of node in Keeper even more #59592

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce Keeper memory usage #59002

Reduce Keeper memory usage #59002

antonio2368 commented Jan 19, 2024

robot-clickhouse commented Jan 19, 2024 •

edited by robot-ch-test-poll

Loading

nikitamikhaylov Jan 19, 2024

antonio2368 Jan 19, 2024

hanfei1991 Jan 23, 2024

antonio2368 Jan 23, 2024

hanfei1991 Jan 23, 2024

antonio2368 Jan 23, 2024

hanfei1991 Jan 23, 2024

antonio2368 Jan 23, 2024

Reduce Keeper memory usage #59002

Reduce Keeper memory usage #59002

Conversation

antonio2368 commented Jan 19, 2024

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

robot-clickhouse commented Jan 19, 2024 • edited by robot-ch-test-poll Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robot-clickhouse commented Jan 19, 2024 •

edited by robot-ch-test-poll

Loading