Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
cloud_storage: concurrent directory walking
This speeds up cache trimming by 2 orders of magnitude even on a lightly loaded reactor. At the cost of higher number of open fds (up to 2K increase: one for directory, one for the stat call in `walk_accumulator::visit`). Also, higher pressure on the reactor (extra 1K futures/tasks), IO subsystem, syscall thread. "Lightly loaded reactor" test case: ```cpp SEASTAR_THREAD_TEST_CASE(empty_dir) { cloud_storage::recursive_directory_walker w; const std::filesystem::path target_dir = "/tmp/recdirtest"; stress_config cfg; cfg.min_spins_per_scheduling_point = 1; cfg.max_spins_per_scheduling_point = 1; cfg.num_fibers = 1; auto mgr = stress_fiber_manager{}; BOOST_REQUIRE(mgr.start(cfg)); access_time_tracker tracker; auto result = w.walk(target_dir.native(), tracker).get(); mgr.stop().get(); } ``` /tmp/recdirtest is filled by the following script: ```py import os import xxhash from pathlib import Path for folders in range (0, 600000): x = xxhash.xxh32() x.update(f'folder{folders}') d = x.hexdigest(); Path(f"./{d}/kafka").mkdir(parents=True, exist_ok=True) open(f'./{d}/kafka/segment.bin', 'a').close() ``` (cherry picked from commit ee34998)
- Loading branch information