Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v23.3.x] cloud_storage: concurrent directory walking #19816

Merged

Conversation

vbotbuildovich
Copy link
Collaborator

Backport of PR #18758

This makes it possible to walk directories concurrently.
It will be done in a subsequent commit.

(cherry picked from commit ec0e855)
One step closer to concurrent walking.

(cherry picked from commit f856d46)
This speeds up cache trimming by 2 orders of magnitude even on a lightly
loaded reactor. At the cost of higher number of open fds (up to 2K
increase: one for directory, one for the stat call in
`walk_accumulator::visit`). Also, higher pressure on the reactor (extra
1K futures/tasks), IO subsystem, syscall thread.

"Lightly loaded reactor" test case:

```cpp
SEASTAR_THREAD_TEST_CASE(empty_dir) {
    cloud_storage::recursive_directory_walker w;
    const std::filesystem::path target_dir = "/tmp/recdirtest";

    stress_config cfg;
    cfg.min_spins_per_scheduling_point = 1;
    cfg.max_spins_per_scheduling_point = 1;
    cfg.num_fibers = 1;
    auto mgr = stress_fiber_manager{};
    BOOST_REQUIRE(mgr.start(cfg));

    access_time_tracker tracker;
    auto result = w.walk(target_dir.native(), tracker).get();

    mgr.stop().get();
}
```

/tmp/recdirtest is filled by the following script:

```py
import os
import xxhash
from pathlib import Path

for folders in range (0, 600000):
    x = xxhash.xxh32()
    x.update(f'folder{folders}')
    d = x.hexdigest();
    Path(f"./{d}/kafka").mkdir(parents=True, exist_ok=True)
    open(f'./{d}/kafka/segment.bin', 'a').close()
```

(cherry picked from commit ee34998)
@vbotbuildovich vbotbuildovich added this to the v23.3.x-next milestone Jun 12, 2024
@vbotbuildovich vbotbuildovich added the kind/backport PRs targeting a stable branch label Jun 12, 2024
@nvartolomei nvartolomei merged commit 8126c40 into redpanda-data:v23.3.x Jun 17, 2024
16 of 17 checks passed
@BenPope BenPope modified the milestones: v23.3.x-next, v23.3.18 Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backport PRs targeting a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants