Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v24.1.x] cloud_storage: concurrent directory walking #19815

Conversation

vbotbuildovich
Copy link
Collaborator

Backport of PR #18758

This makes it possible to walk directories concurrently.
It will be done in a subsequent commit.

(cherry picked from commit ec0e855)
One step closer to concurrent walking.

(cherry picked from commit f856d46)
This speeds up cache trimming by 2 orders of magnitude even on a lightly
loaded reactor. At the cost of higher number of open fds (up to 2K
increase: one for directory, one for the stat call in
`walk_accumulator::visit`). Also, higher pressure on the reactor (extra
1K futures/tasks), IO subsystem, syscall thread.

"Lightly loaded reactor" test case:

```cpp
SEASTAR_THREAD_TEST_CASE(empty_dir) {
    cloud_storage::recursive_directory_walker w;
    const std::filesystem::path target_dir = "/tmp/recdirtest";

    stress_config cfg;
    cfg.min_spins_per_scheduling_point = 1;
    cfg.max_spins_per_scheduling_point = 1;
    cfg.num_fibers = 1;
    auto mgr = stress_fiber_manager{};
    BOOST_REQUIRE(mgr.start(cfg));

    access_time_tracker tracker;
    auto result = w.walk(target_dir.native(), tracker).get();

    mgr.stop().get();
}
```

/tmp/recdirtest is filled by the following script:

```py
import os
import xxhash
from pathlib import Path

for folders in range (0, 600000):
    x = xxhash.xxh32()
    x.update(f'folder{folders}')
    d = x.hexdigest();
    Path(f"./{d}/kafka").mkdir(parents=True, exist_ok=True)
    open(f'./{d}/kafka/segment.bin', 'a').close()
```

(cherry picked from commit ee34998)
@vbotbuildovich vbotbuildovich requested a review from a team as a code owner June 12, 2024 13:33
@vbotbuildovich vbotbuildovich added this to the v24.1.x-next milestone Jun 12, 2024
@vbotbuildovich vbotbuildovich added the kind/backport PRs targeting a stable branch label Jun 12, 2024
@piyushredpanda piyushredpanda merged commit 27e87b7 into redpanda-data:v24.1.x Jun 14, 2024
18 checks passed
@piyushredpanda piyushredpanda modified the milestones: v24.1.x-next, v24.1.8 Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda kind/backport PRs targeting a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants