-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
walk.rs: queue -> stack #450
Conversation
Change the pool of "files to search" from queue to a stack. This causes ripgrep to approximate depth-first search instead of breadth-first search. This dramatically reduces the size of the pool, since most directories are much more "wide" than they are "deep". As a result, ripgrep uses ~45% less peak memory: Before: /usr/bin/time -f '%M' rg PM_RESUME > /dev/null ... 8876 After: /usr/bin/time -f '%M' rg PM_RESUME > /dev/null 16240 Throughput improves a tiny bit: Before: linux_literal (pattern: PM_RESUME) ---------------------------------- rg (ignore)* 0.376 +/- 0.003 (lines: 16)* After: linux_literal (pattern: PM_RESUME) ---------------------------------- rg (ignore)* 0.371 +/- 0.004 (lines: 16)* Related to BurntSushi#152.
OK, so I finally had a chance to manually QA this. If I run |
Great, thanks for taking a look! I've tested it on my end and can also confirm that master is more consistent. I don't really have a great explanation; it's rather tricky to diff two types of undefined behaviors. I guess this PR is a bit open-ended: how do you defined consistency? Master is more likely to output in the same order from one run to another. But "stack" will usually group directories together. If you only run ripgrep once, which output is better? Master:
Stack:
|
Right, if you qualify it with "which is better if you only run ripgrep once," then I agree that this PR is better. But I'm not sure that I buy that qualification... |
I did a little bit of a back-of-the-envelope test here on how different these two actually are. I ran 1,000 greps of PM_SUSPEND of Linux through current HEAD (45e850a) and through the PR branch here (d4fa015), both with SIMD/AVX optimizations, on a machine with 4 CPU cores. HEAD showed very little overlap; of the 1000 entries, there are 998 unique md5sums. The PR branch shows no overlap at all; all 1000 entries are unique. As you might expect, there is also no overlap between the two sets. To get a rough measure of "compactness", I looked at the number of times that top level directories were listed not in a row. (that is: "a/foo, a/bar, b/foo/, a/bar" would have a score of 3) Master:
PR:
So, the PR seems to produce more concise results on average, though the spread is a little worse. Food for thought! |
@hlieberman Thanks for that analysis! I think for now I'm going to close this PR. It's a neat idea, and we might want to revisit it, but I don't feel that there's compelling enough evidence to switch at this point. |
Change the pool of "files to search" from queue to a stack. This causes
ripgrep to approximate depth-first search instead of breadth-first
search. This dramatically reduces the size of the pool, since most
directories are much more "wide" than they are "deep".
As a result, ripgrep uses ~45% less peak memory:
Before:
After:
Throughput improves a tiny bit:
Before:
After:
Related to #152.