Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

walk.rs: queue -> stack #450

Closed
wants to merge 1 commit into from
Closed

Conversation

bmalehorn
Copy link
Contributor

Change the pool of "files to search" from queue to a stack. This causes
ripgrep to approximate depth-first search instead of breadth-first
search. This dramatically reduces the size of the pool, since most
directories are much more "wide" than they are "deep".

As a result, ripgrep uses ~45% less peak memory:

Before:

/usr/bin/time -f '%M' rg PM_RESUME > /dev/null
...
8876

After:

/usr/bin/time -f '%M' rg PM_RESUME > /dev/null
16240

Throughput improves a tiny bit:

Before:

linux_literal (pattern: PM_RESUME)
----------------------------------
rg (ignore)*  0.376 +/- 0.003 (lines: 16)*

After:

linux_literal (pattern: PM_RESUME)
----------------------------------
rg (ignore)*  0.371 +/- 0.004 (lines: 16)*

Related to #152.

Change the pool of "files to search" from queue to a stack. This causes
ripgrep to approximate depth-first search instead of breadth-first
search. This dramatically reduces the size of the pool, since most
directories are much more "wide" than they are "deep".

As a result, ripgrep uses ~45% less peak memory:

Before:

    /usr/bin/time -f '%M' rg PM_RESUME > /dev/null
    ...
    8876

After:

    /usr/bin/time -f '%M' rg PM_RESUME > /dev/null
    16240

Throughput improves a tiny bit:

Before:

    linux_literal (pattern: PM_RESUME)
    ----------------------------------
    rg (ignore)*  0.376 +/- 0.003 (lines: 16)*

After:

    linux_literal (pattern: PM_RESUME)
    ----------------------------------
    rg (ignore)*  0.371 +/- 0.004 (lines: 16)*

Related to BurntSushi#152.
@BurntSushi
Copy link
Owner

OK, so I finally had a chance to manually QA this. If I run rg PM_SUSPEND in a checkout of the Linux kernel, then the results appear much more consistent from run-to-run in current master than they do with this PR. Is this something you expected to happen?

@bmalehorn
Copy link
Contributor Author

Great, thanks for taking a look!

I've tested it on my end and can also confirm that master is more consistent. I don't really have a great explanation; it's rather tricky to diff two types of undefined behaviors.

I guess this PR is a bit open-ended: how do you defined consistency? Master is more likely to output in the same order from one run to another. But "stack" will usually group directories together. If you only run ripgrep once, which output is better?

Master:

include/linux/ide.h
Documentation/dev-tools/sparse.rst
drivers/ide/ide-pm.c
include/uapi/linux/apm_bios.h
arch/x86/kernel/apm_32.c
Documentation/translations/zh_CN/sparse.txt
drivers/input/mouse/cyapa.h
drivers/usb/mtu3/mtu3_hw_regs.h
drivers/mtd/maps/pcmciamtd.c
drivers/net/wireless/intersil/hostap/hostap_cs.c

Stack:

drivers/mtd/maps/pcmciamtd.c
drivers/ide/ide-pm.c
drivers/net/wireless/intersil/hostap/hostap_cs.c
drivers/usb/mtu3/mtu3_hw_regs.h
drivers/input/mouse/cyapa.h
Documentation/translations/zh_CN/sparse.txt
Documentation/dev-tools/sparse.rst
arch/x86/kernel/apm_32.c
include/uapi/linux/apm_bios.h
include/linux/ide.h

@BurntSushi
Copy link
Owner

Right, if you qualify it with "which is better if you only run ripgrep once," then I agree that this PR is better. But I'm not sure that I buy that qualification...

@hlieberman
Copy link

I did a little bit of a back-of-the-envelope test here on how different these two actually are.

I ran 1,000 greps of PM_SUSPEND of Linux through current HEAD (45e850a) and through the PR branch here (d4fa015), both with SIMD/AVX optimizations, on a machine with 4 CPU cores.

HEAD showed very little overlap; of the 1000 entries, there are 998 unique md5sums. The PR branch shows no overlap at all; all 1000 entries are unique. As you might expect, there is also no overlap between the two sets.

To get a rough measure of "compactness", I looked at the number of times that top level directories were listed not in a row. (that is: "a/foo, a/bar, b/foo/, a/bar" would have a score of 3)

Master:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  13.00   13.00   15.00   15.66   16.25   28.00

master

PR:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   7.00    9.00   11.00   14.21   21.00   35.00 

stack

So, the PR seems to produce more concise results on average, though the spread is a little worse.

Food for thought!

@BurntSushi
Copy link
Owner

@hlieberman Thanks for that analysis! I think for now I'm going to close this PR. It's a neat idea, and we might want to revisit it, but I don't feel that there's compelling enough evidence to switch at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants