Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid using an Arc for quiet_matched #141

Closed
wants to merge 1 commit into from

Conversation

lilydjwg
Copy link
Contributor

@lilydjwg lilydjwg commented Oct 1, 2016

I'm sorry, the result isn't stable, and I can't tell whether it's faster or slower....

This makes it a little faster:

from 19.60s user 0.28s system 377% cpu 5.267 total
to   17.99s user 0.30s system 371% cpu 4.924 total
@lilydjwg lilydjwg closed this Oct 1, 2016
@BurntSushi
Copy link
Owner

Yeah, I wouldn't expect this to make a difference. An Arc should only impose overhead when it's cloned, not when it's accessed. We only clone it once for each worker, which means the overhead is likely imperceptible.

Can you explain more about your benchmark?

@lilydjwg
Copy link
Contributor Author

lilydjwg commented Oct 1, 2016

I just use perf to see where it takes the most time.

sudo perf record -F 99 -g -- rg xxx >/dev/null

then

sudo perf report

and press a on the function that uses the most of time to see assembly. Not very easy to read, but there is a mfence instruction, which clearly corresponds to the fence(SeqCst) call inside the steal() function.

a

Today rg took more than one minute to complete a search, so I tried to find out why. The time reduced to several seconds when I built the new master.

@BurntSushi
Copy link
Owner

Profiles can be hard to read, and in this case, it is probably misleading you. The profile reveals a lot of time spent in the work queue because it is spinning waiting for work. That is, the directory iterator can't produce work fast enough for the search workers.

The time reduction on master almost certainly means that the directory you're searching has a very large gitignore file. I implemented a crude stopgap measure to make this faster, but I'm still working on it.

If I'm right, then you should notice a large improvement by running with the -u flag.

One way to extract a less confusing profile is to pass -j1. This will cause single threaded search to, which doesn't use a work queue at all.

@lilydjwg
Copy link
Contributor Author

lilydjwg commented Oct 1, 2016

So there is a spinning lock? It makes my fan spin loudly....I prefer it to block instead.

There is no gitignore file, though it's in a (huge) git repo---the "community" repo from Arch Linux. -u indeed makes it much faster.

I take the source code outside the repo, and it's slow the same.

With -j1 rg uses only one core and it takes less time! Though it's still much slower than ag --workers 1. And with -j1, the CPU time is spent on regex::re_bytes::Regex::shortest_match_at and memchr and regex_syntax::Expr::simplify::simp.

@BurntSushi
Copy link
Owner

Could you please tell me how you're running your test? What command did you run to get the data? What rg commands are you running?

So there is a spinning lock? It makes my fan spin loudly....I prefer it to block instead.

This is how lock free queues work. We should fix the producer.

@lilydjwg
Copy link
Contributor Author

lilydjwg commented Oct 1, 2016

I run this in the percona-server-5.7.14-8 directory

time ./rg utf8mb4 >/dev/null

This is how lock free queues work. We should fix the producer.

Does this mean that, if I search on slow disks, it takes a lot of CPU time to wait?

@BurntSushi
Copy link
Owner

Could you please tell me how to get the data you're searching? What command should I run?

Does this mean that, if I search on slow disks, it takes a lot of CPU time to wait?

No, it doesn't mean that. If the disks are slow and the corpora isn't memory, then the search workers will also be slower.

@lilydjwg
Copy link
Contributor Author

lilydjwg commented Oct 1, 2016

Extract this tarball.

No, it doesn't mean that. If the disks are slow and the corpora isn't memory, then the search workers will also be slower.

I see, thanks for the explanation :-)

@BurntSushi
Copy link
Owner

There is no gitignore file, though it's in a (huge) git repo---the "community" repo from Arch Linux. -u indeed makes it much faster.

Could you show me how to get this data as well? And the rg command you're running?

(I'm asking because it looks like there's more going on here. The only difference between rg PAT and rg -u PAT is that the latter doesn't look at ignore files.)

@BurntSushi
Copy link
Owner

For percona-server, it seems clear to me that you're running into #134. There is a giant .gitignore file in the root directory of that tarball. This is where rg is spending all of its time. Since directory traversal is single threaded, it can't keep up with the search workers, so the search workers are constantly asking for more work, causing the queue to spin and use up CPU.

@lilydjwg
Copy link
Contributor Author

lilydjwg commented Oct 1, 2016

Oh, I see! I used ag -g to search for gitignore, but it didn't search for hidden files either!

@BurntSushi
Copy link
Owner

BurntSushi commented Oct 1, 2016

Indeed. On percona-server, I get these timings, with my current code (which has a few more optimizations than master does):

[andrew@Cheetah percona-server-5.7.14-8] time rg utf8mb4 | wc -l
3114

real    0m0.813s
user    0m6.163s
sys     0m0.183s
[andrew@Cheetah percona-server-5.7.14-8] time ag utf8mb4 | wc -l
3210

real    0m1.687s
user    0m2.263s
sys     0m0.930s

(The counts are off because I'm pretty sure ag is getting the .gitignore file wrong in some places.)

If we ask both tools to skip ignore files, then they both get a lot faster, but rg still wins:

[andrew@Cheetah percona-server-5.7.14-8] time rg -u utf8mb4 | wc -l
3117

real    0m0.118s
user    0m0.323s
sys     0m0.350s
[andrew@Cheetah percona-server-5.7.14-8] time ag -u utf8mb4 | wc -l
3214

real    0m0.399s
user    0m0.617s
sys     0m1.053s

The difference is almost entirely attributable to ag's use of memory maps. Watch what happens when we ask rg to use memory maps:

[andrew@Cheetah percona-server-5.7.14-8] time rg -u --mmap utf8mb4 | wc -l
3117

real    0m0.354s
user    0m0.347s
sys     0m0.763s

@BurntSushi
Copy link
Owner

@lilydjwg FYI, I made a few (important) edits to my previous comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants