-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ripgrep doesn't use a buffered writer in single threaded mode (!) #955
Comments
Neat! Thanks for filing this sort of issue. I'd love to do an analysis, but I can't really do anything without the corpus. Can you provide one? If you can't share the one you described, can you reproduce the performance issue on something we can both see? If not, I'll unfortunately just have to close this. |
Can you agree to destroy the corpus after testing? It should be relatively sanitized but at the same time it is from my company logs and I would rather not have it floating about.
…-Harrison
On Jun 18, 2018, at 11:04, Andrew Gallant ***@***.***> wrote:
Neat! Thanks for filing this sort of issue. I'd love to do an analysis, but I can't really do anything without the corpus. Can you provide one? If you can't share the one you described, can you reproduce the performance issue on something we can both see?
If not, I'll unfortunately just have to close this.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@hgoscenski Yeah, that's fine. It's probably too big to email, but if you upload it somewhere (that you can then delete after I download) and email me the link, then I can do that. I will make sure to delete whatever you give me before closing this issue. My email is on this page: https://blog.burntsushi.net/about/ |
Nice find! I was able to reproduce this in a more emphatic way on the open subtitle data set:
If we suppress output and just count the matches, then all is well in the world:
So this is seemingly a performance bug in the printer? Well, if that were true, then
... yikes. That's no good. This is specifically a case that I tested when initially building ripgrep. My guess is that there is a regression lurking somewhere. So, I did some time travel. It turns out that ripgrep 0.2.9 doesn't have this regression (so my recollection is still in tact, it seems), but ripgrep 0.3.0 does:
OK, so bisecting between these two tags, it appears that commit e8a30cb introduced the regression. This makes sense to an extent, since this is the commit that re-worked colored output and tty handling. My suspicion at this point is that I somehow disabled buffered writing in the single threaded case. Now that I have an idea of what the bug is, I'm looking more closely on master. Indeed, this is how we get the stdout handle:
All this does is use Sigh. This is a bad bug. I am really surprised that this hasn't been caught. I suppose this somewhat validates the assumption that ripgrep's output is often much much smaller than what it searches. Technically, this bug is also present in the multithreaded searcher, but is less pronounced because the multithreaded searcher buffers the entire output of each file in memory before printing it. This will be interesting to fix because of colors... Brainstorming:
|
@hgoscenski I've deleted your data from my system. |
@BurntSushi thank you! Let me know if you need anything else from me. |
Specifically, this will use a buffered writer when not printing to a tty. This fixes a long standing performance regression where ripgrep would slow down dramatically if it needed to report a lot of matches. Fixes #955
This commit removes an unconditional extra regex search that is in fact not always necessary. This can result in a 2x performance improvement in cases where ripgrep reports many matches. The fix itself isn't ideal, but we continue to punt on cleaning up the printer until it is rewritten for libripgrep, which is happening Real Soon Now. Fixes #955
This commit removes an unconditional extra regex search that is in fact not always necessary. This can result in a 2x performance improvement in cases where ripgrep reports many matches. The fix itself isn't ideal, but we continue to punt on cleaning up the printer until it is rewritten for libripgrep, which is happening Real Soon Now. Fixes #955
What version of ripgrep are you using?
λ rg --version ripgrep 0.8.1 -SIMD -AVX
How did you install ripgrep?
Via brew.
What operating system are you using ripgrep on?
MacOS 10.13.5 (17F77)
Describe your question, feature request, or bug.
Performance of ripgrep seems to be substantially slower than GNU grep on a log file I am parsing. Logs are of the format (sanitized):
File Size:
Terminal output with rg, grep, ggrep
https://gist.github.com/hgoscenski/059ab6394b3c0520e37d34c448b69d98
The text was updated successfully, but these errors were encountered: