-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--vimgrep is much slower than without #999
Comments
Could we please try to reproduce this problem on a repository of code that we both have access to? I propose this commit of the Linux kernel. In general, I do observe Without
With
We can also see that memory usage doesn't change too much, with
To test the hypothesis that the output buffers are to blame for the increased memory usage, we can forcefully run ripgrep in single threaded mode, which removes the use of output buffers and instead streams matches to stdout as they are found:
In this case, we get identical memory usage. To get an idea of the bandwidth involved here, let's measure the size of the output:
So here's what happening here. First and foremost, On top of all of this, In my view, For example, #244 discusses a JSON output format. The JSON output format does not duplicate matching lines for every single match. Instead, it shows the matching lines exactly once (equivalent to Finally, I'd like to note that if you're invoking ripgrep in a way that causes it to produce GBs of data, then there is no reasonable expectation that it should do so instantly. The right thing to do here is to change how you invoke ripgrep or put some sort of cap of how much data you actually read from ripgrep. You can in fact stop reading data at any time you like and stop ripgrep in its tracks. |
Grrr... sorry for all the distraction. If that's the case then I see no problem. I was under the impression that "message: match" happened once per match and included full line content each time. I see now that's not the case. It seemed like a crazy way to do things, but since --vimgrep included full line text with each match I though the spec just copied that approach. I might help future readers to change that documentation to plural:
|
I think this issue can be closed. I understand why --vimgrep is so much slower (in particular on minified javascript files). And because of the way that --vimgrep output is structured there's no way to really make it faster. My whole point was to just make sure not to design the JSON format with that same weekness. And of course you hadn't, I just got confused reading the singular/plurals in the spec. |
Aye. Thanks for the feedback! |
@jessegrosjean wrote this from #244 (comment)
@BurntSushi I'm blown away at the speed of ripgrep in terminal. I'm trying not to loose that speed when running from my app. I think the JSON format (as I understand it) makes large searches that
rg
handles easily from the terminal near impossible to perform using the JSON API.I think including a full line of context with each match is just too much bandwidth for some cases.
For example:
My test case is pathological–search for
e
in my home directory. It generates 4,655,585 results. Crazy, but this is just the kind of thing that a user might try to see if an app works and isn't buggy. And in fact it's what made me so impressed with ripgrep. When I run ripgrep on my home directory I see:Wow! Fast. But then if I do:
My computer starts to die. I kill the process after 20 seconds and starting to run out of memory. I "think" the problem is that unlike the default command
--vimgrep
returns a full line for each result. It’s just to much bandwidth when you have many results.I was asking for this, but I think you are correct, it’s to complicated, and would still require to much overlapping bandwidth in some cases. For example imagine the case where the user searches for
e
in a giant minified.js file.Better I think is to just provide some options for what data is included in “match” (from the JSON API). What about an option to just omit the “match.lines” value?
My app could then generate the initial list of results quickly (by omitting the “lines” values). And then lazily (as matches are scrolled through the view) load and highlight the actually matched text.
The text was updated successfully, but these errors were encountered: