-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
maximum line length #129
Comments
A very good use case is searching in minified js/css. |
@kaushalmodi Yup. I'm pretty sure that was precisely the thing I hit. :-) |
I'd rather see a Linux-like approach to this. Pipe the file through something like 'tr' first, then pipe the output through 'cut' Like this: cat file.js | sed 's/;/;\n/' | ripgrep whatever | cut -c1-100 edited: to correct command line to use sed |
@tjayrush What is the purpose of In any case, |
I meant for 'tr' to convert some character into new lines (I forgot to finish editing my comment) to alleviate the long line problem. If 'rg' can already limit to the context that's better. The larger point was that instead of adding a new command line option, pipe the output (or input) into another already-existing command line program. Thomas Jay Rush
|
I think my comment still stands. Also, rg searches recursively by default. In that case, it doesn't make I don't think you need to argue in favor of piping. I'm all for it. But it On Oct 25, 2016 2:56 PM, "Thomas Jay Rush" notifications@github.com wrote:
|
OK, in the interest of moving forward, here is a proposed specification. Add a new flag, I mentioned above it might be cool to only show the context around each match (within a line), but I think that's a complicated feature that deserves its own issue/motivation/specification. I think the above flag solves the most pressing problem. |
ag has |
SpecMake length of printed lines limit-able with conventional COLUMNS environment variable. Every output line should consist of a contiguous input. Amount of context before first match on output line should be equal to that after last. Print at least one match on every output line. |
I would prefer to not add more code that does this to ripgrep. There's already too much of it and it has been a terrible pain to maintain. It's sucked up unbelievable amounts of time. Let's just use
How does this interact with character encodings? You can't just arbitrarily cut a byte sequence, since you may wind up cutting a UTF-8 sequence in half, for example. While the input isn't necessarily valid UTF-8, our output better be valid UTF-8 if the input is also valid UTF-8.
How does this generalize to multiple matches on the same line? What about lines emitted from |
But what can go wrong? Explicit setting is meant for editor plugins.
This paragraph only applies to the case of the single one (on the output line). Lines from |
I spent my entire weekend dealing with tty stuff on Windows. I don't want to do it again. |
It's already been dealt with in mingw-w64, and isn't really rust/ripgrep issue as they do obey the tty api. The tty fiddling (size/isatty) should be in a stable separate crate. |
@forgottenswitch ripgrep has to work with MSVC too. I don't really feel like debating this much further. I'd like to leave it out for now. We can revisit it later. |
All printing is done in UTF-8 characters. |
Consider the following line where
Its UTF-8 encoding looks like this:
Now consider a If I see, OK, you updated your comment. You want the |
OK, will update the pull when done. |
Here are some thoughts that I would like you to consider:
|
SpecMake length of printed lines limit-able with conventional COLUMNS environment variable. Every output line consist of a contiguous input. Amount of context before the first match on output line should be equal to that after last. Print at least one match on every output line.
Lines from All printing is done in UTF-8 characters. |
Unclear. Implementing should bring more light. |
This would be extremely useful. I've just searched a directory which happened to include a few I agree A few suggestions:
|
They don't show up in Also, |
COLUMNS works fine on linux:
Is anyone serious really using another OS and trying to work in the terminal? In 2017, I mean really? |
|
What OS is that?
|
It is linux, and |
ye, i see the problem now, COLUMNS is available in the shell but not in a subprocess.
|
We absolutely, positively, cannot make it the default. We could make it the default when ripgrep is emitting to a tty (like we've done for colors, line numbers and match groupings). We can't make it the true default because ripgrep (like grep) is useful for filtering data as well, and a tool that automatically drops lines just because they are too long does not sound good to me. I am still weary about making it the default even for just emitting to a tty. I worry that dropping data is a bridge too far. However, we've already kind of crossed that bridge by filtering corpora using I would be fine with using Finally, when |
@forgottenswitch I would like to move away from "fitting" the lines. I think we should consider two options: simple truncation or complete hiding. |
I expected that as the simpler way, but isn't it the #129 (comment) with columns counted as graphemes? |
I would say truncation was preferable and "columns" in the terminal (whatever that translates to) would be best but, at least for me. |
@forgottenswitch I'm sorry, but I don't understand. This comment mentions "fitting" the lines and looking at matches. What I'm trying to say is this: if we can avoid writing a specification that needs to do something that is encoding aware, then this feature becomes much simpler. The @samuelcolvin "columns" in the terminal is a visual thing. A single column can contain an arbitrary number of bytes because of combining characters in Unicode. Checking if every line satisfies this limit is not feasible because of the performance hit it requires. |
The reason why I like having |
I understand why don't want to apply it by default, fine with me. For length, I guess just bytes is fine. The primary use case is just to make viewing easier where a line which shows 111 columns instead of 120 is no problem. I'm guessing 99% of ripgrep usage is on ascii code anyway. |
@samuelcolvin Possibly. I do track mentions of ripgrep on the Internet, and there seems to be quite a few mentions of it in otherwise Chinese, Russian and Japanese publications. |
Backtrack to beginning of the last character? Also, Fitting was to improve both readability and pager's scrolling of overly long lines Maybe ripgrep should just consider any file having a line longer than 256 bytes a binary. |
@forgottenswitch you seem determined to make this complicated. :-) Surely best to start with the simplest and most obvious solution and make it more complicated when required. utf8 counting would be nice but isn't required; anything else is surely excessive at this stage. |
OK, I tried addressing different problem - printing matches even in huge lines, instead of omitting them. |
Context before a match could be ellipsized, and total number of matches on a line limited. |
I have to say I wouldn't find just dropping lines very useful; I still may want to know that these lines matched, but I also want to be able to see other matches. I hacked something quick-n-dirty together in https://github.com/RalfJung/ripgrep/tree/longlines, but I'm not proposing this as a solution to anything, I was just looking for an excuse to write some Rust code ;) The simplest thing I can come up with that I'd actually like to use is something where we can set a Of course, something that actually ellipsizes long lines and prints them as "... context match1 context ... context match2 context ..." would be better, but then as you discussed above things quickly become really complicated because unicode. |
@RalfJung I think I like that idea. It would have to be disabled by default though (and perhaps enabled by default when emitting to a tty). |
All right, I prototyped this at RalfJung@9618f87. This is just about the behavior if the option is set (to 80), obviously; what remains to be done is adding the CLI argument, deciding about the default, and wiring the printer to that. There are some open questions however already on this level:
EDIT: Slightly updated version at RalfJung@50c07fc |
I posted implementation above; it does not seem to be so. |
Wow, impressive! That does look nice indeed. I somehow missed these links before, sorry. |
I would like to avoid the contextual display. That should be a separate issue. I do not want to bring that code (if at all) until libripgrep is done. The code and the feature are too complex. @RalfJung I don't have rock solid answers for your questions, but here's an attempt:
|
So should it also still show the number of matches? That'd be extra work, unless there is a way I don't know about to get this from |
@RalfJung |
Right, I could use a closure or implement the trait for a custom type that also does the counting... but that will cost some performance, it seems, since we want to be called for every match and hence cannot use the |
This permits setting the maximum line width with respect to the number of bytes in a line. Omitted lines (whether part of a match, replacement or context) are replaced with a message stating that the line was elided. Fixes #129
This permits setting the maximum line width with respect to the number of bytes in a line. Omitted lines (whether part of a match, replacement or context) are replaced with a message stating that the line was elided. Fixes #129
I'd like
ripgrep
to have the ability to either hide or trim lines that are very long. Some lines take up my entire screen and are borderline useless to look at. It's possible that finding an intelligent way to shorten them would be best, since my guess is that the actual matched text is much smaller than the full line. However, this is harder to implement.I don't think this should be enabled by default. It seems a little surprising for
ripgrep
to hide lines like that. In general, I like the work flow of, "run a search, see huge lines, confirm that I don't care about them and run ripgrep again with an option to hide them." It may however be plausible to enable this limit if results are being dumped to a terminal (we already enable colors, line numbers and file headings).The text was updated successfully, but these errors were encountered: