Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Can't search across lines with .* regex #303

Open
izuzak opened this issue Oct 24, 2014 · 31 comments
Open

Can't search across lines with .* regex #303

izuzak opened this issue Oct 24, 2014 · 31 comments
Labels

Comments

@izuzak
Copy link
Contributor

izuzak commented Oct 24, 2014

Originally reported by @cydrobolt over at atom/atom#3892


Both regex and normal find can't search across lines. E.g


something="ex
ists"


<p>blablablabla
blablablabla</p>

For the second example, a regex of <p>.*</p> should have matched the text. However, it does not work, because it is spread across two lines.

@benogle benogle changed the title Can't search across lines Can't search across lines with .* regex Oct 24, 2014
@burabure
Copy link

burabure commented Nov 5, 2014

if you try to do someting like

<p>(.*|\n)*?</p>

on the current buffer, it actually crashes (at least if the content is more than a couple lines long)

Ubuntu 14.04
atom 0.141.0

@redfellow
Copy link

Ubuntu 14.04 crash still exists.

@harai
Copy link
Contributor

harai commented Jul 7, 2015

It still crashes.

  • Ubuntu 15.04
  • Atom 1.0.0
  • find-and-replace 0.174.1

@dead-claudia
Copy link

To be honest, I don't think that crash is easily fixable. Try running that regex through grep and see what happens. If it doesn't hang on a file of about 30 lines, then there is likely a very difficult perf bug in V8 or (highly unlikely) Atom's text editor. I would be surprised if that's the case, though, considering that is literally "any number of a group of the least number of characters consisting of either a newline or the largest group of non-line-ending characters you can get". That's a lot of work to do, and it's not the easiest to even statically compile that regex to infer that it's matching any set of characters that don't include any line-ending character other than a line feed. That means the regex doesn't match other line-breaking characters, i.e. carriage return, (the obscure line-breaking code points) U+2028 and U+2029, etc. Another thing is that even Sublime, etc. tend to choke a little on regular expressions.

Regular expression engines are extremely slow to begin with, and V8's Irregexp engine is one of the few that isn't atrociously slow. (It's faster than most POSIX-based regex implementations, and it's faster than Perl's highly optimized, highly flexible one.)

I would say one way, probably the best way, to curb the crashes is to instate a delay since the last character is added before the regex is finally executed, even as little as 200 milliseconds. I couldn't tell you how many times I've had Atom crash in the middle of me typing out a regex, simply because the incomplete one happened to match a third of the code. The other thing is that most editors don't regular expressions as they're typed - they run via a dialog or similar. Atom is rather unique in this problem.

@acusti
Copy link

acusti commented Jul 20, 2015

I don’t know if this is worth it’s own issue, but the general problem of not being able to do a multiline search without converting your search to a regular expression is a painful one. It seems like the need to use “replace in project” to modify every instance of a multiline chunk of code in a project is the kind of thing that comes up frequently enough that it would be great if the editor could handle it. There have been a few times that I just wanted to paste in a code chunk to the find field and a different one in the replace field.

@dbolton
Copy link

dbolton commented Jan 7, 2016

@burabure If you delete the asterisk inside the parentheses, you'll get the same matches without the crash (<p>(.|\n)*?</p>). Better yet is the following regular expression which matches newlines regardless of platform (e.g. carriage return or newline)

<p>[\s\S]*?</p>

\s matches white space including line breaks, and \S matches anything that is not a white space. Unlike some languages, JavaScript doesn't have a way to flag that you want dots to match a newline. So maybe Atom can replace dots with [\s\S] under the hood to match newlines.

@guillochon
Copy link

guillochon commented Jun 27, 2016

This bug is really bad, I typed in (.*|\n)*? into my find and it crashed Atom, but find still has that pattern entered so it crashes every time I launch a new search now! How do I clear the search history?

Edit: Looks like it's working now after restarting Atom a few times, not sure what changed.

@menocomp
Copy link

menocomp commented Jul 6, 2016

@dbolton <p>(.|\n)*?</p> only works in one file!!! I tried it in folders and did not work!

@dead-claudia
Copy link

Should there be a multiline option? I think that's probably the best resolution, since there may be cases when you don't intend to match across lines.

@winstliu
Copy link
Contributor

VSCode and Atom are two different projects, so both issues should remain open.

@sekmo
Copy link

sekmo commented May 18, 2017

No news after three years? :-)

@steviesama
Copy link

steviesama commented Aug 1, 2017

I don't know if it's a complete solution but I got something working. Pretty strange I thought, and I'm not sure about the limitations because it's hackity, but here's how I refactored a chunk of code I had in more places than I should have.

The first snip shows what I was matching as it always matches in the same window but never across multiple files. While if you hit enter how I have it here it will match across all instances of the text.
example

Below is a snip of the search matching in all 40 places.
example

This is pretty strange. But I noticed that the first line is always fine. Then to get to the next as well as every line thereafter, you need to start doing a pattern, at least the way I'm doing it. Shown below:

\s*[text to match]*

\s* for all the upcoming space, though I should mention, I did (\s)* or (\s*) in mine as what I wanted to also do was match whatever indentation was present. Putting your text to search inside a character class, and always terminating it with *, and your search will be found.

I found it strange than the character class worked, but I figured it had something to do with how it was finding them so I tried * after each character on lines after the first...and that worked too. Snip below.

var style = \{*\s*w*i*d*t*h*
example

Well, I hope that was helpful to someone. I was about to use sub-grouping to change the followup matching and everything without a hitch.

@steviesama
Copy link

steviesama commented Aug 1, 2017

@isiahmeadows As for the multiline option, since it doesn't let you search across multiple lines, I think with what I found above, that seems to basically make multiline an explicit option.

@dead-claudia
Copy link

@steviesama Good point. Maybe better to add an option to, short-term, transform . to [^], and long-term, use /s (which is currently an ES proposal, but V8 has recently started shipping it by default).

@dead-claudia
Copy link

And maybe make that option ". matches newlines" or something like that.

@ghost
Copy link

ghost commented Jan 4, 2018

What's the status on this?

@winstliu
Copy link
Contributor

winstliu commented Jan 4, 2018

I'm not aware of any attempts to fix this issue, however we would be interested in reviewing PRs addressing this issue that don't regress in terms of performance. The current library we use for searching files is atom/scandal, where I believe files were intentionally broken up into chunks to improve search performance.

@dead-claudia
Copy link

Found an issue there, but no PR.

@g3ar
Copy link

g3ar commented Jul 19, 2018

Have the same issue. We need to have "Multiline" find option.

@artheus
Copy link

artheus commented Jul 24, 2018

👍 I Agree that this is something that is needed.

@jinglesthula
Copy link

Although this has been painful enough for long enough, I think we're nearly out of the woods. The proposal went to Stage 4 seven months ago, and the kangax tables list it as an ES2018 feature http://kangax.github.io/compat-table/es2016plus/. I don't know the guts of Atom to know even what JS engine it's running or what ES features are supported, but I suspect we're either at the point (or will be very soon) where we could just have a button added on the Atom find UI to include the s flag.

@jinglesthula
Copy link

Mmm.. yeah. We're all probably naively thinking "how hard could it be?", but the realities of performance and scaling when dealing with large files isn't trivial. I wonder if other editors' approaches could be looked at to see how they accomplish it. For now, remembering to use [^] or \s* may be the easiest workaround.

@dead-claudia
Copy link

@jinglesthula This very issue has prompted me to start an ESDiscuss thread about what would be required to fix this.

But most certainly, the more intuitively simple something is conceptually, the more complex it really becomes behind the scenes to do correctly, ironically enough.

@ghost
Copy link

ghost commented Apr 1, 2019

I know It has been a long time, but I've been trying a solution for this issue for a while. So, here are my 2¢:

<p>blablablabla
blablablabla</p>

Find: <p>(.*\n.+?)</p>
Replace: <p>New content:$1</p>

Result:

<p>New content:blablablabla
blablablabla</p>

Screenshots

Before "Replace":

image

After "Replace":

image


Atom: 1.35.1 x64, macOS Mojave 10.14.4

Does it help?

@g3ar
Copy link

g3ar commented Apr 1, 2019

No.

@ghost
Copy link

ghost commented Apr 1, 2019

@g3ar Could you give more details, please?

@g3ar
Copy link

g3ar commented Apr 1, 2019

I'm not using atom right now. Your solution works for simple files. I have tried this for complicated sources and it fails. I think problem is in wrong parsing of \n.

@ghost
Copy link

ghost commented Apr 1, 2019

@g3ar I understand. I've tested it in a file (html+javascript+json) with 14,448 lines and it worked fine. However, I'm using Atom. I believe that different regex flavors require different regex structures.

I don't know if you already did it but, if not, you could try to identify which flavor/engine you're using and then try another solution.

Here's a list of them: https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines

Good luck and thank you for the details.

@DigitalLeaves
Copy link

DigitalLeaves commented Apr 10, 2020

My two cents. It works for single files, but not for multifiles.

I have plenty of files with this code (sidebar, HTML static):

<li class="nav-item">
   <a class="nav-link" href="./employees.html">
      <i class="ni ni-badge text-primary"></i>
      <span class="nav-link-text" data-i18n="employees_and_salaries"></span>
   </a>
</li>

I want to add a new class (let's call it newclass) to the <li> element, but only when the link links to employees.html, so my regexp:
<li class="nav-item">([.|\n|\s|\t]*)<a class="nav-link" href="\.\/employees\.html">
And replacement:
<li class="nav-item newclass">$1<a class="nav-link" href="./employees.html">

Works for single files (finds the expression), but fails to find a single match if I look for multi-files (Shift+Option+F).

@svennd
Copy link

svennd commented Mar 13, 2022

the "find all" works fine in a single file, but multi-file doesn't work. Is there a workaround available ? (other then opening 100's of files to run this manually) ?

I want to remove double lines :

thumbnail:(.|\r?\n)*?thumbnail:(.*?)$

with 

thumbnail:$2

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests