Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider regex matching #7

Open
chrisosaurus opened this issue Jun 10, 2015 · 3 comments
Open

consider regex matching #7

chrisosaurus opened this issue Jun 10, 2015 · 3 comments

Comments

@chrisosaurus
Copy link
Owner

Dodo used (read: stole) the e// notation from the ed school of syntax, most other users of this syntax support regex matching and this was always a potential feature for dodo.

PCRE is probably too much but POSIX regex should be sufficient.

Regex search is probably too expensive as in the failure case it would have to go through the whole file, but an anchored regex match (match from current position or die, similar to `expect) could be quite useful.

A potential issue here is of course that any multiple-character matches could be dangerous m/a.*z/, especially if they are greedy by default.

We could make .* NOT match newline characters, but this only helps in the case of files being newline delimited and I think we need to consider the case of operating on large files that lack newlines (however we could just push this issue onto the end user).

@phillid
Copy link
Contributor

phillid commented Jun 16, 2015

If we use

m/Jack and.*Jill/
w/bar/

on the text "Jack and his close childhood friend, Jill", what is the expected output?

Since this is an in-place editor, I'd personally expect "bark and his close childhood friend, Jill" -- is this thinking correct?

Apart from that, matches like m/J... and J... had [0-9] pails/ would be easy/safe enough to implement.

@chrisosaurus
Copy link
Owner Author

@phillid I think your interpretation is correct regarding "bark and his close childhood friend, Jill", thanks for the great concrete examples.

For me the important part is to never have a regex search (as I don't want to be performing a text search across a 14G file), only ever a 'match' which is anchored to the current location.


To spitball / scope creep a little:

There is another interesting case in that if we have the sentence
"Jack and his close childhood friend, Jill" and we want to replace Jill ONLY when preceded with a sentence matching a pattern

m/Jack and.*Jill/

is insufficient as it doesn't move the cursor, and there is no current way to know the byte offset from the cursor to the start of Jill (as we lack a search, on purpose).

It might be worth later considering a syntax for specifying where within the regex the cursor should be placed after the match, but would require careful thought around the notation used (so we could strip it out before regex matching)


But that is for later, for now the focus should be getting a concrete implementation of the basic m/pattern/ system

For now it should be sufficient to mock up a wrapper around the posix functions regcomp and regexec, later on we could consider migrating to re2 (https://github.com/google/re2/)

Thanks @phillid for the great work, I should have some time this weekend to start on this but you are also welcome to dive in first.

@chrisosaurus
Copy link
Owner Author

'This weekend' he said 26 days ago, sorry I have delayed with other things and wont be able to get around to this for a little while yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants