Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider syntax erroring if the same character is repeated in the character class #287

Closed
matklad opened this issue Sep 27, 2016 · 3 comments

Comments

@matklad
Copy link
Member

matklad commented Sep 27, 2016

Hi! Today, I've spend some time debugging regex like [a:digit:] which I wrongly assumed should work like [a[:digit:]]. For me, it would be nice if regex failed to compile this regex because : is repeated twice :)

@BurntSushi
Copy link
Member

I'm not totally sure we want to do this. It seems like a good idea at first, but what if you used two character classes that partially overlapped? e.g., [\p{Lu}\w] or something (although I admit that's a little strange).

Also, this would be a breaking change. While 1.0 is looming, I'm not sure it's worth it.

@matklad
Copy link
Member Author

matklad commented Sep 27, 2016

but what if you used two character classes that partially overlapped?

I was thinking about a simple rule for literal characters only and maybe for exactly duplicated character classes as well. I can't imagine a realistic situation when it is intentional to have such obvious repetitions. But I think they can arise by accident.

I don't know what is the best solution here: I'd made it a syntax error, but I don't have much expertise :) Another option would be to add a lint for this, but it won't be as effective (and won't work for user provided regular expressions).

Also, this would be a breaking change. While 1.0 is looming, I'm not sure it's worth it.

Imo, it's better to fix rough edges before 1.0 rather than leave them as it is. But again, can't say for sure if it is a rough enough edge.

@BurntSushi
Copy link
Member

I thought about this. Here's why I think we shouldn't do it:

  1. I don't think I know of any other regex engine that reports an error for this. While that alone isn't enough of a reason to forgo reporting an error, some folks might find it surprising. I also worry that there's a corner case we aren't considering. People like to do really funny things with their regexes.
  2. I think the "only error if there's a repeated literal" is a bit strange, and the fact that it would have helped your initial problem seems incidental.

While I do kind of agree with your arguments, I just feel like there isn't enough to break rank with everyone else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants