Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ScalaJS regex and regression tests #2912

Merged
merged 1 commit into from
Jun 18, 2024
Merged

Fix ScalaJS regex and regression tests #2912

merged 1 commit into from
Jun 18, 2024

Conversation

pablf
Copy link
Member

@pablf pablf commented Jun 15, 2024

/claim #2707

@987Nabil
Copy link
Contributor

@pablf This solution changes what is checked by the regex. To avoid this we discussed in the issue to use an alternative to regex, like rich text codec from zio-http. As @jdegoes added $250 bounty I expect that is what he was aiming for as well.

@pablf
Copy link
Member Author

pablf commented Jun 15, 2024

@987Nabil Would you mind expanding what is different between those two regex? I really don't mind using the RichTextCodec approach, but before the PR I spent some time trying to understand why this approach was too naive to work but wasn't able to see it 😅.

My reasoning is that matching a regex always applies to the full string, unless explicitly unanchoring it using .unanchored. In this case, in which there is only a match (find methods could match only part of the expression but is not the case), adding ^ and $ at the start or the end of the pattern is redundant as it's already checked by the pattern match unless using unanchored on the regex. Also, the fact that s"""^(?i)([a-z0-9]+)(?:/([a-z0-9.]+))(.*)$$""" can be changed to "^(?i)([a-z0-9]+)(?:/([a-z0-9.]+))(.*)$" without changing the regex and the string interpolation is not really needed, made me think of some possible bug.

Please correct me if I'm wrong, but I don't really see what is failing here 😅. Also, if this modified regex were wrong, I don't see why a really equivalent regex should automatically be discarded. I didn't see this discussed in #2707, so maybe there is also some benefit to RichTextCodec that I don't know.

@987Nabil
Copy link
Contributor

@pablf actually I think you are right. There is nothing that makes it match on a sub string like surrounding .* . Maybe it is okay like that.
I am not 100% sure, but I could imagine that rich text codec could be more perfomant if done right. But I am not sure as well. Maybe the solution is okay.

@pablf
Copy link
Member Author

pablf commented Jun 15, 2024

@987Nabil I've tried to compare both methods: https://scastie.scala-lang.org/KKXGXYeWSTednuDTCXiCfQ. It's a bit messy, but hope it's helpful.

Not too different, although Regex seems 2x faster when matching and RichTextCodec sometimes fails faster (in JVM). The relevant comparison for this particular case would be Comparison 5. Also, there might be better designed RichTextCodec that would perform better than the ones I made...

@jdegoes jdegoes merged commit 5a978cb into zio:main Jun 18, 2024
62 checks passed
@pablf pablf deleted the js-regex branch June 18, 2024 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants