-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow \n literals in strings #21
Conversation
- Results: Original mechanism: Score: 100.00 Proposed in the PR: Score: 52.75, 1.90x slower than original Regex with neg.lookbehind: Score: 91.62, 1.09x slower than original
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thank you for preparing another PR. Although an interesting and sensible proposal, I think this indeed is a wrong approach. Handling these special/escape characters should be done as early as possible (as it is done now in the StringValue::__construct
method), because - as you yourself already pointed out - there are (and probably will be more) many places one would have to track and handle separately. (Which could potentially lead to otherwise unnecessary bugs, if not used with caution.)
I ran a few benchmarks this morning and found out that using preg_replace
with negative lookbehind could to the same job as your original proposal.
php-cb benchmark here 3cd50e3 with these results:
1. Score: 100.00, 9.5176 ms
foreach ($strs as $s) {
$res = originalFn($s);
}
3. Score: 91.62, 1.09x slower, 10.3871 ms
foreach ($strs as $s) {
$res = regexverFn($s);
}
2. Score: 52.75, 1.90x slower, 18.0424 ms
foreach ($strs as $s) {
$res = proposedFn($s);
}
The original mechanism being still the fastest, mechanism proposed in this PR is almost 1.90x slower. The negative lookbehind is slower only 1.09x - while yielding the same resulting string (if I'm seeing correctly).
I'm leaning towards supporting this kind of regex solution - while keeping the expanding itself in constructor.
Totally different approach would be having some sort of raw-string literals, as eg. Python does. Quite bad I already dedicated the most suitable literal in form of `r"..."; to regex literals.
That could be changed (for example re"..."
for regexes and raw strings could be r"..."
) as Primi is still in 0.3
very-pre-major version, but such decision is not something I'm willing to make on the spot.
Thanks for the response. I agree that the regex is much preferable (I just wanted a quick PoC without fiddling with regex.) Edit: I had to add an extra replacement to make my tests pass:
When e.g. concatenating strings (
I think this is also a good solution, but maybe confusing for new users? Depending on whether you want to introduce more escape codes, it might be overkill just for having the possibility to use a newline literal. |
Oh, you're absolutely right! 😄 It is me, who missed something here... Edit: Although now I'm not sure if we're talking about the same thing. What I was talking about is that my regex-based solution changed the original behaviour from this: >>> "\\n"
"\
" to this: >>> "\\n"
"\\n" ... which is clearly wrong. Your new regex solution does this: >>> "\\n"
"\n" ... which looks ok. The |
I've given it some thought and - also inspired by how Python handles this - I realized that processing escape sequences doesn't belong even inside the
I think my hunch still stands. What's currently wrong is the point where Primi does this - it really is too late. Escape sequences should be processed before the Therefore I think the best course of action is to move string escape-sequence logic into primi/src/handlers/StringLiteral.php Lines 18 to 20 in 3b84734
The solution I have in mind would - as a result - contain some backwards incompatible stuff. For example current: >>> a = "\n\a\\n" + "\n"
"
\a\
" (string f6b2df) would in the new version ideally result in error: >>> a = "\n\a\\n" + "\n"
ERR: Unrecognized string escape sequence '\a' The implementation I'm trying also does these ( >>> a = "hello"
Literally:
~hello~
Primi representation: "hello" (string 155d72)
>>> a = "\nhello"
Literally:
~
hello~
Primi representation: "\nhello" (string 42bdf9)
>>> a = "\\nhello"
Literally:
~\nhello~
Primi representation: "\\nhello" (string f078a3)
>>> a = "\\nhello\"
ERR: Syntax error @ line 1, position 0
>>> a = "\\nhello\\"
Literally:
~\nhello\~
Primi representation: "\\nhello\\" (string 28fe52)
>>> a = "\\nhello\\\n"
Literally:
~\nhello\\n~
Primi representation: "\\nhello\\\\n" (string 541c2c)
>>> a = "\\nhello\\\"
ERR: Syntax error @ line 1, position 0
So I was trying different approaches and because of that I already may have some working version of this. But since this is your suggestion and PR, I would gladly let you to implement some similar universal mechanism, if you want. Any thoughts?
Edit: Oh, and I forgot. It also does this: >>> a = "\n" + "\n"
Literally:
~
~
Primi representation: "\n\n" (string 96d650)
>>> a = "\n" + "\\" + "\n"
Literally:
~
\
~
Primi representation: "\n\\\n" (string affe7b)
>>> a = "\n" + "\\n" + "\n"
Literally:
~
\n
~
Primi representation: "\n\\n\n" (string 892c3a) |
- Now these escape sequences are supported: \\, \", \', \n, \t - Previously "\"" would throw a syntax error. Now it is handled properly. - A literal "\n" (or others) can now be entered by double back-slashing the escape sequence like "\\n", which was previously not supported. - Unknown escape sequences, such as '\a' will now throw an error. - This is a BC break! - Solution for PR smuuf#21. - Handling escape sequences is now done outside of StringValue. It is merely a way of how a string literal is interpreted. This means that "\" + "n" will NOT result in a "\n" sequence and thus in a literal newline. This is a BC break, as far as I know. - New StringEscaping helper with methods: escapeString, unescapeString - Modified and added tests for this BC breaking behavior.
- Now these escape sequences are supported: \\, \", \', \n, \t - Previously "\"" would throw a syntax error. Now it is handled properly. - A literal "\n" (or others) can now be entered by double back-slashing the escape sequence like "\\n", which was previously not supported. - Unknown escape sequences, such as '\a' will now throw an error. - This is a BC break! - Solution for PR smuuf#21. - Handling escape sequences is now done outside of StringValue. It is merely a way of how a string literal is interpreted. This means that "\" + "n" will NOT result in a "\n" sequence and thus in a literal newline. This is a BC break, as far as I know. - New StringEscaping helper with methods: escapeString, unescapeString - Modified and added tests for this BC breaking behavior.
42c33ba
to
0664605
Compare
Care to look at this, if it correctly resolves the original idea, before I merge it? @fredsted |
Thanks for this, I'll take a look in a couple hours. :) |
For me it works nicely, not much else to say, I like your solution! |
Ok, thanks :) I'll merge it into master then. |
Was looking for a way to make a string in Primi with a literal "\n". Since it was converting \n on string creation, some data is lost, so I've tried to keep the internal state as-is, but return the right state in the getInternalValue and getStringValue methods. However, it seems a lot of places is accessing the
→value
prop of StringValue directly.Perhaps there's a better way? Mainly looking for feedback on better workarounds for this... I realize it sucks :)