Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle string literals within macro arguments #685

Merged
merged 7 commits into from
Feb 17, 2021

Conversation

Rangi42
Copy link
Contributor

@Rangi42 Rangi42 commented Jan 3, 2021

Fixes #683, fixes #691

The lexer's raw mode for reading macro args already attempted to handle semicolons inside string literals, versus outside ones which start comments. This change reuses the same function for reading string literals in normal and raw modes, also handling:

  • Commas in strings versus between macro args
  • Character escapes
  • {Interpolations} and \1-\9 args inside vs. outside strings
  • Multi-line string literals

All test case outputs remain identical except for passing """ as a macro argument, which now starts a multi-line string literal
instead of terminating the passed argument at the carriage return.

Related discussion in #668

@Rangi42
Copy link
Contributor Author

Rangi42 commented Jan 3, 2021

I'm confident that this is a coherent change to the lexer, because the only test case that changed is the one that I expected to change: passing unclosed string literals to a macro. As expected, passing a single quote gives an unclosed string literal warning, and passing three quotes begins a multi-line string literal.

However, this still needs test cases of its own to more thoroughly test the changed behavior, as well as review to verify that it's even a desirable change. (Depending on #683 discussion, some other solution might be taken.)

@Rangi42
Copy link
Contributor Author

Rangi42 commented Jan 3, 2021

It might be worth factoring this repeated pattern into a macro:

if (i < sizeof(yylval.tzString))
	yylval.tzString[i++] = c; // or sometimes a literal character

For example:

#define append_yyval_tzString(i, c) if (i < sizeof(yylval.tzString)) yylval.tzString[i++] = (c);

@Rangi42
Copy link
Contributor Author

Rangi42 commented Jan 3, 2021

This now has test cases that demonstrate more macro arg lexing.

@Rangi42
Copy link
Contributor Author

Rangi42 commented Jan 5, 2021

Alright, I think this now correctly implements a consistent model for passing macro args and expanding strings.

  • Typing S is like typing the contents of S, if EQUS get expanded in that context. So if tiles EQUS "* $10", then ld a, 5 tiles is like ld a, 5 * $10; but println "5 tiles" and mac 5 tiles don't expand tiles, since being inside string literals or macro args doesn't expand EQUS.
  • Typing {S} is like typing the contents of S, escaping them if it's inside a string literal. So if S EQUS "\"hi\"", then:
    • println {S} is like println "hi"
    • mac {S} is like mac "hi"
    • println "{S}" is like println "\"hi\""
    • mac "{S}" is like mac "\"hi\""
  • Typing \1 is like typing the value of \1, escaping it if it's inside a string literal. So inside of mac "bye", then:
    • println \1 is like println "bye"
    • println "\1" is like println "\"bye\""
    • submac \1 is like submac "bye"
    • submac "\1" is like submac "\"bye\""
  • Inside of "quoted" string """literals""", there's no EQUS expansion, ; doesn't start a comment, and , doesn't start a new macro arg; it only cares about \ to start a character escape or line continuation, and " (or """) to end the string. That's true whether or not the string literal is being passed as a macro arg.
  • Backslash escapes exist for escaping characters that would otherwise have a special meaning, whether inside macro args or string literals. Since , and ; are special inside macro args (unlike inside strings), \, and \; are supported there. Thus:
    • mac ATAN2(3.0\, 4.0) passes \1 as ATAN2(3.0, 4.0)
    • mac a\\nb\nc passes \1 as a\nb<newline>c; so inside mac:
      • println \1 is like println a\nb<newline>c, a syntax error
      • println "\1" is like println "a\\nb\nc"
  • Illegal escapes print an error and expand to their escaped character, like C and Python. So:
    • mac a\q\,b complains about \q and passes \1 as aq,b
    • println "a\q\,b" complains about \q and \, and is like println "aq,b"
    • mac "a\q\,b" complains about \q and \, and is like mac "aq,b" (although \, gives a deprecation warning, not an error)

(As a side effect, this enables the quine2.asm example that I had in mind earlier, but that wasn't the goal: it's just what I had expected from my mental model of how these rgbasm features "ought to" work. Basically this PR fixes the inconsistencies that were preventing that, like if \1 was "a\"b", then println "\1" was printing "a"b".)

@Rangi42 Rangi42 marked this pull request as ready for review January 5, 2021 16:49
@Rangi42
Copy link
Contributor Author

Rangi42 commented Jan 5, 2021

Regarding compatibility with existing code: the tests verify this against pokecrystal, pokered, and ucity, and Polished Crystal builds correctly with three deprecation warnings about \, in string literals in rawchar macro args. Edit: Prism also builds without any warnings or broken code (except a couple of bytes' difference due to corrected fixed-point accuracy in master).

Edit: I still think it might be cleaner if appendStringLiteral(keepLiteral=true/false) were refactored into two separate functions, one for each case. But it might not.

@Rangi42
Copy link
Contributor Author

Rangi42 commented Jan 6, 2021

Change summary:

  • Move the translation of escape characters from appendMacroArg to yylex_RAW (e.g. for mac a\nb)
  • Macro args allow escaping \; in addition to \,
  • String literals in macro args are fully handled, not just special-cased to handle ";" (even 0.4.2 allowed mac "; not comment" ; comment)
  • "\1" and "{S}" were supposed to expand to string representations of \1 and S; this fixes that for escaped characters
  • Illegal string escapes like "\x" expand to "x", not "\" (which they already did in macro args, e.g. mac a\xb being mac axb)

I see this as essentially a bugfix PR for other post-0.4.2 changes:

  • {interpolation} outside string literals
  • "\1" and "{S}" should expand to string representations of \1 and S (e.g. not further expanding EQUS identifiers in their contents)
  • Introduction of """multi-line strings"""

@Rangi42 Rangi42 force-pushed the macro-arg-strings branch 3 times, most recently from fbbf148 to e82670f Compare January 9, 2021 20:05
@Rangi42
Copy link
Contributor Author

Rangi42 commented Jan 9, 2021

Another demo test case:

casetest: MACRO
	println "Original: \1"
	\1
_lwr\@ EQUS STRLWR("\1")
	println "Lowercase: {_lwr\@}"
	_lwr\@
_upr\@ EQUS STRUPR("\1")
	println "Uppercase: {_upr\@}"
	_upr\@
PURGE _lwr\@, _upr\@
	println
ENDM

hi equs "Hello"
	casetest println "{hi}, \"world\"!" ; comment
	casetest println "Today is "\, {__DATE__}\, "." ; builtin acts like __DATE__ EQUS "\"09 January 2021\""
	casetest println "This is line {d:__LINE__}." \; arg comment ; line comment
Original: println "Hello, \"world\"!"
Hello, "world"!
Lowercase: println "hello, \"world\"!"
hello, "world"!
Uppercase: PRINTLN "HELLO, \"WORLD\"!"
HELLO, "WORLD"!

Original: println "Today is ", "09 January 2021", "."
Today is 09 January 2021.
Lowercase: println "today is ", "09 january 2021", "."
today is 09 january 2021.
Uppercase: PRINTLN "TODAY IS ", "09 JANUARY 2021", "."
TODAY IS 09 JANUARY 2021.

Original: println "This is line 16." ; arg comment
This is line 17.
Lowercase: println "this is line 16." ; arg comment
this is line 17.
Uppercase: PRINTLN "THIS IS LINE 16." ; ARG COMMENT
THIS IS LINE 17.

@aaaaaa123456789
Copy link
Member

Alright, I think this now correctly implements a consistent model for passing macro args and expanding strings.

This all seems very reasonable, except for one part, that is actually not related to quoting:

mac ATAN2(3.0\, 4.0) passes \1 as ATAN2(3.0, 4.0)

Considering user-defined functions are coming eventually, this is really unintuitive. Commas inside function call parentheses (or if this is too complicated, inside any parentheses) should not delimit macro arguments. (Consider how the C preprocessor does the same thing, for instance.)

@ISSOtm ISSOtm added this to the v0.4.3 milestone Jan 21, 2021
@Rangi42 Rangi42 force-pushed the macro-arg-strings branch 7 times, most recently from ce5543d to bbfa417 Compare January 23, 2021 19:25
@ISSOtm
Copy link
Member

ISSOtm commented Feb 10, 2021

Alright, I understand that this tries to make the expansion / escaping etc. model feel more consistent. (I have yet to look at the code.)

Only change I'm not sure about, is \; in macros. I don't think that a macro arg should contain a semicolon, as arguments expanding to comments would probably cause a lot of breakage.

I'll now start reviewing the actual code with what I've read in mind.

Copy link
Member

@ISSOtm ISSOtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, this might see the light of the day..!
"It's been 3,000 years" meme

test/asm/multi-line-strings.asm Outdated Show resolved Hide resolved
src/asm/lexer.c Show resolved Hide resolved
src/asm/lexer.c Outdated Show resolved Hide resolved
@Rangi42 Rangi42 force-pushed the macro-arg-strings branch 2 times, most recently from 69f214a to 0fad2d9 Compare February 11, 2021 15:49
@Rangi42 Rangi42 requested a review from ISSOtm February 11, 2021 15:50
@Rangi42 Rangi42 force-pushed the macro-arg-strings branch 3 times, most recently from 80afc18 to ba3f700 Compare February 12, 2021 20:17
Fixes gbdev#683

The lexer's raw mode for reading macro args already attempted
to handle semicolons inside string literals, versus outside ones
which start comments. This change reuses the same function for
reading string literals in normal and raw modes, also handling:

- Commas in strings versus between macro args
- Character escapes
- {Interpolations} and \1-\9 args inside vs. outside strings
- Multi-line string literals

All test case outputs remain identical except for passing `"""`
as a macro argument, which now starts a multi-line string literal
instead of terminating the passed argument at the carriage return.
In macro args, '\,' does not start a new argument,
but passes a literal ',' instead. This adds more escapes:

- '\\' does not start a line continuation
- '\"' does not start a string literal

This makes '\\' in a macro arg pass one backslash,
not two, which affects a test result.
within macro args, string literals, and normal context

- "{S}" should always equal the contents of S
- "\1" should always act like quoting the value of \1

Fixes gbdev#691
`appendIfLiteral` appends a character to yylval.tzString
if keepLiteral is true and it's not too long already.
Normal mode uses readString, and plain copy-til-'\0' loops
for appending interpolations or macro arg values.

Raw mode uses appendStringLiteral, and appendEscapedSubstring
for appending interpolations or macro arg values.
@ISSOtm
Copy link
Member

ISSOtm commented Feb 17, 2021

Since \\, has been deprecated, shouldn't rgbasm(5) be updated, given that it mentions this?

Otherwise, I think this is good, I'll let you handle the rebasing or squashing.

@Rangi42
Copy link
Contributor Author

Rangi42 commented Feb 17, 2021

Since \\, has been deprecated, shouldn't rgbasm(5) be updated, given that it mentions this?

Otherwise, I think this is good, I'll let you handle the rebasing or squashing.

You're right; corrected.

I'll go ahead and squash these after CI; they close a few issues + implement a few changes, but those are pretty closely tied together.

@Rangi42 Rangi42 merged commit d049ffc into gbdev:master Feb 17, 2021
@Rangi42 Rangi42 deleted the macro-arg-strings branch February 17, 2021 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants