Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code points above Latin1 are not recognized as white space #665

Closed
gibson042 opened this issue Jul 14, 2021 · 2 comments
Closed

code points above Latin1 are not recognized as white space #665

gibson042 opened this issue Jul 14, 2021 · 2 comments
Labels
confirmed issue reported has been reproduced fixed - please verify Issue has been fixed. Please verify and close.

Comments

@gibson042
Copy link

gibson042 commented Jul 14, 2021

Environment: XS 10.5.0

Description
In the ECMAScript lexical grammar, WhiteSpace is defined to include ASCII tab/vertical tab/form feed/space, nonbreaking space, BOM/zero-width nonbreaking space, and any code point with the Unicode property “Space_Separator”, and LineTerminator is any of U+000A LINE FEED, U+000D CARRIAGE RETURN, U+2028 LINE SEPARATOR, and U+2029 PARAGRAPH SEPARATOR. Both are required to be insignificant in between expression tokens, but XS does not seem to recognize WhiteSpace code points above the Latin-1 Supplement block.

Steps to Reproduce

  1. Evaluate source text like 0  (i.e., a numeric literal followed by U+1680 OGHAM SPACE MARK).
  2. Check all WhiteSpace and LineTerminator characters (as identified at unicode.org):
    ["0009", "000A", "000B", "000C", "000D", "0020", "00A0",
     "1680", "2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "200A",
     "2028", "2029", "202F", "205F", "3000", "FEFF"
    ].filter(hex => {
      const cp = String.fromCharCode(parseInt(hex, 16));
      try {
        eval("[true]" + cp + "[0]");
        // No parsing error, suppress code point.
        return false;
      } catch ( ex ) {
        // Parsing error, keep code point.
        return true;
      }
    })
    

Expected behavior

  1. No error.
  2. An empty array.

Actual behavior

  1. SyntaxError: invalid character 5760
  2. 1680,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,200A,202F,205F,3000,FEFF

Script

$ ./xs -v; cat /tmp/js; ./xs /tmp/js
XS 10.5.0
const ws = ["0009", "000A", "000B", "000C", "000D", "0020", "00A0",
  "1680", "2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "200A",
  "2028", "2029", "202F", "205F", "3000", "FEFF"
]
const rejected = ws.filter(hex => {
  const cp = String.fromCharCode(parseInt(hex, 16));
  try {
    eval("/x/" + cp);
    // No parsing error, suppress code point.
    return false;
  } catch ( ex ) {
    // Parsing error, keep code point.
    return true;
  }
});
print("rejected: " + rejected);
rejected: 1680,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,200A,202F,205F,3000,FEFF
@gibson042 gibson042 changed the title super-Latin1 white space after regular expression literal is incorrectly rejected code points above Latin1 are not recognized as white space Jul 14, 2021
@gibson042
Copy link
Author

Updated to reflect that this issue is more general than the regular-expression-specific one in JavaScriptCore.

@phoddie
Copy link
Collaborator

phoddie commented Jul 28, 2021

Thank you for the detailed report. You are absolutely correct that the problem was more general than just RegExp.

This is fixed in today's Moddable SDK update. The xst binary hasn't been updated yet, so to verify the fix you need to build from sources.

@phoddie phoddie added confirmed issue reported has been reproduced fixed - please verify Issue has been fixed. Please verify and close. labels Jul 28, 2021
@phoddie phoddie closed this as completed Aug 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed issue reported has been reproduced fixed - please verify Issue has been fixed. Please verify and close.
Projects
None yet
Development

No branches or pull requests

2 participants