Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatter: Escape non-printable characters in literals #11520

Conversation

straight-shoota
Copy link
Member

With this patch, the formatter escapes all non-printable characters in string, char, and symbol literals.

Non-printable characters in source code are confusing because they typically don't show up in the editor or other displays. This can lead to misunderstanding of the code and can be actively exploited to plant malicious code that gets unnoticed in review (see #11392 for example).

Ideally, editors and source code viewers could take care of this and make sure to indicate non-printable characters. There have been some improvements to that lately. But it's probably impossible or at least very hard to cover ever angle. So it's a good idea to mitigate any issues by making non-printable characters more explicit in source code.

In Crystals grammar (and after #11508 is merged), non-printable characters should technically only be valid inside literals or doc comments. We can easily replace any character in a literal by an equivalent escape sequence without changing the semantics.

This patch implements that. The only exception is that the non-printable characters \n and \t are allowed in string and symbol literals in order to avoid messing up the line format. I don't think that makes any sense for symbol literals, but they share the implementation with strings 🤷 Not sure it's worth making a special case.

For demonstration, the two examples from #11392 format like this now:

access_level = "user"
if access_level != "user\u202E \u2066# Check if admin\u2069 \u2066"
  puts "You are an admin!"
end
access_level = "user"
if access_level != "none\u202E\u2066" # Check if admin⁩⁦" && access_level != "user
  puts "You are an admin!"
end

That's the first step of #11478. A follow up could make the parser reject non-printables, but that's up for debate.

Some non-printables such as BIDI control characters could be considered to be allowed if we implement a BIDI algorithm in the parser that restricts the context for such controls to the boundaries of the literal.

Copy link
Member

@beta-ziliani beta-ziliani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@beta-ziliani beta-ziliani added this to the 1.3.0 milestone Dec 15, 2021
@straight-shoota straight-shoota merged commit bd82766 into crystal-lang:master Dec 16, 2021
@straight-shoota straight-shoota deleted the feature/formatter-unicode-nonprintable branch December 16, 2021 23:05
straight-shoota added a commit that referenced this pull request Dec 17, 2021
@straight-shoota straight-shoota removed this from the 1.3.0 milestone Dec 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants