Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear whether raw tab is allowed inside multi-line basic string #571

Closed
ghost opened this issue Oct 6, 2018 · 4 comments
Closed

Unclear whether raw tab is allowed inside multi-line basic string #571

ghost opened this issue Oct 6, 2018 · 4 comments

Comments

@ghost
Copy link

ghost commented Oct 6, 2018

It is unclear from the specification whether raw tab characters are allowed inside multi-line basic strings.

On the one hand, the section Multi-line basic strings states that "All other whitespace and newline characters remain intact.". This suggests that all whitespace is allowed inside the string. And whitespace has already been defined as tabs and spaces.

On the other hand, the same section states that "Any Unicode character may be used except those that must be escaped ... (U+0000 to U+001F, U+007F)".

So which is it ? Allowed because whitespace, or forbidden because control character ?

The ABNF definition forbids raw tab characters inside multi-line basic strings, but the ABNF is currently not authoritative.

@pradyunsg
Copy link
Member

Nice catch @jorisvr! :)

I think we should clarify that not "all whitespace" is allowed unescaped.

@LongTengDao
Copy link
Contributor

LongTengDao commented Nov 14, 2018

Basic strings is designed to input chars which can't easy input, like control chars, use escape mark ( these marks created in computer epoch ).
Literal strings is designed to input the escape mark self, or content completely has no relationship with coding on computer.

So there is no difference between basic strings and literal strings about treating Tab -- Tab is a human visible charactor, like Space or ABC.... CR and LF is same to TAB on this, but used for file lines split, it's special only so. So the control-chars I think even should not be treated special. Because it's no reason and useless -- Unicode defined so mush control-type chars.

TOML is a config format, not a transfer format, I think it could ignore many popular limit, to be as simple as possible, just consider the practical reason.

@pradyunsg pradyunsg added the abnf label May 13, 2019
@eksortso
Copy link
Contributor

Even though it's fairly easy to type a tab, it seems that these whitespace characters were overlooked in the definition of multi-line basic strings, and in fact needlessly forbidden in single-line basic strings.

Let's let tabs be free in all strings. The escape code \t is still alright in both types of basic strings.

@pradyunsg
Copy link
Member

Let's let tabs be free in all strings.

Done. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants