Less restrictive bare keys #337

mwanji · 2015-06-30T18:54:42Z

Bare keys are currently restricted to A-Za-z0-9_- but I don't get the rationale. The only character that really needs to be escaped is .. Is anything else, including spaces, a problem for parsers or easy comprehension?

The text was updated successfully, but these errors were encountered:

BurntSushi · 2015-06-30T20:42:48Z

This is the PR that merged it with the rationale: #283

mwanji · 2015-07-12T21:15:36Z

The justifications seem to be: "easy to understand", "guides users to choose simple key names" and "eliminate any weirdness that could come from having to deal with undelimited Unicode". I may be underestimating the difficulty of dealing with undelimited Unicode, but I disagree somewhat.

However, in languages other than English, not being able to use accented characters might make things more difficult and less clear.

A technical problem that arises is that quoted keys make round-tripping from a class to TOML and back difficult in some cases. For example: a class with äbc = 5 becomes "äbc" = 5 in TOML. So translating it back to code requires some perhaps surprising heuristics.

BurntSushi · 2015-07-13T01:48:22Z

So translating it back to code requires some perhaps surprising heuristics.

Can you elaborate? It seems like you'd have to scan the key name to determine whether it needs quotes or not.

I may be underestimating the difficulty of dealing with undelimited Unicode, but I disagree somewhat.

Yes. Reasonable people can definitely disagree on this point. I tend to like keeping unquoted identifiers simple because it makes it easier for the human writing the config to reason about when quotes are needed.

mwanji · 2015-07-13T12:24:24Z

Can you elaborate? It seems like you'd have to scan the key name to determine whether it needs quotes or not.

Yes, but is that what the user expects? Different libraries handle this differently. Compare JS libs toml-node and toml-j0.4:

# TOML input, referred to as input in JS
"ä" = 5

toml_node.parse(input) // => { "ä": 5 }
toml_j04.parse(input) // => { ä: 5}

If I then use tomlify-j04 to convert them back to TOML:

# from toml-node output
"\"ä\"" = 5.0

# from toml-j0.4 output
"ä" = 5.0

The restricted expressiveness of bare keys relative to programming language variable names leads to unhelpful disagreements between libraries. Mine, toml4j does the same as toml-node, while toml-rb (from what I can make out) follows toml-j0.4. This could perhaps be resolved in the spec or in toml-test, but I think lifting the restrictions on bare keys would reduce the scope of the ambiguity.

Also, this restriction discriminates a bit against languages other than english. For example, French, Greek or Chinese users have to quote all their keys, or write them in english. That isn't necessarily simpler or easier to understand, from their point of view.

BurntSushi · 2015-07-13T13:06:28Z

It looks like toml_node gets it wrong or doesn't know about quoted keys. (Quoted identifiers are a relatively recent addition.) In other words, this isn't a disagreement between libraries---it's a compliancy issue with the spec itself.

but I think lifting the restrictions on bare keys would reduce the scope of the ambiguity.

What exactly is the ambiguity? Can you point it out in the spec?

Also, this restriction discriminates a bit against languages other than english. For example, French, Greek or Chinese users have to quote all their keys, or write them in english. That isn't necessarily simpler or easier to understand, from their point of view.

It isn't necessarily more complex either, but I could see how some might consider this a negative of restricted identifiers.

mwanji · 2015-07-13T13:25:43Z

It looks like toml_node gets it wrong or doesn't know about quoted keys.

Are you saying that parsers should ignore the quotes when creating a data structure from a TOML input? Eg. "ä" = 5 should produce { ä: 5 } ? My thinking was that the keys used to manipulate the data structure in code should be the same as the ones in the TOML input, quotes and all.

BurntSushi · 2015-07-13T13:35:20Z

Are you saying that parsers should ignore the quotes when creating a data structure from a TOML input? Eg. "ä" = 5 should produce { ä: 5 } ?

Uh, ya. I never even considered your alternative interpretation! That seems like something could be clarified in the spec.

ghost · 2015-07-13T13:36:38Z

For BinaryMuse/toml-node, there is an issue about quoted keys months ago: BinaryMuse/toml-node#21

It seems that no much people really care about it, so I made my own library, jakwings/toml-j0.4, and learned some PEG parsing techniques for fun. Thanks for using it. :)

Also, this restriction discriminates a bit against languages other than english. For example, French, Greek or Chinese users have to quote all their keys, or write them in english. That isn't necessarily simpler or easier to understand, from their point of view.

I'm Chinese. Even that equal-sign =, brackets [] and periods ., are not always easy to type while I am typing Chinese characters, that depends on my input method. (Furthermore, I am using a modified version of TOML for my own simplicity.)

But for the latin-originated languages and keyboards, typing these characters are not that hard?

mwanji · 2015-07-13T13:47:44Z

But for the latin-originated language and keyboards, typing these characters are not that hard?

It depends on which language your keyboard is in. Some are easier than others, but in general they're not more than a 2-key combo away. How do Chinese programmers type in these symbols, considering that they are very common across all programming languages?

ghost · 2015-07-13T13:57:15Z

@mwanji Oh, this is an embarrassing problem. ;-) Most of us just use ascii characters, except for comments and string contents. And nearly all input methods provide an ascii mode, or we can just switch off the IME.

BinaryMuse · 2015-07-13T17:55:21Z

Aside: I apologize for the delay on BinaryMuse/toml-node#21; quoted keys didn't work for the longest time, this was just a bug in the parser. Should be good to go in the latest version.

ChristianSi · 2015-07-16T17:55:39Z

I reluctantly agreed when the restriction on bare keys was introduced, but I was never happy with it. The problem is that it introduces a strong bias in favor of English-only vocabularies which TOML didn't have before.

Considering as a totally arbitrary example that my config file includes the following keys:

author
translator
street-address
city
postcode

That works fine, but assuming my app is targeted at German users and therefore uses German keys:

Autor
Übersetzer
Straße
Ort
Postleitzahl

Now I have to tell my users that they need quotes around "Übersetzer" and "Straße" while they can use the other keys unquoted. That would be annoying and confusing.

I can also tell them to use quotes around all keys. That works and is less confusing, but also makes TOML a bit less convenient to read and write. (That may be a matter of disagreement, but I certainly find it inconvenient that I have to quote all keys in JSON!)

I would therefore suggest to reconsider this restriction and to allow (more or less) arbitrary Unicode letters in bare keys. Definitions of identifiers in languages such as JavaScript, Java or XML could provide a starting point for such a generalization, as they all avoid the "English preferred" bias.

mojombo · 2016-01-25T21:22:06Z

Everything I said in #283 still holds. TOML 1.0 will have restricted bare keys, but if TOML adoption becomes significant and we can find a reasonable way to deal with undelimited Unicode, then I'd consider it for a future version of TOML.

Hrxn · 2016-01-26T04:02:22Z

#337 (comment)

I reluctantly agreed when the restriction on bare keys was introduced, but I was never happy with it. The problem is that it introduces a strong bias in favor of English-only vocabularies which TOML didn't have before.

What bias are you talking about here? That every programming language under the sun is based on the English vocabulary? Well, yeah, true. But that ship sailed looong ago.

Don't bother with the past, because it can't be changed anyway...

ChristianSi · 2016-01-26T21:23:37Z

@Hrxn:

What bias are you talking about here? That every programming language under the sun is based on the English vocabulary?

No, those are keywords, and TOML doesn't have any keywords. I'm talking about the bias regarding keys and table names, that is, identifiers. Now, practically all modern programming languages allow arbitrary Unicode letters and (except for the first letter) digits when naming identifiers.

I'd be happy if TOML said something such as "everything sequence of Unicode characters that is a legal JavaScript [or Python, or whatever] identifier is also a valid (bare) key." That would remove the bias I have complained about.

Hrxn · 2016-01-31T05:34:46Z

No, those are keywords, and TOML doesn't have any keywords. I'm talking about the bias regarding keys and table names, that is, identifiers.

Oh, really?
Thanks for the lecture, I guess, but this was definitely not my point.

I merely tried to say that there is no 'bias' in the first place. Nothing worth complaining about..

TheElectronWill · 2016-03-23T18:13:26Z

Here is a suggestion for bare keys:
Allow any character after the space one in the unicode table (so no newlines, no spaces and no weird characters like NULL), except the following ones: points, square brackets (open and closed), number signs (because they are used for comments), and equal signs.

This backs out the unicode bare keys from toml-lang#891. This does *not* mean we can't include it in a future 1.2 (or 1.3, or whatever); just that right now there doesn't seem to be a clear consensus regarding to normalisation and which characters to include. It's already the most discussed single issue in the history of TOML. I kind of hate doing this as it seems a step backwards; in principle I think we *should* have this so I'm not against the idea of the feature as such, but things seem to be at a bit of a stalemate right now, and this will allow TOML to move forward on other issues. It hasn't come up *that* often; the issue (toml-lang#687) wasn't filed until 2019, and has only 11 upvotes. Other than that, the issue was raised only once before in 2015 as far as I can find (toml-lang#337). I also can't really find anyone asking for it in any of the HN threads on TOML. All of this means we can push forward releasing TOML 1.1, giving people access to the much more frequently requested relaxing of inline tables (toml-lang#516, with 122 upvotes, and has come up on HN as well) and some other more minor things (e.g. `\e` has 12 upvotes in toml-lang#715). Basically, a lot more people are waiting for this, and all things considered this seems a better path forward for now, unless someone comes up with a proposal which addresses all issues (I tried and thus far failed). I proposed this over here a few months ago, and the response didn't seem too hostile to the idea: toml-lang#966 (comment)

This backs out the unicode bare keys from toml-lang#891. This does *not* mean we can't include it in a future 1.2 (or 1.3, or whatever); just that right now there doesn't seem to be a clear consensus regarding to normalisation and which characters to include. It's already the most discussed single issue in the history of TOML. I kind of hate doing this as it seems a step backwards; in principle I think we *should* have this so I'm not against the idea of the feature as such, but things seem to be at a bit of a stalemate right now, and this will allow TOML to move forward on other fronts. It hasn't come up *that* often; the issue (toml-lang#687) wasn't filed until 2019, and has only 11 upvotes. Other than that, the issue was raised only once before in 2015 as far as I can find (toml-lang#337). I also can't really find anyone asking for it in any of the HN threads on TOML. Reverting this means we can go forward releasing TOML 1.1, giving people access to the much more frequently requested relaxing of inline tables (toml-lang#516, with 122 upvotes, and has come up on HN as well) and some other more minor things (e.g. `\e` has 12 upvotes in toml-lang#715). Basically, a lot more people are waiting for this, and all things considered this seems a better path forward for now, unless someone comes up with a proposal which addresses all issues (I tried and thus far failed). I proposed this over here a few months ago, and the responses didn't seem too hostile to the idea: toml-lang#966 (comment)

BinaryMuse mentioned this issue Jul 13, 2015

No download count on some modules rvagg/nodei.co#28

Closed

mojombo closed this as completed Jan 25, 2016

BurntSushi mentioned this issue Mar 21, 2016

What about allowing special characters in bare-keys? #401

Closed

dtolnay mentioned this issue May 29, 2017

Readme specification of keys is ambiguous #464

Closed

marzer mentioned this issue Dec 10, 2019

Relax bare key restrictions to allow additional unicode letters and numbers #687

Closed

arp242 mentioned this issue Jun 2, 2023

Backout Unicode bare keys #979

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Less restrictive bare keys #337

Less restrictive bare keys #337

mwanji commented Jun 30, 2015

BurntSushi commented Jun 30, 2015

mwanji commented Jul 12, 2015

BurntSushi commented Jul 13, 2015

mwanji commented Jul 13, 2015

BurntSushi commented Jul 13, 2015

mwanji commented Jul 13, 2015

BurntSushi commented Jul 13, 2015

ghost commented Jul 13, 2015

mwanji commented Jul 13, 2015

ghost commented Jul 13, 2015

BinaryMuse commented Jul 13, 2015

ChristianSi commented Jul 16, 2015

mojombo commented Jan 25, 2016

Hrxn commented Jan 26, 2016

ChristianSi commented Jan 26, 2016

Hrxn commented Jan 31, 2016

TheElectronWill commented Mar 23, 2016

Less restrictive bare keys #337

Less restrictive bare keys #337

Comments

mwanji commented Jun 30, 2015

BurntSushi commented Jun 30, 2015

mwanji commented Jul 12, 2015

BurntSushi commented Jul 13, 2015

mwanji commented Jul 13, 2015

BurntSushi commented Jul 13, 2015

mwanji commented Jul 13, 2015

BurntSushi commented Jul 13, 2015

ghost commented Jul 13, 2015

mwanji commented Jul 13, 2015

ghost commented Jul 13, 2015

BinaryMuse commented Jul 13, 2015

ChristianSi commented Jul 16, 2015

mojombo commented Jan 25, 2016

Hrxn commented Jan 26, 2016

ChristianSi commented Jan 26, 2016

Hrxn commented Jan 31, 2016

TheElectronWill commented Mar 23, 2016