-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Less restrictive bare keys #337
Comments
This is the PR that merged it with the rationale: #283 |
The justifications seem to be: "easy to understand", "guides users to choose simple key names" and "eliminate any weirdness that could come from having to deal with undelimited Unicode". I may be underestimating the difficulty of dealing with undelimited Unicode, but I disagree somewhat. However, in languages other than English, not being able to use accented characters might make things more difficult and less clear. A technical problem that arises is that quoted keys make round-tripping from a class to TOML and back difficult in some cases. For example: a class with |
Can you elaborate? It seems like you'd have to scan the key name to determine whether it needs quotes or not.
Yes. Reasonable people can definitely disagree on this point. I tend to like keeping unquoted identifiers simple because it makes it easier for the human writing the config to reason about when quotes are needed. |
Yes, but is that what the user expects? Different libraries handle this differently. Compare JS libs toml-node and toml-j0.4: # TOML input, referred to as input in JS
"ä" = 5 toml_node.parse(input) // => { "ä": 5 }
toml_j04.parse(input) // => { ä: 5} If I then use tomlify-j04 to convert them back to TOML: # from toml-node output
"\"ä\"" = 5.0
# from toml-j0.4 output
"ä" = 5.0 The restricted expressiveness of bare keys relative to programming language variable names leads to unhelpful disagreements between libraries. Mine, toml4j does the same as toml-node, while toml-rb (from what I can make out) follows toml-j0.4. This could perhaps be resolved in the spec or in toml-test, but I think lifting the restrictions on bare keys would reduce the scope of the ambiguity. Also, this restriction discriminates a bit against languages other than english. For example, French, Greek or Chinese users have to quote all their keys, or write them in english. That isn't necessarily simpler or easier to understand, from their point of view. |
It looks like
What exactly is the ambiguity? Can you point it out in the spec?
It isn't necessarily more complex either, but I could see how some might consider this a negative of restricted identifiers. |
Are you saying that parsers should ignore the quotes when creating a data structure from a TOML input? Eg. "ä" = 5 should produce { ä: 5 } ? My thinking was that the keys used to manipulate the data structure in code should be the same as the ones in the TOML input, quotes and all. |
Uh, ya. I never even considered your alternative interpretation! That seems like something could be clarified in the spec. |
For BinaryMuse/toml-node, there is an issue about quoted keys months ago: BinaryMuse/toml-node#21 It seems that no much people really care about it, so I made my own library, jakwings/toml-j0.4, and learned some PEG parsing techniques for fun. Thanks for using it. :)
I'm Chinese. Even that equal-sign But for the latin-originated languages and keyboards, typing these characters are not that hard? |
It depends on which language your keyboard is in. Some are easier than others, but in general they're not more than a 2-key combo away. How do Chinese programmers type in these symbols, considering that they are very common across all programming languages? |
@mwanji Oh, this is an embarrassing problem. ;-) Most of us just use ascii characters, except for comments and string contents. And nearly all input methods provide an ascii mode, or we can just switch off the IME. |
Aside: I apologize for the delay on BinaryMuse/toml-node#21; quoted keys didn't work for the longest time, this was just a bug in the parser. Should be good to go in the latest version. |
I reluctantly agreed when the restriction on bare keys was introduced, but I was never happy with it. The problem is that it introduces a strong bias in favor of English-only vocabularies which TOML didn't have before. Considering as a totally arbitrary example that my config file includes the following keys:
That works fine, but assuming my app is targeted at German users and therefore uses German keys:
Now I have to tell my users that they need quotes around "Übersetzer" and "Straße" while they can use the other keys unquoted. That would be annoying and confusing. I can also tell them to use quotes around all keys. That works and is less confusing, but also makes TOML a bit less convenient to read and write. (That may be a matter of disagreement, but I certainly find it inconvenient that I have to quote all keys in JSON!) I would therefore suggest to reconsider this restriction and to allow (more or less) arbitrary Unicode letters in bare keys. Definitions of identifiers in languages such as JavaScript, Java or XML could provide a starting point for such a generalization, as they all avoid the "English preferred" bias. |
Everything I said in #283 still holds. TOML 1.0 will have restricted bare keys, but if TOML adoption becomes significant and we can find a reasonable way to deal with undelimited Unicode, then I'd consider it for a future version of TOML. |
What bias are you talking about here? That every programming language under the sun is based on the English vocabulary? Well, yeah, true. But that ship sailed looong ago. Don't bother with the past, because it can't be changed anyway... |
No, those are keywords, and TOML doesn't have any keywords. I'm talking about the bias regarding keys and table names, that is, identifiers. Now, practically all modern programming languages allow arbitrary Unicode letters and (except for the first letter) digits when naming identifiers. I'd be happy if TOML said something such as "everything sequence of Unicode characters that is a legal JavaScript [or Python, or whatever] identifier is also a valid (bare) key." That would remove the bias I have complained about. |
Oh, really? I merely tried to say that there is no 'bias' in the first place. Nothing worth complaining about.. |
Here is a suggestion for bare keys: |
This backs out the unicode bare keys from toml-lang#891. This does *not* mean we can't include it in a future 1.2 (or 1.3, or whatever); just that right now there doesn't seem to be a clear consensus regarding to normalisation and which characters to include. It's already the most discussed single issue in the history of TOML. I kind of hate doing this as it seems a step backwards; in principle I think we *should* have this so I'm not against the idea of the feature as such, but things seem to be at a bit of a stalemate right now, and this will allow TOML to move forward on other issues. It hasn't come up *that* often; the issue (toml-lang#687) wasn't filed until 2019, and has only 11 upvotes. Other than that, the issue was raised only once before in 2015 as far as I can find (toml-lang#337). I also can't really find anyone asking for it in any of the HN threads on TOML. All of this means we can push forward releasing TOML 1.1, giving people access to the much more frequently requested relaxing of inline tables (toml-lang#516, with 122 upvotes, and has come up on HN as well) and some other more minor things (e.g. `\e` has 12 upvotes in toml-lang#715). Basically, a lot more people are waiting for this, and all things considered this seems a better path forward for now, unless someone comes up with a proposal which addresses all issues (I tried and thus far failed). I proposed this over here a few months ago, and the response didn't seem too hostile to the idea: toml-lang#966 (comment)
This backs out the unicode bare keys from toml-lang#891. This does *not* mean we can't include it in a future 1.2 (or 1.3, or whatever); just that right now there doesn't seem to be a clear consensus regarding to normalisation and which characters to include. It's already the most discussed single issue in the history of TOML. I kind of hate doing this as it seems a step backwards; in principle I think we *should* have this so I'm not against the idea of the feature as such, but things seem to be at a bit of a stalemate right now, and this will allow TOML to move forward on other fronts. It hasn't come up *that* often; the issue (toml-lang#687) wasn't filed until 2019, and has only 11 upvotes. Other than that, the issue was raised only once before in 2015 as far as I can find (toml-lang#337). I also can't really find anyone asking for it in any of the HN threads on TOML. Reverting this means we can go forward releasing TOML 1.1, giving people access to the much more frequently requested relaxing of inline tables (toml-lang#516, with 122 upvotes, and has come up on HN as well) and some other more minor things (e.g. `\e` has 12 upvotes in toml-lang#715). Basically, a lot more people are waiting for this, and all things considered this seems a better path forward for now, unless someone comes up with a proposal which addresses all issues (I tried and thus far failed). I proposed this over here a few months ago, and the responses didn't seem too hostile to the idea: toml-lang#966 (comment)
Bare keys are currently restricted to
A-Za-z0-9_-
but I don't get the rationale. The only character that really needs to be escaped is.
. Is anything else, including spaces, a problem for parsers or easy comprehension?The text was updated successfully, but these errors were encountered: