Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow non-English scripts for unquoted keys #891

Merged
merged 24 commits into from
Sep 11, 2022
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
cc0a59e
Allow non-English scripts for unquoted keys
abelbraaksma Mar 15, 2022
aacda84
Update changelog and toml.md with non-English bare key examples
abelbraaksma Mar 15, 2022
6c7e62d
Simplify abnf slightly
abelbraaksma Mar 16, 2022
ddf3fa5
Add full Unicode ranges to toml.md
abelbraaksma Mar 16, 2022
40e6c13
remove trailing spaces
abelbraaksma Mar 16, 2022
8e78ee9
improve wording
abelbraaksma Mar 16, 2022
0387d87
Fix formatting in Unicode summary of toml.md
abelbraaksma Mar 24, 2022
d01782f
Specifically include Enclosed Alphanumerics U+2460-24FF
abelbraaksma Mar 24, 2022
44a1706
Include currency, superscript and fractions, as they are allowed else…
abelbraaksma Mar 24, 2022
92f1e13
Remove currency signs, they shouldn't have been there.
abelbraaksma Mar 24, 2022
0216928
Reflow to fit the 80 character line length limit in toml.md
abelbraaksma Mar 27, 2022
2a3c7f5
Include U+FFFE/FFFF in the text to match the ABNF
abelbraaksma Mar 29, 2022
7052d5c
Fix PUP range in toml.md
abelbraaksma Mar 30, 2022
c086c79
Exclude U+B1, typo, shouldn't have been included
abelbraaksma Apr 2, 2022
68c0ad8
Update toml.md
abelbraaksma Apr 22, 2022
2bbe92d
Update toml.md, clearer wording
abelbraaksma Apr 22, 2022
e075327
Fix end-of-range D999 to DFFF for surrogate blocks
abelbraaksma Apr 28, 2022
4193be3
Merge branch 'main' into update-changelog
abelbraaksma Jul 30, 2022
91533ed
Merge branch 'main' into update-changelog
abelbraaksma Jul 31, 2022
2e239be
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 31, 2022
0c4f76e
Update toml.md to contained proposed text for 'bare keys' explanation
abelbraaksma Aug 15, 2022
9c3e8c9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 15, 2022
19b3ff8
Fix use of emoji variant of info icon character
abelbraaksma Aug 16, 2022
e44e10c
Fix link to suggested format by @marzer, @pradyunsg
abelbraaksma Aug 16, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

* Clarify where and how dotted keys define tables.
* Add new `\e` shorthand for the escape character.
* Allow non-English scripts in unquoted (bare) keys

## 1.0.0 / 2021-01-11

Expand Down
20 changes: 16 additions & 4 deletions toml.abnf
Original file line number Diff line number Diff line change
Expand Up @@ -46,19 +46,31 @@ comment = comment-start-symbol *non-eol
;; Key-Value pairs

keyval = key keyval-sep val

key = simple-key / dotted-key
val = string / boolean / array / inline-table / date-time / float / integer

simple-key = quoted-key / unquoted-key

unquoted-key = 1*( ALPHA / DIGIT / %x2D / %x5F ) ; A-Z / a-z / 0-9 / - / _
;; Unquoted key

unquoted-key = 1*unquoted-key-char
unquoted-key-char = ALPHA / DIGIT / %x2D / %x5F ; a-z A-Z 0-9 - _
unquoted-key-char =/ %xC0-D6 / %xD8-F6 / %xF8-37D ; non-symbol chars in Latin block
unquoted-key-char =/ %x37F-1FFF ; exclude GREEK QUESTION MARK, which is basically a semi-colon
unquoted-key-char =/ %x200C-200D / %x203F-2040 ; include combining chars used in some languages
unquoted-key-char =/ %x2070-218F / %x2C00-2FEF ; this excludes arrows, blocks and the like
unquoted-key-char =/ %x3001-D7FF ; skip 2FF0-3000 ideographic up/down markers and spaces
unquoted-key-char =/ %xF900-FDCF / %xFDF0-FFFD ; skip D800-D999 surrogate block, E000-F8FF Private Use area, FDD0-FDEF intended for process-internal use (unicode)
abelbraaksma marked this conversation as resolved.
Show resolved Hide resolved
unquoted-key-char =/ %x10000-EFFFF ; all chars outside BMP range, excluding Private Use planes
abelbraaksma marked this conversation as resolved.
Show resolved Hide resolved

;; Quoted and dotted key

quoted-key = basic-string / literal-string
dotted-key = simple-key 1*( dot-sep simple-key )

dot-sep = ws %x2E ws ; . Period
keyval-sep = ws %x3D ws ; =

val = string / boolean / array / inline-table / date-time / float / integer

;; String

string = ml-basic-string / basic-string / ml-literal-string / literal-string
Expand Down
30 changes: 23 additions & 7 deletions toml.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,27 +104,39 @@ Keys

A key may be either bare, quoted, or dotted.

**Bare keys** may only contain ASCII letters, ASCII digits, underscores, and
dashes (`A-Za-z0-9_-`). Note that bare keys are allowed to be composed of only
ASCII digits, e.g. `1234`, but are always interpreted as strings.
**Bare keys** may contain any letter-like Unicode character from any Unicode script,
abelbraaksma marked this conversation as resolved.
Show resolved Hide resolved
as well as ASCII digits, dashes and underscores. Punctuation, spaces, arrows, box drawing
and private use characters are not allowed. Note that bare keys are allowed to be
composed of only ASCII digits, e.g. `1234`, but are always interpreted as strings.

* From the ASCII characters, only A-Z, a-z, 0-9, _ and - are allowed
abelbraaksma marked this conversation as resolved.
Show resolved Hide resolved
* From the rest of the first 256 characters, only U+0080-00BF, "×" (U+00D7) and "÷" (U+00F7) are disallowed
* All of U+0100-1FFF are allowed, except ";" (U+037E)
* Characters U+200C, U+200D, U+203F, U+2040, U+2070-218F, U+2C00 to U+2FEF are allowed
* All characters from U+3001 and higher, except surrogates (U+D800 to U+D999), Private Use (U+E000 to U+F8FF, U+F0000 to U+100000) and process-internal use (U+FDD0 to U+FDEF)

```toml
key = "value"
bare_key = "value"
bare-key = "value"
1234 = "value"
Fuß = "value"
😂 = "value"
汉语大字典 = "value"
辭源 = "value"
பெண்டிரேம் = "value"
```

**Quoted keys** follow the exact same rules as either basic strings or literal
strings and allow you to use a much broader set of key names. Best practice is
to use bare keys except when absolutely necessary.
strings and allow you to use any Unicode character in a key name, including spaces.
abelbraaksma marked this conversation as resolved.
Show resolved Hide resolved
Best practice is to use bare keys except when absolutely necessary.

```toml
"127.0.0.1" = "value"
"character encoding" = "value"
"ʎǝʞ" = "value"
'key2' = "value"
'quoted "value"' = "value"
"╠═╣" = "value"
"⋰∫∬∭⋱" = "value"
```

A bare key must be non-empty, but an empty quoted key is allowed (though
Expand All @@ -145,6 +157,7 @@ name = "Orange"
physical.color = "orange"
physical.shape = "round"
site."google.com" = true
பெண்.டிரேம் = "we are women"
```

In JSON land, that would give you the following structure:
Expand All @@ -158,6 +171,9 @@ In JSON land, that would give you the following structure:
},
"site": {
"google.com": true
},
"பெண்": {
"டிரேம்": "we are women"
}
}
```
Expand Down