-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change ABNF array definition to permit single value types only #663
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through and manually tested all the primary permutations of possible conflicts via http://instaparse.mojombo.com/ (remember to set the grammar to :abnf
, not :ebnf
, if you do the same!) and all of them parse erred appropriately, and the arrays validate when the types match (including when the array contains two arrays, or two inline-tables, and each have different values inside). Each was done with two values, of similar or different types (string and string but also string and int or string and inline-table or integer and integer or integer and array or...).
As far as I can understand the grammar, the grammar itself within this diff seems correct.
This PR correctly implements array typing within the ABNF as far as I can see.
I don't think this is necessary, and it hinders the readability. It is customary for grammars to not include semantic restrictions like this. Typically languages will define arrays as something like |
ABNF can be used for many things, one of which could be document validation. What's needed here is a verdict on what the intention for the ABNF is, since this can be anything from basic tokenization until full document validation. |
The intention as specified in PR #236 is as a tool for parsers to reference, and this has been alluded to elsewhere, re: testing document generation or validation, as a reference implementation essentially. It is in fact intended to have a complete match. |
I see one bug in the PR: the spec clearly defines four different date/time types (Offset Date-Time, Local Date-Time, Local Date, Local Time), but the PR erroneously treats them as one, allowing them to be mixed (in contrast to the written spec). Otherwise I repeat my plea (#553 (comment)) to abolishing array typing altogether, but I admit that that's a different issue. |
I'll argue that on the contrary, all four date-time types are grouped together as a single type, "Datetime":
If you can make a strong appeal to enough of the core team, especially @mojombo, that heterogeneous arrays are the way to go, then a counter-proposal would be most welcomed. You have the arguments for it already laid out, and so I say, go for it. |
@eksortso You're right that the written spec says so but that too is clearly a bug. The word "Datetime" is mentioned just that single time in the spec, never again. Instead, four separate date and time-related types are defined farther down. The grouping exists, so far, only in your imagination -- though it might be a way to resolve the contradiction. As for making a case for heterogeneous arrays, we'll see about that. I admit that I don't even have particularly strong feelings regarding that matter, though I would certainly prefer that mixing integers and floats were allowed (to prevent confusing users with nonintuitive and illogical error messages). |
@ChristianSi I have to admit, I don't see where the spec explicitly states that each of the date/time formats constitute their own separate types. Could you provide some clarification, either here or in reference to an issue, old or new? |
This is one of the cases where the spec is vague but I think the intended takeaway is clear: there is one Datetime, and it can be said different ways, and mean different things. This is probably hell on implementation, of course, since we really have something like
so I am not at all against revision or clarification, even if it constitutes an adjustment to implementation, because I suspect that any reconciliation that has been done, on the parser level, between these entities, would have already been made, and so any sensible change we would make will merely formalize what is already out there. |
That said, this thread is about ABNF, TOML, and the handling of array types. Continuing to beg the maintainers for the abolishment of array typing is probably more on topic than discussing the nuances of Datetime's type and what that means. ^_~ |
Though the spec is somewhat vague when it comes to talking about types, I'd say that the TOC is already a strong hint:
Each of these entries defines a type, though the "Table" and "Array of Tables" types are special -- we might call them structural types instead of value types since they form the basic structure of a TOML document and cannot be used as values (in key/value pairs or arrays), while all other types can. The "Array of Tables" section explicitly starts with: "The last type that has not yet been expressed..." (emphasis added). When we look into the past, we notice that, a long time ago, TOML did indeed have just a single Datetime type. This was changed by PR #414 ("Specify OffsetDateTime, LocalDateTime, LocalDate, and LocalTime types") which eliminated that type and introduced the four different types we have today. Note that the issue title talks about "types". And the PR was a spin-off of #362 ("Treat Dates and Datetimes as separate types"). Separate types, indeed. That PR was followed a short time later by PR #442 ("Fixed datetime links in table of contents") which eliminated the now-orphaned "Datetime" link from the TOC, replacing it with links to the four different types we have now. But nobody, except you, had noticed the other orphaned "Datetime" mention which has survived, until this day, in the "Key/Value Pair" section. But that was clearly an accident of history, not a deliberate decision. I rest my case. |
Well, @ChristianSi. With that degree of prosecutorial zeal, you could have made heterogeneous arrays the standard by now. There's still time for that though. But more importantly to you, I concede. All of the date/time formats are separate types, and this will be made explicit. Since the array syntax only addresses value types, I made certain that each type indicated in By the way, let's mention #596. I already agreed with you once before, and I only recently remembered. |
@eksortso: That looks good now, but since we're now also cleaning up the written spec I'd suggest one further revision. With your latest changes, the written spec is up-to-date regarding the value types allowed in key-value pairs, but it is not very explicit about what's allowed in an array.
Supposedly, that refers to the values inside a key/value pair, but that should be made explicit. Hence I'd suggest to replace the sentence
with something like
|
Also, to remove the conceptual oddity (arrays are brackets??), I'd suggest to replace
with
|
Not going to use this. It means exactly the same thing as "square brackets with values inside", with additional aspects spelled out in the following sentences. |
I like this, and the link back to the section containing the value type list is a good addition. It's going in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the following ABNF is equivalent and more readable:
array = array-open ws-comment-newline [ array-values ] ws-comment-newline array-close
array-values = array-strings
...
array-strings = string ws array-sep ws-comment-newline array-strings
array-strings =/ string ws [ array-sep ]
element types). | ||
are separated by commas. | ||
|
||
All values that are allowed in [key/value pairs](#user-content-keyvalue-pair) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's means... all values are allowed?
I'd prefer that we rephrase this paragraph as something along the lines of:
<prev-paragraph> All elements within an array must have the same data type.
All string values are considered the same type. All arrays (regardless of types of elements in them) are considered the same type. All inline tables (regardless of the key-value pairs in them) are considered the same type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pradyunsg "All values that are allowed in key/value pairs are also allowed in arrays..." – That (partial) sentence or something similar is essential because we have to explain which types are allowed in arrays (String, Integer, Float, Boolean, Offset Date-Time, Local Date-Time, Local Date, Local Time, Array, Inline Table) and which are not (Table, Array of Tables), and currently we don't.
It might be possible to rephrase it a bit, say: "An array collects values of any of the types that are allowed in key/value pairs..." But we must be explicit about this, and a backreference to the key/value pairs it much better than having to list all (currently ten) allowed types again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we provide a separate section for Value Types, so that we can pinpoint those types in the spec? Then we can include a backreference to that section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optionally, we could move the statements about string, array, and inline table types up to the proposed Value Types section. We'd simplify it, though, if we did that.
All string values are considered the same type. All arrays are considered the same type, regardless of content. All inline tables are considered the same type, regardless of content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having said all that, let me propose the following synthesis of both of your suggestions. We'd also include a backreference from "value type".
Arrays are square brackets with values inside. Whitespace is ignored. Elements
are separated by commas.
All elements within an array must have the same value type.
(Examples go here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrays are square brackets with values inside. Whitespace is ignored. Elements
are separated by commas.All elements within an array must have the same value type.
LGTM.
Thanks @eksortso, @ChristianSi and @workingjubilee for the discussion on this PR. Much appreciated! :) |
I'll work up an amicus brief. 💖 |
I'll support it. But, I'm going to offer a spin that may work for both single-type and heterogeneous varieties. |
Definitely looking forward to this! |
@pradyunsg wrote:
Definitely more readable. Equivalent, too. Consider it done. |
array-values =/ array-floats | ||
array-values =/ array-integers | ||
|
||
array-strings = string ws array-sep ws-comment-newline array-strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array-value-sep = ws array-sep ws-comment-newline
array-value-ending = ws [ array-sep ]
array-strings = string array-value-sep array-strings
array-strings =/ string array-value-ending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, conditional on a suggested ABNF simplification.
If #676 is merged, the proposed changes to the abnf would be obsolete. However, the changes to the README (update the list of allowed value types which is currently outdated) still remain relevant and are worth rescuing – maybe fork them off into a different PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We went the other way, and relaxed this constraint.
Actually, on second thought, I'll go ahead and close this PR -- the other changes aren't exactly related to the original intent of the PR and it'll be "cleaner" to start fresh vs continue work in the this PR. |
@pradyunsg That's alright, mostly. Just as long as it moved one way or another, I'm glad, and I took up the cause to get things moving. But I sincerely ran out of time to work on this proposal. I'll make time to possibly reintroduce some of the other concepts that made their way here. So the branch won't be deleted too soon. |
No worries. I think I can pick it up. Thanks for getting things moving here! :) |
The old list only mentions the obsolete Datetime time instead of the 4 date/time types we have today. Backlinks to each type are added as well. This rescues a change from the abandoned PR toml-lang#663, authored by @eksortso.
This modifies
toml.abnf
to enforce the current specification, which only considers homogeneous arrays to be valid. Per theREADME
, strings are the same type regardless of their syntax, and arrays are the same type even if the values they contain differ in type between them. Empty arrays are still valid.