Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change ABNF array definition to permit single value types only #663

Closed
wants to merge 6 commits into from
Closed

Change ABNF array definition to permit single value types only #663

wants to merge 6 commits into from

Conversation

eksortso
Copy link
Contributor

@eksortso eksortso commented Sep 3, 2019

This modifies toml.abnf to enforce the current specification, which only considers homogeneous arrays to be valid. Per the README, strings are the same type regardless of their syntax, and arrays are the same type even if the values they contain differ in type between them. Empty arrays are still valid.

Copy link
Contributor

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through and manually tested all the primary permutations of possible conflicts via http://instaparse.mojombo.com/ (remember to set the grammar to :abnf, not :ebnf, if you do the same!) and all of them parse erred appropriately, and the arrays validate when the types match (including when the array contains two arrays, or two inline-tables, and each have different values inside). Each was done with two values, of similar or different types (string and string but also string and int or string and inline-table or integer and integer or integer and array or...).

As far as I can understand the grammar, the grammar itself within this diff seems correct.

This PR correctly implements array typing within the ABNF as far as I can see.

@ehuss
Copy link

ehuss commented Sep 3, 2019

I don't think this is necessary, and it hinders the readability. It is customary for grammars to not include semantic restrictions like this. Typically languages will define arrays as something like `[` _expr_, ... `]` and leave the type restrictions to the language definition. The ABNF is not intended to be used without reading the spec.

@mmakaay
Copy link

mmakaay commented Sep 3, 2019

ABNF can be used for many things, one of which could be document validation. What's needed here is a verdict on what the intention for the ABNF is, since this can be anything from basic tokenization until full document validation.

@workingjubilee
Copy link
Contributor

The intention as specified in PR #236 is as a tool for parsers to reference, and this has been alluded to elsewhere, re: testing document generation or validation, as a reference implementation essentially. It is in fact intended to have a complete match.

@ChristianSi
Copy link
Contributor

ChristianSi commented Sep 4, 2019

I see one bug in the PR: the spec clearly defines four different date/time types (Offset Date-Time, Local Date-Time, Local Date, Local Time), but the PR erroneously treats them as one, allowing them to be mixed (in contrast to the written spec).

Otherwise I repeat my plea (#553 (comment)) to abolishing array typing altogether, but I admit that that's a different issue.

@eksortso
Copy link
Contributor Author

eksortso commented Sep 4, 2019

I'll argue that on the contrary, all four date-time types are grouped together as a single type, "Datetime":

Values must be of the following types: String, Integer, Float, Boolean, Datetime, Array, or Inline Table. Unspecified values are invalid.

If you can make a strong appeal to enough of the core team, especially @mojombo, that heterogeneous arrays are the way to go, then a counter-proposal would be most welcomed. You have the arguments for it already laid out, and so I say, go for it.

@ChristianSi
Copy link
Contributor

ChristianSi commented Sep 4, 2019

@eksortso You're right that the written spec says so but that too is clearly a bug. The word "Datetime" is mentioned just that single time in the spec, never again. Instead, four separate date and time-related types are defined farther down. The grouping exists, so far, only in your imagination -- though it might be a way to resolve the contradiction.

As for making a case for heterogeneous arrays, we'll see about that. I admit that I don't even have particularly strong feelings regarding that matter, though I would certainly prefer that mixing integers and floats were allowed (to prevent confusing users with nonintuitive and illogical error messages).

@eksortso
Copy link
Contributor Author

eksortso commented Sep 4, 2019

@ChristianSi I have to admit, I don't see where the spec explicitly states that each of the date/time formats constitute their own separate types. Could you provide some clarification, either here or in reference to an issue, old or new?

@workingjubilee
Copy link
Contributor

This is one of the cases where the spec is vague but I think the intended takeaway is clear: there is one Datetime, and it can be said different ways, and mean different things. This is probably hell on implementation, of course, since we really have something like

date
datetime
datetimetz

so I am not at all against revision or clarification, even if it constitutes an adjustment to implementation, because I suspect that any reconciliation that has been done, on the parser level, between these entities, would have already been made, and so any sensible change we would make will merely formalize what is already out there.

@workingjubilee
Copy link
Contributor

That said, this thread is about ABNF, TOML, and the handling of array types. Continuing to beg the maintainers for the abolishment of array typing is probably more on topic than discussing the nuances of Datetime's type and what that means. ^_~

@ChristianSi
Copy link
Contributor

ChristianSi commented Sep 5, 2019

@eksortso:

I have to admit, I don't see where the spec explicitly states that each of the date/time formats constitute their own separate types.

Though the spec is somewhat vague when it comes to talking about types, I'd say that the TOC is already a strong hint:

  • ...
  • String
  • Integer
  • Float
  • Boolean
  • Offset Date-Time
  • Local Date-Time
  • Local Date
  • Local Time
  • Array
  • Table
  • Inline Table
  • Array of Tables
  • ...

Each of these entries defines a type, though the "Table" and "Array of Tables" types are special -- we might call them structural types instead of value types since they form the basic structure of a TOML document and cannot be used as values (in key/value pairs or arrays), while all other types can.

The "Array of Tables" section explicitly starts with: "The last type that has not yet been expressed..." (emphasis added).

When we look into the past, we notice that, a long time ago, TOML did indeed have just a single Datetime type. This was changed by PR #414 ("Specify OffsetDateTime, LocalDateTime, LocalDate, and LocalTime types") which eliminated that type and introduced the four different types we have today. Note that the issue title talks about "types". And the PR was a spin-off of #362 ("Treat Dates and Datetimes as separate types"). Separate types, indeed.

That PR was followed a short time later by PR #442 ("Fixed datetime links in table of contents") which eliminated the now-orphaned "Datetime" link from the TOC, replacing it with links to the four different types we have now. But nobody, except you, had noticed the other orphaned "Datetime" mention which has survived, until this day, in the "Key/Value Pair" section. But that was clearly an accident of history, not a deliberate decision.

I rest my case.

@eksortso
Copy link
Contributor Author

eksortso commented Sep 5, 2019

Well, @ChristianSi. With that degree of prosecutorial zeal, you could have made heterogeneous arrays the standard by now. There's still time for that though.

But more importantly to you, I concede. All of the date/time formats are separate types, and this will be made explicit.

Since the array syntax only addresses value types, I made certain that each type indicated in val in toml.abnf got its due. This does suggest that the best place to enumerate all of the valid types is where value types are first made explicit.

By the way, let's mention #596. I already agreed with you once before, and I only recently remembered.

@ChristianSi
Copy link
Contributor

ChristianSi commented Sep 6, 2019

@eksortso: That looks good now, but since we're now also cleaning up the written spec I'd suggest one further revision. With your latest changes, the written spec is up-to-date regarding the value types allowed in key-value pairs, but it is not very explicit about what's allowed in an array.

Arrays are square brackets with values inside.

Supposedly, that refers to the values inside a key/value pair, but that should be made explicit. Hence I'd suggest to replace the sentence

Data types may not be mixed (different ways to define strings should be considered the same type, and so should arrays with different element types).

with something like

All values that are allowed in key/value pairs are also allowed in arrays, but all values within an array must have the same data type. Different ways to define strings are considered the same type, and so are arrays with different element types, and inline tables regardless of their contents.

@ChristianSi
Copy link
Contributor

Also, to remove the conceptual oddity (arrays are brackets??), I'd suggest to replace

Arrays are square brackets with values inside.

with

Arrays are sequences of comma-separated values enclosed in square brackets.

README.md Outdated Show resolved Hide resolved
@eksortso
Copy link
Contributor Author

eksortso commented Sep 6, 2019

Arrays are sequences of comma-separated values enclosed in square brackets.

Not going to use this. It means exactly the same thing as "square brackets with values inside", with additional aspects spelled out in the following sentences.

@eksortso
Copy link
Contributor Author

eksortso commented Sep 6, 2019

All values that are allowed in key/value pairs are also allowed in arrays, but all values within an array must have the same data type. Different ways to define strings are considered the same type, and so are arrays with different element types, and inline tables regardless of their contents.

I like this, and the link back to the section containing the value type list is a good addition. It's going in.

@eksortso eksortso changed the title Change ABNF array definition to permit single types only Change ABNF array definition to permit single value types only Sep 6, 2019
Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following ABNF is equivalent and more readable:

array = array-open ws-comment-newline [ array-values ] ws-comment-newline array-close

array-values = array-strings
...

array-strings =  string ws array-sep ws-comment-newline array-strings
array-strings =/ string ws [ array-sep ]

README.md Show resolved Hide resolved
element types).
are separated by commas.

All values that are allowed in [key/value pairs](#user-content-keyvalue-pair)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's means... all values are allowed?

I'd prefer that we rephrase this paragraph as something along the lines of:

<prev-paragraph> All elements within an array must have the same data type.

All string values are considered the same type. All arrays (regardless of types of elements in them) are considered the same type. All inline tables (regardless of the key-value pairs in them) are considered the same type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pradyunsg "All values that are allowed in key/value pairs are also allowed in arrays..." – That (partial) sentence or something similar is essential because we have to explain which types are allowed in arrays (String, Integer, Float, Boolean, Offset Date-Time, Local Date-Time, Local Date, Local Time, Array, Inline Table) and which are not (Table, Array of Tables), and currently we don't.

It might be possible to rephrase it a bit, say: "An array collects values of any of the types that are allowed in key/value pairs..." But we must be explicit about this, and a backreference to the key/value pairs it much better than having to list all (currently ten) allowed types again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we provide a separate section for Value Types, so that we can pinpoint those types in the spec? Then we can include a backreference to that section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optionally, we could move the statements about string, array, and inline table types up to the proposed Value Types section. We'd simplify it, though, if we did that.


All string values are considered the same type. All arrays are considered the same type, regardless of content. All inline tables are considered the same type, regardless of content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said all that, let me propose the following synthesis of both of your suggestions. We'd also include a backreference from "value type".


Arrays are square brackets with values inside. Whitespace is ignored. Elements
are separated by commas.

All elements within an array must have the same value type.

(Examples go here.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrays are square brackets with values inside. Whitespace is ignored. Elements
are separated by commas.

All elements within an array must have the same value type.

LGTM.

@pradyunsg
Copy link
Member

Thanks @eksortso, @ChristianSi and @workingjubilee for the discussion on this PR. Much appreciated! :)

@ChristianSi
Copy link
Contributor

@eksortso:

With that degree of prosecutorial zeal, you could have made heterogeneous arrays the standard by now.

I don't know about that, but here is my attempt: #665. Support (or even disagreement) welcome.

@workingjubilee
Copy link
Contributor

I'll work up an amicus brief. 💖

@eksortso
Copy link
Contributor Author

I don't know about that, but here is my attempt: #665. Support (or even disagreement) welcome.

I'll support it. But, I'm going to offer a spin that may work for both single-type and heterogeneous varieties.

@eksortso
Copy link
Contributor Author

I'll work up an amicus brief.

Definitely looking forward to this!

@eksortso
Copy link
Contributor Author

@pradyunsg wrote:

I think the following ABNF is equivalent and more readable:

Definitely more readable. Equivalent, too. Consider it done.

array-values =/ array-floats
array-values =/ array-integers

array-strings = string ws array-sep ws-comment-newline array-strings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array-value-sep = ws array-sep ws-comment-newline
array-value-ending = ws [ array-sep ]

array-strings =  string array-value-sep array-strings
array-strings =/ string array-value-ending

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, conditional on a suggested ABNF simplification.

@ChristianSi
Copy link
Contributor

If #676 is merged, the proposed changes to the abnf would be obsolete. However, the changes to the README (update the list of allowed value types which is currently outdated) still remain relevant and are worth rescuing – maybe fork them off into a different PR?

@pradyunsg
Copy link
Member

Well, I just merged in #676.

I second the suggestion that 4a72cb0 (and friends) would be good to have. I leave it to @eksortso's judgement on whether to open a new PR or repurpose this one. :)

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We went the other way, and relaxed this constraint.

@pradyunsg
Copy link
Member

pradyunsg commented Nov 7, 2019

Actually, on second thought, I'll go ahead and close this PR -- the other changes aren't exactly related to the original intent of the PR and it'll be "cleaner" to start fresh vs continue work in the this PR.

@pradyunsg pradyunsg closed this Nov 7, 2019
@eksortso
Copy link
Contributor Author

eksortso commented Nov 8, 2019

@pradyunsg That's alright, mostly. Just as long as it moved one way or another, I'm glad, and I took up the cause to get things moving. But I sincerely ran out of time to work on this proposal.

I'll make time to possibly reintroduce some of the other concepts that made their way here. So the branch won't be deleted too soon.

@pradyunsg
Copy link
Member

No worries. I think I can pick it up. Thanks for getting things moving here! :)

ChristianSi added a commit to ChristianSi/toml that referenced this pull request Nov 8, 2019
The old list only mentions the obsolete Datetime time instead of the 4
date/time types we have today. Backlinks to each type are added as well.

This rescues a change from the abandoned PR toml-lang#663, authored by @eksortso.
@eksortso eksortso deleted the abnf-single-type-arrays branch January 30, 2020 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants