Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deviations from the spec in the kdl4j test suite #121

Closed
Tracked by #112
larsgw opened this issue Sep 1, 2021 · 12 comments
Closed
Tracked by #112

Deviations from the spec in the kdl4j test suite #121

larsgw opened this issue Sep 1, 2021 · 12 comments
Labels
bug Something isn't working

Comments

@larsgw
Copy link
Contributor

larsgw commented Sep 1, 2021

When I implemented the kdl4j test suite in kdljs I came across a few deviations from the spec. Since some of them are sensible I wanted to discuss them here.

Multiline comments in node-space

The spec does currently not allow for the following, neither in the grammar or the spec text:

node /* comment */ "arg"

Single-line comments at EOF

The spec currently requires a single-line comment to be ended by a newline, which makes the following invalid (without a trailing newline):

// hi

Esclines between /- and props/values/children blocks

The spec currently only allows for pure whitespace between /- and the thing it is commenting out. Therefore, the following is invalid (I would definitely agree with the spec here, I just added it for completeness):

node /-    \
    "arg"

Escline outside node-space

Esclines can currently only be in node-space, so the following is invalid (again, I agree):

node1
  \// hey
   node2

Large numbers

The question also arose on how to deal with large numbers. Other than considering BigInt and BigDecimal, parsing large integers as Infinity seems relatively fine to me. However, a problem arises when Infinity should be formatted. Should Infinity, -Infinity and NaN be allowed for full float compatibility?

@zkat
Copy link
Member

zkat commented Sep 1, 2021

Multiline comments in node-space
Single-line comments at EOF

These all smell like spec bugs to me. What do you think? Should we fix this in spec, or consider them quirks?

Esclines between /- and props/values/children blocks

I'm on the fence about this one. I feel like we should be treating esclines as if they were single-line spaces? Maybe we should fix the spec here, even though this is a weird corner case.

Escline outside node-space

That doesn't look like a linespace to me. Which makes me realize that /*foo*/ should be valid because multi-line comments are allowed as part of linespace???

Large numbers

I've been thinking about this one a bit. What do you think of saying "All numbers that, when interpreted as 64-bit IEEE 754 floating point numbers MUST be losslessly accepted by KDL parsers. Larger (or "smaller", if negative) numbers MAY be supported by individual KDL implementations, but are not guaranteed to be portable."?

@zkat zkat added the bug Something isn't working label Sep 1, 2021
@hkolbeck
Copy link
Member

hkolbeck commented Sep 1, 2021

Apologies for the misaligned tests, but I'm glad they're exposing some worthwhile corner cases. It's definitely worth some close eyes on the test suite, as it's fallen out of sync with the spec due to my not having time to keep kdl4j up to date. A few folks expressed interest in helping out there and I proposed updating the test suite as the first step, but as far as I can tell nothing has been pushed up so far.

On the large number question, I think it's worth discussing the full gap between a float64 and a BigDecimal, specifically including a mention of underflow.

@zkat
Copy link
Member

zkat commented Sep 1, 2021

@hkolbeck no worries. The spec has actually changed in subtle ways that actually probably caused this, since you wrote these tests. This is to be expected, really.

As far as float64/BigDecimal goes, I'd love to hear more of your thoughts, taking into account what I suggested about large numbers in my post above?

@hkolbeck
Copy link
Member

hkolbeck commented Sep 1, 2021

I'd phrase things something like "Implementations must accept, store, and roundtrip any value representable as a 64 bit IEEE754 floating point number exactly. Any number not representable exactly may be rounded to the nearest representable value or stored and roundtripped exactly at the author's discretion."

That's admittedly a bit clumsy still, and it might be worthwhile to follow with examples of values not representable exactly.

@hkolbeck
Copy link
Member

hkolbeck commented Sep 1, 2021

One wrinkle: I'm not sure how universal language handling of float rounding is, and it may be the specifying nearest-ties-to-even may make implementations in some languages difficult.

@zkat
Copy link
Member

zkat commented Sep 1, 2021

I'd rather not require roundtripping, because a lot of KDL implementations won't necessarily have writers, but I think the direction of requiring that, if they had a writer, they could is a good one.

I'm also kinda unconcerned with "larger numbers" and I think it's 100% fine to say "here be dragons"? Is that bad?

@hkolbeck
Copy link
Member

hkolbeck commented Sep 1, 2021

Good point about not all implementations having writers, though I'd lean toward making that a requirement for those that do. I've definitely been burned by wonky float handling in libraries before, so I'm pretty strongly inclined to be as explicit as can reasonably be done in a few sentences.

zkat added a commit that referenced this issue Sep 1, 2021
@zkat
Copy link
Member

zkat commented Sep 1, 2021

I've started a PR so we can wordsmith: #122

@zkat zkat mentioned this issue Sep 2, 2021
12 tasks
@zkat
Copy link
Member

zkat commented Sep 2, 2021

Ok looking into this more:

node /* comment */ "arg"

This is legal, per the following spec grammar items:

node-space := ws* escline ws* | ws+
...
ws := bom | unicode-space | multi-line-comment
...
multi-line-comment := '/*' (commented-block | multi-line-comment) '*/'

So, that seems fine to me.

@zkat
Copy link
Member

zkat commented Sep 2, 2021

I've fixed the single-line-comment thing in #126

zkat added a commit that referenced this issue Sep 2, 2021
zkat added a commit that referenced this issue Sep 2, 2021
@zkat
Copy link
Member

zkat commented Sep 2, 2021

Alright, with #127 I think I've addressed everything here.

As per #122, we're not going to continue not specifying any representation for numbers, so they're not actually guaranteed to round-trip. That limits the usability of the test suite, but folks can still use the input part to make sure everything is parsing correctly, and just generate their own output relevant to their own parsers/writers.

So, I'm gonna close this :)

@zkat zkat closed this as completed Sep 2, 2021
@larsgw
Copy link
Contributor Author

larsgw commented Sep 2, 2021

This is legal, per the following spec grammar items:

Oh I definitely missed that, my bad. (I thought it was weird that it wasn't allowed, I guess I should have looked better)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants