Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dagcbor: coerce undef to null. #308

Merged
merged 2 commits into from
Dec 7, 2021
Merged

Conversation

warpfork
Copy link
Collaborator

@warpfork warpfork commented Dec 6, 2021

CBOR's "undefined" token -- byte 23; see https://datatracker.ietf.org/doc/html/rfc8949#section-5.7 or https://datatracker.ietf.org/doc/html/rfc8949#section-3.3 table 4 -- is not generally considered part of DAG-CBOR (which is generally a subset of CBOR).

However, it may be desirable to allow parsing data flagged as DAG-CBOR while coercing any "undefined" tokens to a "null", even if considering it noncanonical.

Because DAG-CBOR was not well specified early on, and because libraries existed for a considerable period of time which would readily emit data that we would not call DAG-CBOR, data exists in the wild which encoded these tokens, and links to them with an indicator that it's DAG-CBOR. We may wish to continue allowing these documents to be parsed. (Related: ipld/js-dag-cbor#44)

This does not mean that our DAG-CBOR codec will emit "undefined" tokens. That would not be neither canonical nor valid DAG-CBOR, so we don't want to; and it would also not be possible to describe with this library in the first place (it's not in the IPLD data model; so how would you?).

This does not mean that you can read DAG-CBOR data with this codec and see "undefined" tokens distinctly. Again, they're not in the IPLD data model, so this library API (intentionally) does not have the ability to describe that. With this change, an "undefined" token will be allowed by the parser, but will be described as "null" in terms of the IPLD data model, and not be distinguishable from any other null.

A change to the DAG-CBOR specs may be required if we decide to embrace this behavior and expansion of what we consider non-canonical-but-valid DAG-CBOR.

CBOR's "undefined" token -- byte 23;
see https://datatracker.ietf.org/doc/html/rfc8949#section-5.7
or https://datatracker.ietf.org/doc/html/rfc8949#section-3.3 table 4 --
is not generally considered part of DAG-CBOR (which is generally a
subset of CBOR).

However, it may be desirable to allow parsing data flagged as DAG-CBOR
while coercing any "undefined" tokens to a "null", even if considering it
noncanonical.

Because DAG-CBOR was not well specified early on, and because libraries
existed for a considerable period of time which would readily emit data
that we would not call DAG-CBOR, data exists in the wild which encoded
these tokens, and links to them with an indicator that it's DAG-CBOR.
We may wish to continue allowing these documents to be parsed.

This does *not* mean that our DAG-CBOR codec will emit "undefined" tokens.
That would not be neither canonical nor valid DAG-CBOR, so we don't want to;
and it would also not be possible to describe with this library in the first place
(it's not in the IPLD data model; so how would you?).

This does *not* mean that you can *read* DAG-CBOR data with this codec
and see "undefined" tokens distinctly.  Again, they're not in the IPLD data model,
so this library API (intentionally) does not have the ability to describe that.
With this change, an "undefined" token will be allowed by the parser, but will
be described as "null" in terms of the IPLD data model, and not be distinguishable
from any other null.

A change to the DAG-CBOR specs may be required if we decide to embrace
this behavior and expansion of what we consider non-canonical-but-valid DAG-CBOR.
(Made a one-off test for this.  It should also be a fixture in
the ipld/ipld metarepo, but we don't already have those wired up,
so I'm making the more incremental diff today.)
@willscott
Copy link
Member

noting that the actual logic change appears to be from 2018 in polydawn/refmt#45

@warpfork warpfork merged commit 0d1b9a3 into master Dec 7, 2021
@warpfork warpfork deleted the dagcbor-coerce-undef-to-null branch December 7, 2021 19:24
rvagg added a commit to ipld/js-dag-cbor that referenced this pull request Dec 8, 2021
rvagg added a commit to ipld/js-dag-cbor that referenced this pull request Dec 8, 2021
@aschmahmann aschmahmann mentioned this pull request Dec 8, 2021
80 tasks
rvagg added a commit to ipld/js-dag-cbor that referenced this pull request Dec 13, 2021
For legacy data since this has been allowed by dag-cbor for a long time.
New data should _not_ take advantage of this, `undefined` should not appear
in IPLD data structures as it is not part of the IPLD data model.

Fixes: #44
Ref: ipld/go-ipld-prime#308
rvagg added a commit to ipld/js-dag-cbor that referenced this pull request Dec 13, 2021
For legacy data since this has been allowed by dag-cbor for a long time.
New data should _not_ take advantage of this, `undefined` should not appear
in IPLD data structures as it is not part of the IPLD data model.

Fixes: #44
Ref: ipld/go-ipld-prime#308
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants