Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should encode error on base64pad byte strings? #106

Closed
olizilla opened this issue May 10, 2023 · 6 comments
Closed

Should encode error on base64pad byte strings? #106

olizilla opened this issue May 10, 2023 · 6 comments

Comments

@olizilla
Copy link
Member

I've found ipni advertisements with a base64pad string where a base64 string should be...

{
  "ContextID": {
    "/": {
      "bytes": "YmFndXFlZXJhNHZkNXR5Ymd4YXViNGVsd2FnNnY3eWhzd2hmbGZ5b3BvZ3I3cjMyYjdkcHQ1bXFmbW1vcQ=="

It gets dag-json encoded to produce the advertisement CID. I'm working on an encoder for that and was trying to write tests that proved it would produce the same output and encode to the same bytes as the code that generated that thing, but i don't think it's possible as I've stuck to using typedarray values rather than the exciting hand encoded form.

Given the spec:

The Base64 encoding is the one described in RFC 4648, section 4 without padding.
– https://ipld.io/specs/codecs/dag-json/spec/#bytes

Should dag-json error if given a base64pad value for a bytes key to prevent this sort of madness?

olizilla added a commit that referenced this issue May 11, 2023
I'm not wise enough to know if this change is a good idea, but base64pad strings as bytes values change the encoded form and are forbidden per the spec.

Notably it will make some exiting IPNI advertisements unencodable. But in principle, the encoder should probably warn the user when they try an encode an illegal bytes string.

> The Base64 encoding is the one described in [RFC 4648, section 4](https://tools.ietf.org/html/rfc4648#section-4) **without padding.**
> – https://ipld.io/specs/codecs/dag-json/spec/#bytes

fixes: #106

License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@rvagg
Copy link
Member

rvagg commented May 11, 2023

Whoa boy, let me regale you with a tale of joy and woe...

Once upon a time, there was a dag-json spec that said exactly what you've pointed out, and it was implemented strictly in JavaScript. In Go, however, dag-json was incomplete and didn't emit Bytes. But, dag-json was wanted in Go for .. I believe the early version of the indexer. So dag-json was completed in Go:

ipld/go-ipld-prime#165 -> ipld/go-ipld-prime#166

Then, shortly after, it was discovered during the effort to produce cross-language fixutres and testing, that this implementation accidentally used the padded form, rather than the unpadded form it should do according to spec. So that was quickly addressed:

ipld/go-ipld-prime#216

Unfortunately, data produced during the in-between times, was set lose, running wild and free, but could no longer be properly parsed by the newly re-strictified implementation. So this forced a sloppy edge in the implementation that we are now trapped with:

ipld/go-ipld-prime#309

This hasn't made it to the spec, I probably couldn't bring myself to it because I was grumpy about the whole incident after working so hard to nail down dag-json so precisely. But it is what it is, this library needs to accept sloppy base64, the spec needs to be updated to say that decoders should allow it, but encoders shouldn't produce it.

As for why the indexer is producing this, that's a bit of a mystery that I think I'll need to hunt down, because that's kind of odd, I guess they're not using the canonical dag-json codec to encode.

@rvagg
Copy link
Member

rvagg commented May 11, 2023

@olizilla where is the padded form coming from that you pasted above? I thought you were saying this is coming out of ipni but maybe not, I'd like to find out what's producing this and see if we can get it fixed. There was only a brief period between go-ipld-prime v0.10.0 and v0.11.0 that was producing dag-json like this and the JS stack never has.

@rvagg
Copy link
Member

rvagg commented May 11, 2023

Nevermind, I found it after hopping through your PR @ storacha/ipni#1 to https://github.com/ipni/specs/blob/main/IPNI.md#example-responses, which is wrong.
Also test fixtures contain bad dag-json too: https://github.com/ipni/go-libipni/blob/81286e4b32baed09e6151ce4f8e763f449b81331/dagsync/httpsync/sync_test.go#L34
We probably need to get these sorted out.

@rvagg
Copy link
Member

rvagg commented May 11, 2023

No, I take that back, the spec doc is all about json not dag-json; so I don't really understand where dag-json is getting involved. I think they're just accepting plain json ingests using the default Go JSON decoder: https://github.com/ipni/go-libipni/blob/81286e4b32baed09e6151ce4f8e763f449b81331/ingest/model/ingest_request.go#L51 which is kind of quirky because Go has its nonstandard opinions about bytes, which it can do because it has a built-in typed schema, that a field that should decode as bytes will be encoded as padded base64. This is incompatible with dag-json, it's custom Go, so I guess you're going to have to write a custom JSON encoder for it, or find someone who's got a nice JS adapter for JSON.stringify to do Go-specific encoding of Uint8Arrays.

Would be kind of nice if they accepted dag-json for nicer cross-language compatibility though.

@olizilla
Copy link
Member Author

Oof. That is rough. The offending article is the e-ipfs publisher that writes advertisements for multihashes DAG Haus has. It is hairy! And in trying to upgrade it I found this curious method of picking a context ID.

https://github.com/elastic-ipfs/publisher-lambda/blob/main/src/handlers/advertisement.js#L190

And a few lines below it gets dag-json encoded to create the cid for it.

It's annoying for me as I can't fully validate my replacement encoder, but we can fix it the output for new ones at least.

Thanks for the background!

@rvagg
Copy link
Member

rvagg commented May 24, 2023

I don't think this is actionable, unfortunately, unless you want to suggest a "strict" option, or some mode that catches these, although we've not jumped over that line with our other codecs yet so I'm not clear on what the API looks like, probably something like a separate export, @ipld/dag-json/strict to use in place of the normal codec. I'd quite like this for dag-cbor at some point, it's been discussed much and we started work on a strict form in Go that you can opt-in to.

@rvagg rvagg closed this as completed May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants