-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should encode error on base64pad byte strings? #106
Comments
I'm not wise enough to know if this change is a good idea, but base64pad strings as bytes values change the encoded form and are forbidden per the spec. Notably it will make some exiting IPNI advertisements unencodable. But in principle, the encoder should probably warn the user when they try an encode an illegal bytes string. > The Base64 encoding is the one described in [RFC 4648, section 4](https://tools.ietf.org/html/rfc4648#section-4) **without padding.** > – https://ipld.io/specs/codecs/dag-json/spec/#bytes fixes: #106 License: MIT Signed-off-by: Oli Evans <oli@protocol.ai>
Whoa boy, let me regale you with a tale of joy and woe... Once upon a time, there was a dag-json spec that said exactly what you've pointed out, and it was implemented strictly in JavaScript. In Go, however, dag-json was incomplete and didn't emit Bytes. But, dag-json was wanted in Go for .. I believe the early version of the indexer. So dag-json was completed in Go: ipld/go-ipld-prime#165 -> ipld/go-ipld-prime#166 Then, shortly after, it was discovered during the effort to produce cross-language fixutres and testing, that this implementation accidentally used the padded form, rather than the unpadded form it should do according to spec. So that was quickly addressed: Unfortunately, data produced during the in-between times, was set lose, running wild and free, but could no longer be properly parsed by the newly re-strictified implementation. So this forced a sloppy edge in the implementation that we are now trapped with: This hasn't made it to the spec, I probably couldn't bring myself to it because I was grumpy about the whole incident after working so hard to nail down dag-json so precisely. But it is what it is, this library needs to accept sloppy base64, the spec needs to be updated to say that decoders should allow it, but encoders shouldn't produce it. As for why the indexer is producing this, that's a bit of a mystery that I think I'll need to hunt down, because that's kind of odd, I guess they're not using the canonical dag-json codec to encode. |
@olizilla where is the padded form coming from that you pasted above? I thought you were saying this is coming out of ipni but maybe not, I'd like to find out what's producing this and see if we can get it fixed. There was only a brief period between go-ipld-prime v0.10.0 and v0.11.0 that was producing dag-json like this and the JS stack never has. |
Nevermind, I found it after hopping through your PR @ storacha/ipni#1 to https://github.com/ipni/specs/blob/main/IPNI.md#example-responses, which is wrong. |
No, I take that back, the spec doc is all about json not dag-json; so I don't really understand where dag-json is getting involved. I think they're just accepting plain json ingests using the default Go JSON decoder: https://github.com/ipni/go-libipni/blob/81286e4b32baed09e6151ce4f8e763f449b81331/ingest/model/ingest_request.go#L51 which is kind of quirky because Go has its nonstandard opinions about bytes, which it can do because it has a built-in typed schema, that a field that should decode as bytes will be encoded as padded base64. This is incompatible with dag-json, it's custom Go, so I guess you're going to have to write a custom JSON encoder for it, or find someone who's got a nice JS adapter for Would be kind of nice if they accepted dag-json for nicer cross-language compatibility though. |
Oof. That is rough. The offending article is the e-ipfs publisher that writes advertisements for multihashes DAG Haus has. It is hairy! And in trying to upgrade it I found this curious method of picking a context ID. https://github.com/elastic-ipfs/publisher-lambda/blob/main/src/handlers/advertisement.js#L190 And a few lines below it gets dag-json encoded to create the cid for it. It's annoying for me as I can't fully validate my replacement encoder, but we can fix it the output for new ones at least. Thanks for the background! |
I don't think this is actionable, unfortunately, unless you want to suggest a "strict" option, or some mode that catches these, although we've not jumped over that line with our other codecs yet so I'm not clear on what the API looks like, probably something like a separate export, |
I've found ipni advertisements with a base64pad string where a base64 string should be...
It gets dag-json encoded to produce the advertisement CID. I'm working on an encoder for that and was trying to write tests that proved it would produce the same output and encode to the same bytes as the code that generated that thing, but i don't think it's possible as I've stuck to using typedarray values rather than the exciting hand encoded form.
Given the spec:
Should dag-json error if given a base64pad value for a bytes key to prevent this sort of madness?
The text was updated successfully, but these errors were encountered: