-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add code for UCAN ipld codec #264
Conversation
Add a code for [UCANs](https://github.com/ucan-wg/spec/)
Drive-by (ignorant) questions:
|
Hey @lidel I should have included more pointers myself, glad you've asked
I'm posting links to more in depth answers, but short version is: Codec uses dual representation, CBOR as a primary and JWT bytes (base64 enveloped) as a secondary. Our library would always produce CBOR variant but will be able to interop with any valid UCAN by representing / parsing them in secondary representation. https://hackmd.io/@gozala/dag-ucan
@rvagg asked the same question so I'm getting an impression that there is a hesitance to add codes for formats that can be represented by existing codecs. If that is the case, would be good to document rational there, as there very well maybe a compelling reason not to. Primary reason for dedicated code is we want to have UCAN CIDs that can be distinguished from arbitrary CBOR without trying to decode one as such and a flexibility to upgrade representation as we learn more in using them. Other than that dag-cbor library very well could be another compound |
That comes quite often in multicodec discussion. The question is, what shoukd the "content identifier" used for. For me it should give a hint on how to decode the data. It should not be about the semantic meaning, or where the data originated from. It's basically the information "how do I get links out of this blob:. There was a quite similar case were I responded in longer form: #204 (comment) |
The test for a new codec vs dag-cbor should be “does the data roundtrip cleanly through the existing dag-cbor encoder/decoder?” Anything beyond that test gets into very subjective opinions about “what codecs are for.” My understanding, based on the thread so far, is that dag-ucan does NOT roundtrip cleanly through the existing dag-cbor encoder/decoder because of the dual representation it has for compatibility with JWT based UCANs. So it should get a codec allocation (in a relatively high range). |
I'm not sure that's something we've agreed to or formalised anywhere; and the main problem with this goes back to the long discussions about schemas and ADLs, where they're solving those kinds of issues at a layer (or three) above the codec. Transformations of data shouldn't really be a codec concern in general. Because we've punted on all of that in the JS stack we don't have very good tools to deal with some of these things, but they're starting to mature and be used in production in the Go stack. I think the discussions we keep on having here about codecs (particularly in the context of the multicodec in CIDs) is more about trying to push back against the use of the codec code as a signalling mechanism for anything other than what function to pass the bytes through to yield data model. Like if Filecoin wanted their own codec entry to say that "these bytes are dag-cbor, but they're specifically used within the Filecoin system". So, in the context of UCAN that might apply if this code is being requested as a mechanism to signal that "these bytes are dag-cbor but will yield a UCAN shaped data structure in the data model". That's not really what CIDs (at least in the limited CIDv1 form) are supposed to do. That kind of signalling is a separate problem that should be solved by other mechanisms within a system (usually that context simply comes from where it's used, e.g. "data linked by this property is always a UCAN" - and schemas help formalise this too). .. back to the old discussion - do we want proliferation of codecs because everyone wants a dedicated code to signal data belongs to their system even though it's all dag-cbor (or whatever) - or are we interested in providing solutions above the codec layer. Opening the door to using the codec code in a CID to signal the specific data structure and use of the data rather than the basic decoder to be used is going to lead to a lot more codec requests. Perhaps that's OK, but we're going to have to be OK with solving the set of problems that comes with, like how we get all those codecs working in our various technologies like go-ipfs, Filecoin and friends (WASM? codec alias lookups? ...?). One of the main drives behind Schemas (and ADLs) was to shift this problem up a layer. |
If understanding it correctly what you're proposing here is that codecs in CIDv1 are basically to signal intermediate representation (IR) of the block. Signaling final representation (at least in JS) will be solved by schemas someday in the future. This is reasonable position however as far as I can tell it does not address case where multiple underlying IRs could be I am also somewhat doubtful of the proposition that "context simply comes from where it's used". Our context is we get CAR files from folks with arbitrary blocks, we could decode every single one and then try to match known set of shapes but it seems that cheaper tagging mechanism would be a better option. For what it's worth I was tempted to tag cbor encoded bytes with multicode instead to have that second layer signalling, but that would make it non cbor. Maybe there could be that kind of second layer signaling on top of IR ? |
Sorry for the long post (and the related comment in #204 (comment)), I hope it's helpful/clarifying. @Gozala my comments and questions posted are advisory and meant to be of use to you and your project not to blocking the allocation of a high range code. Just trying to help you see the potential landmines along the way and avoid them if easy enough 😄.
Correct. Quoting from the IPLD Specs https://github.com/ipld/ipld/blob/master/docs/codecs/index.md?plain=1#L10-L11 "IPLD codecs are functions that transform IPLD Data Model into serialized bytes so you can send and share data,
@Gozala perhaps a stupid question. Why not propose some codec like Describing your setup as having two IRs where one of the "IRs" is just base64 encoded bytes feels wrong it's not really an IR at all, but the base serialized representation. It seems like doing this and then performing validation on top (e.g. using schemas, but whatever works for you) would be straightforward.
Correct, there are other slots in the IPLD stack where such layering could be appropriate. Some examples include:
Note: If I understood how UCAN's work correctly then having a |
That is more or less what implementation does it is effectively two codecs composed. However not every UCAN can be represented in That is to say when you do |
I am not sure what would be a more accurate way to describe this, but broadly speaking there are two representation one that retains whitespaces, key order, quote types etc... and other that does not. How those two are manifested in practice is probably less important, although I'm happy to learn better ways. |
After discussing this yesterday, I went back and changed implementation to make it an ADL that:
Library still provides codec interface but it will encode / decode into one or the other representation. Additionally it provides specialized CBOR codec that basically enforces schema and RAW codec which mostly just provides UCAN specific view of underlying byte array. Overall I think this end up been an improvement but here are pros and cons as I see them
|
I actually went back and force on this so at some point in time I used same code and at other times different. This is actually what swayed me to trying ADL route. |
This is interesting point. I think there is no real reason why current implementation needs to be tied to CBOR it could use DAG-JSON just the same. |
Closing since I end up going with ADL route instead |
@Gozala Do you have an example of what the ADL approach for this could look like? |
@oed readme here https://github.com/ipld/js-dag-ucan attempts to describe it although I'm not sure if this is ADL in classical terms (which I think fairly loosely defined) this is how I've described them elsewhere
I said they're probably not ADLs in classical terms because they can't be made codec agnostic and here is some context on why ipld/js-dag-ucan#23 |
Add a code for UCANs