Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Links without Tsize #5

Open
achingbrain opened this issue Jan 26, 2021 · 5 comments
Open

Links without Tsize #5

achingbrain opened this issue Jan 26, 2021 · 5 comments

Comments

@achingbrain
Copy link
Member

It looks like this module can generate links without a Tsize property. The protobuf schema allows for this (along with not having a Name or a Hash) but we're previously decided that implementations should generate a value for Tsize.

Would you accept a PR making this field mandatory?

It means you won't be able to do things like:

const node = prepare({
  Links: [
    new CID('Qmfoo')
  ]
})

..but IPFS will not do that for the reasons in the linked thread above.

achingbrain added a commit that referenced this issue Jan 26, 2021
* Splits build step up into build:js and build:types
* Adds ts definitions through jsdoc comments
* Adds `prepare` script to build types before publishing and after `npm i`
* Makes `Tsize` property of DAGLinks mandatory (fixes #5)
achingbrain added a commit that referenced this issue Jan 26, 2021
* Splits build step up into build:js and build:types
* Adds ts definitions through jsdoc comments
* Adds `prepare` script to build types before publishing and after `npm i`
* Makes `Tsize` property of DAGLinks mandatory (fixes #5)
@rvagg
Copy link
Member

rvagg commented Jan 27, 2021

The intention of prepare() was to ensure properly formed but minimally supported data model forms that match the DAG-PB logical format schema: https://github.com/ipld/specs/blob/master/block-layer/codecs/dag-pb.md#logical-format

That schema also matches the new Go codec, https://github.com/ipld/go-codec-dagpb, so they're doing the same thing in accepting a missing Tsize both from decode and from encode.

But we could change that. I don't have any strong opinions other than consistency. We get to choose the lens that we want to view the PB as, and we landed on that schema in the spec after iterating around the various forms of it.

Two options for dealing with it can be found in what we did with Hash and Links:

  • Hash is mandatory in both encode and decode. Even though it's possible in the PB to omit it, we reject links that don't have a Hash because that's just not a link and as far as I'm aware nothing in the wild should be producing links with no Hash field (maybe that's incorrect and we'll need to backtrack on that decision!).
  • Links must always appear in the data model, even though it's optional in PB - there's no distinction between empty array and no array in the PB so we get to choose what it should be so we've said that it must appear in the PB and we make it appear out of thin air when it's not present in the bytes, because it's just nicer to be able to have the assurance that Links is always a thing in the data model rather than having to check whether it's null or not.

So we could choose to do something similar with Tsize. Is there a chance that there's lots of blocks in the wild with a missing Tsize? Because the problem is that then there'll be round-trip consistency issues when you take a block without a Tsize, assign it one and re-encode.

A half-way hack is to make prepare() just insert one but still have encode() (edit:) NOT reject if there isn't one, that's kind of what prepare() is there for. But we lose some symmetry with Go unless we encode these things in the spec.

@rvagg
Copy link
Member

rvagg commented Jan 27, 2021

Maybe a more focused response is - what are the chances that if we remove optional from this that we'll find blocks in the wild that have no Tsize at all? Do we have any code in Go or JS currently that might be producing such blocks? You're much more familiar with DAG-PB in the wild than me.

If we're looking at a near-0% chance, then we could just remove optional and then prepare() here can default to 0 and the encode() will be consistent with what the Go codec can do (once the schema is updated there).

If there's only a slightly above 0% chance, maybe we could forgo the round-trip consistency and say that we interpret a missing one as 0 and if you're dealing with a block then you just don't get the ability to go back to those "incorrect" bytes because we're going to stick a 0 in there for you. Round-trip consistency doesn't usually matter a whole lot in practice, especially if you're dealing with edge-case data.

@rvagg
Copy link
Member

rvagg commented May 17, 2021

maybe we should export a util that can make working with Tsize easier, like what's brought up in ipfs/js-ipfs-repo-migrations#98, prepare and validate could be moved into those as well (tho that'd be a breaking change)

const { totalSize } = require('@ipld/dag-pb/util')

// totalSize(links: PBLink[]): number

const size = totalSize(node.links)

I'm not sure what else would be useful but there's probably some other patterns that could be generalised

@achingbrain
Copy link
Member Author

Is there a chance that there's lots of blocks in the wild with a missing Tsize?

AFAIK IPFS implementations have only ever created links with Tsize fields but the schema allows it so I guess it could be encountered.

@rvagg
Copy link
Member

rvagg commented May 28, 2021

I think the TODO here is to enumerate, or just PR, utility functions that make this all much easier. Such as #5 (reference)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants