Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review & publish please #1

Closed
rvagg opened this issue Oct 20, 2020 · 7 comments
Closed

Review & publish please #1

rvagg opened this issue Oct 20, 2020 · 7 comments

Comments

@rvagg
Copy link
Member

rvagg commented Oct 20, 2020

@Gozala @mikeal would you two mind having a quick review of this and then if it's good to go, publish it @mikeal?

There's 96 different hashers included here so there's an extra build step on top of what we normally have. Two files come out of this blake2b.js and blake2s.js, which contain the actual hashers. They're each named exports on @multiformats/blake2/blake2b and @multiformats/blake2/blake2s, and they also roll up into {blake2b,blake2s} on @multiformts/blake2. Let me know what you think of that approach.

@rvagg
Copy link
Member Author

rvagg commented Oct 20, 2020

Here's consuming it https://github.com/ipld/js-datastore-car/blob/rvagg/storage-iface/example-verify-car.js - this script is a CAR file verifier using the new multiformats and the new block storage abstractions and it verifies Filecoin blocks (DAG-CBOR + BLAKE2b-256).

@Gozala
Copy link

Gozala commented Oct 20, 2020

@rvagg I don't think I have much to contribute here, but here some notes (I don't seems to have push access here so I was unable to create blank branch to create a pull against, so commenting here instead):

Neither imported module appears to have default export so I think this will either error at import tyme or result to undefined exports:

js-blake2/blake2.js

Lines 3 to 6 in 2e7f5fc

import blake2b from '@multiformats/blake2/blake2b.js'
import blake2s from '@multiformats/blake2/blake2s.js'
export { blake2b, blake2s }

Aside: Too bad https://www.sweetjs.org/ did not took off 😞

js-blake2/build.js

Lines 23 to 27 in 2e7f5fc

export const blake2${mode}${bitLength} = from({
name: 'blake2${mode}-${bitLength}',
code: 0x${code.toString(16)},
encode: (input) => bytes.coerce(blake2${mode}(input, null, ${length}))
})`)

Aside: Note on @ts-check comments (for which I bare responsibility), they don't really do anything other than tell vscode try to validate things here. I think it would be amazing if we could replicate this https://github.com/multiformats/js-multiformats/pull/41/files.

FWIW typedefs for js-ipfs was highly request feature which is about to land ipfs/js-ipfs#3281. I'm also don't mind sending a pull request to do add it afterwards.

@Gozala
Copy link

Gozala commented Oct 20, 2020

Off topic: Following lines further reinforces my believe that we do lack a named abstraction for {cid, bytes} pair otherwise those checks here would have been obsolete:

  for await (const { bytes, cid } of reader.blocks()) {
    if (cid.code !== dagCbor.code) {
      console.log('Unexpected codec: %d', cid.code)
      process.exit(1)
    }
    if (cid.multihash.code !== blake2b256.code) {
      console.log('Unexpected multihash code: %d', cid.multihash.code.code)
      process.exit(1)
    }
    // ...
    const obj = dagCbor.decode(bytes)

And the DAGDecoder / concept introduced here was intended to take care of that as well (inlining below)`:

export interface DagDecoder<Codec extends number, Algorithm extends number, T> {
  codec: Decoder<Codec, T>
  hasher: MultihashHasher<Algorithm>

  decode(block: BlockView<Codec, Algorithm, T>): Block<T>

  or<C extends number, A extends number>(other: DagDecoder<Codec | C, Algorithm | A, T>): DagDecoder<Codec | C, Algorithm | A, T>
}

export type BlockView<Codec extends number, Algorithm extends number, T> =
  // In most cases CID for the encoded blocks is known, Which is why it
  // is represented as pair of cid and bytes.
  | { cid: CID, bytes: ByteView<T> }
  // In case where CID is unknown it still can be represented using `code`
  // and `bytes`.
  | { cid?: undefined, code: Codec, alogrithm?: Algorithm, bytes: ByteView<T> }

@rvagg
Copy link
Member Author

rvagg commented Oct 21, 2020

Re @ts-check etc. I actually started this fork with all of your types stuff from js-multiformats, intending to figure out how to make it work. But I got tangled in the mess and then bailed on the whole thing. I'd love if you could help get it into this repo, doing it on this smaller scale would be a good learning experience for me--it's quite complex in js-multiformats!

I've fixed up the main blake2.js file imports (and export) and I've also added blake2s.js and blake2b.js to the repo, just with a note at the top that they are generated.

Regarding {cid, bytes}, I've been having this argument with @mikeal about wanting a concrete, shared, type that encapsulates both things. His preference being that abstractions like CarIterator should rely on convention and just return {cid, bytes} and let consumers wrap them in whatever they like. It boils down to the similar argument about undecorated {encode,decode,code} for codecs. I suppose having a BlockView to wrap the two things is fine, but maybe @mikeal would just prefer I consume js-block for this, but it feels like a very heavy-handed fit when I'm simply doing validation.

I think we're getting closer, but this series of APIs and their connectivity still doesn't feel right and I'm not comfortable with relying on convention so heavily when we have so many separate moving pieces.

@Gozala
Copy link

Gozala commented Oct 21, 2020

Regarding {cid, bytes}, I've been having this argument with @mikeal about wanting a concrete, shared, type that encapsulates both things. His preference being that abstractions like CarIterator should rely on convention and just return {cid, bytes} and let consumers wrap them in whatever they like.

I'm not necessarily disagreeing with @mikeal here. When I say type e.g. type BlockViewWithCID = { cid:CID, bytes: Uint8Array } it does not imply underlying class, but rather anything that matches that shape / convention.

What I meant specifically is that we need the layer that operates on that kind of type (or call it a convention if you will). Because currently we have codecs that do Uint8Array <-> JSValue and I think we need another layer of codecs that operate on { cid, bytes } <-> Block. Those also could be autonomous like multibase codecs are in that they have enough information to decode / encode and either use appropriate codec & hasher or fail with meaningful error on mismatch or lack of necessary codec / hasher.

@Gozala
Copy link

Gozala commented Oct 21, 2020

In other words I think following lines:

async function run () {
  const inStream = fs.createReadStream(process.argv[2])
  const reader = await CarIterator.fromIterable(inStream)
  let count = 0
  for await (const { bytes, cid } of reader.blocks()) {
    if (cid.code !== dagCbor.code) {
      console.log('Unexpected codec: %d', cid.code)
      process.exit(1)
    }
    if (cid.multihash.code !== blake2b256.code) {
      console.log('Unexpected multihash code: %d', cid.multihash.code.code)
      process.exit(1)
    }

    const obj = dagCbor.decode(bytes)
    const reenc = dagCbor.encode(obj)
    const hash = await blake2b256.digest(bytes)
    const recid = CID.create(1, dagCbor.code, hash)

    if (!recid.equals(cid)) {
      console.log(`\nMismatch: ${cid} <> ${recid}`)
      console.log(`Orig:\n${toHex(bytes)}\nRe-encode:\n${toHex(reenc)}`)
    } else {
      if (count++ % 100 === 0) {
        process.stdout.write('.')
      }
    }
  }
}

Would have turned into:

async function run (dagDecoder) {
  const inStream = fs.createReadStream(process.argv[2])
  const reader = await CarIterator.fromIterable(inStream)
  let count = 0
  for await (const blockView of reader.blocks()) {
    try {
     dagDecoder.decode(blockView)
    } catch (error) {
      // Error message already explains the missmatch
      console.error(error)
      process.exit(1)
    }

    if (count++ % 100 === 0) {
      process.stdout.write('.')
    }
  }
}

Note: dagDecoder there is composition of decoders so it can decode blocks in variable encodings

@rvagg
Copy link
Member Author

rvagg commented Oct 21, 2020

but dagDecoder isn't something that exists yet is it?

btw, the purpose of that example came from a desire to check whether blocks in the CAR could properly round-trip, not just that they could be decoded. It was originally for checking Filecoin blocks and included a failure with the genesis block not round-tripping properly, hence all the extra complicated bits (that could probably be explained in the file since it does look over-complicated!).

@rvagg rvagg closed this as completed Oct 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants