Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

readableStream broken with raw leaves #1432

Closed
parkan opened this issue Jul 11, 2018 · 18 comments
Closed

readableStream broken with raw leaves #1432

parkan opened this issue Jul 11, 2018 · 18 comments
Labels
exp/expert Having worked on the specific codebase is important kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up

Comments

@parkan
Copy link
Contributor

parkan commented Jul 11, 2018

  • Version: v0.30.0
  • Platform: any
  • Subsystem: files

Type: Bug-ish

Severity: Moderate

Description

The browser-readablestream demo is a very nice way of handling large video files for seeking. Unfortunately, it appears to break when the files are added using --raw-leaves (required for filestore/urlstore) because it attempts to unmarshal the leaf nodes as unixfs protobufs.

(confirmed my suspicion about this with @achingbrain)

Steps to reproduce the error:

add video file with --raw-leaves, attempt to load with readableStream

Note

This is required for the IA-over-IPFS demo at DWeb Summit on August 1, let me know how I can help get it working by then

@alanshaw
Copy link
Member

@parkan do you have an error message or stack trace to help us track this one down?

@alanshaw alanshaw added kind/bug A bug in existing code (including security flaws) exp/expert Having worked on the specific codebase is important status/ready Ready to be worked P2 Medium: Good to have, but can wait until someone steps up labels Jul 12, 2018
@parkan
Copy link
Contributor Author

parkan commented Jul 12, 2018

error is:

decode.js:80 Uncaught TypeError: Cannot read property 'length' of undefined
    at Object.decode (decode.js:80)
    at Function.Data.unmarshal (index.js:101)
    at visitor (file.js:88)
    at eval (index.js:41)
    at drain (index.js:19)
    at eval (index.js:45)
    at dag.get (file.js:117)
    at _get (index.js:143)
    at eval (once.js:12)
    at next (waterfall.js:21)

with zdj7Wc9BBA2kar84oo8S6VotYc9PySAnmc8ji6kzKAFjqMxHS (or QmV3voixu7NXC6VChMYC2AmammEj5EBzsPhEKWmavPo47W in v0)

@daviddias
Copy link
Member

daviddias commented Jul 12, 2018

@alanshaw this happens when we try to do a unixfs.decode in unixfs-engine and the node is just a raw-leaf. //cc @achingbrain

@parkan can you write some interop tests that catch this?

@parkan
Copy link
Contributor Author

parkan commented Jul 13, 2018

@diasdavid sure, but I won't have any time before DWeb, can we merge a hotfix and get it covered after aug 3rd?

@daviddias daviddias added P1 High: Likely tackled by core team if no one steps up and removed P2 Medium: Good to have, but can wait until someone steps up labels Jul 14, 2018
achingbrain added a commit to ipfs-inactive/js-ipfs-unixfs-engine that referenced this issue Jul 17, 2018
Goes some way towards fixing ipfs/js-ipfs#1432 - will need follow up PRs for js-ipfs-mfs and js-ipfs itself (🔜).

There are three ways of importing a file we need to support and each will end up with slightly different DAG structure.

ipfs add will result in a balanced DAG with leaf nodes that are unixfs nodes of type file
ipfs files write results in a trickle DAG with leaf nodes that are unixfs nodes of type raw
ipfs add --raw-leaves and ipfs files write --raw-leaves have the balanced/trickle DAG of above, but the leaf nodes are chunks of file data not wrapped in protobufs.
In all cases above the root node is a unixfs file node with a v0 CID, unless you specify --cid-version=1.

This PR:

Changes meaning of existing rawLeaves argument. Now means the leaf node is just data - a chunk of the file, previously it was meant a unixfs node with type raw. So far the only code using this is js-ipfs-mfs so changing it shouldn't be too disruptive.
Adds a leafType option which can be file or raw - when --raw-leaves is false, this is what the unixfs leaf type will be.
Uses CIDv1 for raw leaves with the codec raw
achingbrain added a commit to ipfs-inactive/js-ipfs-unixfs-engine that referenced this issue Jul 17, 2018
Goes some way towards fixing ipfs/js-ipfs#1432 - will need follow up PRs for js-ipfs-mfs and js-ipfs itself (🔜).

There are three ways of importing a file we need to support and each will end up with slightly different DAG structure.

ipfs add will result in a balanced DAG with leaf nodes that are unixfs nodes of type file
ipfs files write results in a trickle DAG with leaf nodes that are unixfs nodes of type raw
ipfs add --raw-leaves and ipfs files write --raw-leaves have the balanced/trickle DAG of above, but the leaf nodes are chunks of file data not wrapped in protobufs.
In all cases above the root node is a unixfs file node with a v0 CID, unless you specify --cid-version=1.

This PR:

Changes meaning of existing rawLeaves argument. Now means the leaf node is just data - a chunk of the file, previously it was meant a unixfs node with type raw. So far the only code using this is js-ipfs-mfs so changing it shouldn't be too disruptive.
Adds a leafType option which can be file or raw - when --raw-leaves is false, this is what the unixfs leaf type will be.
Uses CIDv1 for raw leaves with the codec raw
achingbrain added a commit to ipfs-inactive/js-ipfs-unixfs-engine that referenced this issue Jul 18, 2018
Goes some way towards fixing ipfs/js-ipfs#1432 - will need follow up PRs for js-ipfs-mfs and js-ipfs itself (🔜).

There are three ways of importing a file we need to support and each will end up with slightly different DAG structure.

ipfs add will result in a balanced DAG with leaf nodes that are unixfs nodes of type file
ipfs files write results in a trickle DAG with leaf nodes that are unixfs nodes of type raw
ipfs add --raw-leaves and ipfs files write --raw-leaves have the balanced/trickle DAG of above, but the leaf nodes are chunks of file data not wrapped in protobufs.
In all cases above the root node is a unixfs file node with a v0 CID, unless you specify --cid-version=1.

This PR:

Changes meaning of existing rawLeaves argument. Now means the leaf node is just data - a chunk of the file, previously it was meant a unixfs node with type raw. So far the only code using this is js-ipfs-mfs so changing it shouldn't be too disruptive.
Adds a leafType option which can be file or raw - when --raw-leaves is false, this is what the unixfs leaf type will be.
Uses CIDv1 for raw leaves with the codec raw
@parkan
Copy link
Contributor Author

parkan commented Jul 23, 2018

mfs support for raw-leaves got merged: ipfs-inactive/js-ipfs-mfs#5 🎉

what remains to take this over the finish line?

@achingbrain
Copy link
Member

#1454 needs merging to pull in the new version, currently included in v0.31.0

@parkan
Copy link
Contributor Author

parkan commented Jul 24, 2018

@achingbrain ah nice didn't see that PR somehow, will +1

@alanshaw
Copy link
Member

@parkan #1454 was merged - are you able to sanity check against master to verify this is now working?

@parkan
Copy link
Contributor Author

parkan commented Jul 25, 2018

@alanshaw 👏 👏 👏 will check ASAP, probably first thing tomorrow

@alanshaw
Copy link
Member

credit where it's due - I pressed the button, @achingbrain did the work 👏 👏 👏

@parkan
Copy link
Contributor Author

parkan commented Jul 26, 2018

ok unfortunately I am super pulled in another direction today and can't test this comprehensively (with/without trickle, small/big files, across platforms, etc) but overall this seems to be working well!

only problem encountered so far is mitra reporting an "invalid node type" when using readableStream with small (single block) file @ zb2rhhEncXjn7PnqJ16mzfeug1bqWuupQ3PnkhnWLpAaDatiZ added using the default dag adder (same file added w/trickle @ zdj7Wi4b5djhAqLk5VfeFniwKMpZEFXjWwyiNCrip3keANZao works fine)

obviously trying to stream (or trickle add, for that matter) a single block file doesn't make much sense but maybe this highlights a corner case?

@alanshaw
Copy link
Member

only problem encountered so far is mitra reporting an "invalid node type" when using readableStream with small (single block) file @ zb2rhhEncXjn7PnqJ16mzfeug1bqWuupQ3PnkhnWLpAaDatiZ added using the default dag adder

@achingbrain do you have some time to take a quick look at this?

@achingbrain
Copy link
Member

achingbrain commented Jul 27, 2018

The node added via the trickle importer (zdj7Wi4b5djhAqLk5VfeFniwKMpZEFXjWwyiNCrip3keANZao) is a UnixFS node with one link to a raw leaf zb2rhhEncXjn7PnqJ16mzfeug1bqWuupQ3PnkhnWLpAaDatiZ:

$ ipfs dag get /ipfs/zdj7Wi4b5djhAqLk5VfeFniwKMpZEFXjWwyiNCrip3keANZao
{"data":"CAIY/h8g/h8=","links":[{"Cid":{"/":"zb2rhhEncXjn7PnqJ16mzfeug1bqWuupQ3PnkhnWLpAaDatiZ"},"Name":"","Size":4094}]}

If you try to play that leaf directly (as above) and as such use the ipfs.files.catReadableStream API, js-ipfs will try to use the exporter from js-unixfs-engine to get the bytes, expecting a UnixFS node and not a raw node, hence the invalid node type error.

@mitra42
Copy link

mitra42 commented Jul 27, 2018

Thanks @achingbrain - these are legacy hashes, from code that changed, where coming from a cache of the metadata somewhere completely different - and I'm now invalidating and rereading when I see them. Looks like nothing wrong with IPFS's handling presuming leaves like that shouldnt be playable.

@parkan
Copy link
Contributor Author

parkan commented Jul 27, 2018

yep can confirm that adding that file yields the bare leaf node zb2rhhEncXjn7PnqJ16mzfeug1bqWuupQ3PnkhnWLpAaDatiZ -- is that expected?

@mitra42
Copy link

mitra42 commented Jul 27, 2018

I guess so - that zb2 hash comes from the previous version of urlstore

@parkan
Copy link
Contributor Author

parkan commented Jul 27, 2018

question meant for @achingbrain mostly, adding the file by any means (with --cid-version=1) results in that

@alanshaw
Copy link
Member

Closing this now as 0.31 has been released. If there are any further issues please would you kindly open a new issue for it? Thank you ❤️

@ghost ghost removed the status/ready Ready to be worked label Jul 29, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
exp/expert Having worked on the specific codebase is important kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up
Projects
None yet
Development

No branches or pull requests

5 participants