Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

Fix for ipfs.files.add + added corresponding tests #104

Closed
wants to merge 3 commits into from
Closed

Fix for ipfs.files.add + added corresponding tests #104

wants to merge 3 commits into from

Conversation

bgrieder
Copy link

@bgrieder bgrieder commented Apr 1, 2016

This is a small fix to ipfs.files.add() with the corresponding tests

I noticed that in https://github.com/ipfs/js-ipfs-api#add, the equivalent call accepts a file or an array of files. Should this module follow a similar API ?
If yes, I can patch that as well however, when an array is provided, the API is silent on what happens when one of the adds fail: should it fail-fast and stop or should it continue and report an Error in the returned array for that file, etc...

Also I am willing to to work on ipfs.files.cat() and ipfs.files.ls() but I wonder where I should put the code. For add it is residing in https://github.com/ipfs/js-ipfs-data-importing. Should this package contain the exporting code too and be renamed accordingly or should the functionality be developed on top of the block methods in js-ipfs?

Finally, on a side note, I am pretty excited with what is going on with IPFS. I have been one of the active maintainer of JXTA which is (was) a full fledge P2P package in Java. My company is actually running a peer to peer network on it in production. I always wanted to do this in JS but never found the time. Kudos.

@jbenet jbenet added the backlog label Apr 1, 2016
@nginnever
Copy link
Member

Great ideas here @bgrieder! Thanks for putting in time on this. I am working on the same functions.

Something I did not think about was array inputs as well as buffer inputs. The data-importer can handle buffers so this shouldn't be a problem. I notice that the go-ipfs cli will report an error when a bad file path is provided in the array. So the cli does support multiple files in an array with a propagated error. I imagine the api should behave similarly to js-ipfs-api.

I am also working on the export functions for cat/add/ls. We are currently planning to add the export to the data-importing module as you stated above. My current implementation requires the export functions like this. const exporter = require('ipfs-data-importing').export I do think a module rename might be in order. Something like 'data-import-export'.

Your test LGTM but I would add browser tests as well a larger file and nested dir.

Feel free to drop by #ipfs on freenode and ping me, voxelot, if you'd like to work on this with me. I'm hoping to have these commands working this weekend.

@bgrieder
Copy link
Author

bgrieder commented Apr 1, 2016

@nginnever Great. I do not want to duplicate efforts but if I can help in any way, let me know.
We are 9 hours apart so the best opportunity is probably for me to review/test/add things if you need to while you sleep (if you do).

I will add a couple more tests

@nginnever
Copy link
Member

Sounds great! Would love to have an extra pair of eyes and ideas on what I'm writing and I'm sure @diasdavid would appreciate the help in CR of my stuff.

EDIT* Also feel free to improve the 'files add' without me. I only did what you did plus a check to see if it's a path being supplied which could be done in the 'is-ipfs' module we have.

I'll be working on the exporter function in 'data-importing' for the next 8 hours and then if you could put some ideas into my js-ipfs fork on my repo that would be awesome.

When you go to write the browser tests for add in something like test-files-browser.js, I think just doing a buffer input will be good enough until we hear otherwise. Since a browser doesn't have an FS to read a path to a file from.

Cheers!

@bgrieder
Copy link
Author

bgrieder commented Apr 2, 2016

Added nested directories + large file + buffer tests.
If we do not want to carry the "large" file (213kb) in Git, we can change the test to download it from somewhere; however that 'somewhere' must remain stable over time.

I cannot get a browser test going for the moment because simply re-enabling 'ipf-data-importing' in the webpack conf of the karma.conf ends up with an illegal attempt to call fs.readFileSync. I have not yet investigated this

@daviddias
Copy link
Member

\o/, more people to help, awesome! 🌟

I noticed that in https://github.com/ipfs/js-ipfs-api#add, the equivalent call accepts a file or an array of files. Should this module follow a similar API ?

Yes, we definitely want js-ipfs-api to provide the same API that ipfs-core inside js-ipfs offers, so that apps can switch between both modules (using a js-ipfs node or a go-ipfs node), without having to introduce many changes. The way that we want to achieve that, is to have a core-api spec, which is still very much a WIP, until we have it, ideally we should be as close as possible :)

Also I am willing to to work on ipfs.files.cat() and ipfs.files.ls() but I wonder where I should put the code. For add it is residing in https://github.com/ipfs/js-ipfs-data-importing.

data-importing goal is to have the primitives of 'importing' and 'exporting' data(in this case, files) in a single module. ls, cat, mkdir, stat, flush and so on are Files API specific function and should (for now), live inside js-ipfs core.

Finally, on a side note, I am pretty excited with what is going on with IPFS. I have been one of the active maintainer of JXTA which is (was) a full fledge P2P package in Java. My company is actually running a peer to peer network on it in production. I always wanted to do this in JS but never found the time. Kudos.

That is awesome! You might be interested in libp2p, the network stack that IPFS runs on top.

As @nginnever mentioned, he has been leading the charge on the Files API and its deps for js-ipfs. @nginnever might be good to get some update on #60, so that @bgrieder can be get onboard quicker and we can parallel the effort. Sounds good?

@bgrieder
Copy link
Author

bgrieder commented Apr 2, 2016

@diasdavid glad I can help. #60 is a good summary and work base.

@nginnever browser tests: the problem lies with the library js-ipfs-unifs. This library is called by the data importer to import a Buffer in importBuffer to create a DagNode:

const raw = new UnixFS('file', chunk)
const node = new mDAG.DAGNode(raw.marshal())

Unfortunately this library reads the the protobuf schema from the file system

const protobuf = require('protocol-buffers')
const schema = fs.readFileSync(path.resolve(__dirname, 'unixfs.proto'))

Obviously this cannot work in a browser. I opened an issue against that lib: ipfs/js-ipfs-unixfs#5

@nginnever
Copy link
Member

@bgrieder thanks for adding more tests and looking into this more! When I asked you to test larger files it got me investigating larger files in the go-ipfs implementation and I caught some sharding issues that may require us to rework the import and export logic to handle a merkle tree that doesn't just have an infinite amount of leaves on the 2nd level.

@nginnever browser tests: the problem lies with the library js-ipfs-unifs. This library is called by the data importer to import a Buffer in importBuffer to create a DagNode:

Good catch!

This seems odd to me. You are certainly correct that the schema should not be able to be loaded from fs in the browser. Yet when I was testing the bufferImporter here we do not run into any errors. May be a case of our build system skipping the require if it's not being used. Will need to look into that.

In any case, we have done something as simple as this in the past to handle the protobuf. If you want to make a PR against the unixfs module to close this issue ipfs/js-ipfs-unixfs#5

@diasdavid, I am going to clean up my local repo and link my fork for #60 or open a PR on js-ipfs to document where I am in the current build so @bgrieder can take a look and figure where he can add ideas. Look for that in about an hour or two. I currently have add, cat, and potentially get working today if I can find some time. CR is definetly needed so I look forward to hearing your thoughts.

@bgrieder
Copy link
Author

bgrieder commented Apr 3, 2016

@nginnever ipfs-js-unixfs: I am having problems getting dignified.js run browser test. See ipfs/js-ipfs-unixfs#6. I have asked @dignifiedquire for help

@bgrieder
Copy link
Author

bgrieder commented Apr 4, 2016

@nginnever as explained in ipfs/js-ipfs-unixfs#6, the issue with fs.readFileSync should resolve itself once this repo is move to dignified.js

@bgrieder
Copy link
Author

bgrieder commented Apr 4, 2016

@nginnever I had a look at your repos at https://github.com/nginnever/js-ipfs-data-importing and https://github.com/nginnever/js-ipfs but I doubt they have your latest commits. I would be glad to test your code by incorporating it into my own project and giving you feedback

@nginnever
Copy link
Member

@bgrieder I decided to go ahead and work on finishing ipfs files get and now need to start thinking about working on the http api to match with js-ipfs-api and the directory exporter. There is also a lot of work to be done on polishing what I have started on the dirExporter() and of course writing tests for all of this. Sorry for the delay, had some travelling to do. Both my data-importing and js-ipfs repo are now updated. A PR will be made when I finish more of the tasks mentioned :D

@bgrieder
Copy link
Author

bgrieder commented Apr 4, 2016

@nginnever Great... and my apologies if you felt pressured, that was not my intention.
AFAIK Github will not let me fork your fork so I will not be able to run PR against your fork. However, since this PR is basically about tests, I can probably enrich it with tests for your code that you will have to be copy/pasted until everything is merged into the main branch.

@nginnever
Copy link
Member

No worries! I really want to work hard on this so it's nice to get some push.

Sorry for not being able to fork the code yet. Not exactly ready to be made into a PR but you can see the direction I'm heading. Going to hack on this the next couple of days and then you'll be able to fork. I'll out a PR out before I write the tests for get and cat so you can help me with that if youd like. As well as the new data-importer:export functions.

@daviddias
Copy link
Member

@nginnever as explained in ipfs/js-ipfs-unixfs#6, the issue with fs.readFileSync should resolve itself once this repo is move to dignified.js

It is all green now :) the dignified pipeline isn't a 'full buy in', just a convenience, you are even able to use js-ipfs-unixfs without any bundler at all (which is awesome!).

@daviddias daviddias removed the backlog label Apr 4, 2016
@bgrieder
Copy link
Author

bgrieder commented Apr 5, 2016

@nginnever I started to look at your code in js-ipfs-data-importing and have a couple of initial comments:

  • import: I am not sure the options parameter makes sense. The only option queried in the code is options.recursive which must be true when it is a directory and is ignored in all other cases. Since directories are detected by the code, this parameter is redundant.
  • export:
    • I would expect the the function signature to take a hash rather than a dagNode as a parameter. This would make it symmetrical with import and does not require the user to know about DAG Nodes. The function would therefore start by pulling the dagNode from the hash, something along the lines of

          const mh = new Buffer(bs58.decode(hash))
          dagService.get(mh, (err, node) => {
            if (err) {
              return cb(err)
            }
            //we have the dag node -> rest of the code
          })
      
    • The UnixFS object contains the type. It should therefore not be necessary to ask the user the type of object it wants to pull (e.g. file or directory)

    • Same thing with the name of the resource which is part of the DAG Node. Specifying a directory where to export the resource should be enough. I see you are returning a Readable stream. This is another option and I have no opinion on which is best at this stage (Redable stream of export to designated directory) (see edit next post - I do have an opinion)

The export signature would thus be something like either

  • export( hash, directory, cb) if exporting to a directory or
  • export( hash, cb) with the cb having an object with a Redable and a name as second argument
    wdyt?

I have prepared a Repo with a small file, a large file and a buffer which will ease writing tests on export.
Cheers. b.

@bgrieder
Copy link
Author

bgrieder commented Apr 5, 2016

@nginnever On second thought, I believe the signature should be something like export(hash, cb) where

  • hash is the hash of the resource
  • cb is a two argument function
    • err: Error
    • res:
      • stream: a Readable stream
      • size: the files size (where appropriate)
      • name: the name of the resource (where appropriate)

because

  • this is the only way we can accommodate Buffers
  • returning a stream is the most flexible when handling large files

Converting stream to files would be a function of js.ipfs ipfs.files.get

@nginnever
Copy link
Member

nginnever commented Apr 5, 2016

@bgrieder Thank you for the CR!

import: I am not sure the options parameter makes sens. The only option queried in the code is options.recursive which must be true

True for go-ipfs afaik, the other option there is a chunker since in the future we will want to support different sized nodes for different types of files for optimizations. Think we should leave an options object for now in case we want to add features in the future.

I would expect the the function signature to take a hash rather than a dagNode as a parameter. This would make it symmetrical with import

That is actually how I had it originally but we decided at some point that we wanted leave as much FS manipulation as possible out of the importer/exporter and make it's job more concise, chunking and bundling data.

On second thought, I believe the signature should be something like export(hash, cb) where...

All of this is great! I was actually talking with @diasdavid last night about the best way to return the data in the exporter after realizing that I couldn't do recursive operations needed to keep track of the files in the dirExporter. I wanted to call the fileExporter every time a file was found in the directory but this wouldn't allow the streams to be correctly concatenated. We came up with two scenarios. One where we simply recursively create an array of streams and return that in the callback with a second path variable. This way we can keep track of the paths and know which streams belong to what files and their location in the dir structure.

The above method will buffer a lot of data into memory so this isn't ideal. The second method will be to use an event emitter in the cli core to listen for incoming files and consume them as they are pulled out of the database.

I have prepared a Repo with a small file, a large file and a buffer which will ease writing tests on export.

I'm going to be hacking these ideas together today and will hopefully have something working tomorrow that I can PR for you to add tests to. Also need some help with the HTTP API side of things. We really want to get these commands running in the browser this week!! :D

Cheers!

@bgrieder bgrieder closed this Apr 26, 2016
vasco-santos pushed a commit that referenced this pull request Sep 21, 2021
- Adds types
- Converts to ESM
- Uses [ejs](https://www.npmjs.com/package/ejs) for html templating
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants