Skip to content
This repository has been archived by the owner on Mar 10, 2020. It is now read-only.

Lack of hash -> Buffer interface #164

Closed
mitra42 opened this issue Oct 23, 2017 · 7 comments
Closed

Lack of hash -> Buffer interface #164

mitra42 opened this issue Oct 23, 2017 · 7 comments

Comments

@mitra42
Copy link

mitra42 commented Oct 23, 2017

A noticeable gap in the APIs seems to be taking a hash and getting a buffer back.

The Scenario ... on the Archive.org gateway, we add a file to IPFS using the https..../api/v0/add" interface (working with Kyle) and get back a multihash-base58 that could EITHER
a) hash of the raw bytes (on a small file) or
b) the IPLD (for a larger file that got sharded).

The problem is turning that hash back to bytes in the JS, we've been using block.get() to retrieve it, but of course that files on case b.

Kyle has suggested using files.get() but this runs into the API limitation that files.cat returns a stream, and I'm guessing that a lot of people using this end up writing code to convert that Stream back into a buffer (so it can be passed to a Blob).

If I understand the code in files.js (and its possible that I don't), the code in cat() already has the whole file in a Buffer, which it converts into a Stream, which the caller then has to convert back into a buffer. Wouldn't it make a lot of sense to provide an API that allows returning it as a Buffer in the first place ?

@daviddias
Copy link
Contributor

If you use the files command (ipfs add) or files endpoint (/api/v0/add) to add a file, then if you wish to get the file back, you should always use the files command to retrieve the file, that can be either cat or get.

Kyle has suggested using files.get() but this runs into the API limitation that files.cat returns a stream, and I'm guessing that a lot of people using this end up writing code to convert that Stream back into a buffer (so it can be passed to a Blob).

It is very common pattern to buffer a stream with a module like http://npmjs.com/bl, nevertheless, I'm working on exposing two interfaces so that users do not have to. Track here https://github.com/ipfs/interface-ipfs-core/pulls

If I understand the code in files.js (and its possible that I don't), the code in cat() already has the whole file in a Buffer, which it converts into a Stream, which the caller then has to convert back into a buffer.

This is not the case, we stream it all the way.

Wouldn't it make a lot of sense to provide an API that allows returning it as a Buffer in the first place?

We will have both :) #162

@mitra42
Copy link
Author

mitra42 commented Oct 30, 2017

Thanks for addressing the buffering issue -I didn't know about BL - and of course the trouble was that writing a new pattern to me (stream buffering) had its own bugs, that were confounded by the nasty bug in JS-IPFS's file.get with short (single block) files.

The other half of this problem is that if you have a CIDv0 (such as returned by /api/v0/add) for example embedded in a string ipfs/QmTds3bVoiM9pzfNJX6vT2ohxnezKPdaGHLd4Ptc4ACMLa and you don't know that it was created by "ipfs add" its very hard to know HOW to fetchit. In fact, due to the bug posted above there is actually NO way to fetch that particular string in a browser.

I've appended my current code to fetch a URL from IPFS. It is kluge upon kluge to work around that bug, and the API weakness of not knowing how to fetch.

async p_rawfetch(url, verbose) {
        /*
        Fetch some bytes based on a url of the form ipfs:/ipfs/Qm..... or ipfs:/ipfs/z....  . 
        No assumption is made about the data in terms of size or structure, nor can we know whether it was created with dag.put or files.put or http api.
        
        Where required by the underlying transport it should retrieve a number if its "blocks" and concatenate them.
        Returns a new Promise that resolves currently to a string.
        There may also be need for a streaming version of this call, at this point undefined since we havent (currently) got a use case..

        TODO - there is still the failure case of short files like  ipfs/QmTds3bVoiM9pzfNJX6vT2ohxnezKPdaGHLd4Ptc4ACMLa
        TODO - added with http /api/v0/add/ but unretrievable on a browser (it retrieves below in Node).

        :param string url: URL of object being retrieved
        :param boolean verbose: True for debugging output
        :resolve buffer: Return the object being fetched. (may in the future return a stream and buffer externally)
        :throws:        TransportError if url invalid - note this happens immediately, not as a catch in the promise
         */
        if (verbose) console.log("IPFS p_rawfetch", url);
        if (!url) throw new Dweb.errors.CodingError("TransportIPFS.p_rawfetch: requires url");
        let cid = (url instanceof CID) ? url : TransportIPFS.url2cid(url);  // Throws TransportError if url bad

        try {
            let res = await this.ipfs.dag.get(cid);
            if (res.remainderPath.length)
                throw new Error("Not yet supporting paths in p_rawfetch"); //TODO-PATH
            let buff;
            if (res.value instanceof DAGNode) { // Its file or something added with the HTTP API for example, TODO not yet handling multiple files
                //console.log("Case a or b" - we can tell the difference by looking at (res.value._links.length > 0) but dont need to
                // as since we dont know if we are on node or browser best way is to try the file.get and if it fails try the block to get an approximate file);
                // Works on Node, but fails on Chrome, cant figure out how to get data from the DAGNode otherwise (its the wrong size)
                buff = await p_streamToBuffer(await this.ipfs.files.cat(cid), true);
                if (buff.length === 0) {    // Hit the Chrome bug
                    // This will get a file padded with ~14 bytes - 4 at front, 4 at end and cant find the other 6 !
                    // but it seems to work for PDFs which is what I'm testing on.
                    if (verbose) console.log("Kludge alert - files.cat fails in Chrome, trying block.get");
                    let blk = await this.promisified.ipfs.block.get(cid)
                    buff = blk.data;
                }
            } else { //c: not a file
                buff = res.value;
            }
            if (verbose) console.log("fetched ", buff.length)
            return buff;
        } catch (err) {
            console.log("Caught misc error in TransportIPFS.p_rawfetch");
            throw err;
        }
    }

By the way .. if anyone else is struggling with this, I'll keep a version of p_rawfetch updated to deal with current bugs in IPFS and Kludges around API issues, feel free to contact for a recent version.

@daviddias
Copy link
Contributor

daviddias commented Oct 30, 2017

The other half of this problem is that if you have a CIDv0 (such as returned by /api/v0/add) for example embedded in a string ipfs/QmTds3bVoiM9pzfNJX6vT2ohxnezKPdaGHLd4Ptc4ACMLa and you don't know that it was created by "ipfs add" its very hard to know HOW to fetch it. In fact, due to the bug posted above there is actually NO way to fetch that particular string in a browser.

Using the endpoint /api/v0/add will always use ipfs add, see https://ipfs.io/docs/api/#api-v0-add

The golden rule here is, if you are manipulating files (add, cat, get, ls), always use the files APIs, there should be no reason to use the block API

@mitra42
Copy link
Author

mitra42 commented Oct 30, 2017

Hi David - I think I'm not communicating the issues here.

a) You can't use files.add if you aren't working with a file - I've stopped using block.put, but I'm using dag.put if I'm storing a JSON object (not a file).

b) Once you receive a string of the form shown you have no way to know IF it was sent via api/v0/add=files.add OR dag.put, and retrieval fails if you use the wrong way to retrieve it.

c) That string was generated via /api/v0/add so it should be retrievable with jsipfs:files.get, but it fails anyway on Chrome (works in Node) (bug posted in js-ipfs Issue#1049 )

d) IF that hash was generated with block.put and you try and retrieve it with dag.get or files.get it fails in REALLY BAD WAYS - in a crash on some other IPFS thread that appears unrelated to the code. (I'm not using block.put any more, but that hash looks like a valid hash to retrieve)

These cases (Except for #c which is just a presumably fixable IPFS bug) are cases that can all occur BECAUSE there is no way to determine what kind of IPFS multihash you've got, and no IPFS way to fetch it when all you have is it embedded in a URL, hence the kludgy code I posted above.

@daviddias
Copy link
Contributor

Got it! Thanks for expanding more on the issues that you were pointing.

a) + b)

That is correct and I believe that it shouldn't be considered a bug, we just need to have better learning materials.

A File is an IPLD Graph and you can get and add files through the Files API
An IPLD node is one single object and can be accessed by the DAG API
A block is a serialized IPLD node.

c)

Working on it! #162

d)

Agreed, it should provide a better error to the user. Will look into it. It would be stellar if you could PR with tests showing those errors.

@mitra42
Copy link
Author

mitra42 commented Nov 16, 2017

b: I think should be considered a bug, IPFS talks a lot about self-describing identifiers, but at the basic level the main outward facing identifier of IPFS does not appear to be self-describing, i.e. if you have a CID1 /ipfs/Q.... or a CID2 /ipfs/z.... then you don't know how to fetch it because it does not tell you whether its a File, an IPLDnode/DAG or a Block. You need out-of-band info as to what to expect, which is contrary to the essence of something being self-describing.

Without knowing what you have, you have to

  • try and retrieve it with dag.get
    -- (and hope its not a block which would crash the IPFS thread in the background so you can't catch it (bug#d),
  • then look to see if its an IPLD
  • in which case re-retrieve it with files.cat,
    -- and also do the partial workaround to the bug in "c" (the corrupt file retrieved coincidentally opens in PDF readers).

My nasty kludgy code to do this is in https://github.com/internetarchive/dweb-transport/blob/master/js/TransportIPFS.js , not proud of it, but its the only way I've figure in the JS API to turn an IPFS identifier back into data.

c: I couldnt find any notes in Issue #162 as to whether this is addressed? Though 162 seems to have progressed to testing.

d: I posted tests on my repo as commented in ipfs/js-ipfs#1049, as mentioned elsewhere I have zero confidence in finding docs for the clone/edit/test/submitPR cycle on IPFS's multiple repos

@daviddias
Copy link
Contributor

On your comment:

because it does not tell you whether its a File, an IPLDnode/DAG or a Block. You need out-of-band info as to what to expect, which is contrary to the essence of something being self-describing.

You are mixing concepts. It is important to to understand that a CID always references to a node of a graph (IPLD node), that node can be also retrieved in its serialized format (Block). If a node is the root of a graph that represents a file, then you can fetch it as a File.

If you want to bring more of these notes up, I invite to you open a issue on https://github.com/ipfs/notes since it is a topic that affects more of IPFS than just its interface.

On your main question on this issue:

A noticeable gap in the APIs seems to be taking a hash and getting a buffer back.

The API has been upgraded, see:

Going to close this issue for now, let's track the remaining work on ipfs/js-ipfs#1086

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants