Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imports: Support imports from IPFS hashes #5290

Closed
legowerewolf opened this issue May 14, 2020 · 25 comments
Closed

Imports: Support imports from IPFS hashes #5290

legowerewolf opened this issue May 14, 2020 · 25 comments
Labels
cli related to cli/ dir suggestion suggestions for new features (yet to be agreed)

Comments

@legowerewolf
Copy link

legowerewolf commented May 14, 2020

The problem
Files at URLs are mutable - they can be changed or deleted at any time, by anyone with access - whether that access is legitimate or not. If the host gets hacked, bad code could be injected for anyone who fetches it.

The solution
Imports from IPFS. IPFS is a content-addressed globally-distributed filesystem. Files are identified by a hash of their contents, so they can never be modified without changing the file identifier. A given hash will always point to the exact same file, forever.

Additionally, as it's globally distributed, the chance of files disappearing forever when they're depended on is nearly zero. No more of this. And none of this.

Implementation options
Right now, the IPFS community seems to be standardizing on ipfs:// as a URI scheme for files on IPFS. We can use that to identify when an import is from IPFS, as opposed to a HTTP(S) import, and act accordingly.

Now that we have the file identifier, there's a few options for fetching it.

  1. Use the local IPFS node. Chances are, if a user wants to import from IPFS they're running a node that we can talk to to resolve files. IPFS nodes have an HTTP API that typically runs (for localhost) on 127.0.0.1:5001, and allows you to get files that way. An import from ipfs://{hash} can be fetched from http://127.0.0.1:5001/ipfs/{hash} - we can also optionally pin files to the user's node so they're reprovided to others on the IPFS network. Full local HTTP API docs are here.

  2. Use a public IPFS gateway. There are a fair number of them, and known gateways and their status are tracked here. File resolution is always at {gateway}/ipfs/{hash}.

  3. Run a local IPFS node. This would be the most difficult to handle, as it's not just a string manipulation to translate an ipfs:// import into a http(s):// import.

Personally? I recommend using the installed local node and falling back to a select list of public gateways.

@joshverd
Copy link

How would you be able to identify which package you are importing if it's just a hash in the import statement?

@legowerewolf
Copy link
Author

Doc comments on your deps.ts file? Ideally, the imported module would also have a doc comment at the top identifying itself.

@carsonfarmer
Copy link

The other option is sticking to a convention? You can reference linked data within an IPFS IPLD DAG structure (the stuff behind the hash) by name. This requires 'wrapping' the IPFS content in an external folder, but this isn't particularly burdensome.

import { Something } from "http://127.0.0.1:5001/ipfs/{hash}/some_package/mod.ts"

@legowerewolf
Copy link
Author

That also works, and preserves the core immutability features we're looking for. Although, I think that import string would be better as

import { Something } from "ipfs://{hash}/some_package/mod.ts"

to preserve the fallback options I proposed above.

@bartlomieju bartlomieju added the feat new feature (which has been agreed to/accepted) label May 15, 2020
@olanod
Copy link

olanod commented May 19, 2020

Was going to create the exact same issue 😅
Another protocol that might be suitable as well is p2p://. I think this feature would be used a lot in tandem with import maps, later the community could create cli tools to manage the json file more easily making use of some registry that maps package names to hashes, so you could do module-management-tool add library and it automatically adds library with the right hash.

{
   "imports": {
      "library/": "p2p://<hash>"
   }
}
import foo from 'library/foo.js'

And to go the extra mile that registry can just be another json file on ipfs managed by a decentralized blockchain based organization that is economically incentivized to keep a high quality registry with possibly different channels like testing and stable where packages are reviewed carefully to make sure no funny scripts make their way to the stable registry.

@Diomas
Copy link

Diomas commented May 25, 2020

IPFS is built on libp2p - it has both node.js and in-browser support.
So Deno just need to support that lib, and it can do bunch of p2p interactions then

@rotemdan
Copy link

rotemdan commented May 26, 2020

From a security perspective, IPFS is not completely necessary, nor does it actually solve the issue of content-defined library imports - since it provides no guarantee that the requested file would internally use a secure hash scheme to import its own dependencies, and so would their own dependencies' dependencies, and so on.

I think it would also be productive to consider a URI scheme that is independent of any specific protocol (e.g. https/ipfs/p2p/magnet) and instead:

  • Standardize some way to integrate the hash into the URI, e.g.:
protocol://path/some_package@version/module.{hash}.ts
  • Have Deno try to automatically verify the hash, including the hashes of all of the file's internal imports (if provided).
  • Command-line option for a strict secure mode in which every imported file must include a hash (or, for the very least, files that are fetched from external sources).

For compatibility and future proofing: it seems reasonable to use IPFS's multiformat scheme that allows to support various kinds of hash algorithms.

Edit:

The filename doesn't necessarily need to embed the hash. This scheme may be as secure (not completely sure, need to consider it further..):

protocol://path/some_package@version/module.ts#cid={hash}

(Using a key=value scheme the # part can be made extensible to allow for other metadata such as digital signatures etc.)

@rotemdan
Copy link

rotemdan commented May 27, 2020

After thinking about this further I've found a simple alternative solution that guarantees 100% content addressed safety and reproducibility for the entire dependency tree:

It would look like this:

protocol://domain/path/name@version/module.ts#lock_cid=QmcRD4wkPPi6dig81r5sLj9Zm1gDCL4zgpEj9CfuRrGbzF

Where lock_cid is a content identifier (basically a hash in a flexible format) for a lock file that would include the hash of the target file (here module.ts) as well as the hashes of all imported files in the entire dependency tree (possibly including imports from the standard library):

The file might look like something like this (or might use JSON etc. this is only shown for simplicity):

https://my.website/path/name@version/module.ts Qmf8obm7bxrQS1JnjUniJdibcN2kUJy9zz732sr7o3dxtn
https://my.website/path/name@version/utils.ts Qmeg1Hqu2Dxf35TxDg18b7StQTMwjCqhWigm8ANgm8wA3p
https://my.website/path/name@version/methods.ts QmZfSNpHVzTNi9gezLcgq64Wbj1xhwi9wk4AxYyxMZgtCG

https://someother.website/path/name@version/othermodule.ts QmbKxNNCxBox7Cmv3jiUZbiG3zpzmtnYzVUuKHxfAjvpyH
https://someother.website/path/name@version/othermoduleutils.ts QmPwwoytFU3gZYk5tSppumxaGbHymMUgHsSvrBdQH69XRx

https://deno.land/std@v0.3.0/async/delay.ts QmaLRet8qeYqNaq8xJeiqwjNnukSo3uEA8oWsDLoxxBv4Q
https://deno.land/std@v0.3.0/async/deferred.ts QmWZtn3ahqqpGBBRZqPdthcWz2n1rxc1UuiDoWXrgrHKzZ

...

Since the lock file is content addressed, it can be fetched from anywhere, either from IPFS, or from the web server itself (say in https://my.website/path/name@version/module.ts.lock).

The lock file can also be effectively used as an IPFS-like merkledag (though unlike in IPFS, it doesn't represent a directory structure, but a collection of references to various sources), but since all the references use cids, they can all be potentially fetched from IPFS (and in parallel, which may also improve performance).

Technically, if some of the imports in the dependency tree already refer to their own lock file, then it may not be strictly necessary to include them in the lock file. However, since the storage requirements of a structure like this are relatively minimal in modern standards, it may be better not to rely on multiple sources and include everything in one file (even if techincally redundant).

@srdjan
Copy link

srdjan commented May 27, 2020

ha! I had similar thoughts here: denoland/dotland#406 (comment)

@rotemdan
Copy link

rotemdan commented May 28, 2020

@srdjan
Interesting you were going the same way.

I've modified this approach through several iterations and eventually came to a solution that doesn't actually require the library authors to know anything about the existance of a lock file or even the hashing scheme. I'll try to clarify some aspect for the design :

The hash part (#) is never actually sent to the server, it is only for local use:

https://domain/path/name@version/module.ts#lock_cid=QmcRD4wkPPi6dig81r5sLj9Zm1gDCL4zgpEj9CfuRrGbzF

The lock file is individual to the dependecy tree of a particular ts (or js) file (there is no consideration of a directory structure here), but by convention, for the above URI, one of its search locations might be:

https://domain/path/name@version/module.ts.lock

However, it doesn't have to. It can be stored locally or fetched from a p2p network like IPFS.

As an importer of a third-party ts or js file, you would be able to produce, by yourself, a lock file for that particular import and put it practically wherever you want.

@rotemdan
Copy link

rotemdan commented May 28, 2020

Alright, after considering it even more, I've realized that the lock file isn't even strictly necessary to be stored anywhere. Instead, it can be regenerated purely based on the content and structure of the dependency tree for verification purposes.

It's actually pretty simple:

Say I want to import this URI, but I also want a strong proof that ensures that what I get is always the same:

import * from "https://example.com/path/name@version/module.ts"

I use a tool that walks the dependency tree of module.ts in some deterministic order and records the URIs and hashes of all the files it finds and puts the result in some file called module.ts.lock. I annotate the hash of that file into the link like this:

import * from "https://example.com/path/name@version/module.ts#lock_cid=QmaLRet8qeYqNaq8xJeiqwjNnukSo3uEA8oWsDLoxxBv4Q"

Now whenever someone encounters this annotated link. They have two options:

  1. Try to fetch the lock file from somewhere (server, local directory, IPFS) and use it for verification.
  2. Recreate the same process I did (regenerate the lock file) and see if the hash they get is the same.

That's pretty much it.

@legowerewolf
Copy link
Author

As an update, this happened. Chrome extended support for custom protocol handlers to include, among others, support for ipfs:// and ipns://.

@bartlomieju bartlomieju added cli related to cli/ dir suggestion suggestions for new features (yet to be agreed) and removed feat new feature (which has been agreed to/accepted) labels Nov 18, 2020
@RobotSail
Copy link

RobotSail commented Mar 14, 2022

Going off of some comments that have been posed in this thread, would it make sense for there to be a convention that utilizes IPNS as part of its base path to host modules?
Since the base ID for an IPNS directory is just the multihash of a public key, this could be used to create signatures for the modules to verify the signing authority.

So someone responsible for creating a module could host the directory at ipns/<foo_pubkey_hash> with the following structure:

pubkey.txt
modules/
    foo_module/
        index.ts
        index.ts.sig
    bar_module/
        index.ts
        index.ts.sig

Then the directory could be declared from within a package.json, and the necessary import logic could be handled by the runtime:

{
    "imports": {
        "alias_name": {
            "cid": "ipns/<pubkey_hash>",
            "modules": ["foo_module", "bar_module"] 
        }
    }
}

The files themselves could then be imported as such:

import React from "ipfs:alias_name/foo_module"

The runtime could check the signatures of the modules being requested for import and error if they don't line up, or if the pubkey doesn't match its IPNS hash.

@jeff-hykin
Copy link

jeff-hykin commented Mar 19, 2022

For me, it seems like IPNS would kind of defeat the purpose of IPFS for Deno.

I'd like the hash to be in the import statement, and the URL to always point to the same content, instead of needing a separate package.json file to map names to hashes. At that point, it just seems like a package lock file to a regular website whose content can change.

@iacore
Copy link

iacore commented Mar 20, 2022

IPFS uses multiaddr, which is incompatible with URL. /ipfs/Qmf8obm7bxrQS1JnjUniJdibcN2kUJy9zz732sr7o3dxtn is an example multiaddr.

@RangerMauve
Copy link

Just wanted to chime in that it would be nice if we could have a way to add new protocol schemes for imports (or maybe for fetch as well from within code).

I'm the main developer of Agregore, a p2p web browser that supports stuff like IPFS. I'd love it if we could reuse modules published for p2p web browsers like Agregore within Deno.

In Agregore, we've extended the browser's protocol handlers to support ipfs/ipns/etc not just for reading with GET, but also for writing with PUT.

On desktop we're doing this via the protocol API provided by Electron, and via some C++ changes on mobile.

Similarly, the Node.js VM API enables us to provide custom "linkers" which can enable customizing module resolution for custom protocols (something I've played around with in webrun).

Having a way to dynamically add protocol scheme handlers in Deno would make it easier to integrate stuff like IPFS in user-land without needing large dependencies.

@legowerewolf
Copy link
Author

Actually, yeah, I think that would be a good way to go about it. If you could register new protocol schemes for Deno (probably with a Deno-namespaced function), you could support any protocol you wanted. If this registration also added support for those protocols to the fetch function, it'd be even more useful.

@jeff-hykin
Copy link

jeff-hykin commented Jun 28, 2022

Having a way to dynamically add protocol scheme handlers in Deno

While that would be a good start, and not something I'm opposed to, I think it would be a mistake for IPFS to not have built-in support.

For example:

  • Consider an important script that has a normal URL import to a color package
  • The important script gets used inside many apps/tools/websites
  • The color package URL goes down permanently (maintainer looses the domain)
  • I host a copy of the color package, but must do it on a different URL
  • People try to run their apps/tools
  • All of them break because they can't import the color package. A new version of the important script must be created and published, the devs/maintainers of the app must do the work to update all their apps/tools, and then all the downstream users need to get the latest version even if it is a breaking change for them.
  • If/When I loose my domain (no one lives/pays forever) then whole process repeats. A ton of maintenance without any actual code/features being added.

IPFS solves that problem (anyone could host the color package source code and the scripts could find it).

But,

If the only IPFS support is through custom protocols, my understanding is we're going to need a non-IPFS import, before we can use IPFS imports. E.g. it's still centralized with a possible single point of failure, unless we go with the primitive solution of pasting the entire custom protocol library at the top every time.

TLDR; decentralization/stability only works if IPFS is natively supported.

@legowerewolf
Copy link
Author

legowerewolf commented Jun 29, 2022

Yeah. It'd be good if it was built-in. It might not even be that hard - has anyone tried running js-ipfs inside Deno? It might just work. Sure, implementing support for talking to a user's existing node would be good, but I'm pretty sure the JS version is stable enough for our needs here.

EDIT: Looks like the esm.sh transpilation CDN is having issues at the moment. The following should work right, but it isn't.

import * as IPFS from "https://esm.sh/ipfs-core@0.15.4";

const node = await IPFS.create();

const stream = node.cat("QmPChd2hVbrJ6bfo3WBcTW4iZnpHm8TEzWkLHmLpXhF68A");
const decoder = new TextDecoder();
let data = "";

for await (const chunk of stream) {
	// chunks of data are returned as a Uint8Array, convert it back to a string
	data += decoder.decode(chunk, { stream: true });
}

console.log(data); // should output "Hello, <YOUR NAME HERE>"

@jeff-hykin
Copy link

jeff-hykin commented Jun 29, 2022

but I'm pretty sure the JS version is stable enough for our needs here.

I've been getting familiar with running Deno in Rust, so I was thinking about exploring implementing it on the backend. The IPFS client for rust still needs some work though. The JavaScript client is probably the faster way. I might attempt to bundle it through parcel to esm.

As a start, Deno could just check for files under $HOME/.ipfs (or whatever the path to the store is)

@RangerMauve
Copy link

Since deno hasn't got the infrastructure for custom protocols yet, I've started sketching up a tool based on Node.js in the meantime. 😁

https://github.com/AgregoreWeb/agregore-cli/blob/main/test.js#L34

With this it should be possible to run modules from IPFS and to customize the APIs available to them. (at least to start experimenting).

Since it's only providing web APIs any scripts written for it should be portable to deno down the line.

@rafrafek
Copy link

In Python when installing dependencies you can specify hashes:

requests==2.31.0 ; python_full_version >= "3.12.0" and python_full_version < "3.13.0" \
    --hash=sha256:58cd2187c01e70e6e26505bca751777aa9f2ee0b7f4300988b709f44e013003f \
    --hash=sha256:942c5a758f98d790eaed1a29cb6eefc7ffb0d1cf7af05c3d2791656dbd6ad1e1

Poetry by default generates lock file with hashes and if you export dependencies to a requirements.txt file it will include hashes as well.

That way you can be sure nobody tempered with the content of your dependency module.

@lucacasonato
Copy link
Member

We are not going to do this.

@lucacasonato lucacasonato closed this as not planned Won't fix, can't repro, duplicate, stale Sep 4, 2024
@mhanuszh
Copy link

mhanuszh commented Sep 4, 2024

We are not going to do this.

Sad, but understandable

@jeff-hykin
Copy link

jeff-hykin commented Sep 5, 2024

We are not going to do this.

I'm also sad, but I appreciate the clear communication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli related to cli/ dir suggestion suggestions for new features (yet to be agreed)
Projects
None yet
Development

No branches or pull requests