Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content Type set by HTTP Gateway #152

Open
lidel opened this issue Sep 4, 2019 · 8 comments
Open

Content Type set by HTTP Gateway #152

lidel opened this issue Sep 4, 2019 · 8 comments

Comments

@lidel
Copy link
Member

lidel commented Sep 4, 2019

HTTP Gateway does content-type sniffing based on golang.org/src/net/http/sniff.go and file extension. js-ipfs uses similar setup.

Problem: there is no mechanism for website creator to override returned content-type, setting custom file extension works only for some file types.

Example

The same data produces different content-type, depending on request path.

SVG image

https://ipfs.io/ipfs/QmVdFJJBiQkVKFcvXu4WzySbZ7KnCW6uGWLJqZz5FnRWjk/ipfs-logo.svg
→ returned as image/svg+xml

XML document

https://ipfs.io/ipfs/QmVdFJJBiQkVKFcvXu4WzySbZ7KnCW6uGWLJqZz5FnRWjk/ipfs-logo.xml
→ returned as text/xml

Unknown extension

https://ipfs.io/ipfs/QmVdFJJBiQkVKFcvXu4WzySbZ7KnCW6uGWLJqZz5FnRWjk/ipfs-logo.foo
→ returned as text/plain

Raw CID

https://ipfs.io/ipfs/QmTqZhR6f7jzdhLgPArDPnsbZpvvgxzCZycXK7ywkLxSyU
→ returned as text/plain

Raw CID + explicit filename

https://ipfs.io/ipfs/QmTqZhR6f7jzdhLgPArDPnsbZpvvgxzCZycXK7ywkLxSyU?filename=/ipfs-logo.svg
→ returned as image/svg+xml

Motivation

We want IPFS to become viable solution for hosting websites.
At the HTTP level, as a bare minimum, website owners expect to able to override:

  • content-type of specific files / file types
  • error pages (4xx, 5xx)

Ideas to explore

(A) Embedding content-type in DAG-PB (UnixFS metadata)

One way to address this is to support embedding Content-Type in UnixFS DAG metadata.
It would be opt-in (like mode and mtime).
TBD if filename should override content type embedded in the dag.

This is tracked in ipfs/specs#364

(B) Drop-in config to override content-type per directory

@warpfork noted that DAG metadata may not be the best place for storing content-type:

ipfs/specs#217 (comment)
+1 towards the idea that if [Content] type is getting well-known support, it should be something we move towards the gateway knowing of it, rather than making it a feature of the filesystem.

This would be a much closer set of relationships to how the rest of the world works already (e.g. doing sysadmin today with nginx or something, I would generally configure [Content] types at the webserver area, and not in filesystem metadata) -- and thus seems much less likely to go awry.

Carefully avoiding baking in the idea of a single "mimetype string" field into our filesystem metadata also leaves much more room for issues to evolve around the things Ian mentioned:

  1. a file can have multiple mime types depending on the context
  2. some mime types can't be deduced until the entire file has been read

My take on this is:

  • mind, we did exactly the opposite with mtime and mode – UnixFS 1.5 embedds them in dag-pb
  • we could support both ways. e.g., website creator would add something like _headers to the directory, and Gateway would do the right thing when resource from directory or its subdirectories are requested
    • presence of the config file would disable content sniffing on both server and client (X-Content-Type-Options: nosniff)

See _headers in ipfs/specs#257

References

cc @olizilla @autonome

@hsanjuan
Copy link
Contributor

hsanjuan commented Sep 5, 2019

* prior art: `.htaccess`, `.gitattributes`

wouldn't his mean that every request to the gateway becomes two request (one to the actual content, the other to figure out if .htaccess-clone exists). This may be expensive.

And. if using different extensions on the filename is effectively setting the content type guessed for that file, isn't this precisely a way to hint/override the content type of certain content?

@lidel
Copy link
Member Author

lidel commented Sep 5, 2019

wouldn't his mean that every request to the gateway becomes two request (one to the actual content, the other to figure out if .htaccess-clone exists). This may be expensive.

It looks that way, however (iiuc) if gateway wants to resolve /ipfs/{cid}/foo/bar/cat.xyz to a CID it needs to fetch and cache dag roots of /ipfs/{cid}/, /ipfs/{cid}/foo/ and /ipfs/{cid}/foo/bar/.

This means checking if .ipfs exists in any of them does not trigger additional fetch: dag with directory listing is already cached in local repo, which should be cheap to check by the gateway.

if using different extensions on the filename is effectively setting the content type guessed for that file, isn't this precisely a way to hint/override the content type of certain content?

Unfortunately extension-based sniffing relies on arbitrary mapping hardcoded in go-ipfs and works only for popular file types, such as SVG. Publishing file with .sxg extension did not set correct content-type (example below).

Real life example: .sxg

Signed HTTP Exchanges (#121) are bundled as .sxg files. Chrome won't load them unless .sxg is returned with specific content-type (at the moment it is application/signed-exchange;v=b3). Right now ipfs.io has a special Nginx rule that overrides content-type for .sxg, but this obviously does not scale well, and will break old snapshots when we globally update to a new version. On top of that, future specs add more content types.

It is a good illustration of use case where a person publishing file would want to override content-type of a specific file locally and ensure every gateway returns a valid one.

@AuHau
Copy link
Member

AuHau commented Nov 26, 2019

Just FYI there is accepted proposal ipfs/kubo#6214 for support of .ipfs-gateway.(json|yaml). Let see how implementation will move on.

@holloshaw
Copy link

holloshaw commented Jan 17, 2022

Has much progressed in terms of having a 404 page for ipfs hosted websites?

@lidel
Copy link
Member Author

lidel commented Mar 30, 2022

I believe _redirects is work-in-progress, and _headers will be next – see recent status update in ipfs/specs#257 (comment)
When we have that, we may allow customizing Content-Type header via _headers file (tbd, needs security analysis).

@lidel
Copy link
Member Author

lidel commented Jan 12, 2023

An alternative idea is to do what we did for opt-in mtime and mode and allow opt-in mtype as part of dag-pb.
Looking for early feedback in ipfs/specs#364 (no IPIP yet).

@alexgleason
Copy link

What if I want to serve media from an IPFS gateway for my website, but I do not want to allow application/javascript in the content-type header? We need the ability to control allowed types too, not just make detection good.

@alexgleason
Copy link

alexgleason commented Sep 7, 2023

Good news, the default IPFS gateway returns Content-Type: text/plain; charset=utf-8 for javascript files, even if they end with a .js extension.

EDIT: Same isn't true for svg, though. https://github.com/allanlw/svg-cheatsheet
EDIT2: svg exploits don't work when used in an img tag, so it's okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants