Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update subdomain redirection logic to only engage on browsers #6975

Closed
momack2 opened this issue Mar 11, 2020 · 9 comments · Fixed by #6982
Closed

Update subdomain redirection logic to only engage on browsers #6975

momack2 opened this issue Mar 11, 2020 · 9 comments · Fixed by #6982
Assignees
Labels
kind/enhancement A net-new feature or improvement to an existing feature

Comments

@momack2
Copy link
Contributor

momack2 commented Mar 11, 2020

From conversation with @Stebalien and @jbenet we should only redirect to a subdomain gateway if the user is detected to be a browser node so as to avoid breaking curl and wget usage (curl doesn't auto-redirect without passing -L and wget won't span hosts on redirect). Note this is needed for 0.5 to avoid breaking existing usage.

Required for #6776

@momack2 momack2 added the kind/enhancement A net-new feature or improvement to an existing feature label Mar 11, 2020
@momack2
Copy link
Contributor Author

momack2 commented Mar 11, 2020

Ex what ipfs-search does:
curl http://localhost:8081/ipfs/QmS4ustL54uo8FzR9455qaxZwuMiUhyvMcX9Ba8nUH4uVv/readme

@Stebalien
Copy link
Member

Potential solution: Just return the page with the redirect response but set the "Clear-Site-Data" header. CURL will download the page and won't follow the redirect (and will even politely exit with a zero status by default). Browsers should follow the redirect.

@lidel
Copy link
Member

lidel commented Mar 11, 2020

@Stebalien I implemented that idea (Option A - see #6982). Let me know what you think.

Personally I don't feel too confident with it, produces valid responses, but feels like a hack that could bite us in the future.

What if we go with a much simpler hack (Option B - see #6984)

  • see if User-Agent match ^(curl|wget)/i
    • add Clear-Site-Data header
    • disable subdomain, no redirect, return regular HTTP 200 response

Would it be worse than A? I feel it is a bit safer and more explicit behavior.

@Stebalien
Copy link
Member

My concern is that there are probably other tools that don't follow redirects either (although this looks rare). Option A is the most general option available, as far as I know, and it doesn't rely on user agent sniffing.

If it does bite us, I believe we can switch to option B in the future, right?

I feel it is a bit safer and more explicit behavior.

How?


I'm fine just white listing curl if option A causes issues, it's just that user-agent sniffing makes me uncomfortable.

lidel added a commit that referenced this issue Mar 11, 2020
Context: #6975

When request is sent to http://localhost:8080/ipfs/$cid we check
User-Agent and disable subdomain redirect if it is a known cli tool that
does not follow redirects by default.

We also set Clear-Site-Data header on 'localhost' responses to ensure Origin
sandbox can't be abused.

License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
@lidel
Copy link
Member

lidel commented Mar 12, 2020

I feel [user-agent sniff] is a bit safer and more explicit behavior.
How?

In my mind User-Agent sniff (B) would be bit "safer" because if it breaks something, user can change/remove User-Agent sent to go-ipfs and control gateway behavior that way:

$ curl -H "User-Agent:" 

If there is an issue with redirect+payload (A) user can't easily fix it on the client side alone, they need to either wait for new go-ipfs or put a reverse proxy in front of go-ipfs and fix responses on the fly.

To be fair both hacks feel bit bad to me and I'd rather ask people to fix their curl scripts by adding -L or simply switching from localhost to 127.0.0.1 which remains path gateway anyway (C).

👉 that being said, if I had to pick between A and B I'd probably try to make A work, just because it feels more magical ;-)

I'm fine just white listing curl if option A causes issues, it's just that user-agent sniffing makes me uncomfortable.

I managed to fix errors in PoC for A (#6982), so right now both options are available

If [A] does bite us, I believe we can switch to option B in the future, right?

Yes, we can switch from A to B at any time, code is in #6984

@Stebalien
Copy link
Member

If there is an issue with redirect+payload (A) user can't easily fix it on the client side alone, they need to either wait for new go-ipfs or put a reverse proxy in front of go-ipfs and fix responses on the fly.

Well, my thinking here is that this is less likely to go wrong because there's nothing special about it. I guess the worst-case scenario is if someone assumes that the HTTP response body for a redirect is always some kind of error message.

@lidel
Copy link
Member

lidel commented Mar 13, 2020

Sounds like A (301+payload) is the way to go then?

@Stebalien is it ok if I rebase&merge A (#6982) into subdomain PR at #6096, and close the other PR?

@Stebalien
Copy link
Member

For now, yes.

@lidel
Copy link
Member

lidel commented May 13, 2020

For the record A (#6982) was merged with #6096 and released with go-ipfs 0.5.0.


To illustrate: asking dweb.link for CID bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq
returns HTTP 301 to subdomain, with text payload: hello:

$ curl -sD - https://dweb.link/ipfs/bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq
HTTP/2 301                           
server: nginx
date: Wed, 13 May 2020 19:05:41 GMT
content-type: text/plain; charset=utf-8
content-length: 5
accept-ranges: bytes
access-control-allow-methods: GET
cache-control: public, max-age=29030400, immutable
clear-site-data: "cookies", "storage"
etag: "bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq"
last-modified: Thu, 01 Jan 1970 00:00:01 GMT
location: https://bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq.ipfs.dweb.link/
x-ipfs-gateway-host: gateway-bank1-fra2
x-ipfs-path: /ipfs/bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq
access-control-allow-origin: *
access-control-allow-methods: GET, POST, OPTIONS
access-control-allow-headers: X-Requested-With, Range, Content-Range, X-Chunked-Output, X-Stream-Output
access-control-expose-headers: Content-Range, X-Chunked-Output, X-Stream-Output
x-ipfs-pop: gateway-bank1-fra2
strict-transport-security: max-age=31536000; includeSubDomains; preload

hello

@lidel lidel closed this as completed May 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature
Projects
None yet
3 participants