Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(backend): Sanitize slashes in URL paths #3982

Merged
merged 6 commits into from
Aug 29, 2024

Conversation

mlafeldt
Copy link
Contributor

@mlafeldt mlafeldt commented Aug 19, 2024

Description

Fixes the following worker crash in Cloudflare Pages when accessing an API endpoint where the URL path starts with more than one slash:

$ wrangler pages deployment tail --project-name myproject

 ⛅️ wrangler 3.72.0
-------------------

No deployment specified. Using latest deployment for production environment.
Connected to deployment 6ea72014-b827-4b47-bd3e-d64cbf0bd743, waiting for logs...
HEAD https://myproject.pages.dev//v1.0.0/foo/bar.gz - Ok @ 8/19/2024, 4:18:08 PM
  (error) 14:18:08 [ERROR] TypeError: Invalid URL string.
    at new ClerkUrl (_astro-internal_middleware.mjs:2383:16)
    at createClerkUrl (_astro-internal_middleware.mjs:2389:10)
    at ClerkRequest.deriveUrlFromHeaders (_astro-internal_middleware.mjs:2420:12)
    at new ClerkRequest (_astro-internal_middleware.mjs:2395:26)
    at createClerkRequest (_astro-internal_middleware.mjs:2434:54)
    at astroMiddleware (_astro-internal_middleware.mjs:3657:26)
    at applyHandle (chunks/index_yfcIUlhQ.mjs:353:22)
    at chunks/index_yfcIUlhQ.mjs:372:18
    at _astro-internal_middleware.mjs:3902:26
    at applyHandle (chunks/index_yfcIUlhQ.mjs:353:22)

Triggered by:

❯ curl -I https://myproject.pages.dev//v1.0.0/foo/bar.gz
HTTP/2 500
date: Mon, 19 Aug 2024 14:18:08 GMT
report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v4?s=rCvB4tHH34QJuo01yp5XWuiAEAOUwSFWoMP%2BEUcWmqtkiqPrJyyEEqDMFMoHZ1eUDt8qRr3iI9SjZrW3%2B2MLH1BqP53%2F8xdkYuAJQqT00H0wVoEm6pF5CrBGfHPyJQDtDPRDLfw%3D"}],"group":"cf-nel","max_age":604800}
nel: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
server: cloudflare
cf-ray: 8b5ac56bab3c62c5-HAM
alt-svc: h3=":443"; ma=86400

Checklist

  • npm test runs as expected.
  • npm run build runs as expected.
  • (If applicable) JSDoc comments have been added or updated for any package exports
  • (If applicable) Documentation has been updated

Type of change

  • 🐛 Bug fix
  • 🌟 New feature
  • 🔨 Breaking change
  • 📖 Refactoring / dependency upgrade / documentation
  • other:

Copy link

changeset-bot bot commented Aug 19, 2024

🦋 Changeset detected

Latest commit: 6418a29

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 9 packages
Name Type
@clerk/backend Patch
@clerk/astro Patch
@clerk/express Patch
@clerk/fastify Patch
@clerk/nextjs Patch
@clerk/remix Patch
@clerk/clerk-sdk-node Patch
@clerk/tanstack-start Patch
@clerk/testing Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@mlafeldt
Copy link
Contributor Author

Please note that I'm unsure whether that's the right way to fix the issue. At least, it's been working great for my Astro site. It may be better to handle the duplicate slashes at the middleware layer.

@anagstef
Copy link
Member

@mlafeldt Hello, and thanks for your contribution!
Can we have more info on this issue?
How did you end up with this double-slashed URL (https://myproject.pages.dev//v1.0.0/foo/bar.gz)? Is this a CF Pages default?

@mlafeldt
Copy link
Contributor Author

@mlafeldt Hello, and thanks for your contribution!
Can we have more info on this issue?
How did you end up with this double-slashed URL (https://myproject.pages.dev//v1.0.0/foo/bar.gz)? Is this a CF Pages default?

The double slash is part of the curl request I'm sending to Astro (see the HEAD example above). Anyone could be sending such a request to CF. IIRC, I first used multiple slashes by accident when I noticed the crashing middleware.

@mlafeldt
Copy link
Contributor Author

mlafeldt commented Aug 19, 2024

I haven't checked CF's implementation of new URL(path, base) (the code that throws the reported exception), but I can already tell you the behavior is inconsistent between JS runtimes.

Deno:

❯ deno
Deno 1.45.5
exit using ctrl+d, ctrl+c, or close()
REPL is running with all permissions allowed.
To specify permissions, run `deno repl` with allow flags.
> new URL('https://example.com/////foo/bar')
URL {
  href: "https://example.com/////foo/bar",
  origin: "https://example.com",
  protocol: "https:",
  username: "",
  password: "",
  host: "example.com",
  hostname: "example.com",
  port: "",
  pathname: "/////foo/bar",
  hash: "",
  search: ""
}
> new URL('/////foo/bar', 'https://example.com')
Uncaught TypeError: Invalid URL: '/////foo/bar' with base 'https://example.com'
    at getSerialization (ext:deno_url/00_url.js:98:11)
    at new URL (ext:deno_url/00_url.js:405:27)
    at <anonymous>:1:22

Node

❯ node
Welcome to Node.js v22.6.0.
Type ".help" for more information.
> new URL('https://example.com/////foo/bar')
URL {
  href: 'https://example.com/////foo/bar',
  origin: 'https://example.com',
  protocol: 'https:',
  username: '',
  password: '',
  host: 'example.com',
  hostname: 'example.com',
  port: '',
  pathname: '/////foo/bar',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''
}
> new URL('/////foo/bar', 'https://example.com')
URL {
  href: 'https://foo/bar',
  origin: 'https://foo',
  protocol: 'https:',
  username: '',
  password: '',
  host: 'foo',
  hostname: 'foo',
  port: '',
  pathname: '/bar',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''

@wobsoriano
Copy link
Member

wobsoriano commented Aug 19, 2024

Trying to replicate this issue on my end. I have an app deployed to CF pages as well - https://astro-clerk-template.pages.dev

I tried to tail it and sent a request via curl:

Screenshot 2024-08-19 at 11 11 52 AM Screenshot 2024-08-19 at 11 12 10 AM

It's not throwing any errors on my end. Might be missing something.

Deployed a fresh Astro app without @clerk/astro as well and can confirm double slash works - https://astro-test-slashes.pages.dev//api/ping.json

Astro also made an attempt to handle duplicate slashes but reverted it

@mlafeldt
Copy link
Contributor Author

@wobsoriano I created a quick example that reproduces the problem: https://github.com/mlafeldt/fumbling-fusion

This works:

❯ curl -i https://fumbling-fusion.pages.dev/v1.0.0/xxx
HTTP/2 200 
date: Mon, 19 Aug 2024 20:21:23 GMT
content-type: text/plain;charset=UTF-8
content-length: 14
x-clerk-auth-message: 
x-clerk-auth-reason: dev-browser-missing
x-clerk-auth-reason: dev-browser-missing
x-clerk-auth-status: signed-out
x-clerk-auth-status: signed-out
x-clerk-auth-token: 
report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v4?s=9eKiN2HH08jMEjvEkZW7sr1F3immx8y9i%2Fgh5N%2BKCPx3PLKC2Gzz9unBXlmCgmSQRQPoyMjbvRR5koqdRb%2Fz3M0JtFWt669jkoYKoq%2Bd3cOAb%2BY8d3k3sc2yK7ozM4VUxQzXEWxrNV2xpEyk"}],"group":"cf-nel","max_age":604800}
nel: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
server: cloudflare
cf-ray: 8b5cd984b87962c7-HAM
alt-svc: h3=":443"; ma=86400

{"path":"xxx"}

This does not:

❯ curl -i https://fumbling-fusion.pages.dev//v1.0.0/xxx
HTTP/2 500 
date: Mon, 19 Aug 2024 20:21:19 GMT
content-length: 0
report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v4?s=23%2FEL%2BTGCohSPRlt0%2BP6YSDsL7vjiPsp6HGG4jYKDv3CxuLaICRrQZXr339MA9eDNgQuoLRJ5RyBjyhunRs5jMPeFH6liZscCVhHwfYUVA8YB7wSg8jxLThhledfngOAZWkvPAkZlfnizFON"}],"group":"cf-nel","max_age":604800}
nel: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
server: cloudflare
cf-ray: 8b5cd96d1a9262bf-HAM
alt-svc: h3=":443"; ma=86400

And the reported backtrace again:

GET https://fumbling-fusion.pages.dev//v1.0.0/xxx - Ok @ 8/19/2024, 10:21:19 PM
  (error) 20:21:19 [ERROR] TypeError: Invalid URL string.
    at new ClerkUrl (_astro-internal_middleware.mjs:2382:16)
    at createClerkUrl (_astro-internal_middleware.mjs:2388:10)
    at ClerkRequest.deriveUrlFromHeaders (_astro-internal_middleware.mjs:2419:12)
    at new ClerkRequest (_astro-internal_middleware.mjs:2394:26)
    at createClerkRequest (_astro-internal_middleware.mjs:2433:54)
    at astroMiddleware (_astro-internal_middleware.mjs:3656:26)
    at applyHandle (chunks/index_yfcIUlhQ.mjs:353:22)
    at chunks/index_yfcIUlhQ.mjs:372:18
    at server (_astro-internal_middleware.mjs:3884:31)
    at applyHandle (chunks/index_yfcIUlhQ.mjs:353:22)

Happy to provide more info.

@anagstef
Copy link
Member

Node

❯ node
Welcome to Node.js v22.6.0.
Type ".help" for more information.
> new URL('https://example.com/////foo/bar')
URL {
  href: 'https://example.com/////foo/bar',
  origin: 'https://example.com',
  protocol: 'https:',
  username: '',
  password: '',
  host: 'example.com',
  hostname: 'example.com',
  port: '',
  pathname: '/////foo/bar',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''
}
> new URL('/////foo/bar', 'https://example.com')
URL {
  href: 'https://foo/bar',
  origin: 'https://foo',
  protocol: 'https:',
  username: '',
  password: '',
  host: 'foo',
  hostname: 'foo',
  port: '',
  pathname: '/bar',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''

Two small comments on this example:

  • It looks like it's working with /////foo/bar but if you use the //v1.0.0/xxx it still throws. It has to do with the dots.
  • Even though it looks like it's working, if you look at the URL it creates the href is https://foo/bar which is wrong.

Relevant: nodejs/node#30776

@mlafeldt mlafeldt force-pushed the fix/sanitize-slashes branch from b5fa2dc to ac30531 Compare August 20, 2024 14:16
@mlafeldt mlafeldt changed the title Fix Astro middleware by collapsing duplicate leading slashes in URL path Fix Astro middleware by accepting duplicate leading slashes in URL paths Aug 20, 2024
@mlafeldt
Copy link
Contributor Author

@wobsoriano @anagstef I think I found a good solution to address the issue without messing with the URL at all. Please take a look at my latest commit: ac30531

@anagstef
Copy link
Member

@mlafeldt I'm curious, does this issue affect your project? Are you using URLs with double slashes in your app?

@mlafeldt
Copy link
Contributor Author

@mlafeldt I'm curious, does this issue affect your project? Are you using URLs with double slashes in your app?

Nope, routing on double slashes is not a good practice IMO, but I really prefer my app to not crash in case someone sends such a request (which I can't control). 😄 Hence the PR.

@mlafeldt
Copy link
Contributor Author

mlafeldt commented Aug 20, 2024

In general, the handling of multiple slashes in URL paths doesn't seem to be standardized. For example, github.com//////foo works fine, while x.com//foo returns a plain 400 error. This is why I like the current fix that leaves the URL as-is, allowing frameworks like Astro to do their routing as they please without Clerk interfering.

@mlafeldt
Copy link
Contributor Author

So there's a repro of the crash, a minimal non-invasive fix, and a test for that fix. What else do you need to move this forward? Happy to assist. 🙏

@wobsoriano wobsoriano changed the title Fix Astro middleware by accepting duplicate leading slashes in URL paths fix(backend): Sanitize slashes in URL paths Aug 28, 2024
@wobsoriano
Copy link
Member

wobsoriano commented Aug 28, 2024

Hi @mlafeldt, sorry for the late reply. I was trying to replicate the issue with my astro + clerk app deployed to CF pages by duplicating slashes (https://astro-clerk-template.pages.dev////api////user.json) but it seems to automatically normalize it based on URL normalization settings. Ok this results to 404

The fix is minimal and non-invasive as you said, so I just approved the workflow to run the e2e tests and we can proceed after that.

Edit: Okay did some more tests. Without the Clerk integration and doing multiple slashes leads to a 404 on CF. Vercel normalizes it.

  1. Vercel = https://test-astro-cf.vercel.app///api///user.json -> https://test-astro-cf.vercel.app/api/user.json
  2. CF pages = https://test-astro-cf-8h1.pages.dev///api///user.json -> 404

@wobsoriano
Copy link
Member

@mlafeldt updated the changeset message, hope you don't mind ✌🏼

@mlafeldt
Copy link
Contributor Author

@wobsoriano Thanks!

Re Cloudflare: With a custom domain, I had to enable "Normalize URLs to origin" for multiple slashes to be collapsed.

image

However, when using the pages.dev domain, you can't configure normalization AFAIK.

As for the number of slashes, it's actually a bit more complicated. 😃

image

But I still think the fix makes sense in any case.

Copy link
Member

@wobsoriano wobsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @mlafeldt. Based on our convo and the info provided, I'm going to approve this one 👍🏼

Appreciate the patience!

@wobsoriano wobsoriano enabled auto-merge (squash) August 28, 2024 17:20
@mlafeldt
Copy link
Contributor Author

@wobsoriano Awesome! Looking forward to the next release. 🚀

@nikosdouvlis nikosdouvlis disabled auto-merge August 29, 2024 08:21
@nikosdouvlis nikosdouvlis merged commit c9ef591 into clerk:main Aug 29, 2024
5 of 17 checks passed
@mlafeldt mlafeldt deleted the fix/sanitize-slashes branch August 29, 2024 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants