Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update: CSRF Cheat Sheet #1558

Open
matt-allan opened this issue Dec 7, 2024 · 5 comments
Open

Update: CSRF Cheat Sheet #1558

matt-allan opened this issue Dec 7, 2024 · 5 comments
Assignees
Labels
ACK_OBTAINED Issue acknowledged from core team so work can be done to fix it. UPDATE_CS Issue about the update/refactoring of a existing cheat sheet.

Comments

@matt-allan
Copy link

What is missing or needs to be updated?

The docs discourage the "Naive Double-Submit Cookie Pattern" because it is vulnerable to "certain attacks", linking to the "Bypassing CSRF Protections" paper.

That paper describes a class of attacks where an attacker is able to set cookies on the target site, either through a subdomain or a MITM attack. They set the CSRF cookie, then send a request with a token matching that cookie. The server verifies the attacker controlled cookie against the token in the attacker's request, and since they match the CSRF protection is bypassed.

Instead the docs recommend the "Signed Double-Submit Cookie" pattern, where the token is signed with a secret key. But signing alone does nothing to prevent this attack! All it does is add an extra step, where the attacker must first get a valid token by making a request. They then set that token in the cookie. Without signing, they could have set any random value in the cookie and it would have worked.

From what I can tell, signing only helps when it is bound to the session. Then the attacker can't inject a valid CSRF cookie without also injecting a valid session cookie, and they can't do that while keeping the user's existing session. But the docs don't state this as a requirement. They recommend that you bind it to the user's session to "further enhance security", making it sound like signing alone is sufficient.

Given that binding to the session is what actually makes this more secure, it isn't clear to me why signing is necessary at all. It seems like a hash over the session ID and some random bytes would serve the same purpose, without requiring the user to manage secret keys.

There are a lot of other things in the docs that are pretty confusing:

This ensures that an attacker cannot create and inject their own, known, CSRF token into the victim's authenticated session

As stated above, signing does nothing to prevent an attacker who can set cookies from injecting their own CSRF token.

The system's tokens should be secured by hashing or encrypting them.

I don't think encryption would be safe here unless it was authenticated encryption.

"The system's tokens" is a bit confusing too, because there are no tokens to hash or encrypt; the token is the mac (or hash) of the session ID.

The server-side session ID (e.g. PHP or ASP.NET). This value should never leave the server or be in plain text in the CSRF Token.

Don't session IDs necessarily leave the server in the cookie on every request? I think they are trying to say it shouldn't be in the non HTTP-Only cookie?

Generate a random value (preferably cryptographically random) to ensure that consecutive calls within the same second do not produce the same hash

Why are collisions a problem here? The linked Github issue did not explain it. I don't think hash is the correct term either.

secret = readEnvironmentVariable("CSRF_SECRET")

This doc links to a doc which recommends against storing keys in environment variables, so this is surprising to see.

How should this be resolved?

All of this is assuming my understanding outlined above is correct. If I'm off base please let me know.

  1. Rewrite this to emphasize binding the token to the session, not "signing" specifically. Make binding to the session a requirement, not a recommendation.
  2. If hashing is sufficient, include that, perhaps as the default recommendation.
  3. Remove references to encryption and consistently use the correct terms, i.e. message authentication and hashing (if applicable)
  4. Reword the part about session IDs needing to never leave the server
  5. Clarify why preventing collisions is necessary
  6. Remove the example of reading the secret key from an environment variable

I would be happy to take a stab at rewriting this once someone can confirm what the recommended approach is.

Thanks!

@matt-allan matt-allan added ACK_WAITING Issue waiting acknowledgement from core team before to start the work to fix it. HELP_WANTED Issue for which help is wanted to do the job. UPDATE_CS Issue about the update/refactoring of a existing cheat sheet. labels Dec 7, 2024
@jmanico
Copy link
Member

jmanico commented Dec 7, 2024

I think these are very reasonable changes. When you can, I'd prefer several smaller PR's as opposed to one big one. :) Thank you!

@mackowski
Copy link
Collaborator

@matt-allan do you want to work on that? This are great points, this CS is old and have its history ... for good and for bad

@matt-allan
Copy link
Author

Yep, I plan on making a few PRs. Thanks

@szh szh added ACK_OBTAINED Issue acknowledged from core team so work can be done to fix it. and removed ACK_WAITING Issue waiting acknowledgement from core team before to start the work to fix it. HELP_WANTED Issue for which help is wanted to do the job. labels Dec 10, 2024
@matt-allan
Copy link
Author

Answering my own question:

It seems like a hash over the session ID and some random bytes would serve the same purpose, without requiring the user to manage secret keys.

A hash is sufficient if you are generating the token from the session ID. For example:

$token = base64_encode(hash('sha512', session_id(), true));
setcookie('XSRF-TOKEN', $token, ['secure' => true, 'httponly' => false]);

If an attacker can compute the hash, that means they know the session ID, at which point they can takeover the session without needing a CSRF attack.

There is a tiny risk from exposing the hash to JS that an attacker can brute force the session ID, but that risk exists with a HMAC too.

But if you have a session ID, you should really just use the synchronizer token pattern. It will be faster to verify (not hashing every single request), safer (not exposing the hash), and simpler.

A hash is not sufficient if you are generating the token from something else, like the jti claim in a JWT. In that case the data is not considered secret and might be exposed to an attacker through some other means. Using HMAC ensures that even if the attacker knows the value, they cannot forge a valid CSRF token.

It still is not clear to me why the docs include a nonce. I cannot think of a scenario where it is problematic that "consecutive calls within the same second...produce the same hash".

Side note, since none of these approaches require a cookie, why is this even considered a double submit cookie technique? Isn't it an entirely separate technique?

@kwwall
Copy link
Collaborator

kwwall commented Dec 12, 2024

Just adding my late $.02 here based on my lessons learned via ESAPI... I think the Horton Principle is relevant here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ACK_OBTAINED Issue acknowledged from core team so work can be done to fix it. UPDATE_CS Issue about the update/refactoring of a existing cheat sheet.
Projects
None yet
Development

No branches or pull requests

5 participants