-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Unable to parse SAML2 response: Unsolicited response #7056
Comments
possibly we're hitting |
Is there any consistent way to reproduce this or just logging in a bunch of times? |
If it helps, there is a bunch of these daily in the Modular Synapse sentry. |
Summary:I think this might only be an issue in worker mode due to requests coming back to a different worker (and using in memory storage which has no knowledge of the original request). I see a few ways forward:
More detailsI think #7530 might actually be a duplicate of this!? My thought is the following happens:
Note that the SAML response is valid, just that the worker knows nothing about it. I suspect the solution is to store this information in the database, similar to what we did for #6877.
I was curious how OIDC handled this, and that doesn't seem to persist anything in memory, so taking another look at why we have this
Note that we could actually pass the UI auth session ID in the I'd be curious if other people have ideas on how to fix this! |
I've never really understood its purpose. Possibly a defence in depth against CSRF attacks? We can probably remove it if that solves any problems. However, I'm not sure that the hypothesis fits the symptoms. It's reported against mozilla's deployment of synapse, which only has one of each type of worker (except synchrotrons). If this were a problem with requests going to different workers, I would expect it to either always work or never work. I can't see how we'd end up with an intermittent bug. |
A similar symptom would happen during a restart of services, I'm unsure how often that would happen on Modular instances. |
It might also be worth fixing the known situation and seeing if it still happens, but ideally we'd want to ensure the solution works in all cases... |
possibly. I'm hoping we'll be able to get some logs out of the modular instance to help understand what is going on. |
ok well, I got logs for two instances of this this morning on the mozilla instance. The first one is a complete mystery tbh. A client, from an IP address we've never seen before, suddenly pops up with a SAML session ID we've never heard of (or at least, I couldn't find in some brief grepping of the logs). I guess it's just an old session, and the user used an old link in their browser history or something. The main source of regret here is that the error message isn't better ("oops something went wrong" isn't terribly informative.) The second one is much clearer: the user took 6 minutes to validate their email address and come back. We expire the SAML session dict after only 5 minutes. Particularly given auth0's email validation links are valid for 15 minutes, this seems... silly. |
I wonder if the expiry time should be configurable? |
it is. But I think the default is probably too short. |
Looks like this is already configurable, the default is 5 minutes: synapse/synapse/config/saml2_config.py Lines 283 to 287 in b2b8699
Edit: Doh, you already set it is configurable. 😢 |
I put up #7664 to increase the timeout. Might not be an ideal solution, but should fix a concrete case we've seen. |
This is happening much less after the changes in #7664. Not sure if these are people taking greater than the 15 minutes to finish validation or not. I'm unclear what the next steps might be here: try to improve the error message maybe? |
are we still getting reports of this? I'd be inclined to close it if not. Otherwise yes, probably need to remember where the "oops something went wrong" error message is coming from and try to make it give more clues as to what went wrong. |
Yes, it looks like we're still seeing this (around 5-10x/day on Modular). |
gosh. it was only a couple a day back when I investigated a few weeks ago (mind you, there was some brokenness in logging at the time). ok then I would like to suggest a two-pronged approach:
|
Sentry seems to be bucketing some separately, in this case I looked at two separate issues that each had roughly between 2 and 5 occurrences per day, maybe that explains why you were seeing less of them? |
I spent some time with these logs and with Sentry and couldn't really figure out if there was a correlation between old requests or something else happening. I think improving the error handling might be useful, I'm guessing that the concern with that is that we're missing a "real" bug? |
Now that we have better logging I looked back over the last 7 days of this error occurring on the Mozilla instance:
Note that we remove the outstanding request once a response for it is received -- this seems correct, but I'm unsure if SAML allows for a single session to be completed multiple times (assuming that they are all within the proper timeout and such). I'm not sure what, if anything, should be done to handle these cases? Maybe we can improve the error page to say something like "Your SAML session might have timed out or already been completed. Please try again." Or something to that effect? |
do we have any idea why people would re-use the SAML session ID? Improving the error text seems sensible either way. |
My guess is that it is due to reloading a page? Or if e-mail verification is in the workflow it could be clicking on a link twice? I should note that the "re-used" SAML session IDs were within the 15 minute timeout period (and all from 2 users). |
Since i couldn't remember the behavior the user saw here they currently just get an internal server error sent back to them (since it is part of the redirect flow the client isn't involved). Steps to reproduce this sanely:
You end up at a white that says "Internal server error". When adding the OpenID code we added a template for handling some errors, I think we should re-use that here. |
Ah this isn't entirely accurate -- we do have a |
Note that this is the only way we have to process and display errors from Auth0, which is why it works like that :/ |
Sometimes, when authenticating with passwordless login on Mozilla's SSO, the user's browser gets told to
POST
to/authn_response
with a SAML AuthN response (as expected), but that call seems to fail with the error "Unable to parse SAML2 response: Unsolicited response: id-XXXXXXXXXXXXX".I'm currently not sure why this happens.
The text was updated successfully, but these errors were encountered: