-
-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Session requests failing after upgrade #8868
Comments
If it's helpful, I'd seen exact this behavior while upgrading instances from 4.5 to 4.5.2. |
This Haproxy log is telling.
It seems to have no connection to your CouchDb. How are you deploying this? |
@dianabarsan This was done via an EKS deployment. I started with |
Okay, so while testing this locally via the docker-helper, I was able to reproduce the step |
I think the CI test is failing due to the error being displayed in the admin app. So, really the session/auth stuff might just be weirdness on EKS somehow? 🤔 |
I think the upgrade in admin app is a UI display issue. Thanks a lot for following up! |
Ok, I no longer believe this issue is the same as the moh-togo upgrade. I think this is isolated to 4.6.x and how API sends requests to haproxy. |
This is caused bumping Node to v20, and possibly some interaction between request-promise-native library and Node 20 http libraries. This is the POST
It seems that haproxy somehow receives a truncated request ( This request shows up in haproxy logs as:
|
Downgrading to Node 18 fixes the issue. It seems that Node 19 made some changes to the http library: I've tried removing all lua password obfuscation scripts from haproxy, but it has not resolved the problem. At this moment, I'm strongly suspecting that changes made to the http library are interacting in some unfortunate way with |
Node 19 adds keep-alive header to requests by default. This makes the This only seems to happen when accessing API directly or through the AWS load balancer, and never when running requests through nginx. We shouldn't really be forwarding all headers from a random request to another. |
Node 19 adds keep-alive header to requests by default. This makes the content-length header very significant and important because it enables the server to know when one request has been fully received and another one is sent through the same connection. When authenticating a user, we take all headers from the original user's request and pass them to a GET /_session request, presumably so we don't have to filter out all the ways through which the user can authenticate (cookie, authorization etc). This meant that if the original request was a POST, and it had a content-length header, this header was also forwarded. This meant that when the connection would later be reused, the server (haproxy in this case) would truncate the previous's request content-length number of characters from the next request, believing the previous request was not finished. Obviously, this new truncated request was invalid and generated a 400 status code. This only seems to happen when accessing API directly or through the AWS load balancer, and never when running requests through nginx. To avoid next request truncation, I am no longer forwarding content-length headers in the session request. #8868
Node 19 adds keep-alive header to requests by default. This makes the content-length header very significant and important because it enables the server to know when one request has been fully received and another one is sent through the same connection. When authenticating a user, we take all headers from the original user's request and pass them to a GET /_session request, presumably so we don't have to filter out all the ways through which the user can authenticate (cookie, authorization etc). This meant that if the original request was a POST, and it had a content-length header, this header was also forwarded. This meant that when the connection would later be reused, the server (haproxy in this case) would truncate the previous's request content-length number of characters from the next request, believing the previous request was not finished. Obviously, this new truncated request was invalid and generated a 400 status code. This only seems to happen when accessing API directly or through the AWS load balancer, and never when running requests through nginx. To avoid next request truncation, I am no longer forwarding content-length headers in the session request. #8868 (cherry picked from commit 831bd6e)
4.6.0-beta.2
Node 19 adds keep-alive header to requests by default. This makes the content-length header very significant and important because it enables the server to know when one request has been fully received and another one is sent through the same connection. When authenticating a user, we take all headers from the original user's request and pass them to a GET /_session request, presumably so we don't have to filter out all the ways through which the user can authenticate (cookie, authorization etc). This meant that if the original request was a POST, and it had a content-length header, this header was also forwarded. This meant that when the connection would later be reused, the server (haproxy in this case) would truncate the previous's request content-length number of characters from the next request, believing the previous request was not finished. Obviously, this new truncated request was invalid and generated a 400 status code. This only seems to happen when accessing API directly or through the AWS load balancer, and never when running requests through nginx. To avoid next request truncation, I am no longer forwarding content-length headers in the session request. #8868 (cherry picked from commit 831bd6e)
Merged to |
Describe the bug
Upgrading to
4.6.0-beta.2
somehow seems to break the session/authentication handling in the api server! You can no longer login to the webapp. Also, even direct REST calls to the api with basic auth fail due to a400
error. (Though calls to open endpoints likeapi/v2/monitoring
are successful.)To Reproduce
4.6.x
branch....4.6.0-beta.2
via the admin app.Stage
first and thenInstall
, the upgrade will complete without any errors being displayed on the admin page. If you just directlyInstall
without staging, then an error will display on the upgrade page, but the upgrade will actually complete successfully.api/v2.upgrade
) and see errorExpected behavior
Upgrading should not break the deployment. If it does, you are not going to space today....
Logs
Here are the logs directly after making 5 back-to-back calls to the
api/v2/upgrade
endpoint (and getting the above 400 error every time)api
haproxy
couch
Environment
4.6.0-beta.2
Additional context
We first noticed this issue because the Upgrade job is failing on the CI for the beta tag.
The text was updated successfully, but these errors were encountered: