-
-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: abort and headers already sent errors for the rest api #5722
Conversation
To reproduce the error run following command in ./lodestar dev --reset --startValidators=0..7 --rest.port=3400 After running this command wait for few seconds and press Ctrl+C. |
Performance Report✔️ no performance regression detected Full benchmark results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The errors mentioned in #5706 are solely on the beacon node side and are not addressed by this PR, although the unclean disconnect might be a fix in the validator client but it happens at any point when shutting down and not just at startup.
I am not sure if it is a good idea to swallow aborted errors during initialization of the validator client (or beacon node) as in that case those kinda make sense. Specifically in the case of "waiting for genesis..." it feels alright to me to print out a abort error if user aborts (CTRL+C).
@nflaig Please suggest the steps which you used to produce the issue mentioned in #5706. The only way I was able to produce was the steps mentioned above and I fixed it accordingly. Related to your other comment. When user presses Ctrl+C and we show a message
Then it does not make useful to show abort error. |
Can be reproduce by running a beacon node ./lodestar beacon \
--dataDir /home/devops/goerli/data/beacon \
--rest \
--rest.address "0.0.0.0" \
--rest.namespace '*' \
--metrics \
--execution.urls http://localhost:8551 \
--jwt-secret /home/devops/goerli/data/jwtsecret \
--logLevel info \
--network goerli and a validator client ./lodestar validator \
--dataDir /home/devops/goerli/data/validator \
--beaconNodes http://localhost:9596 \
--metrics \
--logLevel info \
--network goerli \ and then just shutting down (CTRL+C) the validator client which produces this log on the beacon node Jun-30 17:00:44.746[rest] error: Req req-1 eventstream error aborted
Error: aborted
at connResetException (node:internal/errors:720:14)
at abortIncoming (node:_http_server:771:17)
at socketOnClose (node:_http_server:765:3)
at Socket.emit (node:events:523:35)
at TCP.<anonymous> (node:net:334:12) the other issue |
@nflaig Thanks for sharing the snippet. Will check and fix the issue. |
370a791
to
ba6b116
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error: Cannot set headers after they are sent to the client
not yet addressed, can't close the issue with this in current state
@@ -49,7 +50,12 @@ export function getRoutes(config: ChainForkConfig, api: ServerApi<Api>): ServerR | |||
// The client may disconnect and we need to clean the subscriptions. | |||
req.raw.once("close", () => resolve()); | |||
req.raw.once("end", () => resolve()); | |||
req.raw.once("error", (err) => reject(err)); | |||
req.raw.once("error", (err) => { | |||
if ("code" in err && (err as unknown as {code: string}).code === "ECONNRESET") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not really sure if this is ideal, we might swallow cases where validator client crashed or disconnected for other reasons than shutting it down.
Was really hoping that this could be somehow addressed on the validator client side, it should not cause ECONNRESET
error on the beacon node but disconnect gracefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not swallowing something, trying to log more relevant information. If the client crashed internally it's still aborted from the context of AbortController on client. So we can't differentiate on the server side of both scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at the code specifically but shouldn't the validator client be able to more gracefully close the connection is it receives a abort signal?
@@ -36,6 +37,9 @@ export function getRoutes(config: ChainForkConfig, api: ServerApi<Api>): ServerR | |||
await new Promise<void>((resolve, reject) => { | |||
void api.eventstream(req.query.topics, controller.signal, (event) => { | |||
try { | |||
// If the request is already aborted, we don't need to send any more events. | |||
if (req.raw.destroyed) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check will try to avoid headers already sent scenario. Though that edge case is very tricky to reproduce. I tried a lot of ticks but was not able to produce it locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this handles all cases where this can happen, I was think maybe to check res.raw.headersSent
before setting hearders but would also have to look into this more if that the correct condition to check for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
headers already sent error occurred, when you try to write to a response, which related request is completed. That completion can be because request was aborted, or server already finished writing response and closed the stream with res.end()
.
I can't see the later case happens in our code, as we write the response only from one place. So the only case left in my mind is the first one, so this condition will cover it.
If you can find a way to reproduce headers already sent
scenario manually, please share then we can test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the stack trace in the issue, the error is not caused by us setting headers but something fastify does.
Looks like here res.writeHead, might be an issue on fastify or related to us not closing correcty.
But res.raw.write
looks like a good candidate that could be causing this. If we are sure this does not cause unwanted side effects the change looks good.
If you can find a way to reproduce headers already sent scenario manually
I just saw this once or twice on my end over a period of several months and it was reported by a user once. Seems to be highly timing specific, I tried to reproduce it consistently but no luck so far
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any side effects for this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nazarhussain I was going through logs from sim tests and it looks like the error still happens there.
[node-1-cl-lodestar] [8460]: Eph 6/0 2.220[node-1-cl-lodestar] �[31merror�[39m: uncaughtException: Cannot set headers after they are sent to the client
Error: Cannot set headers after they are sent to the client
at new NodeError (node:internal/errors:405:5)
at ServerResponse.setHeader (node:_http_outgoing:648:11)
at ServerResponse.writeHead (node:_http_server:381:21)
at onSendEnd (/home/runner/actions-runner/_work/lodestar/lodestar/node_modules/fastify/lib/reply.js:597:7)
at onSendHook (/home/runner/actions-runner/_work/lodestar/lodestar/node_modules/fastify/lib/reply.js:530:5)
at fallbackErrorHandler (/home/runner/actions-runner/_work/lodestar/lodestar/node_modules/fastify/lib/error-handler.js:127:3)
at handleError (/home/runner/actions-runner/_work/lodestar/lodestar/node_modules/fastify/lib/error-handler.js:61:5)
at onErrorHook (/home/runner/actions-runner/_work/lodestar/lodestar/node_modules/fastify/lib/reply.js:743:5)
at Reply.send (/home/runner/actions-runner/_work/lodestar/lodestar/node_modules/fastify/lib/reply.js:133:5)
at defaultErrorHandler (/home/runner/actions-runner/_work/lodestar/lodestar/node_modules/fastify/lib/error-handler.js:92:9) Cannot set headers after they are sent to the client
Error: Cannot set headers after they are sent to the client
See node-1-cl-lodestar.log from this sim tests run. This error seems to happen quite frequently in sim tests.
I would suggest we revert this change here as it does not solve the issue. It might also be a bug in fastify and not directly solvable on our end.
Edit: I created an issue to keep track of this
@@ -169,10 +171,15 @@ export async function validatorHandler(args: IValidatorCliArgs & GlobalArgs): Pr | |||
distributed: args.distributed, | |||
}, | |||
metrics | |||
); | |||
).catch((err) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in #5511 (comment) by @tuyennhv we shouldn't use await and catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I earlier used the try/catch
pattern as suggested by @tuyennhv. You asked it does not feel good wrap it. So alternative is the following approach. As we don't prefer using let
in our code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, all of the proposed solutions look sub-optimal
@@ -36,6 +37,9 @@ export function getRoutes(config: ChainForkConfig, api: ServerApi<Api>): ServerR | |||
await new Promise<void>((resolve, reject) => { | |||
void api.eventstream(req.query.topics, controller.signal, (event) => { | |||
try { | |||
// If the request is already aborted, we don't need to send any more events. | |||
if (req.raw.destroyed) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the stack trace in the issue, the error is not caused by us setting headers but something fastify does.
Looks like here res.writeHead, might be an issue on fastify or related to us not closing correcty.
But res.raw.write
looks like a good candidate that could be causing this. If we are sure this does not cause unwanted side effects the change looks good.
If you can find a way to reproduce headers already sent scenario manually
I just saw this once or twice on my end over a period of several months and it was reported by a user once. Seems to be highly timing specific, I tried to reproduce it consistently but no luck so far
@@ -49,7 +53,12 @@ export function getRoutes(config: ChainForkConfig, api: ServerApi<Api>): ServerR | |||
// The client may disconnect and we need to clean the subscriptions. | |||
req.raw.once("close", () => resolve()); | |||
req.raw.once("end", () => resolve()); | |||
req.raw.once("error", (err) => reject(err)); | |||
req.raw.once("error", (err) => { | |||
if ((err as unknown as {code: string}).code === "ECONNRESET") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed our code and did some research on how EventSource
behaves. It looks like we are closing this correctly on the client by calling .close()
.
It is invoking both the close
and error
event on the server, this is a bit strange to me but it looks like that's how EventSource
works, it doesn't gracefully close for some reason.
Another thing is that usually this would be handled by fastify which also just ignore errors with code ECONNRESET
, see https://github.com/fastify/fastify/blob/cc347d7c0b4266097b61b126158b797878668353/fastify.js#L667C23-L667C33
But since the event API is not handled by fastify we need to do it ourselves and applying same pattern sounds good to me.
We might even consider just resolving here instead of rejecting with abort error but I guess if this error is not logged due to it being ignored upstream this is also fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should abort here on ECONNRESET
and let the higher level stack handle that error, as it's currently doing. Once we have more resources to knew if the reset was actually abort or gracefully closed by client then we can decide to resolve the later scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sounds good to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is invoking both the close and error event on the server, this is a bit strange to me but it looks like that's how EventSource works, it doesn't gracefully close for some reason.
This might be related to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also simliar to issue reported here
@@ -169,10 +171,15 @@ export async function validatorHandler(args: IValidatorCliArgs & GlobalArgs): Pr | |||
distributed: args.distributed, | |||
}, | |||
metrics | |||
); | |||
).catch((err) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, all of the proposed solutions look sub-optimal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 👍
Let's also update the PR description to reflect the changes implemented
🎉 This PR is included in v1.10.0 🎉 |
Motivation
Gracefully shutdown the connection.
Description
When validator is loading up and waiting for genesis that can take a lot of time and during this time if the user press Ctrl+C then the following error is shown.
So with this change we carefully wait for the validator to load and if user abort we skip that error.
Closes #5706
Steps to test or reproduce