Skip to content
This repository has been archived by the owner on Feb 24, 2024. It is now read-only.

fixed server and worker to be gracefully shut down #2214

Merged
merged 8 commits into from
Mar 11, 2022

Conversation

sio4
Copy link
Member

@sio4 sio4 commented Mar 4, 2022

Fixed server and worker to be gracefully shutting down.

Since the current version of buffalo does not support graceful shutdown properly, this could be a serious security (integrity) issue if the application is interrupted while it has uncompleted request handling or background job running.

It seems like the graceful shutdown mechanism was broken while implementing multiple server support. (#1039)

This patch is tested with following scenarios:

  • properly stop listening
  • properly completes ongoing requests
  • properly completes ongoing background job: a 5 seconds long background job running with a simple worker

fixes: #2198

@sio4 sio4 added bug Something isn't working security security issue labels Mar 4, 2022
@sio4 sio4 self-assigned this Mar 4, 2022
@sio4
Copy link
Member Author

sio4 commented Mar 5, 2022

Findings and fixes:

  • 3fbd918 - there were two waiting points for context.Done() and it made an immediate exit on interrupt. To fix this issue, sync.WaitGroup is used for server and worker.
  • 002824e - server.Shutdown() was called with already canceled context. The argument must not be a canceled context to make the shutdown function works properly.
  • 8d8f521 - reorder shutdown process from worker then server to server then worker. It would be better if we block incoming requests by stopping the listener, while we still keep the worker for long-running tasks.
    • We need a shutdown timeout option here.
    • Fixed the location of event emitting points (especially for the meaning of "... was called")
    • Cleaned up log messages

I feel like we need to check some more about process control flow, request flow, and some more thing from those points.

@paganotoni
Copy link
Member

This is great @sio4. Let me know when this is ready for review.

@sio4
Copy link
Member Author

sio4 commented Mar 6, 2022

  • 0a640e1 - replaced @markbates 's http://github.com/markbates/sigtx with the standard signal.NotifyContext() which was introduced since 1.16, the oldest supported version. :-)
  • 61fb3fe - just added shutdown timeout option, TimeoutSecondShutdown which is used as a context timeout for Shutdown call. So all pending requests should be handled within this time duration (which is 60 seconds default), or developers should configure a reasonable value for their application. I guess with will need some more timeout values in the future.

@sio4 sio4 marked this pull request as ready for review March 6, 2022 11:25
@sio4 sio4 requested a review from a team as a code owner March 6, 2022 11:25
@sio4
Copy link
Member Author

sio4 commented Mar 6, 2022

This is great @sio4. Let me know when this is ready for review.

Hi! Thanks @paganotoni for your comment! The PR is now ready for review with essential fixes. Please take a look at the PR and let me know if something needs to be fixed or needs to be clear.

@sio4 sio4 changed the title fixed server and worker to be gracefully shutting down fixed server and worker to be gracefully shut down Mar 6, 2022
Copy link
Member

@paganotoni paganotoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. @sio4 is there any way to add tests for this changes?. It would be good to specify how this should behave through the tests to protect the functionality as the package evolves.

@sio4
Copy link
Member Author

sio4 commented Mar 7, 2022

Yeah, I am thinking the same. There was a broken thing from the start (calling Shutdown() with canceled Context) and also another issue (duplicated Done checking) made after we started to support multiple servers. If we had test cases, it could be caught earlier.

However, I am not sure which kind of test case would be better. For testing this correctly, a semi-complete application that has a long-running request handler and a long-running job handler is required. Also, it could be a time-consuming task since it will cover timing behavior. I am thinking if it is good to have a unit test in this package or if it is the time we need to consider an integrated test with separated testing scenarios.

I am considering the first approach for now and will start to write one soon. However, please let me know if you have a good idea of any kind.

@paganotoni paganotoni added this to the v0.18.5 milestone Mar 8, 2022
@paganotoni
Copy link
Member

@sio4 also, one thought I had was about the event kinds (variables) being moved moved within the worker package. That would be a breaking change, right? Could we keep the event variables in the same place?

@sio4
Copy link
Member Author

sio4 commented Mar 9, 2022

Yeah, agree. That could be a breaking change since they are public. Actually, I am not sure why they are public and at the same time, it seems like some of my changes related to them are not in a good direction. I will check them and will fix them.

By the way, I wrote test cases for this issue but since this is a highly timing-related issue, making a perfect test case is not easy. I wrote unit tests for the server.go file and also created a separate test app for this issue. I will test and investigate some more and will update.

@sio4 sio4 force-pushed the graceful-shutdown branch from ee0dba9 to 910b65e Compare March 10, 2022 17:57
@sio4 sio4 force-pushed the graceful-shutdown branch from 910b65e to 8a401ea Compare March 10, 2022 17:59
@sio4
Copy link
Member Author

sio4 commented Mar 10, 2022

Please take a look at the change and test case.

Copy link
Member

@paganotoni paganotoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sio4. This is looking good.

@paganotoni
Copy link
Member

we just need to solve the conflicts and we should be good to go.

@sio4 sio4 merged commit 6ab7f0f into gobuffalo:development Mar 11, 2022
@sio4
Copy link
Member Author

sio4 commented Mar 11, 2022

Thanks!

@sio4 sio4 linked an issue Mar 11, 2022 that may be closed by this pull request
@sio4 sio4 deleted the graceful-shutdown branch May 14, 2022 16:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working security security issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Buffalo app is never gracefully shut down
2 participants