-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor HTTP Server and add ServerGroup to handle graceful shutdown of multiple servers #1047
Conversation
// Start starts the HTTP and HTTPS server if applicable. | ||
// It will block until the context is cancelled. | ||
// If any errors occur, only the first error will be returned. | ||
func (s *server) Start(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method server.Start
has 5 return statements (exceeds 4 allowed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nonsense, the function has 1 return statement, the anonymous functions within have 4, false positive
// startServer creates and starts a new server with the given listener. | ||
// When the given context is cancelled the server will be shutdown. | ||
// If any errors occur, only the first error will be returned. | ||
func (s *server) startServer(ctx context.Context, listener net.Listener) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method server.startServer
has 5 return statements (exceeds 4 allowed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nonsense, the function has 1 return statement, the anonymous functions within have 4, false positive
// Server represents an HTTP or HTTPS server. | ||
type Server interface { | ||
// Start blocks and runs the server. | ||
Start(ctx context.Context) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to expose Shutdown
(or Close
) on this interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea I'm running with on this one is that the context being cancelled causes the shutdown, so to use it you do
ctx, cancel := context.WithCancel(context.Background())
go srv.Start(ctx)
// wait for signal
cancel()
If you think having a Close would help then I can look into adding that as well, would probably mean just moving the context inside the server itself implementation wise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gotcha -- so this ctx
in the Server Start
is more for controlling the concurrency with the underlying uses of the errgroup
. (rather than in something like http.Server
which doesn't take it).
I think I had Serve
+ Shutdown
/Close
on my mind when I saw an interface named Server
(and mistook Start
for Serve
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that's it, this is more of a Start with a stop channel style thing, except the stop channel is the context which can be cancelled
switch scheme { | ||
case "", "http": | ||
return "tcp" | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like a common misconfiguration would be https
somehow here -- Is that handled somewhere else since that wouldn't align to the net.Listener
legal networks?
The network must be "tcp", "tcp4", "tcp6", "unix" or "unixpacket".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is coming straight over from our existing codebase - carry on 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You raise an interesting point, we should probably investigate, but yeah, this is lift and shift so not sure 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use this on the https listener, that is forced to tcp as that's the only one that makes sense, so we don't need to handle this case separately I don't think
}) | ||
} | ||
|
||
return g.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what happens if only 1 set of servers in the errgroup fails? This waits for all Goroutines to finish right?
I would think if either main or prometheus (or there HTTP vs HTTPS variant if both are stood up) fail, we want the whole server to fail so we aren't in a partially failed state? (then let whatever platform hosting restart mechanisms are in place to start the whole suite up again with everything?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As soon as the first non-nil error is returned, the groupCtx
is cancelled which causes the shut down of all of the servers we started. Checking the godoc for errgroup.WithContext
as this is what I think it is supposed to do at least:
WithContext returns a new Group and an associated Context derived from ctx.
The derived Context is canceled the first time a function passed to Go returns a non-nil error or the first time Wait returns, whichever occurs first.
|
||
// NewServerGroup creates a new Server to start and gracefully stop a collection | ||
// of Servers. | ||
func NewServerGroup(servers ...Server) Server { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this concept 👍
On HTTP/2, gRPC - we've had a few questions on that front. Would this design open up us potentially supporting a /oauth2/auth
handler variant for gRPC based subrequest architecture?
(I'll punt on the complexity of HTTP/2 to an upstream for now 😅 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think so, as long as we can create a gRPC server that we can start/stop using the Server interface defined in server.go, I wanted to make sure the start/stop was expandable in case we need further servers in the future
bedde6c
to
1815523
Compare
1815523
to
e78e535
Compare
// setupTLSListener sets the server TLS listener if the HTTPS server is enabled. | ||
// The HTTPS server can be disabled by setting the SecureBindAddress to "-" or by | ||
// leaving it empty. | ||
func (s *server) setupTLSListener(opts Opts) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method server.setupTLSListener
has 5 return statements (exceeds 4 allowed).
pkg/apis/options/alpha_options.go
Outdated
// To use the secure server you must configure a TLS certificate and key. | ||
Server Server `json:"server,omitempty"` | ||
|
||
// Server is used to configure the HTTP(S) server for metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Server is used to configure the HTTP(S) server for metrics. | |
// MetricsServer is used to configure the HTTP(S) server for metrics. |
func legacyServerFlagset() *pflag.FlagSet { | ||
flagSet := pflag.NewFlagSet("server", pflag.ExitOnError) | ||
|
||
flagSet.String("metrics-address", "", "the address /metrics will be served on (e.g. \":9100\")") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add support for HTTPS metrics with the legacy flags?
I think both this PR & the initial metrics PR will go out together in the same release for the first time (v7.1.0 I imagine). So there's no backwards compatibility we need to preserve with the existing metrics flags.
TLS support seems handy if it isn't too burdensome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess distinct metrics tls file flags for solely the metrics endpoint won't help our config flag explosion problem 😨
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally happy to add them, should be relatively easy to add, what shall we call them 🤔
metrics-address, metrics-secure-address, metrics-tls-cert-file, metrics-tls-key-file?
It's not too bad in terms of code management now these are in the legacy sets (as they are grouped nicely)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for adding these!
} | ||
|
||
var _ = BeforeSuite(func() { | ||
By("Generating a self-signed cert for TLS tests", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This very thorough TLS setup is very nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we will need this in other places in the code as well, I think we should probably make a test utility package that we can store helpers to do this kind of set up, that can be a follow up though as we need it
if s.listener != nil { | ||
g.Go(func() error { | ||
if err := s.startServer(groupCtx, s.listener); err != nil { | ||
return fmt.Errorf("error starting insecure server: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this reminds me -- I'm tempted to remove support for our force-https
HTTP->HTTPS redirect middleware for security reasons (since we don't have any HSTS headers set from our end to give that process the appropriate protection to prevent future MITM attacks).
But from a security profile in our current state, where do we end up if a user sets both these servers, but doesn't set the force-https
flag? Will the whole server be served on HTTP -- and what negative consequences does that have?
Maybe I need to understand the userbase more -- but I don't see OAuth2 Proxy being the first proxy facing the internet. In my mind that should be something like nginx, which should be dealing with the HTTP->HTTPS redirects, HSTS settings (or the upstream apps do this...?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But from a security profile in our current state, where do we end up if a user sets both these servers, but doesn't set the force-https flag? Will the whole server be served on HTTP -- and what negative consequences does that have?
To do this, they would have to use the alpha config for the moment. I think it's still useful to have the redirect even though we don't necessarily have the most secure HTTPS settings in that case, it's an easy way for users to enforce HTTPS without having to worry about how that's configured in another proxy.
Will the whole server be served on HTTP -- and what negative consequences does that have?
I don't think there's any more danger than someone just running it on HTTP anyway, I think it's up to the users to make sure it's secure
// tcpKeepAliveListener sets TCP keep-alive timeouts on accepted | ||
// connections. It's used by so that dead TCP connections (e.g. closing laptop | ||
// mid-download) eventually go away. | ||
type tcpKeepAliveListener struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any historical context as to why we only wrap the TLS listener in this tcpKeepAliveListener
implementation of net.Listener
and not the standard listener?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I had a dig and there's no conversation about this when it was implemented. I have no idea why it is only used for one, perhaps keep alive is more common for TLS to prevent handshake renegotiation or something?
pkg/http/server.go
Outdated
} | ||
err = tc.SetKeepAlive(true) | ||
if err != nil { | ||
logger.Printf("Error setting Keep-Alive: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errorf? What do you think for this type of error where we don't stop processing flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will fix.
Erm, I think this error is extremely rare, so we probably never really hit this case, I think this ok for now.
I have just copied this from the original location though so could be nonsense.
e78e535
to
18fe8b4
Compare
Updated the broken bits and added some flags for the metrics server for TLS. Also realised I don't have tests for the legacy servers conversion, so will need to add those |
1243547
to
7f6b698
Compare
expectHTTPListener: false, | ||
expectTLSListener: true, | ||
}), | ||
Entry("with an invalid bind address port", &newServerTableInput{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These test cases are very comprehensive - Nice!
The only addition I could think of adding is the case where a bind port is already taken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR! Really good stuff - I feel like I get a little better at Go every time I review your code 😄
@NickMeves I think this is ready to go now. I've rebased on latest master (no conflicts) and added tests for the legacy server conversion, that should be the only new part since your last review, PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT! I'm excited for these changes + new metrics support in v7.1. I will definitely be rolling it out everywhere throughout Greenhouse and plugging in Prometheus ASAP 👍
Description
This moves the http server implementation to its own package under
pkg/http
(not sure about the name).It also moves the configuration to a structured config type which means we can reuse the config for the main app server and the metrics server (which will give the metrics server the ability to serve over TLS).
One change from before is that the listeners are established before
Start
is called, so errors are returned sooner in case of listener conflicts.I've also introduced a ServerGroup concept to start and run/manage the lifecycle of multiple servers at once, so we can make sure everything starts and stops gracefully.
Motivation and Context
Wanted to simplify the usage of http server implementations and make sure we use a common implementation for both metrics and the main application.
How Has This Been Tested?
Local testing environment and unit tests
Checklist: