Refactor HTTP Server and add ServerGroup to handle graceful shutdown of multiple servers #1047

JoelSpeed · 2021-02-15T15:24:01Z

Description

This moves the http server implementation to its own package under pkg/http (not sure about the name).
It also moves the configuration to a structured config type which means we can reuse the config for the main app server and the metrics server (which will give the metrics server the ability to serve over TLS).

One change from before is that the listeners are established before Start is called, so errors are returned sooner in case of listener conflicts.

I've also introduced a ServerGroup concept to start and run/manage the lifecycle of multiple servers at once, so we can make sure everything starts and stops gracefully.

Motivation and Context

Wanted to simplify the usage of http server implementations and make sure we use a common implementation for both metrics and the main application.

How Has This Been Tested?

Local testing environment and unit tests

Checklist:

My change requires a change to the documentation or CHANGELOG.
I have updated the documentation/CHANGELOG accordingly.
I have created a feature (non-master) branch for my PR.
I have added appropriate comments throughout my code changes.
I have added appropriate tests for my code changes.

codeclimate · 2021-02-15T15:24:28Z

pkg/http/server.go

+// Start starts the HTTP and HTTPS server if applicable.
+// It will block until the context is cancelled.
+// If any errors occur, only the first error will be returned.
+func (s *server) Start(ctx context.Context) error {


Method server.Start has 5 return statements (exceeds 4 allowed).

This is nonsense, the function has 1 return statement, the anonymous functions within have 4, false positive

codeclimate · 2021-02-15T15:24:29Z

pkg/http/server.go

+// startServer creates and starts a new server with the given listener.
+// When the given context is cancelled the server will be shutdown.
+// If any errors occur, only the first error will be returned.
+func (s *server) startServer(ctx context.Context, listener net.Listener) error {


Method server.startServer has 5 return statements (exceeds 4 allowed).

This is nonsense, the function has 1 return statement, the anonymous functions within have 4, false positive

NickMeves · 2021-02-15T16:46:28Z

pkg/http/server.go

+// Server represents an HTTP or HTTPS server.
+type Server interface {
+	// Start blocks and runs the server.
+	Start(ctx context.Context) error


Do we want to expose Shutdown (or Close) on this interface?

The idea I'm running with on this one is that the context being cancelled causes the shutdown, so to use it you do

ctx, cancel := context.WithCancel(context.Background()) go srv.Start(ctx) // wait for signal cancel()

If you think having a Close would help then I can look into adding that as well, would probably mean just moving the context inside the server itself implementation wise.

I gotcha -- so this ctx in the Server Start is more for controlling the concurrency with the underlying uses of the errgroup. (rather than in something like http.Server which doesn't take it).

I think I had Serve + Shutdown/Close on my mind when I saw an interface named Server (and mistook Start for Serve).

Yep that's it, this is more of a Start with a stop channel style thing, except the stop channel is the context which can be cancelled

NickMeves · 2021-02-15T16:49:28Z

pkg/http/server.go

+	switch scheme {
+	case "", "http":
+		return "tcp"
+	default:


I feel like a common misconfiguration would be https somehow here -- Is that handled somewhere else since that wouldn't align to the net.Listener legal networks?

The network must be "tcp", "tcp4", "tcp6", "unix" or "unixpacket".

Oh this is coming straight over from our existing codebase - carry on 😄

You raise an interesting point, we should probably investigate, but yeah, this is lift and shift so not sure 🤔

We don't use this on the https listener, that is forced to tcp as that's the only one that makes sense, so we don't need to handle this case separately I don't think

NickMeves · 2021-02-15T17:00:17Z

pkg/http/server_group.go

+		})
+	}
+
+	return g.Wait()


So what happens if only 1 set of servers in the errgroup fails? This waits for all Goroutines to finish right?

I would think if either main or prometheus (or there HTTP vs HTTPS variant if both are stood up) fail, we want the whole server to fail so we aren't in a partially failed state? (then let whatever platform hosting restart mechanisms are in place to start the whole suite up again with everything?)

As soon as the first non-nil error is returned, the groupCtx is cancelled which causes the shut down of all of the servers we started. Checking the godoc for errgroup.WithContext as this is what I think it is supposed to do at least:

WithContext returns a new Group and an associated Context derived from ctx. The derived Context is canceled the first time a function passed to Go returns a non-nil error or the first time Wait returns, whichever occurs first.

NickMeves · 2021-02-15T17:03:12Z

pkg/http/server_group.go

+
+// NewServerGroup creates a new Server to start and gracefully stop a collection
+// of Servers.
+func NewServerGroup(servers ...Server) Server {


I like this concept 👍

On HTTP/2, gRPC - we've had a few questions on that front. Would this design open up us potentially supporting a /oauth2/auth handler variant for gRPC based subrequest architecture?

(I'll punt on the complexity of HTTP/2 to an upstream for now 😅 )

Yeah I think so, as long as we can create a gRPC server that we can start/stop using the Server interface defined in server.go, I wanted to make sure the start/stop was expandable in case we need further servers in the future

codeclimate · 2021-02-16T23:01:01Z

pkg/http/server.go

+// setupTLSListener sets the server TLS listener if the HTTPS server is enabled.
+// The HTTPS server can be disabled by setting the SecureBindAddress to "-" or by
+// leaving it empty.
+func (s *server) setupTLSListener(opts Opts) error {


Method server.setupTLSListener has 5 return statements (exceeds 4 allowed).

NickMeves · 2021-02-18T04:46:45Z

pkg/apis/options/alpha_options.go

+	// To use the secure server you must configure a TLS certificate and key.
+	Server Server `json:"server,omitempty"`
+
+	// Server is used to configure the HTTP(S) server for metrics.


Suggested change

// Server is used to configure the HTTP(S) server for metrics.

// MetricsServer is used to configure the HTTP(S) server for metrics.

NickMeves · 2021-02-18T04:50:15Z

pkg/apis/options/legacy_options.go

+func legacyServerFlagset() *pflag.FlagSet {
+	flagSet := pflag.NewFlagSet("server", pflag.ExitOnError)
+
+	flagSet.String("metrics-address", "", "the address /metrics will be served on (e.g. \":9100\")")


Do we want to add support for HTTPS metrics with the legacy flags?

I think both this PR & the initial metrics PR will go out together in the same release for the first time (v7.1.0 I imagine). So there's no backwards compatibility we need to preserve with the existing metrics flags.

TLS support seems handy if it isn't too burdensome.

I guess distinct metrics tls file flags for solely the metrics endpoint won't help our config flag explosion problem 😨

Totally happy to add them, should be relatively easy to add, what shall we call them 🤔
metrics-address, metrics-secure-address, metrics-tls-cert-file, metrics-tls-key-file?

It's not too bad in terms of code management now these are in the legacy sets (as they are grouped nicely)

Nice! Thanks for adding these!

NickMeves · 2021-02-18T04:58:06Z

pkg/http/http_suite_test.go

+}
+
+var _ = BeforeSuite(func() {
+	By("Generating a self-signed cert for TLS tests", func() {


This very thorough TLS setup is very nice!

I suspect we will need this in other places in the code as well, I think we should probably make a test utility package that we can store helpers to do this kind of set up, that can be a follow up though as we need it

NickMeves · 2021-02-18T05:42:34Z

pkg/http/server.go

+	if s.listener != nil {
+		g.Go(func() error {
+			if err := s.startServer(groupCtx, s.listener); err != nil {
+				return fmt.Errorf("error starting insecure server: %v", err)


So this reminds me -- I'm tempted to remove support for our force-https HTTP->HTTPS redirect middleware for security reasons (since we don't have any HSTS headers set from our end to give that process the appropriate protection to prevent future MITM attacks).

But from a security profile in our current state, where do we end up if a user sets both these servers, but doesn't set the force-https flag? Will the whole server be served on HTTP -- and what negative consequences does that have?

Maybe I need to understand the userbase more -- but I don't see OAuth2 Proxy being the first proxy facing the internet. In my mind that should be something like nginx, which should be dealing with the HTTP->HTTPS redirects, HSTS settings (or the upstream apps do this...?).

But from a security profile in our current state, where do we end up if a user sets both these servers, but doesn't set the force-https flag? Will the whole server be served on HTTP -- and what negative consequences does that have?

To do this, they would have to use the alpha config for the moment. I think it's still useful to have the redirect even though we don't necessarily have the most secure HTTPS settings in that case, it's an easy way for users to enforce HTTPS without having to worry about how that's configured in another proxy.

Will the whole server be served on HTTP -- and what negative consequences does that have?

I don't think there's any more danger than someone just running it on HTTP anyway, I think it's up to the users to make sure it's secure

NickMeves · 2021-02-18T06:09:16Z

pkg/http/server.go

+// tcpKeepAliveListener sets TCP keep-alive timeouts on accepted
+// connections. It's used by so that dead TCP connections (e.g. closing laptop
+// mid-download) eventually go away.
+type tcpKeepAliveListener struct {


Do you have any historical context as to why we only wrap the TLS listener in this tcpKeepAliveListener implementation of net.Listener and not the standard listener?

Nope, I had a dig and there's no conversation about this when it was implemented. I have no idea why it is only used for one, perhaps keep alive is more common for TLS to prevent handshake renegotiation or something?

NickMeves · 2021-02-18T06:10:08Z

pkg/http/server.go

+	}
+	err = tc.SetKeepAlive(true)
+	if err != nil {
+		logger.Printf("Error setting Keep-Alive: %v", err)


Errorf? What do you think for this type of error where we don't stop processing flow.

Good point, will fix.

Erm, I think this error is extremely rare, so we probably never really hit this case, I think this ok for now.
I have just copied this from the original location though so could be nonsense.

JoelSpeed · 2021-02-19T11:58:27Z

Updated the broken bits and added some flags for the metrics server for TLS. Also realised I don't have tests for the legacy servers conversion, so will need to add those

NickMeves · 2021-02-21T00:51:08Z

pkg/http/server_test.go

+				expectHTTPListener: false,
+				expectTLSListener:  true,
+			}),
+			Entry("with an invalid bind address port", &newServerTableInput{


These test cases are very comprehensive - Nice!

The only addition I could think of adding is the case where a bind port is already taken.

NickMeves

Great PR! Really good stuff - I feel like I get a little better at Go every time I review your code 😄

…Options

JoelSpeed · 2021-03-07T11:51:35Z

@NickMeves I think this is ready to go now. I've rebased on latest master (no conflicts) and added tests for the legacy server conversion, that should be the only new part since your last review, PTAL

NickMeves

LGMT! I'm excited for these changes + new metrics support in v7.1. I will definitely be rolling it out everywhere throughout Greenhouse and plugging in Prometheus ASAP 👍

JoelSpeed added enhancement refactor needs tests labels Feb 15, 2021

JoelSpeed added this to In Progress (Pull Requests) in Release v7.1.0 via automation Feb 15, 2021

codeclimate bot reviewed Feb 15, 2021

View reviewed changes

NickMeves reviewed Feb 15, 2021

View reviewed changes

JoelSpeed force-pushed the http-server branch 2 times, most recently from bedde6c to 1815523 Compare February 16, 2021 22:52

JoelSpeed removed the needs tests label Feb 16, 2021

JoelSpeed marked this pull request as ready for review February 16, 2021 22:53

JoelSpeed requested a review from a team as a code owner February 16, 2021 22:53

JoelSpeed changed the title ~~[Very WIP] Refactor HTTP Server and add ServerGroup to handle graceful shutdown of multiple servers~~ Refactor HTTP Server and add ServerGroup to handle graceful shutdown of multiple servers Feb 16, 2021

JoelSpeed requested a review from NickMeves February 16, 2021 22:56

JoelSpeed force-pushed the http-server branch from 1815523 to e78e535 Compare February 16, 2021 22:57

codeclimate bot reviewed Feb 16, 2021

View reviewed changes

NickMeves reviewed Feb 18, 2021

View reviewed changes

JoelSpeed mentioned this pull request Feb 18, 2021

Listen both http & https to ease force_https redirection #842

Closed

JoelSpeed force-pushed the http-server branch from e78e535 to 18fe8b4 Compare February 19, 2021 11:57

JoelSpeed force-pushed the http-server branch 2 times, most recently from 1243547 to 7f6b698 Compare February 19, 2021 12:08

NickMeves reviewed Feb 21, 2021

View reviewed changes

NickMeves previously approved these changes Feb 21, 2021

View reviewed changes

JoelSpeed mentioned this pull request Feb 23, 2021

Add support for the envoy ext_authz gRPC protocol #958

Closed

JoelSpeed added 7 commits March 7, 2021 11:49

Create server options struct and move legacy server options to legacy…

3d457a8

…Options

Add new http server implementation

d8aca8a

Add server group implementation for running multiple servers at once

2c54ee7

Integrate new server implementation into main OAuth2 Proxy

8d2fc40

Update generated docs reference

86e672e

Add complete metrics server configuration

f4b3115

Add changelog entry for HTTP server refactor

36887a0

JoelSpeed dismissed NickMeves’s stale review via 36887a0 March 7, 2021 11:50

JoelSpeed force-pushed the http-server branch from 7f6b698 to 36887a0 Compare March 7, 2021 11:50

NickMeves approved these changes Mar 7, 2021

View reviewed changes

JoelSpeed merged commit 6894738 into master Mar 7, 2021

Release v7.1.0 automation moved this from In Progress (Pull Requests) to Done Mar 7, 2021

JoelSpeed deleted the http-server branch April 1, 2021 11:51

	// Server is used to configure the HTTP(S) server for metrics.
	// MetricsServer is used to configure the HTTP(S) server for metrics.

Refactor HTTP Server and add ServerGroup to handle graceful shutdown of multiple servers #1047

Refactor HTTP Server and add ServerGroup to handle graceful shutdown of multiple servers #1047

Conversation

JoelSpeed commented Feb 15, 2021 • edited

Description

Motivation and Context

How Has This Been Tested?

Checklist:

codeclimate bot Feb 15, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codeclimate bot Feb 15, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codeclimate bot Feb 16, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NickMeves Feb 18, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoelSpeed commented Feb 19, 2021

Choose a reason for hiding this comment

NickMeves left a comment

Choose a reason for hiding this comment

JoelSpeed commented Mar 7, 2021

NickMeves left a comment

Choose a reason for hiding this comment

JoelSpeed commented Feb 15, 2021 •

edited

NickMeves Feb 18, 2021 •

edited