Replace HTTP compression with a more scoped impl, only use on responses > 128KB #77449

smarterclayton · 2019-05-04T22:15:31Z

The previous HTTP compression implementation functioned as an HTTP filter, which required it to deal with a number of special cases that complicated the implementation and prevented it from ever being turned on by default.

Instead, when we write an API object to a response, handle only the one case of a valid Kube object being encoded to the output. This will allow a more limited implementation that does not impact other code flows and is easier to reason about, as well as promote this to beta.

Because Golang clients request gzipping by default, and gzip has a significant CPU cost on small requests, ignore requests to compress objects that are smaller than 128KB in size. The goal of this feature is to reduce bandwidth and latency requirements on large lists, even with chunking, and 128KB is smaller than a 500 pod page but larger than almost any single object request.

Also fixes a bug introduced in #50342 because httpResponseWriterWithInit.Write wasn't a pointer receiver - the init code was called repeatedly:

2019/05/04 19:15:31 http: superfluous response.WriteHeader call from k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.httpResponseWriterWithInit.Write (writers.go:56)
2019/05/04 19:15:31 http: superfluous response.WriteHeader call from k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.httpResponseWriterWithInit.Write (writers.go:56)
2019/05/04 19:15:31 http: superfluous response.WriteHeader call from k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.httpResponseWriterWithInit.Write (writers.go:56)

/kind bug

KEP kubernetes/enhancements#1115

Kubernetes now supports transparent compression of API responses. Clients that send `Accept-Encoding: gzip` will now receive a GZIP compressed response body if the API call was larger than 128KB.  Go clients automatically request gzip-encoding by default and should see reduced transfer times for very large API requests.  Clients in other languages may need to make changes to benefit from compression.

k8s-ci-robot · 2019-05-04T22:45:12Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/features/OWNERS~~ [smarterclayton]
~~staging/src/k8s.io/apiserver/OWNERS~~ [smarterclayton]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

smarterclayton · 2019-05-04T23:30:10Z

@liggitt since you reviewed #50342

fejta-bot · 2019-05-04T23:48:32Z

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

smarterclayton · 2019-05-04T23:56:02Z

/retest

smarterclayton · 2019-05-05T19:52:37Z

# 99th for this PR on cluster lists
$ api-100 https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/77449/pull-kubernetes-e2e-gce-100-performance/1124864914138075139/artifacts/APIResponsiveness_density_2019-05-05T03:01:32Z.json
LIST  cronjobs                cluster  5.486   46
LIST  namespaces              cluster  13.926  16
LIST  jobs                    cluster  14.989  46
LIST  persistentvolumes       cluster  26.484  16
LIST  pods                    cluster  31.564  4
LIST  replicationcontrollers  cluster  33.16   6
LIST  services                cluster  42.151  6
LIST  nodes                   cluster  74.265  117

# 99th for master(ish) on cluster lists
$ api-100 https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/77341/pull-kubernetes-e2e-gce-100-performance/1124998416011628545/artifacts/APIResponsiveness_density_2019-05-05T11:51:20Z.json
LIST  services                cluster  1.796    4
LIST  pods                    cluster  5.508    5
LIST  jobs                    cluster  9.609    36
LIST  replicationcontrollers  cluster  15.28    6
LIST  cronjobs                cluster  33.418   36
LIST  namespaces              cluster  86.998   12
LIST  persistentvolumes       cluster  89.04    12
LIST  nodes                   cluster  194.802  90

Same but namespace scoped

○ api-100 https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/77449/pull-kubernetes-e2e-gce-100-performance/1124864914138075139/artifacts/APIResponsiveness_density_2019-05-05T03:01:32Z.json
LIST  limitranges             namespace  3.034    16
LIST  secrets                 namespace  4.852    159
LIST  replicasets             namespace  10.3     16
LIST  ingresses               namespace  17.373   16
LIST  endpoints               namespace  19.825   16
LIST  statefulsets            namespace  22.191   16
LIST  cronjobs                namespace  25.027   16
LIST  persistentvolumeclaims  namespace  42.206   16
LIST  daemonsets              namespace  46.348   16
LIST  pods                    namespace  48.785   1019
LIST  configmaps              namespace  51.115   119
LIST  resourcequotas          namespace  68.432   17
LIST  jobs                    namespace  82.747   16
LIST  replicationcontrollers  namespace  88.394   16
LIST  deployments             namespace  91.022   16
LIST  services                namespace  128.396  16
○ api-100 https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/77341/pull-kubernetes-e2e-gce-100-performance/1124998416011628545/artifacts/APIResponsiveness_density_2019-05-05T11:51:20Z.json
LIST  persistentvolumeclaims  namespace  6.919   12
LIST  secrets                 namespace  7.289   136
LIST  endpoints               namespace  19.084  12
LIST  configmaps              namespace  22.612  79
LIST  cronjobs                namespace  23.684  12
LIST  limitranges             namespace  24.617  14
LIST  daemonsets              namespace  27.742  12
LIST  replicasets             namespace  29.692  12
LIST  statefulsets            namespace  29.859  12
LIST  pods                    namespace  31.246  1015
LIST  replicationcontrollers  namespace  32.53   12
LIST  deployments             namespace  37.013  12
LIST  resourcequotas          namespace  38.098  14
LIST  jobs                    namespace  38.902  12
LIST  services                namespace  43.577  12
LIST  ingresses               namespace  68.877  12

It looks like this brings in the 99th tail on large lists significantly, at a tradeoff of slightly higher latency on small lists. We could potentially tune this at a threshold higher than 16KB - for instance 32KB or 128KB, which would potentially reduce tail latency on lists.

@kubernetes/sig-scalability-pr-reviews

smarterclayton · 2019-05-05T20:58:47Z

/retest

jennybuckley · 2019-05-06T21:11:01Z

/cc @wojtek-t

smarterclayton · 2019-06-25T18:00:06Z

/retest

smarterclayton · 2019-06-26T22:01:05Z

/retest

smarterclayton · 2019-06-26T22:04:57Z

https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/77449/pull-kubernetes-e2e-gce-100-performance/1143560782466781184/artifacts/APIResponsiveness_density_2019-06-25T17:22:09Z.json

LIST  configmaps              cluster  0.143  0.143   0.143    1
LIST  pods                    cluster  3.036  3.468   3.468    7
LIST  services                cluster  1.373  3.677   3.677    4
LIST  persistentvolumes       cluster  0.973  2.805   4.373    12
LIST  cronjobs                cluster  0.654  3.368   5.06     36
LIST  jobs                    cluster  0.866  3.346   5.763    36
LIST  namespaces              cluster  0.981  3.143   9.611    12
LIST  replicationcontrollers  cluster  0.959  21.873  21.873   6
LIST  nodes                   cluster  1.801  17.66   109.361  90

Nodes in particular is 30% below the p99 of perf-dash right now.

https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/77449/pull-kubernetes-e2e-gce-100-performance/1144003791448707072/artifacts/APIResponsiveness_density_2019-06-26T22:33:50Z.json

LIST  services                cluster  1.627  1.714   1.714    4
LIST  pods                    cluster  3.244  5.241   5.241    6
LIST  jobs                    cluster  1.111  5.045   8.967    36
LIST  namespaces              cluster  1.049  5.849   9.068    12
LIST  cronjobs                cluster  0.868  3.973   9.692    36
LIST  replicationcontrollers  cluster  1.39   27.937  27.937   6
LIST  persistentvolumes       cluster  1.263  7.879   49.757   12
LIST  nodes                   cluster  1.97   20.387  205.261  90

Normal

https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/77449/pull-kubernetes-e2e-gce-100-performance/1144281746527752192/artifacts/APIResponsiveness_density_2019-06-27T16:57:09Z.json

LIST  configmaps              cluster  0.204  0.204   0.204    1
LIST  namespaces              cluster  1.033  1.391   1.399    12
LIST  services                cluster  1.265  1.444   1.444    4
LIST  persistentvolumes       cluster  1.146  2.591   4.382    12
LIST  pods                    cluster  3.539  5.169   5.169    5
LIST  replicationcontrollers  cluster  1.778  43.263  43.263   6
LIST  cronjobs                cluster  0.818  13.377  53.772   37
LIST  jobs                    cluster  1.531  9.668   117.324  37
LIST  nodes                   cluster  1.892  21.964  119.049  95

Lower

wojtek-t · 2019-06-27T07:10:38Z

Nodes in particular is 30% below the p99 of perf-dash right now.

There is significant variance there, but I agree the results look very promising.

I would like to also run a test on larger scale, once we get out of the last regressions we have: #79096

smarterclayton · 2019-06-27T15:20:50Z

The variance before was all measured where we had the bucketing problem I believe.

I would trade some P99 on large requests in cluster for a dramatically reduced P99 outside the cluster. We don't have an easy way to measure that unless we simulate constrained bandwidth for a client.

wojtek-t · 2019-06-27T15:24:35Z

I would trade some P99 on large requests in cluster for a dramatically reduced P99 outside the cluster. We don't have an easy way to measure that unless we simulate constrained bandwidth for a client.

That sounds reasonable to me - I would just like to know how much we are trading (if really this is visible) , I'm definitely NOT saying "we can't do this if it grows at all".

smarterclayton · 2019-06-27T16:29:23Z

/test pull-kubernetes-e2e-gce-100-performance

smarterclayton · 2019-06-27T17:33:46Z

Fortunately this is easy to test - we just gate it on or off. All clients are automatically requesting. We can also tune up.

smarterclayton · 2019-07-02T19:58:56Z

Are we ready to try the larger run test now that the other blocker was resolved?

wojtek-t · 2019-07-03T07:31:16Z

Are we ready to try the larger run test now that the other blocker was resolved?

I asked @krzysied to patch this to the experiments he is running offline, the plan was to run something over night - will get back to you when I know if that happened or not.

wojtek-t · 2019-07-03T19:12:10Z

I asked @krzysied to patch this to the experiments he is running offline, the plan was to run something over night - will get back to you when I know if that happened or not.

We don't have full results because we were running some other experiments during that test so it didn't finish. But what we've seen looked good enough so that I'm fine with this PR from scalability POV.

@smarterclayton - do you want me to review the code too (I can do that in the next 2 days).

smarterclayton · 2019-07-08T01:55:34Z

Yes please. Jordan signed off on the KEP, and seeing the variance over time is the biggest factor. This is ready to review

wojtek-t

Just a couple nits - other than that lgtm.

wojtek-t · 2019-07-08T06:25:08Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers.go

 	}
+
+	// make a best effort to write the object if a failure is detected
+	utilruntime.HandleError(fmt.Errorf("apiserver was unable to write a JSON response: %v", err))


How do we know that it was JSON?

I see it's moved from a different place, but if I'm not missing something it would make sense to fix this comment if you're touching this code.

If encode fails, the response is always JSON. The assumption being that encode will not fail (and thus exit early) above. If encode fails, we just spit out what we have.

ok - that makes sense now

wojtek-t · 2019-07-08T06:29:11Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers.go

+	if len(encoding) == 0 {
+		return ""
+	}
+	if !utilfeature.DefaultFeatureGate.Enabled(features.APIResponseCompression) {


We probably want to check it as a first thing in this method (to avoid unnecessary work otherwise).

I had it this way because the feature gate check is probably more expensive than the map lookup. But it could be the other way if you think it's easier to read (I think it would perform slightly worse).

I don't have strong opinion - we can change that later too if needed.

wojtek-t · 2019-07-08T06:51:58Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers_test.go

@@ -0,0 +1,303 @@
+/*
+Copyright 2016 The Kubernetes Authors.


wojtek-t · 2019-07-08T07:00:23Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers_test.go

+			req:         &http.Request{Header: http.Header{}},
+			wantCode:    http.StatusBadRequest,
+			wantHeaders: http.Header{"Content-Type": []string{"application/json"}},
+			wantBody:    smallPayload,


Why for BadRequest we return an object?

I guess I'm missing something, because you do this in couple other tests below too.

In this case it's emulating a custom registry endpoint (aggregated api) that returns a valid object but with bad request, which is technically possible. I wanted to have a test that captured behavior that is infrequently used (error code with valid object), but which can show up in some cases (if you had a custom response type that implemented APIStatus, you could return an object + an error code today).

Thanks for explanation - that makes a lot of sense to me now.

wojtek-t · 2019-07-08T07:04:38Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers_test.go

+			statusCode:  http.StatusOK,
+			wantCode:    http.StatusNotFound,
+			wantHeaders: http.Header{"Content-Type": []string{"text/plain"}},
+			wantBody:    []byte("NotFound:  \"test\" not found"),


Feel free to ignore, but it seems we have some additional whitespace somewhere in the stack...

Yeah, I actually was testing the exact output. The two spaces separates an empty resource string (same message as before this PR).

wojtek-t · 2019-07-08T07:06:32Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers_test.go

+		},
+
+		{
+			name:               "errors are compressed",


wojtek-t

There is only one nit (s/2014/2019) and I don't want to block this PR on it.

/lgtm

wojtek-t · 2019-07-09T06:32:17Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers.go

 	}
+
+	// make a best effort to write the object if a failure is detected
+	utilruntime.HandleError(fmt.Errorf("apiserver was unable to write a JSON response: %v", err))


ok - that makes sense now

wojtek-t · 2019-07-09T06:32:59Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers.go

+	if len(encoding) == 0 {
+		return ""
+	}
+	if !utilfeature.DefaultFeatureGate.Enabled(features.APIResponseCompression) {


I don't have strong opinion - we can change that later too if needed.

wojtek-t · 2019-07-09T06:33:54Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers_test.go

+			req:         &http.Request{Header: http.Header{}},
+			wantCode:    http.StatusBadRequest,
+			wantHeaders: http.Header{"Content-Type": []string{"application/json"}},
+			wantBody:    smallPayload,


Thanks for explanation - that makes a lot of sense to me now.

smarterclayton · 2019-07-09T14:56:11Z

Encouraging:

Significant reduction in p99 on cluster nodes LIST

Looks like CPU is variable, which is interesting:

smarterclayton · 2019-07-09T15:04:43Z

Scratch that, it looks like there's intersections with the replica set / deployment change yesterday, and the runs are newer.

smarterclayton · 2019-07-09T16:28:26Z

Even taking the RS/DS change into account, I'd say we're within bounds on p99 (maybe a bit more variability across runs). Will continue to monitor today and tomorrow, but looks like a small win in some flows and no huge impact elsewhere.

tedyu · 2019-07-09T17:04:35Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers.go

+	New: func() interface{} {
+		gw, err := gzip.NewWriterLevel(nil, defaultGzipContentEncodingLevel)
+		if err != nil {
+			panic(err)


Should an error be returned here ?

In deferredResponseWriter#Write, the error can be returned to caller.

See #79943

wojtek-t · 2019-07-10T11:29:24Z

Even taking the RS/DS change into account, I'd say we're within bounds on p99 (maybe a bit more variability across runs). Will continue to monitor today and tomorrow, but looks like a small win in some flows and no huge impact elsewher

I think that on the scale of 100-nodes, the change is small enough that it's not easy to say what happened.

When looking into 5k-node scale, things like node-listing, regressed visibly (because those are called from within the cluster, actually even via localhost, where network throughput and latency is not a problem):
http://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness&Resource=nodes&Scope=cluster&Subresource=&Verb=LIST
but on deployments (being called externally from test-framework) the signiifcant drop is visible:
http://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness&Resource=deployments&Scope=cluster&Subresource=&Verb=LIST
we should also merge it somewhat with this graph, due to change from RCs to Deployments:
http://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness&Resource=replicationcontrollers&Scope=cluster&Subresource=&Verb=LIST

So I would say that it's a reasonable tradeoff to take.

smarterclayton · 2019-07-10T20:25:46Z

I'm wondering whether there is an additional heuristic we could add on the client side to suppress gzip encoding when making requests to localhost. It's kind of a weak heuristic though.

smarterclayton · 2019-07-10T20:42:00Z

We could also have a way to have clients bypass compression when run in certain modes (i.e. kcm and scheduler disabling it)

wojtek-t · 2019-07-11T07:31:49Z

We could also have a way to have clients bypass compression when run in certain modes (i.e. kcm and scheduler disabling it)

yeah - that is exactly what I was thinking about - add an option to client config to disable compression, default it to false (I mean enable compression by default) and disable it only in scheduler and kcm (which I believe would be enough).

sftim · 2021-11-25T15:00:07Z

Also see kubernetes/website#30639

k8s-ci-robot requested review from CaoShuFeng and sttts May 4, 2019 22:16

smarterclayton force-pushed the compress_2 branch from a70585f to 45c9fb4 Compare May 4, 2019 22:16

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 4, 2019

smarterclayton force-pushed the compress_2 branch from 45c9fb4 to 0f5e7c0 Compare May 4, 2019 22:19

smarterclayton mentioned this pull request May 4, 2019

DO NOT MERGE: Use an alternate implementation for gzip compression handling #77445

Closed

6 tasks

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 4, 2019

smarterclayton force-pushed the compress_2 branch from dd5d1ca to 2e5b83d Compare May 4, 2019 22:44

smarterclayton force-pushed the compress_2 branch from 2e5b83d to c1aea48 Compare May 4, 2019 23:28

smarterclayton force-pushed the compress_2 branch from c1aea48 to 63522a4 Compare May 4, 2019 23:31

smarterclayton force-pushed the compress_2 branch from 63522a4 to 7ede87d Compare May 5, 2019 02:33

k8s-ci-robot added the sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. label May 5, 2019

smarterclayton changed the title ~~Replace HTTP compression with an inline handler~~ Replace HTTP compression with an inline handler, only use on responses > 16KB May 5, 2019

smarterclayton force-pushed the compress_2 branch from 7ede87d to 1bf167e Compare May 5, 2019 19:55

smarterclayton force-pushed the compress_2 branch from 26d202d to 4ed2b98 Compare June 25, 2019 16:44

wojtek-t self-assigned this Jul 3, 2019

wojtek-t reviewed Jul 8, 2019

View reviewed changes

wojtek-t reviewed Jul 9, 2019

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 9, 2019

k8s-ci-robot merged commit 7c7d70b into kubernetes:master Jul 9, 2019

tedyu reviewed Jul 9, 2019

View reviewed changes

tedyu mentioned this pull request Jul 9, 2019

Ignore the error return from NewWriterLevel which is always nil #79943

Closed

kkkkun mentioned this pull request Nov 26, 2021

APIResponseCompression would increase P99 In apiserver #106679

Closed

smarterclayton mentioned this pull request Sep 8, 2022

Allow compression of json response body using Accept-Encoding header #110349

Closed

Replace HTTP compression with a more scoped impl, only use on responses > 128KB #77449

Replace HTTP compression with a more scoped impl, only use on responses > 128KB #77449

Conversation

smarterclayton commented May 4, 2019 • edited Loading

k8s-ci-robot commented May 4, 2019

smarterclayton commented May 4, 2019

fejta-bot commented May 4, 2019

smarterclayton commented May 4, 2019

smarterclayton commented May 5, 2019

smarterclayton commented May 5, 2019

jennybuckley commented May 6, 2019

smarterclayton commented Jun 25, 2019

smarterclayton commented Jun 26, 2019

smarterclayton commented Jun 26, 2019 • edited Loading

wojtek-t commented Jun 27, 2019

smarterclayton commented Jun 27, 2019

wojtek-t commented Jun 27, 2019

smarterclayton commented Jun 27, 2019

smarterclayton commented Jun 27, 2019

smarterclayton commented Jul 2, 2019

wojtek-t commented Jul 3, 2019

wojtek-t commented Jul 3, 2019

smarterclayton commented Jul 8, 2019

wojtek-t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtek-t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smarterclayton commented Jul 9, 2019

smarterclayton commented Jul 9, 2019

smarterclayton commented Jul 9, 2019

tedyu Jul 9, 2019 • edited Loading

Choose a reason for hiding this comment

wojtek-t commented Jul 10, 2019

smarterclayton commented Jul 10, 2019

smarterclayton commented Jul 10, 2019

wojtek-t commented Jul 11, 2019

sftim commented Nov 25, 2021

smarterclayton commented May 4, 2019 •

edited

Loading

smarterclayton commented Jun 26, 2019 •

edited

Loading

tedyu Jul 9, 2019 •

edited

Loading