Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure scheduler throughput in density test #64266

Conversation

shyamjvs
Copy link
Member

This is a step towards exposing scheduler-related metrics on perf-dash.
This particular PR adds scheduler throughput computation and makes the results available in our test artifacts.
So if you do some experiments, you'll have some historical baseline data to compare against.

xref #63493

fyi - @wojtek-t @davidopp @bsalamat @misterikkit
cc @kubernetes/sig-scheduling-misc @kubernetes/sig-scalability-misc

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 24, 2018
@shyamjvs shyamjvs requested a review from wojtek-t May 24, 2018 14:58
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 24, 2018
framework.Logf(startupStatus.String("Density"))
// Compute scheduling throughput for the latest time period.
scheduleThroughput := float32(startupStatus.Scheduled - lastScheduledCount) / float32(period/time.Second)
*maxScheduleThroughput = math.Max(*maxScheduleThroughput, scheduleThroughput)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering about this.
In large clusters, throughput is dropping with number of pods in the cluster significantly. So this will effectively always be a number from one of the first couple phase. So this number by itself isn't that useful.

I think that minThroughput would be more interesting (though we would need to exclude some phases (e.g. the last one) when obviously throughput may be smaller) .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a good point there about high initial throughputs. Ideally, it is neither the max throughput nor the min throughput that I'm really interested in (as both of them may possibly be subject to extremes). What I'm most keen on is sth like "steady state throughput" (which we see for e.g in our 2k-node test staying at around ~80 pods/s for a long time) or maybe the "average throughput" as a good proxy for the former. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm trying to say is that especially in 5k-node clusters, there isn't something like "steady state throughput" - it significantly depends on cluster saturation.

So when I think about it now, maybe what we should do is to gather two numbers:

  • average throughput from first few (3? 5?) steps
  • average throughput from last few (3? 5?) steps (maybe excluding the last one, which is kind of special)

That would give us some approximation of thorughput in empty and full cluster.
I know it's imperfect, but I don't have any better idea for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree with that reasoning. And thinking of it further, I propose sth simpler instead - which is to just record the throughputs seen over time in an array.
We can later try to compute some aggregates based on it, but for now that information seems enough. Wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I think you should keep all values and at the end print some percentiles (say 5th and 95th)?
Does that make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that sgtm. So for now, I made the result an array. Will observe how those values look like for a run or so and decide based on that what/how we want to compute the aggregates. Sounds fine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - that makes sense.

@shyamjvs shyamjvs force-pushed the measure-max-scheduler-throughput-metric branch from 82a3e2c to 01acf34 Compare May 24, 2018 15:43
Binding LatencyMetric `json:"binding"`
Total LatencyMetric `json:"total"`
type SchedulingMetrics struct {
SchedulingLatency LatencyMetric `json:"schedulingLatency"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schedulingLatency -> scheduling_latency will be better in json.
and others like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree. In the whole Kubernetes API we use camelCase for json tag - let's not come up with a different convention here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, make sense.

@shyamjvs shyamjvs force-pushed the measure-max-scheduler-throughput-metric branch from 01acf34 to f363f54 Compare May 25, 2018 12:49
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented May 25, 2018

@shyamjvs: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-kubemark-e2e-gce-big f363f54 link /test pull-kubernetes-kubemark-e2e-gce-big

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@wojtek-t
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 25, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shyamjvs, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 63232, 64257, 64183, 64266, 64134). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit b8db949 into kubernetes:master May 25, 2018
@shyamjvs shyamjvs deleted the measure-max-scheduler-throughput-metric branch May 25, 2018 15:25
@shyamjvs
Copy link
Member Author

@wojtek-t So here are how the throughputs look like in our 2k-node density test - https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-large-performance/101/artifacts/SchedulingMetrics_density_2018-05-29T14:09:29Z.json

And since they're pretty constant over time (except last element), my suggestion above to compute average throughput seems like a reasonable one.

@wojtek-t
Copy link
Member

I believed that it will be constant in 2k-node cluster. But I think it's not constant in 5k-node clusters.
If it isn't, I don't think average is a good statistic then.
Do you have data from 5k node cluster?

@shyamjvs
Copy link
Member Author

shyamjvs commented May 30, 2018

I was also looking for that data, but we don't have a run with that change yet.
That said, I'm fairly confident from our past 5k runs that this rate is super stable for almost the whole time. For e.g see:

I0528 20:19:07.274] May 28 20:19:07.274: INFO: Density Pods: 1095 out of 150000 created, 236 running, 54 pending, 805 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 20:19:17.275] May 28 20:19:17.274: INFO: Density Pods: 2095 out of 150000 created, 567 running, 56 pending, 1472 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 20:19:27.276] May 28 20:19:27.275: INFO: Density Pods: 3082 out of 150000 created, 888 running, 63 pending, 2131 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 20:19:37.276] May 28 20:19:37.276: INFO: Density Pods: 4080 out of 150000 created, 1208 running, 53 pending, 2819 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 20:19:47.276] May 28 20:19:47.276: INFO: Density Pods: 5070 out of 150000 created, 1514 running, 60 pending, 3496 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 20:19:57.278] May 28 20:19:57.277: INFO: Density Pods: 6063 out of 150000 created, 1848 running, 52 pending, 4163 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 20:20:07.278] May 28 20:20:07.278: INFO: Density Pods: 7063 out of 150000 created, 2153 running, 60 pending, 4850 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 20:20:17.280] May 28 20:20:17.280: INFO: Density Pods: 8048 out of 150000 created, 2475 running, 64 pending, 5509 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 20:20:27.279] May 28 20:20:27.278: INFO: Density Pods: 9050 out of 150000 created, 2793 running, 51 pending, 6206 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 20:20:37.281] May 28 20:20:37.281: INFO: Density Pods: 10040 out of 150000 created, 3116 running, 63 pending, 6861 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 20:20:47.283] May 28 20:20:47.283: INFO: Density Pods: 11036 out of 150000 created, 3440 running, 66 pending, 7530 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 20:20:57.286] May 28 20:20:57.285: INFO: Density Pods: 12032 out of 150000 created, 3766 running, 54 pending, 8212 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 20:21:07.287] May 28 20:21:07.286: INFO: Density Pods: 13021 out of 150000 created, 4103 running, 44 pending, 8874 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 20:21:17.284] May 28 20:21:17.283: INFO: Density Pods: 14020 out of 150000 created, 4419 running, 52 pending, 9549 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 20:21:27.284] May 28 20:21:27.284: INFO: Density Pods: 15011 out of 150000 created, 4714 running, 59 pending, 10238 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 20:21:37.282] May 28 20:21:37.282: INFO: Density Pods: 16005 out of 150000 created, 5029 running, 60 pending, 10916 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 20:21:47.285] May 28 20:21:47.284: INFO: Density Pods: 17003 out of 150000 created, 5355 running, 56 pending, 11592 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 20:21:57.287] May 28 20:21:57.287: INFO: Density Pods: 17991 out of 150000 created, 5667 running, 59 pending, 12265 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
...
...
and after all pods are created:
...
I0528 21:29:17.432] May 28 21:29:17.432: INFO: Density Pods: 150000 out of 150000 created, 124352 running, 60 pending, 25588 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 21:29:27.461] May 28 21:29:27.460: INFO: Density Pods: 150000 out of 150000 created, 124642 running, 50 pending, 25308 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 21:29:37.476] May 28 21:29:37.476: INFO: Density Pods: 150000 out of 150000 created, 124922 running, 60 pending, 25018 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 21:29:47.429] May 28 21:29:47.429: INFO: Density Pods: 150000 out of 150000 created, 125230 running, 62 pending, 24708 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 21:29:57.484] May 28 21:29:57.483: INFO: Density Pods: 150000 out of 150000 created, 125516 running, 63 pending, 24421 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 21:30:07.448] May 28 21:30:07.448: INFO: Density Pods: 150000 out of 150000 created, 125825 running, 56 pending, 24119 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 21:30:17.488] May 28 21:30:17.488: INFO: Density Pods: 150000 out of 150000 created, 126116 running, 54 pending, 23830 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 21:30:27.496] May 28 21:30:27.496: INFO: Density Pods: 150000 out of 150000 created, 126398 running, 59 pending, 23543 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 
I0528 21:30:37.482] May 28 21:30:37.481: INFO: Density Pods: 150000 out of 150000 created, 126693 running, 64 pending, 23243 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady 

Note that for initial part when we're creating pods, for every 10s it's around +1000 pods (created) -300 pods (scheduled), so it's +700 waiting pods
And later we just have the -300 pods part.

Source - https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/159/build-log.txt

@shyamjvs
Copy link
Member Author

That said, I agree that in general avg might not be a good aggregate (for e.g when using features like pod anti-affinity, etc which might introduce significant dependence on #pods already in the system). Actually, should we be expecting such a dependence even now?

@wojtek-t
Copy link
Member

@shyamjvs - thanks for pointing on that data - this indeed looks pretty stable
That said, I'm still not convinced avg is a good metric. Instead, I would prefer to do the same thing we are doing everywhere - expose percentile. What about exposing 50th, 90th and 99th percentile of throughput?

@shyamjvs
Copy link
Member Author

shyamjvs commented Jun 1, 2018

As we discussed offline, I'm going to add percentiles for now (beside the average). However, as I was pointing out, it's still not clear if percentile is the right metric for throughput because in this case it's a variable that can change over time and vary as a function of prior values (e.g pods with anti-affinity). And usually latencies (like at other places (e.g api calls)) are independently distributed - so percentiles make a good choice for them.

k8s-github-robot pushed a commit that referenced this pull request Jun 13, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Compute avg and quantiles of scheduler throughput in density test

Based on my comment here - #64266 (comment)

/sig scheduling
/kind cleanup
/priority important-soon
/milestone v1.11
/cc @wojtek-t 

```release-note
NONE
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants