-
Notifications
You must be signed in to change notification settings - Fork 39.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measure scheduler throughput in density test #64266
Measure scheduler throughput in density test #64266
Conversation
test/e2e/scalability/density.go
Outdated
framework.Logf(startupStatus.String("Density")) | ||
// Compute scheduling throughput for the latest time period. | ||
scheduleThroughput := float32(startupStatus.Scheduled - lastScheduledCount) / float32(period/time.Second) | ||
*maxScheduleThroughput = math.Max(*maxScheduleThroughput, scheduleThroughput) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering about this.
In large clusters, throughput is dropping with number of pods in the cluster significantly. So this will effectively always be a number from one of the first couple phase. So this number by itself isn't that useful.
I think that minThroughput would be more interesting (though we would need to exclude some phases (e.g. the last one) when obviously throughput may be smaller) .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a good point there about high initial throughputs. Ideally, it is neither the max throughput nor the min throughput that I'm really interested in (as both of them may possibly be subject to extremes). What I'm most keen on is sth like "steady state throughput" (which we see for e.g in our 2k-node test staying at around ~80 pods/s for a long time) or maybe the "average throughput" as a good proxy for the former. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm trying to say is that especially in 5k-node clusters, there isn't something like "steady state throughput" - it significantly depends on cluster saturation.
So when I think about it now, maybe what we should do is to gather two numbers:
- average throughput from first few (3? 5?) steps
- average throughput from last few (3? 5?) steps (maybe excluding the last one, which is kind of special)
That would give us some approximation of thorughput in empty and full cluster.
I know it's imperfect, but I don't have any better idea for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree with that reasoning. And thinking of it further, I propose sth simpler instead - which is to just record the throughputs seen over time in an array.
We can later try to compute some aggregates based on it, but for now that information seems enough. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I think you should keep all values and at the end print some percentiles (say 5th and 95th)?
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that sgtm. So for now, I made the result an array. Will observe how those values look like for a run or so and decide based on that what/how we want to compute the aggregates. Sounds fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - that makes sense.
82a3e2c
to
01acf34
Compare
Binding LatencyMetric `json:"binding"` | ||
Total LatencyMetric `json:"total"` | ||
type SchedulingMetrics struct { | ||
SchedulingLatency LatencyMetric `json:"schedulingLatency"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
schedulingLatency -> scheduling_latency will be better in json.
and others like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree. In the whole Kubernetes API we use camelCase for json tag - let's not come up with a different convention here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, make sense.
01acf34
to
f363f54
Compare
@shyamjvs: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: shyamjvs, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Automatic merge from submit-queue (batch tested with PRs 63232, 64257, 64183, 64266, 64134). If you want to cherry-pick this change to another branch, please follow the instructions here. |
@wojtek-t So here are how the throughputs look like in our 2k-node density test - https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-large-performance/101/artifacts/SchedulingMetrics_density_2018-05-29T14:09:29Z.json And since they're pretty constant over time (except last element), my suggestion above to compute average throughput seems like a reasonable one. |
I believed that it will be constant in 2k-node cluster. But I think it's not constant in 5k-node clusters. |
I was also looking for that data, but we don't have a run with that change yet.
Note that for initial part when we're creating pods, for every 10s it's around +1000 pods (created) -300 pods (scheduled), so it's +700 waiting pods |
That said, I agree that in general avg might not be a good aggregate (for e.g when using features like pod anti-affinity, etc which might introduce significant dependence on #pods already in the system). Actually, should we be expecting such a dependence even now? |
@shyamjvs - thanks for pointing on that data - this indeed looks pretty stable |
As we discussed offline, I'm going to add percentiles for now (beside the average). However, as I was pointing out, it's still not clear if percentile is the right metric for throughput because in this case it's a variable that can change over time and vary as a function of prior values (e.g pods with anti-affinity). And usually latencies (like at other places (e.g api calls)) are independently distributed - so percentiles make a good choice for them. |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Compute avg and quantiles of scheduler throughput in density test Based on my comment here - #64266 (comment) /sig scheduling /kind cleanup /priority important-soon /milestone v1.11 /cc @wojtek-t ```release-note NONE ```
This is a step towards exposing scheduler-related metrics on perf-dash.
This particular PR adds scheduler throughput computation and makes the results available in our test artifacts.
So if you do some experiments, you'll have some historical baseline data to compare against.
xref #63493
fyi - @wojtek-t @davidopp @bsalamat @misterikkit
cc @kubernetes/sig-scheduling-misc @kubernetes/sig-scalability-misc