Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CPU performance measurement precision #132

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mergenci
Copy link
Contributor

@mergenci mergenci commented Sep 4, 2023

Description of your changes

This PR addresses issues raised in [1] and comments that follow. Results are reported in [2].

  1. Switch from node_exporter's per-node CPU metrics to cadvisor's per-container metrics.
  2. Query only experiment duration for CPU metrics.
  3. Very short running experiments may not have enough measurements for performance to be reported. Handle such cases by querying outside the experiment duration.

I have:

How has this code been tested

I ran the following command against my cluster, which has necessary software (UXP, provider-aws, prometheus) installed.

cd cmd/perf
go run main.go --mrs manifests/ecr.yaml=5 --provider-pods "$(kubectl get pods -o name | grep provider-aws | cut -d/ -f2)" --provider-namespace upbound-system --node "$(k get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}'):9100" --step-duration 1s

Signed-off-by: Cem Mergenci <cmergenci@gmail.com>
Signed-off-by: Cem Mergenci <cmergenci@gmail.com>
Signed-off-by: Cem Mergenci <cmergenci@gmail.com>
cmd/perf/internal/quantify.go Outdated Show resolved Hide resolved
So far, we've been using node_exporter's metrics, which reported
metrics per node. We switch to using cadvisor's per container metrics,
which measure provider performance in isolation.

So far, we've been reporting CPU utilization as a percentage of total
CPU resources. 100% CPU utilization on an 8-CPU machine meant all
cores were fully utilized. We switch to reporting CPU utilization as a
percentage of one CPU. Full utilization of an 8-CPU machine is now
reported as 800%.

Signed-off-by: Cem Mergenci <cmergenci@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants