Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically detect number of cpus from cgroup in Scylla Operator binaries #1612

Merged
merged 2 commits into from
Dec 4, 2023

Conversation

zimnx
Copy link
Collaborator

@zimnx zimnx commented Nov 30, 2023

Description of your changes:

Golang runtime doesn't understand cgroups hence it assumes all cpus are
available for running goroutines.
Since applications on Kubernetes are running under restricted resources
we should detect cgroups settings and align Go runtime to discovered settings.
This should lower resource consumption, and allow us to run on even more limited resources.

…naries

Golang runtime doesn't understand cgroups hence it assumes all cpus are
available for running goroutines.
Since applications on Kubernetes are running under restricted resources
we should detect cgroups settings and align Go runtime to discovered settings.
This should lower resource consumption, and allow us to run on even more limited resources.
@zimnx zimnx added kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Nov 30, 2023
@scylla-operator-bot scylla-operator-bot bot added area/dependency Issues or PRs related to dependency changes approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 30, 2023
@rzetelskik
Copy link
Member

rzetelskik commented Nov 30, 2023

Did we actually hit any throttling-related issues in the past?

@tnozicka
Copy link
Member

tnozicka commented Dec 1, 2023

Did we actually hit any throttling-related issues in the past?

I don't recall hitting latency issues with the operator itself. As this is well know and we haven't seen an impact, so far we have been waiting for a golang to fix golang/go#33803. I suppose we could as well use the uber lib in the meantime although the impact may be blurry.

@zimnx
Copy link
Collaborator Author

zimnx commented Dec 1, 2023

Did we actually hit any throttling-related issues in the past?

No, we haven't, as we don't have any observability, nor any container running on low resources having a liveness/readiness probe so any slowness is not visible.

As part of other PR I will add such low resource container with probes. I hit multiple times issue with such container being started but not even printing starting log message. After I configured Go runtime properly I haven't noticed it again. Can't tell whether I was lucky or it indeed fixed what I saw.

This is a known issue, fixing it wouldn't hurt.

Copy link
Member

@tnozicka tnozicka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it helps you I am fine going ahead with it and reverting this when golang/go#33803 gets fixed

/approve
/lgtm

@scylla-operator-bot scylla-operator-bot bot added the lgtm Indicates that a PR is ready to be merged. label Dec 1, 2023
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tnozicka, zimnx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rzetelskik
Copy link
Member

This is a known issue, fixing it wouldn't hurt.

Of course, I just wanted to know why do it now, since I remember we discussed in the past and never followed through. Thanks.

@zimnx
Copy link
Collaborator Author

zimnx commented Dec 4, 2023

Flake: #1028 (comment)
/test e2e-gke-parallel

@scylla-operator-bot scylla-operator-bot bot merged commit 4c26152 into scylladb:master Dec 4, 2023
11 checks passed
@zimnx zimnx deleted the mz/automaxprocs branch March 27, 2024 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/dependency Issues or PRs related to dependency changes kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants