Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(metrics-collector): allow user to nuke ephemeral-storage requests #1312

Merged
merged 3 commits into from
Sep 17, 2020

Conversation

khersey
Copy link
Contributor

@khersey khersey commented Aug 21, 2020

What this PR does / why we need it:

  • Allows users to nuke ephemeral-storage request and limit by setting these values to something negative in the katib-config configmap.
  • This fixes an issue where GKE node pools will not scale up because they are not compatible with any pod requesting ephemeral-storage (see linked issue for more details)

Which issue(s) this PR fixes:
Fixes #1289

Special notes for your reviewer:

  1. First contribution and first time dabbling in Go so let me know if I am not conforming to best practices.
  2. Please let me know if there is a better way/place to solve this problem within the codebase.

@kubeflow-bot
Copy link

This change is Reviewable

@k8s-ci-robot
Copy link

Hi @khersey. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@khersey khersey force-pushed the gke-nodepool-scaling-fix branch 2 times, most recently from 8ba028e to 7061e43 Compare August 21, 2020 21:11
Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution @khersey!
/ok-to-test

pkg/util/v1beta1/katibconfig/config.go Outdated Show resolved Hide resolved
@andreyvelich
Copy link
Member

/cc @gaocegege @johnugeorge

@gaocegege
Copy link
Member

/retest

@andreyvelich
Copy link
Member

@gaocegege Test will not work until we fix this problem: kubeflow/testing#749 (comment).
We should submit new project for test infra.
See this: kubeflow/community-infra#10

@andreyvelich
Copy link
Member

@khersey Can you run gofmt, please?

@andreyvelich
Copy link
Member

/retest

@andreyvelich
Copy link
Member

I believe you also need to rebase to fix e2e tests.

@andreyvelich
Copy link
Member

@khersey Thanks for the rebase.

Can you revert gofmt changes from api.pb.go files, please?
These files are generated automatically from GRPC APIs and we don't verify them in the script.

I will update update-gofmt to not fmt these files script in the separate PR.

@khersey
Copy link
Contributor Author

khersey commented Sep 16, 2020

@andreyvelich yeah sorry for the delay, will do. Looks like the frontend tests are failing, not sure that's related to my changes

@andreyvelich
Copy link
Member

@andreyvelich yeah sorry for the delay, will do. Looks like the frontend tests are failing, not sure that's related to my changes

I believe we need to modify nvm version in Travis for frontend tests.
I fixed it in this PR: #1340.

@khersey
Copy link
Contributor Author

khersey commented Sep 16, 2020

@andreyvelich ok api.pb formatting changes have been nuked 💣

@andreyvelich
Copy link
Member

/retest

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @khersey!
/lgtm
/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trial metrics-logger-and-collector breaks GKE Node-Pool Autoscaling
6 participants