Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update job name and service name as configurable for cert generator #1889

Merged
merged 4 commits into from
Jun 14, 2022

Conversation

shaowei-su
Copy link
Contributor

What this PR does / why we need it:
This PR adds two additional command line input args serviceName jobName for the cert-generator job so that it's configurable in case Katib controller is deployed to multiple envs, e.g having katib-controller-production and katib-controller-staging.

By default, these two values will fall back to the hardcoded constants defined here: https://github.com/kubeflow/katib/blob/master/pkg/cert-generator/v1beta1/consts/const.go#L21

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #
Allows cert-generator usage like:

./katib-cert-generator generate --namespace={{ Namespace }} --jobName={{ JobName }} --serviceName={{ ServiceName }}

Checklist:

  • Docs included if any changes are user facing

@shaowei-su
Copy link
Contributor Author

Hi @andreyvelich could you help take a look for this PR, thanks!

@coveralls
Copy link

coveralls commented Jun 7, 2022

Coverage Status

Coverage decreased (-0.3%) to 73.697% when pulling 748fb80 on shaowei-su:shaowei--update-cert-gen into e2378c3 on kubeflow:master.

@johnugeorge
Copy link
Member

/cc @tenzen-y

@google-oss-prow google-oss-prow bot requested a review from tenzen-y June 8, 2022 04:19
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! @shaowei-su

I do not know of any use cases where we want to change only jobName or serviceName.
Should we change the implementation so that all names can have a prefix or suffix?

What do you think? @kubeflow/wg-automl-leads @shaowei-su

@johnugeorge
Copy link
Member

Isn't this harmless anyways?

@tenzen-y
Copy link
Member

tenzen-y commented Jun 8, 2022

Yes, this feature is harmless. It makes sense. @johnugeorge
I will consider prefix or suffix at another time.

@shaowei-su
If a specified Service resource does not exist on the specified namespace, the katib-cert-generator can not generate an appropriate certificate. So, can you add a process to verify Service name?

@google-oss-prow google-oss-prow bot added size/M and removed size/S labels Jun 8, 2022
@shaowei-su
Copy link
Contributor Author

Good idea @tenzen-y PR updated with service resource validation. PTAL!

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these updates! @shaowei-su
/lgtm

@kubeflow/wg-automl-leads Can you start gh-actions?

@shaowei-su
Copy link
Contributor Author

It seems that all the e2e tests filed due to mysql connections

�mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!

cc + @johnugeorge

@johnugeorge
Copy link
Member

@shaowei-su The test error is happening because katib-cert-generator is in error state

@johnugeorge
Copy link
Member

johnugeorge commented Jun 9, 2022

@shaowei-su
Copy link
Contributor Author

shaowei-su commented Jun 9, 2022

Thanks @johnugeorge ! any chance we can view the error logs or reproduce the failure locally? could it be the ordering of deployments, e.g cert-generator deployed before the controller service created and thus failed the validation added in this PR?

@tenzen-y
Copy link
Member

@shaowei-su
You can reproduce it locally in the following steps:

  1. Create KinD Cluster with kind create cluster.
  2. Build cert-generator image with docker build -t shaowei-su/cert-generator:test -f cmd/cert-generator/v1beta1/Dockerfile ..
  3. Upload your cert-generator image to KinD Cluster with kind load docker-image shaowei-su/cert-generator:test.
  4. Modify cert-generator manifest with yq eval -i '.images[3].newTag|="test",.images[3].newName|="shaowei-su/cert-generator"' manifests/v1beta1/installs/katib-standalone/kustomization.yaml.
  5. Deploy Katib with ./scripts/v1beta1/deploy.sh

@johnugeorge
Copy link
Member

@shaowei-su Can you update this at the earliest as we are nearing the release timelines?

@google-oss-prow google-oss-prow bot removed the lgtm label Jun 13, 2022
@shaowei-su
Copy link
Contributor Author

Thanks @tenzen-y @johnugeorge for the detailed instructions!
It turns out to be the missing rbac permission for service and I updated in the latest commit. PTAL

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating PR! @shaowei-su
/lgtm

@kubeflow/wg-automl-leads Can you restart Go Test?

@google-oss-prow google-oss-prow bot added the lgtm label Jun 14, 2022
@tenzen-y
Copy link
Member

@johnugeorge
I think that this PR is ready to merge.
Can you restart Go Test / Unit Test?

@shaowei-su
Copy link
Contributor Author

Go tests have been restarted few times (https://github.com/kubeflow/katib/runs/6874004788?check_suite_focus=true) but always failed on k8s version 1.21.4 while succeeded on 1.22.1 and 1.23.5.

@tenzen-y
Copy link
Member

We have flaky unit tests...

ref: #1649

@johnugeorge
Copy link
Member

/approve

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnugeorge, shaowei-su

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 170647d into kubeflow:master Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants