-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support argo Workflow CRD as new trial kind #1081
Comments
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
@terrykong It is possible. It is extensible. We have a doc for it https://github.com/kubeflow/katib/blob/master/docs/new-trial-kind.md although it may be outdated. The problem is that whether it is an actual use case. |
@gaocegege glad to hear it's possible. I don't know about others, but there are certainly convenient use cases covered by workflows. In particular optimizing a training -> benchmarking problem where the benchmarking doesn't need GPUs so the GPUs used by training pod can be freed up while benchmarking. At least to my knowledge this is definitely not possible with Jobs, and I'm guessing not with TFJobs and Pytorchjobs either. |
@gaocegege for us this is a use case as well. We would like to be able to tune parameters of arbitrary chains of Docker containers, e.g. executed as an Argo workflow. In such a workflow we could easily mix various languages to achieve our needs, instead of relying on a single container or a specific language. I will try to come up with an implementation, and try to commit back / propose a design if it works out. Any pointers on which parts of the doc are outdated? |
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
@nielsmeima Thanks for your interest! Now we have a new abstraction |
@gaocegege Thanks, I will give it a go. |
@nielsmeima We are also working on the new Trial Template implementation for the new version: #906 (comment). |
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
@andreyvelich Thanks for letting me know, that looks like a much better approach. just FYI: implementing the provider interface for Argo is very straightforward, however when using the The only short-term solution for this would be to create a fork of |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/lifecycle frozen |
We should investigate this comment: argoproj/argo-workflows#4545 (comment) to try to support Argo workflows. |
@andreyvelich: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/area release |
/kind feature
Describe the solution you'd like
For more complicated jobs where we may execute things in separate containers I have noticed that the only job types we support in a trial are
Job
s,TFJob
s andPyTorchJob
s. Would it be possible to also support argo workflows?Anything else you would like to add:
I see that https://github.com/kubeflow/katib/blob/master/docs/new-trial-kind.md is outdated, but it sounds like for a vanilla workflow, it should be possible to inject a metrics collector sidecar to each workflow container right?
The text was updated successfully, but these errors were encountered: