-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU example for NAS RL cifar10 training container #999
CPU example for NAS RL cifar10 training container #999
Conversation
Send trial info from CRD to GRPC
@@ -67,7 +67,7 @@ spec: | |||
- "RunTrial.py" | |||
{{- with .HyperParameters}} | |||
{{- range .}} | |||
- "--{{.Name}}={{.Value}}" | |||
- "--{{.Name}}=\"{{.Value}}\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here https://github.com/andreyvelich/katib/blob/cpu-example-rl-cifar10/examples/v1alpha3/NAS-training-containers/RL-cifar10/RunTrial.py#L13 I parse architecture
and nn_config
as a string. The format of architecture something like this [[15], [86, 1], [7, 1, 0], [57, 1, 0, 0], [3, 0, 1, 1, 0], [42, 1, 1, 1, 1, 0], [17, 0, 0, 0, 1, 0, 0], [49, 0, 1, 1, 0, 0, 0, 1]]
and we need to send it as string. If we pass it without double quotes it fails in parsing.
t []trialsv1alpha3.Trial) []*suggestionapi.Trial { | ||
res := make([]*suggestionapi.Trial, 0) | ||
return res | ||
func (g *General) ConvertTrials(ts []trialsv1alpha3.Trial) []*suggestionapi.Trial { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange. Without trial information being sent to suggestion algorithm services, how did it work for other algorithms?
@gaocegege
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have unit test with trial info. But if the trial info is missing, the algorithm also works. I think it is the reason why we did not find the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trials = Trial.convert(request.trials) |
We are using trials from the request. How can bayesianoptimization like algo work without the previous trial information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bayesianoptimization can work since the prior knowledge is always nil. skopt can handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is strange why we are trying to convert it from request.trials
. I faced with the problem when request.trials
is nil, so it is nothing to convert.
/retest |
1 similar comment
/retest |
grid search has consistently failed. This has to be looked at |
The hyperparameter is created, but the trials is not finished. |
We should try with increased timeout |
/retest |
I think I found the problem. Here https://chocolate.readthedocs.io/api/sample.html#chocolate.Grid is says that search space to explore can be only discrete dimensions, but we have in our example https://github.com/kubeflow/katib/blob/master/examples/v1alpha3/grid-example.yaml#L33 categorical type also. |
Hold for now |
Check if trial is SUCCEEDED in grid
Tests passed. /hold cancel |
/lgtm |
@andreyvelich Thanks for the fix. It is my fault. I meet the problem before, but forget that we do not run the unit test for grid search. https://github.com/kubeflow/katib/blob/master/test/suggestion/v1alpha3/test_chocolate_service.py.failed |
@gaocegege but Is this a acceptable restriction for not allowing to having categorical parameters for grid algorithm? |
@johnugeorge It is not. I think we should investigate if we could solve it. |
Yeah. I think so. This restriction looks unacceptable to me as this is one of the common algorithms. Should we merge it anyways or wait ? |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johnugeorge The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I made few improvements in this PR:
ConvertTrials
function return empty Trial set).Because of that, I added
ConvertTrials
function to Suggestion clientgetSuggestions
call.Please, take a look.
/cc @gaocegege @johnugeorge @hougangliu
This change is