-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Katib example in docs is not working #1425
Comments
Hello, azarezade ! Have you solved this problem? I have the same error. 😢 |
Hi @Gorosia, no success yet. Do you have Kubernetes on a on-premise cluster, or a single node machine. I suspect the issue may be related to the connection between pods, since I have a two node cluster, and my pods that run experiments are in different node that katib-controller pod runs. |
@azarezade |
@azarezade I am trying to run the official katib documentation example using kubeflow deployed through microk8s and I am getting this error.
|
@Josepholaidepetro I think you should try |
@azarezade That's what I did, I still got the error. |
I think you may need to open a new issue, unless you get the the same results in debugging command like |
@azarezade The experiment is created but nothing is running |
Thank you for creating this @azarezade. It seems that you are creating Experiment in |
Thanks @andreyvelich for the reply. I also tried to create experiment with the name that I set when logging in to the Kubeflow dashboard for the first time, but it returned error:
|
@azarezade |
Thanks @Gorosia. So I close this issue. |
/kind bug
What steps did you take and what happened:
I have a running Kubernetes (two nodes on-prem) cluster and installed Kubeflow using kfctl_k8s_istio config. Followed Getting Started with Katib, I have created a TensorFlow example and go through all 3 steps. This is my
tfjob-example.yaml
file:What did you expect to happen:
I expected to see the graphs and results of the experiments in Katib but all experiments remained in the
Running
status, although the logs of experiments containers shows that they areCompleted
.Anything else you would like to add:
Is seems the observation_logs is empty:
But, I don't know why it happed and how to trace it. Everything other seems to be alright.
Some other logs and debugging that I tried:
Environment:
kfctl v1.2.0-0-gbc038f9
Ubuntu 20.04.1 LTS
The text was updated successfully, but these errors were encountered: