Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

安装hami失败 #377

Open
9 tasks
bx0216 opened this issue Jul 8, 2024 · 0 comments
Open
9 tasks

安装hami失败 #377

bx0216 opened this issue Jul 8, 2024 · 0 comments

Comments

@bx0216
Copy link

bx0216 commented Jul 8, 2024

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

1. Issue or feature description

  1. 设置代理
    export http_proxy = xxxxx:xxx
    export https_proxy = xxxxx:xxxx
  2. wget www.google.com 显示代理连接成功
  3. 在有NV的GPU上打了标签gpu=on
  4. 执行helm install hami hami-charts/hami --set scheduler.kubeScheduler.imageTag=v1.16.8 -n kube-system
    失败,
    信息如下:
    install.go:222: [debug] Original chart version: ""
    install.go:239: [debug] CHART PATH: /home/vagrant/.cache/helm/repository/hami-2.0.0.tgz

client.go:486: [debug] Starting delete for "hami-admission" ServiceAccount
client.go:490: [debug] Ignoring delete failure for "hami-admission" /v1, Kind=ServiceAccount: serviceaccounts "hami-admission" not found
wait.go:104: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "hami-admission" ClusterRole
wait.go:104: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "hami-admission" ClusterRoleBinding
wait.go:104: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "hami-admission" Role
client.go:490: [debug] Ignoring delete failure for "hami-admission" rbac.authorization.k8s.io/v1, Kind=Role: roles.rbac.authorization.k8s.io "hami-admission" not found
wait.go:104: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "hami-admission" RoleBinding
client.go:490: [debug] Ignoring delete failure for "hami-admission" rbac.authorization.k8s.io/v1, Kind=RoleBinding: rolebindings.rbac.authorization.k8s.io "hami-admission" not found
wait.go:104: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "hami-admission-create" Job
client.go:490: [debug] Ignoring delete failure for "hami-admission-create" batch/v1, Kind=Job: jobs.batch "hami-admission-create" not found
wait.go:104: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:712: [debug] Watching for changes to Job hami-admission-create with timeout of 5m0s
client.go:740: [debug] Add/Modify event for hami-admission-create: ADDED
client.go:779: [debug] hami-admission-create: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:740: [debug] Add/Modify event for hami-admission-create: MODIFIED
client.go:779: [debug] hami-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:740: [debug] Add/Modify event for hami-admission-create: MODIFIED
client.go:779: [debug] hami-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:740: [debug] Add/Modify event for hami-admission-create: MODIFIED
client.go:779: [debug] hami-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:740: [debug] Add/Modify event for hami-admission-create: MODIFIED
client.go:779: [debug] hami-admission-create: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:740: [debug] Add/Modify event for hami-admission-create: MODIFIED
client.go:486: [debug] Starting delete for "hami-admission" ServiceAccount
wait.go:104: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:486: [debug] Starting delete for "hami-admission" ClusterRole
wait.go:104: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:486: [debug] Starting delete for "hami-admission" ClusterRoleBinding
wait.go:104: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:486: [debug] Starting delete for "hami-admission" Role
wait.go:104: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:486: [debug] Starting delete for "hami-admission" RoleBinding
wait.go:104: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:486: [debug] Starting delete for "hami-admission-create" Job
wait.go:104: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 13 resource(s)
client.go:486: [debug] Starting delete for "hami-admission" ServiceAccount
client.go:490: [debug] Ignoring delete failure for "hami-admission" /v1, Kind=ServiceAccount: serviceaccounts "hami-admission" not found
wait.go:104: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "hami-admission" ClusterRole
client.go:490: [debug] Ignoring delete failure for "hami-admission" rbac.authorization.k8s.io/v1, Kind=ClusterRole: clusterroles.rbac.authorization.k8s.io "hami-admission" not found
wait.go:104: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "hami-admission" ClusterRoleBinding
client.go:490: [debug] Ignoring delete failure for "hami-admission" rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io "hami-admission" not found
wait.go:104: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "hami-admission" Role
client.go:490: [debug] Ignoring delete failure for "hami-admission" rbac.authorization.k8s.io/v1, Kind=Role: roles.rbac.authorization.k8s.io "hami-admission" not found
wait.go:104: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "hami-admission" RoleBinding
client.go:490: [debug] Ignoring delete failure for "hami-admission" rbac.authorization.k8s.io/v1, Kind=RoleBinding: rolebindings.rbac.authorization.k8s.io "hami-admission" not found
wait.go:104: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "hami-admission-patch" Job
client.go:490: [debug] Ignoring delete failure for "hami-admission-patch" batch/v1, Kind=Job: jobs.batch "hami-admission-patch" not found
wait.go:104: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:712: [debug] Watching for changes to Job hami-admission-patch with timeout of 5m0s
client.go:740: [debug] Add/Modify event for hami-admission-patch: ADDED
client.go:779: [debug] hami-admission-patch: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:740: [debug] Add/Modify event for hami-admission-patch: MODIFIED
client.go:779: [debug] hami-admission-patch: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: INSTALLATION FAILED: failed post-install: 1 error occurred:
* timed out waiting for the condition

helm.go:84: [debug] failed post-install: 1 error occurred:
* timed out waiting for the condition

INSTALLATION FAILED
main.newInstallCmd.func2
helm.sh/helm/v3/cmd/helm/install.go:158
github.com/spf13/cobra.(*Command).execute
github.com/spf13/cobra@v1.8.0/command.go:983
github.com/spf13/cobra.(*Command).ExecuteC
github.com/spf13/cobra@v1.8.0/command.go:1115
github.com/spf13/cobra.(*Command).Execute
github.com/spf13/cobra@v1.8.0/command.go:1039
main.main
helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
runtime/proc.go:271
runtime.goexit
runtime/asm_amd64.s:1695

2. Steps to reproduce the issue

3. Information to attach (optional if deemed irrelevant)

Common error checking:

  • The output of nvidia-smi -a on your host
  • Your docker or containerd configuration file (e.g: /etc/docker/daemon.json)
  • The vgpu-device-plugin container logs
  • The vgpu-scheduler container logs
  • The kubelet logs on the node (e.g: sudo journalctl -r -u kubelet)

Additional information that might help better understand your environment and reproduce the bug:

  • Docker version from docker version
  • Docker command, image and tag used
  • Kernel version from uname -a
  • Any relevant kernel output lines from dmesg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant