Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] PyTorchJob create pod failure due to pod webhook #1863

Closed
zmberg opened this issue Dec 18, 2024 · 0 comments
Closed

[BUG] PyTorchJob create pod failure due to pod webhook #1863

zmberg opened this issue Dec 18, 2024 · 0 comments
Assignees
Labels
kind/bug Something isn't working
Milestone

Comments

@zmberg
Copy link
Member

zmberg commented Dec 18, 2024

What happened:

  1. create PyTorchJob
apiVersion: kubeflow.org/v1
kind: PyTorchJob
metadata:
  name: pytorch-tcp-dist-mnist
spec:
  pytorchReplicaSpecs:
    Master:
      replicas: 1
      restartPolicy: OnFailure
      template:
        metadata:
          labels:
            app: tfjob
        spec:
          terminationGracePeriodSeconds: 0
          containers:
            - command: [ "/bin/sleep", "infinity" ]
              image: busybox:latest
              name: pytorch
  1. describe PyTorchJob
  Warning  FailedCreatePod  108s (x17 over 7m15s)  pytorchjob-controller  Error creating: admission webhook "mpod.kb.io" denied the request: Internal error occurred: the spec replicas field ".spec.pytorchReplicaSpecs.Worker.replicas" does not exist

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kruise version: v1.6.3
  • Kubernetes version (use kubectl version):
  • Install details (e.g. helm install args):
  • Others:
@zmberg zmberg added the kind/bug Something isn't working label Dec 18, 2024
@zmberg zmberg assigned FillZpp and zmberg and unassigned FillZpp Dec 18, 2024
@zmberg zmberg added this to the 1.8 milestone Dec 18, 2024
@zmberg zmberg changed the title [BUG] PyTorchJob create pod failed with kruise webhook [BUG] PyTorchJob create pod failure due to pod webhook Dec 18, 2024
@zmberg zmberg closed this as completed Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants