Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression on Optional GCS Artifact #6576

Closed
AntoineDao opened this issue Aug 20, 2021 · 0 comments · Fixed by #6577
Closed

Regression on Optional GCS Artifact #6576

AntoineDao opened this issue Aug 20, 2021 · 0 comments · Fixed by #6577
Labels

Comments

@AntoineDao
Copy link
Contributor

AntoineDao commented Aug 20, 2021

Summary

What happened/what you expected to happen?

Optional GCS artifacts had been fixed and tested manually in #6393. It seems that some changes in #6302 has caused a regression on this behaviour.

This bug is reproducible by running the following two workflows:

Failing Workflow

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  namespace: argo
  generateName: gcs-test-
  labels:
    workflows.argoproj.io/container-runtime-executor: emissary
    disposable: 'true'
spec:
  entrypoint: gcs-test
  templates:
    - name: gcs-test
      inputs:
        artifacts:
          - name: my-art
            path: /my-artifact
            optional: true
            gcs:
              bucket: pollination-public
              key: blobs/argo-test/not-exists.txt
              serviceAccountKeySecret:
                name: gcs-creds
                key: serviceAccountKey
      container:
        image: ubuntu:latest
        command: [sh, -c]
        args: ["echo 'no artifact :('"]

Working Workflow

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  namespace: argo
  generateName: gcs-test-
  labels:
    workflows.argoproj.io/container-runtime-executor: emissary
    disposable: 'true'
spec:
  entrypoint: gcs-test
  templates:
    - name: gcs-test
      inputs:
        artifacts:
          - name: my-art
            path: /my-artifact
            optional: true
            http:
              url: ''
      container:
        image: ubuntu:latest
        command: [sh, -c]
        args: ["echo 'no artifact :('"]

Solution/Cause

I'm pretty sure the issue is caused by a misunderstanding on the retry API and the bool value to provide to it. The isTransientGCSErr function works correctly and the unit tests make sense, however the Load and Save functions should return !isTransientGCSErr(err), err instead of isTransientGCSErr(err), err.

// Load function downloads objects from GCS
func (g *ArtifactDriver) Load(inputArtifact *wfv1.Artifact, path string) error {
err := waitutil.Backoff(defaultRetry,
func() (bool, error) {
log.Infof("GCS Load path: %s, key: %s", path, inputArtifact.GCS.Key)
gcsClient, err := g.newGCSClient()
if err != nil {
log.Warnf("Failed to create new GCS client: %v", err)
return isTransientGCSErr(err), err
}
defer gcsClient.Close()
err = downloadObjects(gcsClient, inputArtifact.GCS.Bucket, inputArtifact.GCS.Key, path)
if err != nil {
log.Warnf("Failed to download objects from GCS: %v", err)
return isTransientGCSErr(err), err
}
return true, nil
})
return err
}

// Save an artifact to GCS compliant storage, e.g., uploading a local file to GCS bucket
func (g *ArtifactDriver) Save(path string, outputArtifact *wfv1.Artifact) error {
err := waitutil.Backoff(defaultRetry,
func() (bool, error) {
log.Infof("GCS Save path: %s, key: %s", path, outputArtifact.GCS.Key)
client, err := g.newGCSClient()
if err != nil {
return isTransientGCSErr(err), err
}
defer client.Close()
err = uploadObjects(client, outputArtifact.GCS.Bucket, outputArtifact.GCS.Key, path)
if err != nil {
return isTransientGCSErr(err), err
}
return true, nil
})
return err
}

This is because the Backoff function wrapping the Load and Save calls expects the bool to indicate whether to retry or not where: true = don't retry and false = retry.

// the underlying ExponentialBackoff does not retain the underlying error
// so this addresses this
func Backoff(b wait.Backoff, f func() (bool, error)) error {
var err error
waitErr := wait.ExponentialBackoff(b, func() (bool, error) {
var done bool
done, err = f()
return done, nil
})
if waitErr != nil {
if err != nil {
return fmt.Errorf("%v: %v", waitErr, err)
} else {
return waitErr
}
}
return err
}

I'll propose a PR in the next hour or so with some proposed changes.

Diagnostics

👀 Yes! We need all of your diagnostics, please make sure you add it all, otherwise we'll go around in circles asking you for it:

What Kubernetes provider are you using?

  • GKE

What version of Argo Workflows are you running?

  • custom fork from argo master

What executor are you running? Docker/K8SAPI/Kubelet/PNS/Emissary

  • Emissary

Did this work in a previous version? I.e. is it a regression?

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant