Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: keep temporary files so they are pushed to the artifact bucket #66

Merged
merged 1 commit into from
Jul 25, 2022

Conversation

paulfouquet
Copy link
Collaborator

@paulfouquet paulfouquet commented Jul 25, 2022

Description

The temporary files generated with gdal should not been deleted as we need to push them to the Argo artifact bucket at the end of the workflow.

Change

  • Not using a TempDir which is empty at the end of the process, but write the files in the /tmp/ folder which is passed to Argo as Artifact output.

Test

FYI, I've submitted the following workflow in order to test the change:

---
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: imagery-standardise-
spec:
  serviceAccountName: workflow-runner-sa
  podGC:
    strategy: OnPodCompletion # Delete pod once its finished
  nodeSelector:
    karpenter.sh/capacity-type: "spot"
  entrypoint: main
  arguments:
    parameters:
      - name: uri
        value: "s3://linz-imagery-staging/RGB/nelson/2022/06/nelson-urban/"
      - name: filter
        value: "2022_BQ27_1000_2001.tif"
  templates:
    - name: main
      dag:
        tasks:
          - name: aws-list
            template: aws-list
            arguments:
              parameters:
                - name: uri
                  value: "{{workflow.parameters.uri}}"
                - name: filter
                  value: "{{workflow.parameters.filter}}"
          - name: standardise
            template: standardise
            arguments:
              parameters:
                - name: file
                  value: "{{item}}"
            depends: "aws-list"
            withParam: "{{tasks.aws-list.outputs.parameters.files}}"
    - name: aws-list
      inputs:
        parameters:
          - name: uri
          - name: filter
      container:
        resources:
          requests:
            memory: 2Gi
            cpu: 2000m
        image: ghcr.io/linz/basemaps/cli:v6.29.0-3-gb4dec98c
        command: [node, index.cjs]
        args:
          [
            "-V",
            "list",
            "--filter",
            "{{inputs.parameters.filter}}",
            "--output",
            "/tmp/file_list.json",
            "--config",
            "linz-bucket-config",
            "{{inputs.parameters.uri}}",
          ]
      outputs:
        parameters:
          - name: files
            valueFrom:
              path: /tmp/file_list.json
    - name: standardise
      inputs:
        parameters:
          - name: file
      script:
        image: ghcr.io/linz/topo-imagery:v0.2.0-17-gba69fea
        command: [python]
        source: |
          import sys

          # Because Argo Workflow executes the script under "/argo/staging/script"
          sys.path.append("/app/")

          # Put your code below
          import argparse
          import os

          from aws_helper import parse_path
          from file_helper import get_file_name_from_path
          from format_source import format_source
          from gdal_helper import run_gdal
          from linz_logger import get_log

          source = ["{{inputs.parameters.file}}"]

          get_log().info("standardising", source=source)
          gdal_env = os.environ.copy()

          for file in source:
              src_bucket_name, src_file_path = parse_path(file)
              standardized_file_name = f"standardized_{get_file_name_from_path(src_file_path)}"
              tmp_file_path = os.path.join("/tmp/", standardized_file_name)

              command = [
                  "gdal_translate",
                  "-q",
                  "-scale",
                  "0",
                  "255",
                  "0",
                  "254",
                  "-a_srs",
                  "EPSG:2193",
                  "-a_nodata",
                  "255",
                  "-b",
                  "1",
                  "-b",
                  "2",
                  "-b",
                  "3",
                  "-of",
                  "COG",
                  "-co",
                  "compress=lzw",
                  "-co",
                  "num_threads=all_cpus",
                  "-co",
                  "predictor=2",
                  "-co",
                  "overview_compress=webp",
                  "-co",
                  "bigtiff=yes",
                  "-co",
                  "overview_resampling=lanczos",
                  "-co",
                  "blocksize=512",
                  "-co",
                  "overview_quality=90",
                  "-co",
                  "sparse_ok=true",
              ]
              run_gdal(command, input_file=file, output_file=tmp_file_path)

      outputs:
        artifacts:
          - name: standardised_tiffs
            path: /tmp/
            archive:
              none: {}

@paulfouquet paulfouquet requested a review from a team as a code owner July 25, 2022 03:23
@kodiakhq kodiakhq bot merged commit 86b8c3a into master Jul 25, 2022
@kodiakhq kodiakhq bot deleted the fix/tde-421-keep-temporary-files branch July 25, 2022 20:14
@github-actions github-actions bot mentioned this pull request Aug 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants