Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use multipart upload method to put files larger than 5Gi to OSS. Fixes #12877 #12897

Merged
merged 2 commits into from
Apr 6, 2024

Conversation

AlbeeSo
Copy link
Contributor

@AlbeeSo AlbeeSo commented Apr 5, 2024

Fixes #12877

Motivation

return bucket.PutObjectFromFile(objectName, path)

OSS SDK client offers several functions help upload local files to server. And PutObjectFromFile uses Simple Upload which usually causes (or always from my experience) InvalidArgument or EntityTooLarge error when uploading files larger than 5Gi.
More info plz see: https://www.alibabacloud.com/help/en/oss/you-cannot-upload-large-objects-by-using-simple-upload?spm=a3c0i.23458820.2359477120.55.7ef56e9boHbYeC
Multipart Upload is the recommended way.

Modifications

When trying to save files in OSS by Save Func of ArtifactDriver Interface:

  • Still use Simple Upload method if file to saved is not larger than 5Gi.
  • Or else split it into size of capacity / 5Gi +1 parts and upload by Multipart Upload method.

Verification

Plz test with a simple workflow like

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-
spec:
  entrypoint: main
  templates:
    - name: main
      container:
        image: alpine:latest
        command:
          - sh
          - -c
        args:
          - |
            mkdir -p /out
            dd if=/dev/random of=/out/testfile.txt bs=20M count=1024
            echo "created files!"
      outputs:
        artifacts:
          - name: out
            path: /out/testfile.txt
            oss:
              endpoint: http://oss-cn-zhangjiakou-internal.aliyuncs.com
              bucket: bucket
              key: argo-workflows/test/bigfile
              accessKeySecret:
                name: my-argo-workflow-credentials
                key: accessKey
              secretKeySecret:
                name: my-argo-workflow-credentials
                key: secretKey

@shuangkun shuangkun self-assigned this Apr 5, 2024
@shuangkun shuangkun added the area/artifacts S3/GCP/OSS/Git/HDFS etc label Apr 5, 2024
Copy link
Member

@shuangkun shuangkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments here。

Co-authored-by: AlbeeSo <suyashi1321@163.com>
Co-authored-by: shuangkun <tsk2013uestc@163.com>
Signed-off-by: AlbeeSo <suyashi1321@163.com>
Signed-off-by: AlbeeSo <suyashi1321@163.com>
@AlbeeSo AlbeeSo changed the title fix: use multipart upload method to put files larger than 5G to OSS. Fixes #12877 fix: use multipart upload method to put files larger than 5Gi to OSS. Fixes #12877 Apr 5, 2024
@AlbeeSo
Copy link
Contributor Author

AlbeeSo commented Apr 5, 2024

Some comments here。

Done, PTAL. :)

@agilgur5 agilgur5 requested a review from juliev0 April 5, 2024 18:38
Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@terrytangyuan terrytangyuan merged commit d2369c9 into argoproj:main Apr 6, 2024
29 checks passed
Comment on lines +348 to +349
// OSS multipart upload code reference: https://www.alibabacloud.com/help/en/oss/user-guide/multipart-upload?spm=a2c63.p38356.0.0.4ebe423fzsaPiN#section-trz-mpy-tes
func multipartUpload(bucket *oss.Bucket, objectName, path string, objectSize int64) error {
Copy link
Member

@agilgur5 agilgur5 Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is mostly taken from the Alibaba docs, but I wonder if the chunking / UploadPart section could be parallelized?

If so, could potentially contribute that back upstream if Alibaba's docs are open source (idk?)

@agilgur5 agilgur5 added this to the v3.5.x patches milestone Apr 19, 2024
agilgur5 pushed a commit that referenced this pull request Apr 19, 2024
…ixes #12877 (#12897)

Signed-off-by: AlbeeSo <suyashi1321@163.com>
Co-authored-by: shuangkun <tsk2013uestc@163.com>
(cherry picked from commit d2369c9)
@agilgur5
Copy link
Member

Backported cleanly into release-3.5 as 1c1f433

isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this pull request May 6, 2024
…ixes argoproj#12877 (argoproj#12897)

Signed-off-by: AlbeeSo <suyashi1321@163.com>
Co-authored-by: shuangkun <tsk2013uestc@163.com>
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this pull request May 7, 2024
…ixes argoproj#12877 (argoproj#12897)

Signed-off-by: AlbeeSo <suyashi1321@163.com>
Co-authored-by: shuangkun <tsk2013uestc@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OSS client PutObject not support file larger than 5Gi
4 participants