Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use MultipartUpload for uploading chunks to s3 #27034

Merged
merged 4 commits into from
Mar 9, 2023

Conversation

juliusknorr
Copy link
Member

@juliusknorr juliusknorr commented May 19, 2021

Implements #19414

This allows chunked upload streaming the chunks directly to S3 so that the final MOVE request no longer needs to assemble the final file on the Nextcloud server.

Ref: https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html

Current API changes to trigger the new upload mechanism (required changes on the client side)

The DAV capabilities expose a feature flag s3-multipart if available.

Additional requirements on top of the current chunking API in order to trigger it:

  • New additional header is required on all chunked upload requests (MKCOL/PUT/MOVE)
    • Destination: http://nextcloud.dev.local/remote.php/dav/files/admin/myfiletarget
  • Limitations
    • Chunk filename must be numberic between 1 and 10000
    • Chunks must be at least 5MB of size (except for the last one)
    • Chunks cannot be downloaded from the upload directory

Summary of how S3 handles MultipartUpload

  • CreateMultipartUpload is called with the destination bucket, object key and will return an uploadId
  • UploadPart gets called for each part with the bucket, objectKey, uploadId, contentLength and the content, it returns a set of an ETag and the UploadId which need to be stored for the CompleteMultipartUpload
  • CompleteMultipartUpload Receives the list of parts and merges them directly on S3 to the final object

How it currently works

  • MKCOL Create the UploadFolder, initiate the MultipartUpload through the IChunkedFileWrite storage and persist the returned chunkToken (uploadId) in the dav properties of the folder
    • If the destination file does not exist we create a .target file in the upload folder to have a filecache entry
    • The destination is required upfront to allow writing directly to the target file
  • PUT Writes the stream data directly to the S3 object through the IChunkedFileWrite storage
    • Validates the DAV chunk filename name to be numeric between 1 and 10000
  • MOVE the MultipartUpload will be completed
    • if the file exists we are done
    • if the file didn't exist, call rename to move the .target file from the upload folder to the destination
  • If a storage doesn't exist the new API calls will just fallback to the previous chunking mechanism

Test script

Bash script to test the upload
#!/bin/bash
# Usage: ./upload.sh targetname1
# Uploads a file consisting of 3 chunks and downloads it a again to verify the upload

if [ -z "$1" ]
  then
    echo "Usage: ./s3-chunk filepath"
	exit 1
fi

CHUNKSIZE=10
CHUNKS=2
> /tmp/one.testbin
for i in $(seq 1 $CHUNKS)
do
	CHUNKSIZE=$(( $RANDOM % 10 + 5 ))
	dd if=/dev/random of=/tmp/$i.testbin bs=${CHUNKSIZE}M count=1 2>/dev/null
	cat /tmp/$i.testbin >> /tmp/one.testbin
done

ls -l /tmp/*.testbin

SERVER=http://nextcloud.dev.local
USER=admin
PASS=admin
FILENAME=$1
timestamp=$(date +%s)

#echo "=> Cleanup existing file"
#curl -X DELETE -u $USER:$PASS "$SERVER/remote.php/dav/files/$USER/$FILENAME"



DESTINATION="$SERVER/remote.php/dav/files/admin/$FILENAME"

echo "Uploading to $USER:$PASS $SERVER/remote.php/dav/uploads/$USER/$timestamp"
echo "=> MKCOL"
curl -X MKCOL -u $USER:$PASS $SERVER/remote.php/dav/uploads/$USER/$timestamp \
	-H "Destination: $DESTINATION"

echo "=> PUT"
for i in $(seq 1 $CHUNKS | shuf)
do
	echo "- PART $i"
	(curl -X PUT -u $USER:$PASS $SERVER/remote.php/dav/uploads/$USER/$timestamp/$i -T /tmp/$i.testbin -H "X-S3-Multipart-Destination: files/admin/$FILENAME" \
		-H "Destination: $DESTINATION"; echo " - DONE $i") &
done

wait

# exit 1;

echo "=> MOVE"
curl -X MOVE -u $USER:$PASS --header "Destination:$SERVER/remote.php/dav/files/$USER/$FILENAME" $SERVER/remote.php/dav/uploads/$USER/$timestamp/.file \
	-H "Destination: $DESTINATION"

curl -u $USER:$PASS $SERVER/remote.php/dav/files/$USER/$FILENAME > /tmp/compare.testbin

md5sum /tmp/one.testbin
md5sum /tmp/compare.testbin

ls -l /tmp/*.testbin

rm /tmp/*.testbin

@juliusknorr juliusknorr force-pushed the enh/s3-multipart-upload-api branch 4 times, most recently from 108d4e6 to b1690b2 Compare May 26, 2021 15:38
@juliusknorr juliusknorr force-pushed the enh/s3-multipart-upload-api branch from b1690b2 to 6cf9a74 Compare July 7, 2021 11:22
@juliusknorr juliusknorr force-pushed the enh/s3-multipart-upload-api branch from 6cf9a74 to ea62004 Compare July 28, 2021 13:30
@juliusknorr juliusknorr force-pushed the enh/s3-multipart-upload-api branch from ea62004 to e69b0b9 Compare August 4, 2021 13:34
@juliusknorr juliusknorr self-assigned this Aug 12, 2021
@skjnldsv skjnldsv added this to the Nextcloud 23 milestone Aug 30, 2021
tsdicloud added a commit to nextmcloud/server that referenced this pull request Sep 13, 2021
The change was needed to avoid doubling of upload time as the original solution does a complete re-read of the S3 object on final MOVE of NextCloud chunked upload. See nextcloud#27034 for details.

Signed-off-by: Bernd.Rederlechner@t-systems.com <bernd.rederlechner@t-systems.com>
@juliusknorr juliusknorr force-pushed the enh/s3-multipart-upload-api branch 2 times, most recently from 89e2e83 to 31a3a8e Compare September 14, 2021 18:31
@juliusknorr
Copy link
Member Author

@nextcloud/server-backend I didn't get that far in finishing this as expected, but some intermediate feedback on the approach with the storage abstraction and the chunking plugin would already be much appreciated.

@skjnldsv skjnldsv mentioned this pull request Oct 13, 2021
@juliusknorr juliusknorr force-pushed the enh/s3-multipart-upload-api branch from 31a3a8e to 68e2a44 Compare October 14, 2021 15:20
dokshukin pushed a commit to nextmcloud/server that referenced this pull request Oct 18, 2021
The change was needed to avoid doubling of upload time as the original solution does a complete re-read of the S3 object on final MOVE of NextCloud chunked upload. See nextcloud#27034 for details.

Signed-off-by: Bernd.Rederlechner@t-systems.com <bernd.rederlechner@t-systems.com>
dokshukin pushed a commit to nextmcloud/server that referenced this pull request Oct 18, 2021
The change was needed to avoid doubling of upload time as the original solution does a complete re-read of the S3 object on final MOVE of NextCloud chunked upload. See nextcloud#27034 for details.

Signed-off-by: Bernd.Rederlechner@t-systems.com <bernd.rederlechner@t-systems.com>
sgyuris pushed a commit to nextmcloud/server that referenced this pull request Oct 20, 2021
The change was needed to avoid doubling of upload time as the original solution does a complete re-read of the S3 object on final MOVE of NextCloud chunked upload. See nextcloud#27034 for details.

Signed-off-by: Bernd.Rederlechner@t-systems.com <bernd.rederlechner@t-systems.com>
@skjnldsv skjnldsv mentioned this pull request Oct 25, 2021
25 tasks
@juliusknorr juliusknorr force-pushed the enh/s3-multipart-upload-api branch 3 times, most recently from 0610055 to 6a1abcd Compare October 29, 2021 15:56
@blizzz
Copy link
Member

blizzz commented Mar 9, 2023

/backport to stable26

@pierreozoux
Copy link
Member

🚀

This is amazing :) Thanks!

(hope it is ok to "pollute" github PR with praising :) it is easy to complain about bugs, but we should also learn to celebrate :) )

@backportbot-nextcloud
Copy link

The backport to stable-23 failed. Please do this backport manually.

@backportbot-nextcloud
Copy link

The backport to stable26 failed. Please do this backport manually.

@ir0nhide
Copy link
Contributor

ir0nhide commented Mar 9, 2023

@juliushaertl great work, thank you very much!
Looking at the newly added code in https://github.com/nextcloud/server/blob/master/lib/private/Files/ObjectStore/S3.php, i am wondering if it is also required to add SSE-C headers here like done in e.g. https://github.com/nextcloud/server/blob/master/lib/private/Files/ObjectStore/S3ObjectTrait.php#L62 to support #32798 with multipart upload?

@backportbot-nextcloud
Copy link

The backport to stable-22 failed. Please do this backport manually.

@backportbot-nextcloud
Copy link

The backport to stable22 failed. Please do this backport manually.

@backportbot-nextcloud
Copy link

The backport to stable23 failed. Please do this backport manually.

@backportbot-nextcloud
Copy link

The backport to stable24 failed. Please do this backport manually.

@juliusknorr
Copy link
Member Author

@juliushaertl great work, thank you very much!
Looking at the newly added code in master/lib/private/Files/ObjectStore/S3.php, i am wondering if it is also required to add SSE-C headers here like done in e.g. master/lib/private/Files/ObjectStore/S3ObjectTrait.php#L62 to support #32798 with multipart upload?

Thanks, good catch indeed. I'll take care of that in the follow up.

@unteem
Copy link

unteem commented Jul 6, 2023

Awesome feature :) ! Thanks a lot

Are there any reasons for it not being backported to v25 ?

@AndyScherzinger
Copy link
Member

We usually only backport security fixes or critical bugfixes. Any kind of performance improvement as well as features gets pinned to the upcoming major release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3. to review Waiting for reviews enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.