spec: big file chunking

⚠ Deprecated in favor of https://dragotin.wordpress.com/2015/06/22/owncloud-chunking-ng/ which was added with https://github.com/owncloud/core/pull/20118 ⚠

The ownCloud custom file chunking algorithm that is used to upload big files via WebDAV. This is implemented in ownCloud 4.5 and higher:

Someone want´s to upload a big file for example big.mpg
The client splits the file into several chunks
The size of the chunks is flexible and can be optimized for best performance in the future.
The indivdual chunks are uploaded via WebDAV to the server.
The order of the uploads is not important.
And the upload can talk a long time and can be interruped without problems.
The client sends a custom http header OC-Chunked: 1 to enable chunked uploading mode on the server so that we always stay backward compatible.
The files are uploaded to the final location with a special name: <path/filename>-chunking-<transferid>-<chunkcount>-<index>
Example: big.mpg-chunking-4711-20-3
The transferid is a random id that together with the filename signals the server that specific chunks belong together.
The index starts at 0 and counts to chunkcount-1
The Server detects the files during upload as chunk files and keeps them in a temp folder
The server moves the file to the final place with the final name when all parts are uploaded
Last WebDAV requests ends when final file is in place. (syncron transfer)
temp folder is cleaned once a day to remove failed old uploads.
Some other header may be added which allow optimisations: OC-Total-Length is the size of the full file; OC-Chunk-Size is the size of each chunk but the last one.

In a second step this can be extended to support partial updating of files.

The client generated a parts list with md5 hashes for the different chunks
The server provides an addtional REST API to check the hashes.
The API is called with PUT method to http://.../remote.php/filesync/oc_chunked/path/to/file
the PUT datastream starts with a 2 byte chunksize
followed by binary md5 of the chunks
Everything in big-endian
The API returns the following information in json encoded format:
transferid: to use for the missing chunks
needed: list of chunk numbers
count: of provided hashes
The client sends the chunksize and list of hashes to the server via this API
Server responds with list of which chunks are needed and prepares the upload directory.
The client sends only the changed chunks.

Pitfalls:

Clients need to be aware that the server does not receive the OC-Chunked header because of a filtering proxy.

In that case: If there are more than one chunk and the server returns an etag after the first chunk, that can mean two things:

a) The server does not see the chunk header and created that file with the chunk name, which is the error.

b) The server already knew all parts and the one transmitted was the last missing. Thats cool and no file with the chunk filename is created.

If that happens, we send an DELETE request on the chunk file name. If that succeeds we have the error condition and can not do chunking. If the remove fails, we have case b) and all is fine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec: big file chunking

Clone this wiki locally