Chunk file disappears from cache during sync #24653

Marginal · 2016-05-16T23:35:53Z

Syncing large files with the 2.1.1 (build 3107) client on OSX against ownCloud 9.0.2 on a slowish 384KB/s link I'm getting an "expected filesize 5242880 got 0" exception on about 5% of the chunk uploads. I've added some additional logging to OC\Files\View::file_put_contents() (see owncloud.log below) and it appears that apache is receiving the file chunk contents OK, ownCloud is writing the contents to the filesystem cache OK, but the chunk files are occasionally disappearing from the filesystem before OC_FileChunking::store() returns.

Any pointers on how to debug this gratefully received. On the assumption that the underlying filesystem isn't randomly losing files, my working hypothesis is that the cache is occasionally clearing up chunk files too aggressively while they're still being written.

Also, I'm aware of #9832 but that issue seems a mess of various problems, and of #11778 but this seems to be a different problem. Could be related to owncloud/client#4421.

Steps to reproduce

Upload 6 100MB files with the ownCloud sync client over a slowish link.

Expected behaviour

Files sync OK.

Actual behaviour

Client reports "Error download https://xxx/xxx-chunking-xxx-y-z - server replied: Bad Request (expected filesize 5242880 got 0)"

Server configuration

Operating system: Linux

Web server: Apache 2.4.16

Database: MariaDB 10.1.14

PHP version: 5.5.32

ownCloud version: 9.0.2.2

Updated from an older ownCloud or fresh install: Updated incrementally from 5.0.7

Where did you install ownCloud from: https://download.owncloud.org/community/owncloud-9.0.2.zip

Signing status (ownCloud 9.0 and above):

No errors have been found.

List of activated apps:

Files
Gallery
Contacts
Calendar

The content of config/config.php:

<?php
$CONFIG = array (
  'debug' => false,
  'instanceid' => '51b4613a9b05a',
  'passwordsalt' => 'Redacted',
  'datadirectory' => '/home/marginal/owncloud-data',
  'dbtype' => 'mysql',
  'dbdriveroptions' => 
  array (
    1002 => 'SET wait_timeout = 3600',
  ),
  'version' => '9.0.2.2',
  'dbname' => 'marginal_owncloud',
  'dbhost' => 'localhost',
  'dbtableprefix' => 'oc_',
  'dbuser' => 'Redacted',
  'dbpassword' => 'Redacted',
  'installed' => true,
  'forcessl' => true,
  'logfile' => '/home/marginal/owncloud.log',
  'loglevel' => 0,
  'log_rotate_size' => 67108864,
  'maintenance' => false,
  'theme' => '',
  'trusted_domains' => 
  array (
    0 => 'Redacted',
  ),
  'secret' => 'Redacted',
  'share_folder' => '/Shared',
  'forceSSLforSubdomains' => true,
  'trashbin_retention_obligation' => 'auto',
);

Are you using external storage, if yes which one: Local

Are you using encryption: No

Are you using an external user-backend, if yes which one: None

Client configuration

Browser: Firefox 46.0.1

Operating system: OSX 10.10.5

Logs

Web server access log

x.x.x.x - - [17/May/2016:00:30:19 +0200] "PUT /remote.php/webdav/Test/100M-2-chunking-1949736465-20-2 HTTP/1.1" 201 - "-" "Mozilla/5.0 (Macintosh) mirall/2.1.1"
x.x.x.x - - [17/May/2016:00:31:08 +0200] "GET /ocs/v1.php/cloud/activity?page=0&pagesize=100&format=json HTTP/1.1" 200 242 "-" "Mozilla/5.0 (Macintosh) mirall/2.1.1"
x.x.x.x - - [17/May/2016:00:30:19 +0200] "PUT /remote.php/webdav/Test/100M-2-chunking-1949736465-20-0 HTTP/1.1" 201 - "-" "Mozilla/5.0 (Macintosh) mirall/2.1.1"
x.x.x.x - - [17/May/2016:00:30:19 +0200] "PUT /remote.php/webdav/Test/100M-2-chunking-1949736465-20-1 HTTP/1.1" 201 - "-" "Mozilla/5.0 (Macintosh) mirall/2.1.1"
x.x.x.x - - [17/May/2016:00:31:20 +0200] "PROPFIND /remote.php/webdav/ HTTP/1.1" 207 374 "-" "Mozilla/5.0 (Macintosh) mirall/2.1.1"
x.x.x.x - - [17/May/2016:00:31:29 +0200] "PROPFIND /remote.php/webdav/ HTTP/1.1" 207 416 "-" "Mozilla/5.0 (Macintosh) mirall/2.1.1"
x.x.x.x - - [17/May/2016:00:31:08 +0200] "PUT /remote.php/webdav/Test/100M-2-chunking-1949736465-20-3 HTTP/1.1" 201 - "-" "Mozilla/5.0 (Macintosh) mirall/2.1.1"
x.x.x.x - - [17/May/2016:00:31:52 +0200] "PROPFIND /remote.php/webdav/ HTTP/1.1" 207 374 "-" "Mozilla/5.0 (Macintosh) mirall/2.1.1"
x.x.x.x - - [17/May/2016:00:31:09 +0200] "PUT /remote.php/webdav/Test/100M-2-chunking-1949736465-20-4 HTTP/1.1" 201 - "-" "Mozilla/5.0 (Macintosh) mirall/2.1.1"
x.x.x.x - - [17/May/2016:00:32:02 +0200] "PROPFIND /remote.php/webdav/ HTTP/1.1" 207 416 "-" "Mozilla/5.0 (Macintosh) mirall/2.1.1"
x.x.x.x - - [17/May/2016:00:31:39 +0200] "PUT /remote.php/webdav/Test/100M-2-chunking-1949736465-20-6 HTTP/1.1" 201 - "-" "Mozilla/5.0 (Macintosh) mirall/2.1.1"
x.x.x.x - - [17/May/2016:00:32:27 +0200] "PROPFIND /remote.php/webdav/ HTTP/1.1" 207 374 "-" "Mozilla/5.0 (Macintosh) mirall/2.1.1"
x.x.x.x - - [17/May/2016:00:32:04 +0200] "PUT /remote.php/webdav/Test/100M-2-chunking-1949736465-20-7 HTTP/1.1" 201 - "-" "Mozilla/5.0 (Macintosh) mirall/2.1.1"
x.x.x.x - - [17/May/2016:00:32:34 +0200] "PROPFIND /remote.php/webdav/ HTTP/1.1" 207 416 "-" "Mozilla/5.0 (Macintosh) mirall/2.1.1"
x.x.x.x - - [17/May/2016:00:31:09 +0200] "PUT /remote.php/webdav/Test/100M-2-chunking-1949736465-20-5 HTTP/1.1" 201 - "-" "Mozilla/5.0 (Macintosh) mirall/2.1.1"
x.x.x.x - - [17/May/2016:00:32:06 +0200] "PUT /remote.php/webdav/Test/100M-2-chunking-1949736465-20-8 HTTP/1.1" 400 225 "-" "Mozilla/5.0 (Macintosh) mirall/2.1.1"

ownCloud log (data/owncloud.log)

{"reqId":"VzpJ@23JklEAADcNxJsAAAAG","remoteAddr":"x.x.x.x","app":"file_put_contents","message":"Wrote 5242880 bytes from Resource id #320 to cache\/100M-2-chunking-1949736465-2.YQIY9G9r9RWnP7YS.part: exists: 1, readable: 1, size: 5242880","level":1,"time":"2016-05-16T22:31:06+00:00","method":"PUT","url":"\/remote.php\/webdav\/Test\/100M-2-chunking-1949736465-20-2","user":"xxx"}
{"reqId":"VzpJ@23JklEAADVGWzcAAAAM","remoteAddr":"x.x.x.x","app":"file_put_contents","message":"Wrote 5242880 bytes from Resource id #320 to cache\/100M-2-chunking-1949736465-0.5MiXMy0jYPhWsfQG.part: exists: 1, readable: 1, size: 5242880","level":1,"time":"2016-05-16T22:31:07+00:00","method":"PUT","url":"\/remote.php\/webdav\/Test\/100M-2-chunking-1949736465-20-0","user":"xxx"}
{"reqId":"VzpJ@23JklEAADRc0bsAAAAD","remoteAddr":"x.x.x.x","app":"file_put_contents","message":"Wrote 5242880 bytes from Resource id #320 to cache\/100M-2-chunking-1949736465-1.JxhlUUjsdiqlImLV.part: exists: 1, readable: 1, size: 5242880","level":1,"time":"2016-05-16T22:31:07+00:00","method":"PUT","url":"\/remote.php\/webdav\/Test\/100M-2-chunking-1949736465-20-1","user":"xxx"}
{"reqId":"VzpKLG3JklEAADpXFl4AAAAI","remoteAddr":"x.x.x.x","app":"file_put_contents","message":"Wrote 5242880 bytes from Resource id #320 to cache\/100M-2-chunking-1949736465-3.9caCvTHfJ9mRqU9Z.part: exists: 1, readable: 1, size: 5242880","level":1,"time":"2016-05-16T22:31:37+00:00","method":"PUT","url":"\/remote.php\/webdav\/Test\/100M-2-chunking-1949736465-20-3","user":"xxx"}
{"reqId":"VzpKLW3JklEAADbZQCoAAAAC","remoteAddr":"x.x.x.x","app":"file_put_contents","message":"Wrote 5242880 bytes from Resource id #320 to cache\/100M-2-chunking-1949736465-4.g8dA06k5JAF1iUzp.part: exists: 1, readable: 1, size: 5242880","level":1,"time":"2016-05-16T22:32:00+00:00","method":"PUT","url":"\/remote.php\/webdav\/Test\/100M-2-chunking-1949736465-20-4","user":"xxx"}
{"reqId":"VzpKS23JklEAADhbAwwAAAAJ","remoteAddr":"x.x.x.x","app":"file_put_contents","message":"Wrote 5242880 bytes from Resource id #320 to cache\/100M-2-chunking-1949736465-6.3YH71eRiLUiVjh9C.part: exists: 1, readable: 1, size: 5242880","level":1,"time":"2016-05-16T22:32:04+00:00","method":"PUT","url":"\/remote.php\/webdav\/Test\/100M-2-chunking-1949736465-20-6","user":"xxx"}
{"reqId":"VzpKZG3JklEAADnhLgIAAAAB","remoteAddr":"x.x.x.x","app":"file_put_contents","message":"Wrote 5242880 bytes from Resource id #320 to cache\/100M-2-chunking-1949736465-7.bmRfooGP900enSbt.part: exists: 1, readable: 1, size: 5242880","level":1,"time":"2016-05-16T22:32:28+00:00","method":"PUT","url":"\/remote.php\/webdav\/Test\/100M-2-chunking-1949736465-20-7","user":"xxx"}
{"reqId":"VzpKLW3JklEAADcNxJ4AAAAG","remoteAddr":"x.x.x.x","app":"file_put_contents","message":"Wrote 5242880 bytes from Resource id #320 to cache\/100M-2-chunking-1949736465-5.2a6iZC56JUVnROmJ.part: exists: 1, readable: 1, size: 5242880","level":1,"time":"2016-05-16T22:32:37+00:00","method":"PUT","url":"\/remote.php\/webdav\/Test\/100M-2-chunking-1949736465-20-5","user":"xxx"}
{"reqId":"VzpKZm3JklEAACfcAz8AAAAF","remoteAddr":"x.x.x.x","app":"PHP","message":"filesize(): stat failed for \/home\/marginal\/owncloud-data\/xxx\/cache\/100M-2-chunking-1949736465-8.adjGuN5SCj567wO6.part at \/home\/marginal\/public_html\/lib\/private\/files\/storage\/local.php#142","level":3,"time":"2016-05-16T22:32:53+00:00","method":"PUT","url":"\/remote.php\/webdav\/Test\/100M-2-chunking-1949736465-20-8","user":"xxx"}
{"reqId":"VzpKZm3JklEAACfcAz8AAAAF","remoteAddr":"x.x.x.x","app":"file_put_contents","message":"Wrote 5242880 bytes from Resource id #320 to cache\/100M-2-chunking-1949736465-8.adjGuN5SCj567wO6.part: exists: 0, readable: 0, size: 0","level":1,"time":"2016-05-16T22:32:53+00:00","method":"PUT","url":"\/remote.php\/webdav\/Test\/100M-2-chunking-1949736465-20-8","user":"xxx"}
{"reqId":"VzpKZm3JklEAACfcAz8AAAAF","remoteAddr":"x.x.x.x","app":"webdav","message":"Exception: {\"Message\":\"HTTP\\\/1.1 400 expected filesize 5242880 got 0\",\"Exception\":\"Sabre\\\\DAV\\\\Exception\\\\BadRequest\",\"Code\":0,\"Trace\":\"#0 \\\/home\\\/marginal\\\/public_html\\\/apps\\\/dav\\\/lib\\\/connector\\\/sabre\\\/file.php(103): OCA\\\\DAV\\\\Connector\\\\Sabre\\\\File->createFileChunked(Resource id #320)\\n#1 \\\/home\\\/marginal\\\/public_html\\\/apps\\\/dav\\\/lib\\\/connector\\\/sabre\\\/directory.php(134): OCA\\\\DAV\\\\Connector\\\\Sabre\\\\File->put(Resource id #320)\\n#2 \\\/home\\\/marginal\\\/public_html\\\/3rdparty\\\/sabre\\\/dav\\\/lib\\\/DAV\\\/Server.php(1036): OCA\\\\DAV\\\\Connector\\\\Sabre\\\\Directory->createFile('100M-2-chunking...', Resource id #320)\\n#3 \\\/home\\\/marginal\\\/public_html\\\/3rdparty\\\/sabre\\\/dav\\\/lib\\\/DAV\\\/CorePlugin.php(523): Sabre\\\\DAV\\\\Server->createFile('Test\\\/100M-2-chu...', Resource id #320, NULL)\\n#4 [internal function]: Sabre\\\\DAV\\\\CorePlugin->httpPut(Object(Sabre\\\\HTTP\\\\Request), Object(Sabre\\\\HTTP\\\\Response))\\n#5 \\\/home\\\/marginal\\\/public_html\\\/3rdparty\\\/sabre\\\/event\\\/lib\\\/EventEmitterTrait.php(105): call_user_func_array(Array, Array)\\n#6 \\\/home\\\/marginal\\\/public_html\\\/3rdparty\\\/sabre\\\/dav\\\/lib\\\/DAV\\\/Server.php(459): Sabre\\\\Event\\\\EventEmitter->emit('method:PUT', Array)\\n#7 \\\/home\\\/marginal\\\/public_html\\\/3rdparty\\\/sabre\\\/dav\\\/lib\\\/DAV\\\/Server.php(248): Sabre\\\\DAV\\\\Server->invokeMethod(Object(Sabre\\\\HTTP\\\\Request), Object(Sabre\\\\HTTP\\\\Response))\\n#8 \\\/home\\\/marginal\\\/public_html\\\/apps\\\/dav\\\/appinfo\\\/v1\\\/webdav.php(55): Sabre\\\\DAV\\\\Server->exec()\\n#9 \\\/home\\\/marginal\\\/public_html\\\/remote.php(138): require_once('\\\/home\\\/marginal\\\/...')\\n#10 {main}\",\"File\":\"\\\/home\\\/marginal\\\/public_html\\\/apps\\\/dav\\\/lib\\\/connector\\\/sabre\\\/file.php\",\"Line\":406,\"User\":\"xxx\"}","level":4,"time":"2016-05-16T22:32:54+00:00","method":"PUT","url":"\/remote.php\/webdav\/Test\/100M-2-chunking-1949736465-20-8","user":"xxx"}

Browser log

n/a

The text was updated successfully, but these errors were encountered:

PVince81 · 2016-05-17T13:06:55Z

clearing up chunk files too aggressively while they're still being written

I'd also suspect that.
Are you able to do a ls -l of the chunk file before it disappears ?
The way how cleanup works is that it sets the mtime of the chunk 24 hours in the future.

Then cleanup job is triggered at login time (login hook) and would discard all chunks which mtime is < now. Maybe the timestamp / timezone of the server is causing the mtime to be wrong ?

Marginal · 2016-05-17T13:11:16Z

Are you able to do a ls -l of the chunk file before it disappears ?

Yes - the offending chunk files are definitely created on the filesystem, and are visible to a sftp client (I'm running on shared hosting).

Then cleanup job is triggered at login time (login hook) and would discard all chunks which mtime is < now.

I'm looking into this now. While the chunk file is being written to the mtime ~= now. The mtime is only set 24hrs into the future after the file handle has been closed, in OC\Cache\File::set.

PVince81 · 2016-05-17T13:18:49Z

The mtime is only set 24hrs into the future after the file handle has been closed, in OC\Cache\File::set.

Hmmmm, indeed. But then there shouldn't be a huge time window during the time the file is written and the touch operation is done.
Unless file_put_contents is slow and already creates the file entry while it's writing it ?
Chunks are usually 20 mb big, and if the transfer is slow...

Also the code uses a .part file while it's being written. But the gc code doesn't care about extension. This is correct because we might want to clear stray part chunks too.

Were you able to observe the time between the moment where the part file is being written, does it exist ?

You might be onto something, keep posting your findings 😄
Will do some local testing with a debugger, this looks fishy.

PVince81 · 2016-05-17T13:35:31Z

I did some debugging and found out that this file_put_contents from chunking is actually calling streamCopy in a loop, because the chunk contents is a file handle, not data.
When checking the timestamp of the file on-disk I noticed that for every fwrite the mtime of the file was changed. So it's a game of cat and mouse until the GC comes to eat the file if its mtime happens to be older than now.

CC @RealRancor @guruz

PVince81 · 2016-05-17T13:36:15Z

So basically there's a loop that copies the bytes 8k by 8k, and for every fwrite the timestamp gets adjusted. If one of them takes longer and at the same time a login hook is triggered, it will kill the chunk too early.

PVince81 · 2016-05-17T13:37:14Z

@Marginal great debugging, thank you!

PVince81 · 2016-05-17T13:38:44Z

A possible fix would be to not use touch but do actual time calculation in the gc method.
Basically, instead of setting the chunk's mtime to the future we leave it be and do the gc on files where mtime < now - 24hours

Marginal · 2016-05-17T13:40:53Z

Yep - I added some logging to OC\Cache\File::gc and I can see this happening:

{"reqId":"VzsdGG3JklEAAA0Z70YAAAAB","remoteAddr":"x.x.x.x","app":"OC\\Files\\View::file_put_contents","message":"Wrote 5242880 bytes from Resource id #320 to cache\/100M-2-chunking-49666361-2.flUvxizYuEcTCGAx.part: exists: 1, readable: 1, size: 5242880","level":1,"time":"2016-05-17T13:31:42+00:00","method":"PUT","url":"\/remote.php\/webdav\/Test\/100M-2-chunking-49666361-20-2","user":"xxx"}
{"reqId":"VzsdWm3JklEAAA5505IAAAAN","remoteAddr":"x.x.x.x","app":"OC\\Cache\\File::gc","message":"Discarding: 100M-2-chunking-49666361-4.ZzxwB0pcWur6mTEJ.part","level":1,"time":"2016-05-17T13:32:11+00:00","method":"PROPPATCH","url":"\/remote.php\/caldav\/calendars\/xxx\/","user":"xxx"}
{"reqId":"VzsdWm3JklEAAA5505IAAAAN","remoteAddr":"x.x.x.x","app":"OC\\Cache\\File::gc","message":"Discarding: 100M-3-chunking-1403625106-1.G5YnsFJG5LIfaI49.part","level":1,"time":"2016-05-17T13:32:11+00:00","method":"PROPPATCH","url":"\/remote.php\/caldav\/calendars\/xxx\/","user":"xxx"}
{"reqId":"VzsdWm3JklEAAA5505IAAAAN","remoteAddr":"x.x.x.x","app":"OC\\Cache\\File::gc","message":"Discarding: 100M-2-chunking-49666361-3.nZoppMlNyBs6FbOa.part","level":1,"time":"2016-05-17T13:32:11+00:00","method":"PROPPATCH","url":"\/remote.php\/caldav\/calendars\/xxx\/","user":"xxx"}
{"reqId":"VzsdNG3JklEAABPYdxEAAAAD","remoteAddr":"x.x.x.x","app":"PHP","message":"filesize(): stat failed for \/home\/marginal\/owncloud-data\/xxx\/cache\/100M-2-chunking-49666361-3.nZoppMlNyBs6FbOa.part at \/home\/marginal\/public_html\/lib\/private\/files\/storage\/local.php#142","level":3,"time":"2016-05-17T13:32:13+00:00","method":"PUT","url":"\/remote.php\/webdav\/Test\/100M-2-chunking-49666361-20-3","user":"xxx"}
{"reqId":"VzsdNG3JklEAABPYdxEAAAAD","remoteAddr":"x.x.x.x,"app":"OC\\Files\\View::file_put_contents","message":"Wrote 5242880 bytes from Resource id #320 to cache\/100M-2-chunking-49666361-3.nZoppMlNyBs6FbOa.part: exists: 0, readable: 0, size: 0","level":1,"time":"2016-05-17T13:32:13+00:00","method":"PUT","url":"\/remote.php\/webdav\/Test\/100M-2-chunking-49666361-20-3","user":"xxx"}
{"reqId":"VzsdNG3JklEAABPYdxEAAAAD","remoteAddr":"x.x.x.x","app":"OC\\Files\\Cache\\Cache::remove","message":"Removing: cache\/100M-2-chunking-49666361-3, id: 66491","level":1,"time":"2016-05-17T13:32:14+00:00","method":"PUT","url":"\/remote.php\/webdav\/Test\/100M-2-chunking-49666361-20-3","user":"xxx"}
{"reqId":"VzsdNG3JklEAABPYdxEAAAAD","remoteAddr":"x.x.x.x","app":"webdav","message":"Exception: {\"Message\":\"HTTP\\\/1.1 400 expected filesize 5242880 got 0\",\"Exception\":\"Sabre\\\\DAV\\\\Exception\\\\BadRequest\",\"Code\":0,\"Trace\":\"#0 \\\/home\\\/marginal\\\/public_html\\\/apps\\\/dav\\\/lib\\\/connector\\\/sabre\\\/file.php(103): OCA\\\\DAV\\\\Connector\\\\Sabre\\\\File->createFileChunked(Resource id #320)\\n#1 \\\/home\\\/marginal\\\/public_html\\\/apps\\\/dav\\\/lib\\\/connector\\\/sabre\\\/directory.php(134): OCA\\\\DAV\\\\Connector\\\\Sabre\\\\File->put(Resource id #320)\\n#2 \\\/home\\\/marginal\\\/public_html\\\/3rdparty\\\/sabre\\\/dav\\\/lib\\\/DAV\\\/Server.php(1036): OCA\\\\DAV\\\\Connector\\\\Sabre\\\\Directory->createFile('100M-2-chunking...', Resource id #320)\\n#3 \\\/home\\\/marginal\\\/public_html\\\/3rdparty\\\/sabre\\\/dav\\\/lib\\\/DAV\\\/CorePlugin.php(523): Sabre\\\\DAV\\\\Server->createFile('Test\\\/100M-2-chu...', Resource id #320, NULL)\\n#4 [internal function]: Sabre\\\\DAV\\\\CorePlugin->httpPut(Object(Sabre\\\\HTTP\\\\Request), Object(Sabre\\\\HTTP\\\\Response))\\n#5 \\\/home\\\/marginal\\\/public_html\\\/3rdparty\\\/sabre\\\/event\\\/lib\\\/EventEmitterTrait.php(105): call_user_func_array(Array, Array)\\n#6 \\\/home\\\/marginal\\\/public_html\\\/3rdparty\\\/sabre\\\/dav\\\/lib\\\/DAV\\\/Server.php(459): Sabre\\\\Event\\\\EventEmitter->emit('method:PUT', Array)\\n#7 \\\/home\\\/marginal\\\/public_html\\\/3rdparty\\\/sabre\\\/dav\\\/lib\\\/DAV\\\/Server.php(248): Sabre\\\\DAV\\\\Server->invokeMethod(Object(Sabre\\\\HTTP\\\\Request), Object(Sabre\\\\HTTP\\\\Response))\\n#8 \\\/home\\\/marginal\\\/public_html\\\/apps\\\/dav\\\/appinfo\\\/v1\\\/webdav.php(55): Sabre\\\\DAV\\\\Server->exec()\\n#9 \\\/home\\\/marginal\\\/public_html\\\/remote.php(138): require_once('\\\/home\\\/marginal\\\/...')\\n#10 {main}\",\"File\":\"\\\/home\\\/marginal\\\/public_html\\\/apps\\\/dav\\\/lib\\\/connector\\\/sabre\\\/file.php\",\"Line\":406,\"User\":\"xxx\"}","level":4,"time":"2016-05-17T13:32:14+00:00","method":"PUT","url":"\/remote.php\/webdav\/Test\/100M-2-chunking-49666361-20-3","user":"xxx"}

I guess the chunk file should be locked while being written?

PVince81 · 2016-05-17T13:44:19Z

I just realized that the set function is expecting a ttl here: https://github.com/owncloud/core/blob/v9.0.2/lib/private/cache/file.php#L100
This must be why it was implemented this way. So the approach of moving the ttl away won't work.
However I'm not aware of any code path that uses that $ttl argument.

@Marginal let's examine your idea about locking.
It looks like the code that calls file_put_contents already acquires locks.
But the gc one doesn't, so let's see if we can add lock acquiring there as well.

PVince81 · 2016-05-17T13:48:35Z

Hmmm, I went through with a debugger and found out that both file_put_contents from OC_FileChunking and also unlink from gc will acquire locks using the locking provider.

However from what I remember the locking provider was limited to work only on the "files/" entries, not the "cache/" entries. If we make it work on the latter as well, it should prevent such situations.

Marginal · 2016-05-17T13:57:29Z

I can see that OC\Files\View::file_put_contents is creating locks, but I'm not seeing any "Could not cleanup locked chunk" debug messages from OC\Cache\File::gc so they're not being applied.

PVince81 · 2016-05-17T14:00:38Z

@Marginal this is why: #24672
You might want try with this patch, but I'm not 100% convinced because locking will cause some performance overhead, especially with the database locking provider.

PVince81 · 2016-05-17T14:01:32Z

I'm not so happy about using locking here because in some environments the DB locking provider is likely to be much slower when uploading many chunks. For every chunks it needs to allocate locks and change them during the course of the file operation.

Also the locking approach isn't backportable to OC < 8.1 where locking wasn't available yet.

Let's list the possible approaches:

Get rid of the $ttl argument in OC\Cache\File::set() and have the ttl value in the gc() method directly.
- the good: no locking overhead, single ttl value for every cached file
- the bad: this is private api, so breaking it might be fine, but still... backporting an API change isn't nice
Enable locking for the "cache" folder (POC here [POC] [stable9] Also lock chunks to prevent concurrent cleanup #24672)
- the good: locking makes sure that there are no concurrency issues
- the bad: more performance overhead especially with DB locking, not backportable to OC < 8.1
Add an extra 24 hours to the ttl in the gc function's checks
- the good: solves the issue with API change, it's backportable to OC < 8.1
- the bad: the ttl contract isn't respected since a file might stay 24 hours longer than expected

@icewind1991 @schiesbn @nickvergessen @DeepDiver1975 any insights ?

Marginal · 2016-05-17T14:03:18Z

Thanks, I'll try #24672.

IMHO your solution 1 seems cleanest - it's weird having mtimes in the future, and even weirder relying on this.

PVince81 · 2016-05-17T14:04:12Z

@Marginal yep, I agree on the weirdness. This is very very old code so there might have been an reason in the past. I'll make a POC for solution 1) too so you can try it too on your magic env 😄

PVince81 · 2016-05-17T14:10:03Z

Oh noes 😦 :

It looks like this is part of the public API, which means there could be apps using this and expecting the $ttl value to work.

Another idea I got was to add the expected mtime value into the file name, but then it makes it difficult to find that file name again when only the $key is known.

PVince81 · 2016-05-17T14:22:53Z

Okay, yet another idea: if we assume that the mtime of a finished chunk is always correct but the one of a part chunk might be wrong, then add an extra 24 hours tolerance for part chunks in the gc() method. This is especially relevant considering that part chunks cannot be found concurrently by `get()' as they aren't finished and have a unique id.

Marginal · 2016-05-17T14:27:39Z

It looks like this is part of the public API, which means there could be apps using this and expecting the $ttl value to work.

I guess that explains the "mtimes in the future" implementation.

if we assume that the mtime of a finished chunk is always correct but the one of a part chunk might be wrong, then add an extra 24 hours tolerance for part chunks in the gc() method.

Kinda ugly to special-case .part files in gc, but if you don't want the overhead of locking the chunk files this might be the only practical solution.

PVince81 · 2016-05-17T14:28:55Z

Steps to reproduce the original issue

Edit https://github.com/owncloud/core/blob/v9.0.2/lib/private/cache/file.php#L116 and add sleep(5); before the touch() operation
Open cadaver with the current user on the console (every file operation triggers a login due to basic auth)
Setup desktop client
Upload a big file (~200 MB) with the desktop client
While the upload is running, run ls several times in cadaver to trigger the login hook which will clean up the part file
💣 desktop client says upload failed, log says "Expected filesize X go 0"

PVince81 · 2016-05-17T14:30:28Z

I'd like @icewind1991's wisdom regarding locking overhead to make a decision between solution 2 (enable locking for "cache" folder) and solution 4 (extra expire time in GC for part files). For the backport to 8.0 which doesn't have the locking code, I'd go for 4 anyway.

Marginal · 2016-05-17T14:34:10Z

Testing your solution 2 #24672:

{"reqId":"Vzsp-m3JklEAAAbfqkUAAAAB","remoteAddr":"x.x.x.x","app":"OC\\Cache\\File::gc","message":"Discarding: 100M-2-chunking-453979753-18.DsYu8Yn3IYL09Jmp.part","level":1,"time":"2016-05-17T14:26:07+00:00","method":"PROPFIND","url":"\/remote.php\/caldav\/calendars\/xxx\/","user":"xxx"}
{"reqId":"Vzsp-m3JklEAAAbfqkUAAAAB","remoteAddr":"x.x.x.x","app":"core","message":"Exception when running cache gc: $absolutePath must be relative to \"files\"","level":2,"time":"2016-05-17T14:26:07+00:00","method":"PROPFIND","url":"\/remote.php\/caldav\/calendars\/xxx\/","user":"xxx"}

The sync is still proceeding happily 'though!

PVince81 · 2016-05-17T14:36:27Z

Exception when running cache gc: $absolutePath must be relative to "files"

Okay, looks like I might have missed something.
Thanks for testing!

Marginal · 2016-05-17T14:38:40Z

Looks like there are assumptions that locked files only reside under /files in OC\Files\View::getPathRelativeToFiles and ::getUidAndFilename . Perhaps also elsewhere?

PVince81 · 2016-05-17T14:56:55Z

@icewind1991 popped up on IRC and suggested we just add - 1min in the gc code instead.

But this would assume that an fwrite of 8k would never take more than a minute, which is still a risky assumption.

@Marginal you're right, might need special handling for the cache folder 😦

PVince81 · 2016-05-17T15:09:18Z

Simplified fix with extra time tolerance is here: #24676

PVince81 · 2016-05-17T15:14:08Z

@Marginal if you say several minutes per chunk: one chunk is usually 20 MB I believe (or was it reduced to 10 ?). Several minutes, ok let's say 8 minutes which means 2.5 MB a minute, ~41 KB a second. So within that one second the fwrite method will be called more than 5 times, which means the mtime of the file will get updated 5 times within a second. Should be enough 😄

Marginal · 2016-05-17T15:22:21Z

The default chunk size is now 5MB for the 2.1 client.

You're correct that the mtime is updated every 8KB, so 1hr leeway should be fine.

PVince81 · 2016-05-17T15:29:45Z

🍸

Let me know if the new PR works for you and we can move it forward to the fixing pipeline / backports, etc

Marginal · 2016-05-17T15:55:27Z

#24676 works for me - I uploaded 6 x 100MB files with no "expected filesize 5242880 got 0" errors.

PVince81 · 2016-05-18T11:42:08Z

@Marginal thanks for testing!

master PR is here: #24691 (for 9.1)
I'll list the backport PRs in that one.

guruz · 2016-05-18T12:22:09Z

Awersome to see this fixed!

PVince81 · 2016-05-18T12:27:11Z

Yep. The reason why we didn't find it earlier is because it's hidden behind a $storage->file_put_contents() call which, at first appearance, should be atomic. However, due to the many layers of OC, that call is transformed to a loop that copies blocks which isn't atomic.

PVince81 · 2016-05-18T12:58:05Z

I'm going to link all other related tickets to this one for the solution.

For those of you arriving here from other tickets, please try this PR #24691 or one of the related backports #24691 (comment). Alternatively you can just wait for the next maintenance releases 😄

PVince81 · 2016-05-18T13:01:01Z

@RealRancor if you see people on the forums with the error "Expected filesize X got 0", you might want to direct them to this solution. Thanks 😄

enoch85 · 2016-05-18T13:16:55Z

Finally! Thanks @Marginal and @PVince81 🎆 👍

lock · 2019-08-04T15:00:57Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

PVince81 added the needs info label May 17, 2016

PVince81 added the Type:Bug label May 17, 2016

PVince81 added the sev1-critical label May 17, 2016

PVince81 added this to the 9.0.3-current-maintenance milestone May 17, 2016

PVince81 mentioned this issue May 17, 2016

Add config value for TTL of files in cache folder before removed by garbage collection #23486

Closed

PVince81 mentioned this issue May 17, 2016

[POC] [stable9] Also lock chunks to prevent concurrent cleanup #24672

Closed

guruz mentioned this issue May 18, 2016

[Ubuntu] [2.1.0] Bandwidth control options disabled? owncloud/client#4334

Closed

PVince81 mentioned this issue May 18, 2016

Allow chunk GC mtime tolerance for unfinished part chunks #24691

Merged

This was referenced May 18, 2016

[stable8] Allow chunk GC mtime tolerance for unfinished part chunks #24694

Merged

[stable8.1] Allow chunk GC mtime tolerance for unfinished part chunks #24693

Merged

[stable8.2] Allow chunk GC mtime tolerance for unfinished part chunks #24692

Merged

This was referenced May 18, 2016

ownCloud sees size of a multiple-file upload as max single filesize #10489

Closed

Sabre_DAV_Exception_BadRequest: expected filesize mismatch #9832

Closed

PVince81 mentioned this issue May 18, 2016

Fatal webdav error #20423

Closed

PVince81 closed this as completed in #24691 May 18, 2016

MorrisJobke added 9.0.3-current labels May 18, 2016

MorrisJobke modified the milestones: 9.1-current, 9.0.3-current-maintenance May 18, 2016

MorrisJobke mentioned this issue May 18, 2016

File uploads larger than 1MB fail owncloud/android#402

Closed

AndrewSav mentioned this issue Jul 19, 2016

Cannot sync long files Sabre_DAV_Exception_BadRequest expected filesize #25524

Closed

lock bot locked as resolved and limited conversation to collaborators Aug 4, 2019

Chunk file disappears from cache during sync #24653

Chunk file disappears from cache during sync #24653

Comments

Marginal commented May 16, 2016 • edited Loading

Steps to reproduce

Expected behaviour

Actual behaviour

Server configuration

Client configuration

Logs

Web server access log

ownCloud log (data/owncloud.log)

Browser log

PVince81 commented May 17, 2016

Marginal commented May 17, 2016 • edited Loading

PVince81 commented May 17, 2016

PVince81 commented May 17, 2016

PVince81 commented May 17, 2016

PVince81 commented May 17, 2016

PVince81 commented May 17, 2016

Marginal commented May 17, 2016 • edited Loading

PVince81 commented May 17, 2016

PVince81 commented May 17, 2016

Marginal commented May 17, 2016

PVince81 commented May 17, 2016

PVince81 commented May 17, 2016 • edited Loading

Marginal commented May 17, 2016

PVince81 commented May 17, 2016

PVince81 commented May 17, 2016 • edited Loading

PVince81 commented May 17, 2016

Marginal commented May 17, 2016

PVince81 commented May 17, 2016

Steps to reproduce the original issue

PVince81 commented May 17, 2016

Marginal commented May 17, 2016 • edited Loading

PVince81 commented May 17, 2016

Marginal commented May 17, 2016

PVince81 commented May 17, 2016

PVince81 commented May 17, 2016

PVince81 commented May 17, 2016

Marginal commented May 17, 2016 • edited Loading

PVince81 commented May 17, 2016

Marginal commented May 17, 2016

PVince81 commented May 18, 2016

guruz commented May 18, 2016

PVince81 commented May 18, 2016

PVince81 commented May 18, 2016

PVince81 commented May 18, 2016

enoch85 commented May 18, 2016

lock bot commented Aug 4, 2019

Marginal commented May 16, 2016 •

edited

Loading

Marginal commented May 17, 2016 •

edited

Loading

Marginal commented May 17, 2016 •

edited

Loading

PVince81 commented May 17, 2016 •

edited

Loading

PVince81 commented May 17, 2016 •

edited

Loading

Marginal commented May 17, 2016 •

edited

Loading

Marginal commented May 17, 2016 •

edited

Loading