Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

writing large streams to bucket fails on OS X #196

Closed
chrishiestand opened this issue Oct 15, 2016 · 12 comments
Closed

writing large streams to bucket fails on OS X #196

chrishiestand opened this issue Oct 15, 2016 · 12 comments

Comments

@chrishiestand
Copy link

chrishiestand commented Oct 15, 2016

gcsfuse my-bucket ./my-bucket

output:

Using mount point: /Users/user/gcp-mount/my-bucket
Opening GCS connection...
Opening bucket...
Mounting file system...
File system has been successfully mounted.

cd my-bucket
cat /dev/zero > test.zeros
# wait 2-3 seconds and hit ^c
# sometimes this operation hangs for a bit

ls -la

expected output:

total 943280
drwxrwxr-x 3 user staff 102 Oct 15 04:05 .
drwxrwxr-x 4 user staff 136 Oct 15 04:05 ..
-rw-r--r-- 1 user staff 48295930 Oct 15 04:05 test.zeros

actual output:

ls: .: Device not configured

At this point the mount is stuck and must be force unmounted in order to try again:

cd ..
umount -f ./my-bucket

OS X 10.11 (El Capitan)

gcsfuse --version

gcsfuse version 0.19.0 (Go version go1.7.1)

@jacobsa
Copy link
Contributor

jacobsa commented Oct 19, 2016

It's probably a bug that this leaves the file system in an unusable state; I will take a look.

But it's not surprising that your write doesn't work -- you are buffering 2-3 seconds times the bandwidth of copying from /dev/zero (very high) into memory, then attempting to write all of that to GCS when the file descriptor is closed. I assume your uplink is nowhere near as fast as /dev/zero.

@jacobsa
Copy link
Contributor

jacobsa commented Oct 19, 2016

On OS X 10.11.6 with osxfuse 3.5.2 I'm able to reproduce a hang before returning to the shell after hitting ctrl-C on cat. While it hangs, ls mount_point hangs. However ls mount_point/other_file doesn't, unless I've run ls mount_point first. However once the cat process actually goes away, everything works fine for me.

--foreground --debug_fuse shows me that:

  • when cat receives the signal gcsfuse gets a FlushFile op, which takes forever because it's writing a ton of data to GCS.
  • ls mount_point/other_file results in a LookUpInode for other_file, which works fine.
  • ls mount_point results in OpenDir then ReadDir, which are quick, followed by LookUpInode for the file being written, which hangs.
  • Once the LookUpInode on behalf of ls mount_point is hung, ls mount_point/foo doesn't cause any ops to be received at all.
  • eventually the FlushFile op also receives an interrupt and is cancelled correctly. I assume this is due to some timeout in osxfuse for the max flush time.

This all seems as expected (or at least the best we can do) with the sole possible exception of a hung inode lookup blocking further ops from being received for other inodes. But it appears that's happening in the kernel, presumably due to some kernel-side lock being held on the directory wile listing it, so there's not much that can be done.

@chrishiestand: can you confirm what exact OS X and osxfuse versions you are running, and that the file system truly stops working even after cat has gone away?

@chrishiestand
Copy link
Author

chrishiestand commented Feb 4, 2017

The version of gcsfuse in the first report was 0.19.0. I've upgraded so it is now gcsfuse version 0.20.1 (Go version go1.7.5). This version seems to gracefully recover after the error and at least the mount point is still usable. I'm no longer reproducing the file system hang. Version of OS X is now 10.11.6. Version of FUSE for MacOS is 3.5.4.

Sorry for the delayed response. Perhaps a pipe from /dev/zero was a bad example, though I'm not sure why that would be substantively different than copying a large file. I believe a pipe should block when the write buffer is full and so gcsfuse shouldn't have to worry about doing that OS task? I was just using it as a reproducible case where large piped uploads failed (though the source of the stream where I hit this error was a process reading from a remote source and not /dev/zero).

In any case, surely this utility should be able to handle an upload of a file the size of 1GB.

$ dd if=/dev/zero of=bigfile bs=1m count=1000
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 1.150960 secs (911044529 bytes/sec)

$ cp bigfile my-bucket/
cp: my-bucket/bigfile: Input/output error

This seems to be the crux of the problem:

fuse_debug: Op 0x00000004        connection.go:395] <- unknown (inode 2, opcode 23)
fuse_debug: Op 0x00000004        connection.go:476] -> Error: "function not implemented"

With --foreground --debug_fuse --debug_gcs I see this on copy:

...
fuse_debug: Op 0x00000002        connection.go:395] <- WriteFile (inode 2, handle 1, offset 1043333120, 1048576 bytes)
fuse_debug: Op 0x00000002        connection.go:474] -> OK ()
fuse_debug: Op 0x00000003        connection.go:395] <- WriteFile (inode 2, handle 1, offset 1044381696, 1048576 bytes)
fuse_debug: Op 0x00000003        connection.go:474] -> OK ()
fuse_debug: Op 0x00000004        connection.go:395] <- WriteFile (inode 2, handle 1, offset 1045430272, 1048576 bytes)
fuse_debug: Op 0x00000004        connection.go:474] -> OK ()
fuse_debug: Op 0x00000002        connection.go:395] <- WriteFile (inode 2, handle 1, offset 1046478848, 1048576 bytes)
fuse_debug: Op 0x00000002        connection.go:474] -> OK ()
fuse_debug: Op 0x00000003        connection.go:395] <- WriteFile (inode 2, handle 1, offset 1047527424, 1048576 bytes)
fuse_debug: Op 0x00000003        connection.go:474] -> OK ()
fuse_debug: Op 0x00000004        connection.go:395] <- GetInodeAttributes (inode 2)
fuse_debug: Op 0x00000004        connection.go:474] -> OK ()
fuse_debug: Op 0x00000002        connection.go:395] <- SetInodeAttributes (inode 2, mode -rw-r--r--)
fuse_debug: Op 0x00000002        connection.go:474] -> OK ()
fuse_debug: Op 0x00000003        connection.go:395] <- SetInodeAttributes (inode 2, mode -rw-r--r--)
fuse_debug: Op 0x00000003        connection.go:474] -> OK ()
fuse_debug: Op 0x00000004        connection.go:395] <- unknown (inode 2, opcode 23)
fuse_debug: Op 0x00000004        connection.go:476] -> Error: "function not implemented"
fuse_debug: Op 0x00000002        connection.go:395] <- SetInodeAttributes (inode 2, mode -rw-r--r--)
fuse_debug: Op 0x00000002        connection.go:474] -> OK ()
fuse_debug: Op 0x00000003        connection.go:395] <- FlushFile (inode 2)
gcs: Req              0x6: <- CreateObject("bigfile")
fuse_debug: Op 0x00000004        connection.go:395] <- LookUpInode (parent 1, name "bigfile")
gcs: Req              0x7: <- StatObject("bigfile")
gcs: Req              0x7: -> StatObject("bigfile") (253.815422ms): OK
fuse_debug: Op 0x00000002        connection.go:395] <- interrupt (fuseid 0x00000003)
gcs: Req              0x6: -> CreateObject("bigfile") (59.993571695s): Put https://www.googleapis.com/upload/storage/v1/b/my-bucket/o?ifGenerationMatch=1486246191346000&ifMetagenerationMatch=1&projection=full&uploadType=resumable&upload_id=AEnB2Up0uaot6F-8GOdunDGnVn35Ek_zhFpSIkRtKv3fkx1eN1oHIX0o5g-vJGNDFAWGhqs07sEG5-F5rdH9h9H--RGmim2Vhj7w2G5RRlMUB0CzQi24rz8: net/http: request canceled
fuse_debug: Op 0x00000003        connection.go:476] -> Error: "FileInode.Sync: SyncObject: Create: CreateObject: Put https://www.googleapis.com/upload/storage/v1/b/my-bucket/o?ifGenerationMatch=1486246191346000&ifMetagenerationMatch=1&projection=full&uploadType=resumable&upload_id=AEnB2Up0uaot6F-8GOdunDGnVn35Ek_zhFpSIkRtKv3fkx1eN1oHIX0o5g-vJGNDFAWGhqs07sEG5-F5rdH9h9H--RGmim2Vhj7w2G5RRlMUB0CzQi24rz8: net/http: request canceled"
fuse_debug: Op 0x00000004        connection.go:474] -> OK (inode 2)
fuse_debug: Op 0x00000004        connection.go:395] <- GetInodeAttributes (inode 1)
fuse: 2017/02/04 14:17:05.759336 *fuseops.FlushFileOp error: FileInode.Sync: SyncObject: Create: CreateObject: Put https://www.googleapis.com/upload/storage/v1/b/my-bucket/o?ifGenerationMatch=1486246191346000&ifMetagenerationMatch=1&projection=full&uploadType=resumable&upload_id=AEnB2Up0uaot6F-8GOdunDGnVn35Ek_zhFpSIkRtKv3fkx1eN1oHIX0o5g-vJGNDFAWGhqs07sEG5-F5rdH9h9H--RGmim2Vhj7w2G5RRlMUB0CzQi24rz8: net/http: request canceled
fuse_debug: Op 0x00000005        connection.go:395] <- ReleaseFileHandle
fuse_debug: Op 0x00000004        connection.go:474] -> OK ()
fuse_debug: Op 0x00000005        connection.go:474] -> OK ()
fuse_debug: Op 0x00000004        connection.go:395] <- GetInodeAttributes (inode 2)
fuse_debug: Op 0x00000004        connection.go:474] -> OK ()
fuse_debug: Op 0x00000005        connection.go:395] <- LookUpInode (parent 1, name "bigfile")
fuse_debug: Op 0x00000005        connection.go:474] -> OK (inode 2)
fuse_debug: Op 0x00000004        connection.go:395] <- LookUpInode (parent 1, name "bigfile")
fuse_debug: Op 0x00000004        connection.go:474] -> OK (inode 2)
fuse_debug: Op 0x00000005        connection.go:395] <- LookUpInode (parent 1, name "bigfile")
gcs: Req              0x8: <- StatObject("bigfile")
gcs: Req              0x8: -> StatObject("bigfile") (150.683627ms): OK
fuse_debug: Op 0x00000005        connection.go:474] -> OK (inode 3)
fuse_debug: Op 0x00000004        connection.go:395] <- LookUpInode (parent 1, name "bigfile")
fuse_debug: Op 0x00000004        connection.go:474] -> OK (inode 3)

@jacobsa
Copy link
Contributor

jacobsa commented Feb 6, 2017

It can and does handle 1 GB file writes. How fast is your upstream connection? You should see a transfer of ~1 GB, then everything works again. As I said above, this appears to be a lock in the kernel held for the duration of the file flush, so there's not really anything I can do about it as far as I can tell.

@chrishiestand
Copy link
Author

The speed of my connection shouldn't be a significant factor as long as I'm on something better than dialup right? In any case I've got about 15Mbs up / 3Mbs down here (I nearly cried when Google Fiber expansion paused); even so I'd imagine that shouldn't be the cause of this issue.

The copy seems to go quite well with these lines scrolling quickly:

...
fuse_debug: Op 0x00000007        connection.go:395] <- WriteFile (inode 7, handle 6, offset 1065353216, 1048576 bytes)
fuse_debug: Op 0x00000007        connection.go:474] -> OK ()
fuse_debug: Op 0x00000004        connection.go:395] <- WriteFile (inode 7, handle 6, offset 1066401792, 1048576 bytes)
fuse_debug: Op 0x00000004        connection.go:474] -> OK ()

And then it pauses after these:

fuse_debug: Op 0x00000006        connection.go:395] <- GetInodeAttributes (inode 7)
fuse_debug: Op 0x00000006        connection.go:474] -> OK ()
fuse_debug: Op 0x00000003        connection.go:395] <- SetInodeAttributes (inode 7, mode -rw-r--r--)
fuse_debug: Op 0x00000003        connection.go:474] -> OK ()
fuse_debug: Op 0x00000002        connection.go:395] <- SetInodeAttributes (inode 7, mode -rw-r--r--)
fuse_debug: Op 0x00000002        connection.go:474] -> OK ()
fuse_debug: Op 0x00000005        connection.go:395] <- unknown (inode 7, opcode 23)
fuse_debug: Op 0x00000005        connection.go:476] -> Error: "function not implemented"
fuse_debug: Op 0x00000007        connection.go:395] <- SetInodeAttributes (inode 7, mode -rw-r--r--)
fuse_debug: Op 0x00000007        connection.go:474] -> OK ()
fuse_debug: Op 0x00000004        connection.go:395] <- FlushFile (inode 7)
gcs: Req             0x19: <- CreateObject("bigfile2")
fuse_debug: Op 0x00000006        connection.go:395] <- LookUpInode (parent 1, name "bigfile2")
gcs: Req             0x1a: <- StatObject("bigfile2")
gcs: Req             0x1a: -> StatObject("bigfile2") (519.995545ms): OK
fuse_debug: Op 0x00000003        connection.go:395] <- StatFS
fuse_debug: Op 0x00000003        connection.go:474] -> OK ()

I don't understand why the operations change from writing the file to GetInodeAttributes() "stat'ing" the file. So then would you think this is a kernel bug? A MacOS Fuse bug?

@jacobsa
Copy link
Contributor

jacobsa commented Feb 6, 2017

All of the WriteFile operations are quickly writing to the local buffer. The problem is FlushFile, which starts uploading the object to GCS, and can't tell the kernel it has succeeded until the data is successfully uploaded. On your connection, that will take at least 1 GB / (15 Mbit/s) = 8 minutes 53 seconds.

Can you help me understand from your point of view what you think is going wrong here? I'm a bit lost.

@chrishiestand
Copy link
Author

Ah, yes that makes sense. I didn't look too closely and I thought that the writes were being immediately flushed. I'm unfamiliar with the codebase and what's going on under the hood, so I haven't a clue what's going wrong, I was just trying to understand what you thought was going wrong :-).

So it looks like we are trying to upload the entire file in one request? How come we aren't flushing smaller chunks of the file at a time?

@jacobsa
Copy link
Contributor

jacobsa commented Feb 6, 2017

One request or many doesn't really make a difference -- it's going to take ~9 minutes either way. Multiple requests actually make this slightly slower.

@chrishiestand
Copy link
Author

What I wrote wasn't very clear sorry. The problem seems to be that something is triggering a 60s timeout. What is?

fuse_debug: Op 0x00000005        connection.go:395] <- interrupt (fuseid 0x00000004)
gcs: Req              0x9: -> CreateObject("bigfile") (59.999527448s): Put https://www.googleapis.com/upload/storage/v1/b/my-bucket/o?ifGenerationMatch=1486424608948000&ifMetagenerationMatch=1&projection=full&uploadType=resumable&upload_id=AEnB2UpXdbvECFEKfxB0qlWiHmmWQO3Xxws-8Fjzg-WHCIpdrBaikm_WbcCUyq5sooV2Nz15acZ6-6cM8K0hK2DNWw5Tz9Je3RU0B174VwUefJhQBe-LfPo: net/http: request canceled
fuse_debug: Op 0x00000004        connection.go:476] -> Error: "FileInode.Sync: SyncObject: Create: CreateObject: Put https://www.googleapis.com/upload/storage/v1/b/my-bucket/o?ifGenerationMatch=1486424608948000&ifMetagenerationMatch=1&projection=full&uploadType=resumable&upload_id=AEnB2UpXdbvECFEKfxB0qlWiHmmWQO3Xxws-8Fjzg-WHCIpdrBaikm_WbcCUyq5sooV2Nz15acZ6-6cM8K0hK2DNWw5Tz9Je3RU0B174VwUefJhQBe-LfPo: net/http: request canceled"
fuse_debug: Op 0x00000003        connection.go:474] -> OK (inode 2)

I mean, why aren't we flushing at smaller increments so that we're not hitting a timeout issue?

As a counter-example, for example, if I'm cping a file over nfs to a busy fileserver, it might take 10 minutes to copy but nothing is going to error as long as the writes are slowly succeeding. I haven't tested, but I imagine that FUSEs should behave similarly, e.g. sshfs. Is this a fuse limitation or what is really to blame for this behavior? Do you even agree that this is broken behavior?

@jacobsa
Copy link
Contributor

jacobsa commented Feb 7, 2017

I assume this is osxfuse's FUSE_DEFAULT_DAEMON_TIMEOUT, which is 60 seconds and is documented here, along with a user-configurable mount option. I was unaware that this was set so tightly; sigh.

The reason that the GCS object is only created when the file is flushed is that GCS objects can't be randomly written to, but files can. So gcsfuse may see you write 100 MB to an object, and then go back and change the first byte. If that were to happen, it would have to write an extra 100 MB just to overwrite the first byte.

In theory if gcsfuse could tell that you had opened the file in an append-only mode it could write to gcsfuse incrementally as it received writes from the kernel. But there are other issues with this:

  • The application is allowed to go an arbitrarily long time between writes. Now you have to worry about GCS timeouts—presumably GCS won't let you keep an idle HTTP connection open for 30 minutes.
  • This still doesn't solve the underlying problem. What if you have a very slow connection where each incremental write still takes 60 seconds?

@chrishiestand
Copy link
Author

That's the timeout!

Isn't it possible to measure upload speed and calculate chunk sizes to optimize for bandwidth but not timeout? There could be a cli flag set to disable chunking for those on a very fast network (e.g. GCE) or small files; or conversely to enable chunking for large uploads/slow connections.

Since I won't be submitting a PR for this anytime soon I'll understand if this is a "won't add" feature. I guess the next best thing we can do is document the timeout problem for large files.

I can confirm that using mount option -o daemon_timeout=9999999 mitigates that timeout problem. But now I'm hitting a timeout at 10 minutes which is the FUSE_MAX_DAEMON_TIMEOUT.

Thanks for all your time @jacobsa, I appreciate it.

@jacobsa
Copy link
Contributor

jacobsa commented Feb 8, 2017

I don't want to get into the business of heuristics to guess how long things will take, since that is error-prone and makes the system's behavior harder to reason about.

In this case the software using the file system can implement that behavior on top, by regularly flushing the file after the desired amount of data is written. I understand that's not a great answer for command-line use though.

I'll call this issue out in the documentation; thanks for that idea.

jacobsa added a commit that referenced this issue Feb 8, 2017
jacobsa added a commit that referenced this issue Feb 8, 2017
jacobsa added a commit that referenced this issue Feb 8, 2017
@jacobsa jacobsa closed this as completed Feb 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants