Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Implement VirtioFS device backend #1351

Closed
wants to merge 1 commit into from

Conversation

zmlcc
Copy link

@zmlcc zmlcc commented Oct 24, 2019

Reason for This PR

Use virtio-fs as backend device to implement Host Filesystem Sharing #1180

Description of Changes

  • implement virtio-fs device in devices and vmm
  • add /vtfs request in api_server
  • add fuse_gen package to support fuse protocol
  • add low level filesystem functions in sys_util

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.

PR Checklist

  • All commits in this PR are signed (git commit -s).
  • Either this PR is linked to an issue, or, the reason for this PR is
    clearly provided.
  • The description of changes is clear and encompassing.
  • Either no docs need to be updated as part of this PR, or, the required
    doc changes are included in this PR. Docs in scope are all *.md files
    located either in the repository root, or in the docs/ directory.
  • Either no code has been touched, or, code-level documentation for touched
    code is included in this PR.
  • Either no API changes are included in this PR, or, the API changes are
    reflected in firecracker/swagger.yaml.
  • Either the changes in this PR have no user impact, or, the changes in
    this PR have user impact and have been added to the CHANGELOG.md file.

Status Description

The whole work are still in process. I would like to let the community make a technical preview and discussion. Is the virtio-fs backend a viable and safe way to implement host filesystem sharing ? Is work in this direction worth continuing ?

Up to now, the basic framework has been established and parts of fuse features were implemented. With kernel v5.4-rc3, it has passed mkdir tests of pjdfstest. The next work may include

  • Implement all the fuse features
  • pass all tests of pjdfstest
  • pass Blogbench tests
  • add metrics
  • implement rate limit
  • add unit tests and ci tests
  • add documents
  • security issues

Comment on lines +99 to +116
match self.in_header.opcode {
fuse_opcode_FUSE_INIT => fs.do_init(self),
fuse_opcode_FUSE_GETATTR => fs.do_getattr(self),
fuse_opcode_FUSE_LOOKUP => fs.do_lookup(self),
fuse_opcode_FUSE_OPENDIR => fs.do_opendir(self),
fuse_opcode_FUSE_READDIR => fs.do_readdir(self),
fuse_opcode_FUSE_ACCESS => fs.do_access(self),
fuse_opcode_FUSE_FORGET => fs.do_forget(self),
fuse_opcode_FUSE_RELEASEDIR => fs.do_releasedir(self),
fuse_opcode_FUSE_STATFS => fs.do_statfs(self),
fuse_opcode_FUSE_MKNOD => fs.do_mknod(self),
fuse_opcode_FUSE_MKDIR => fs.do_mkdir(self),
fuse_opcode_FUSE_RMDIR => fs.do_rmdir(self),
fuse_opcode_FUSE_SETATTR => fs.do_setattr(self),
fuse_opcode_FUSE_UNLINK => fs.do_unlink(self),
fuse_opcode_FUSE_SYMLINK => fs.do_symlink(self),
fuse_opcode_FUSE_READLINK => fs.do_readlink(self),
_ => Err(ExecuteError::InvalidMethod),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the number of functions, you should considering adding a HashMap between fuse_opcode_* and Fn<&mut FuseBackend>. I would also consider moving the execute(op) inside the FuseBackend, as maybe some backends would not support all the operations.

@paulbaumgart
Copy link

paulbaumgart commented Nov 12, 2019

I'm very excited to see the progress on this, because this is the major blocker for adopting Firecracker at RunKit.

In the process of trying to get it to run, I noticed that #1319 was merged around the time this PR was submitted, and as of that merge, this PR no longer applies cleanly to master.

Any chance you have time to rebase this branch and make it compatible with the new API server code?

@raduweiss
Copy link
Contributor

To put thing in context: this is the kind of thing we need to really reflect on before merging since it's both a large piece of functionality, and a large new attack surface. We haven't gotten around to do this yet.

@zmlcc
Copy link
Author

zmlcc commented Nov 18, 2019

I'm very excited to see the progress on this, because this is the major blocker for adopting Firecracker at RunKit.

In the process of trying to get it to run, I noticed that #1319 was merged around the time this PR was submitted, and as of that merge, this PR no longer applies cleanly to master.

Any chance you have time to rebase this branch and make it compatible with the new API server code? If I have time this week, I'll take a stab at it, but I suspect you'll be much faster at it. 😁

I'm working hard to implement all the necessary fuse functions now. After that, I will rebase the code and update this PR. It will be soon.

@zmlcc
Copy link
Author

zmlcc commented Nov 18, 2019

To put thing in context: this is the kind of thing we need to really reflect on before merging since it's both a large piece of functionality, and a large new attack surface. We haven't gotten around to do this yet.

I cannot agree more. Fuse require all filesystem-related syscall and could be very dangerous to manipulate for attacking the host.

Besides, I am thinking about a safer and more controllable way to implement Host Filesystem Sharing: What about Host --> FUSE --> VSOCK(Maybe) --> GUSET ?

@paulbaumgart
Copy link

paulbaumgart commented Nov 20, 2019

Thinking a little more about this, I wonder if read-only access to the host filesystem would be a good (initial?) compromise in terms of functionality and security. It would satisfy our use case at RunKit, and, from my reading of kata-containers/runtime#1071, it seems like it'd support a number of other use cases as well.

@dianpopa dianpopa added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Dec 2, 2019
@kainosnoema
Copy link

kainosnoema commented Apr 3, 2020

Thinking a little more about this, I wonder if read-only access to the host filesystem would be a good (initial?) compromise in terms of functionality and security.

One big drawback of firecracker today is I/O performance, even write IO. virtio-fs has the potentially to dramatically improve that, making a big difference for some use cases. Hopefully this can be implemented in a secure way?

@KawaiDesu
Copy link

Hi! How is the progress? This feature is very desirable. Right now it blocks me from fully using firecracker :)

@raduweiss
Copy link
Contributor

Hi @zmlcc , @paulbaumgart , @kainosnoema @KawaiDesu . I embarrassingly failed to follow up here all this time. Apologies for that.

We did look at virtio-fs again over the past months, and our findings were these:

  • We still think the attack surface implications are large. Even in read-only mode, it's still a lot of code and a lot of new bits the guest can throw at the Firecracker emulation layer.
  • For most use cases we've seen so far, there's been a not-too-bad workaround via other channels (e.g., for configuration, using MMDS can be a workable alternative to a read-only file system).
  • Exposing a file system will likely make (incremental) snapshot restore more complex, including in use cases where the same snapshot is resumed in multiple microVMs. If we build live migration, the complexity is even greater.

Mapping these findings onto Firecracker's core mission and tenets led us to not prioritize this feature this year, and frankly I don't think the above will change enough for us to build it outright in Firecracker proper. There are two other future-looking paths:

  1. rust-vmm which is a place to build virtualization building blocks in Rust. These components can then be used in rust-based VMMs. The rust-vmm community is also building towards a way to create full-blown VMMs out of those crates, which would be a super-nice way to have a VMM with just what you need an no more, picking out of a variety of capabilities. Also, since Firecracker uses/plans to use all the crates that it can from rust-vmm, including all the virtio machinery, it will make any potential future addition of virtio-fs in Firecracker much easier.

  2. As more users find use cases for Firecracker, we may reach a point where it makes sense to support more things. To not trade off our core use case simplifying assumptions, we'll need something like a way to build different Firecracker "specialized variants" or alternatively a plugin system. This is something worth discussing for 2021. Again this is not a guarantee, but virtio-fs could fit in such a scheme, especially if it's a rust-vmm crate we can easily import over.

Both of these are long-term solutions we don't think there's a 2020 solution for virtio-fs.


Thank being said ...

@zmlcc I'm curious about the Host --> FUSE --> VSOCK(Maybe) --> GUSET proposal. Does this bypass a new device?

@kainosnoema , you're right, I/O performance is currently a gap in Firecracker. We've done some prototyping with io_uring [1] and we think that's a very good next step to take on the I/O front: it will greatly improve the efficiency of how we use our single emulation thread. We'll probably add that to our roadmap soon, along with the rest of the findings from our I/O experiments.

[1]
image

@raduweiss
Copy link
Contributor

Closing this for now with the above reasoning. Please re-open to continue the conversation here! 😄

@twitchyliquid64
Copy link

Given the user demand for this feature, was the primary reason around non-acceptance mostly security and complexity?

Its been a few years, now theres a reasonably stable/proven virtiofsd which does all the scary bits (parsing FUSE requests and actually servicing the I/O requests). If firecracker was to delegate all of this off to a separate daemon, the implementation / attack-surface within firecracker itself would be extremely small. Alternatively, virtiofsd is also a Rust crate, so you could bring this into Firecracker.

virtio-fs is just a series of virtqueues that shuttle FUSE headers + payloads back and forth, so all the device driver would need to do is setup those queues + copy them down a socket to the daemon.

@raduweiss thoughts on this architecture? Would this be palatable as a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants