Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new memory management options for FUSE driver #15

Closed
mhx opened this issue Dec 2, 2020 · 5 comments
Closed

Add new memory management options for FUSE driver #15

mhx opened this issue Dec 2, 2020 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@mhx
Copy link
Owner

mhx commented Dec 2, 2020

Granted you might be right. Next time that I try DwarFS, I'll issue a sysctl -q vm.drop_caches=3 which if I'm not mistaken should drop the kernel file-system caches.


(In what follows I refer to the dwarfs image as just image, and to the uncompressed files exposed through the mount point as files.)

However, on the same topic, wouldn't it be useful to have the following complementary options:

  • whether to let the kernel cache the files (not the image), like all normal file-systems do; (I think this is the default;)
  • whether dwarfs daemon accesses the image without using the kernel cache (either via O_DIRECT or by using madvise with MADV_DONTNEED in case of mmap access after a block was used);

At the moment I think that both the files and the image are eventually cached by the kernel, thus increasing the memory pressure of the system.

However by using the two proposed options, one could fine tune the CPU / memory usage to fit one's particular use-case:

  • disable the kernel cache for the files, but enable the kernel cache for the image, one trades CPU for and saves some memory; (useful for example when the application reading the files already has its own caches;)
  • (my proposed default) enable the kernel cache for files, but disable the kernel cache for the image, one saves on memory for the image but trades some CPU (less than in the previous case); (I think this would be the closest thing to how a normal file-system works, only actual files are cached, but not the block device data;)
  • disable the kernel cache for both files and image, one would heavily trade CPU for minimal memory usage; (this would be useful for example when one needs only a single pass of the stored files;)
  • (the current default?) enable the kernel cache for both files and image, one would trade memory for minimal CPU usage;

Originally posted by @cipriancraciun in #9 (comment)

@mhx mhx self-assigned this Dec 2, 2020
@mhx mhx added the enhancement New feature or request label Dec 2, 2020
@mhx
Copy link
Owner Author

mhx commented Dec 2, 2020

@cipriancraciun feel free to check out https://github.com/mhx/dwarfs/tree/next-release which implements both features (and some other stuff to go into the next release). New options are documented here.

@cipriancraciun
Copy link

@mhx I'll try to experiment with that branch at the end of this week.

However until then I've looked at the pointed documentation and I would suggest changing a little the names to better reflect what is happening behind the scenes:

  • -o no_image_madvise based on what I understood it is my understanding that you've taken my default proposal of issuing a madvise with MADV_DONTNED by default; so I would suggest renaming this one to -o image_cache to better reflect that using this the image would also be cached; furthermore madvise is an implementation detail, and perhaps at some point you'll switch to a different implementation (like O_DIRECT) thus the flag will be inaccurate;
  • -o direct_io is misleading because one could mistakenly understand that the image is read with O_DIRECT; perhaps rename it to -o no_fuse_cache that reflects exactly what happens; (-o no_kernel_cache would be an alternative but it's still unclear which files are cached by the kernel;)

In fact I would suggest also adding the complementary of each, i.e. -o image_cache vs -o no_image_cache and -o fuse_cache vs -o no_fuse_cache, this way one can always use the proper flag for once to make it clear (perhaps for documentation purposes in a script) what is used, and thus allowing you to change the defaults without affecting advanced users.

@mhx
Copy link
Owner Author

mhx commented Dec 3, 2020

Thanks for the suggestions, they're indeed much better. I wasn't really happy with my choices either. The direct_io just happened to be what FUSE itself called this option in the past. And the other one is just bad.

I'm going to almost go with your suggestions, but I think that -o (no_)cache_image and -o (no_)cache_files are probably even more intuitive.

@mhx
Copy link
Owner Author

mhx commented Dec 3, 2020

Naming updated in 0ca5a29.

@cipriancraciun
Copy link

I'm going to almost go with your suggestions, but I think that -o (no_)cache_image and -o (no_)cache_files are probably even more intuitive.

Yup, makes more sense than my initial proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants