Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to cache read files, but not whole archive #31

Open
arthurmelton opened this issue Jan 14, 2025 · 5 comments
Open

Option to cache read files, but not whole archive #31

arthurmelton opened this issue Jan 14, 2025 · 5 comments
Assignees

Comments

@arthurmelton
Copy link

Hello, I was wondering if there was a possibility for the program to cache files that are either read, or recently read. For example say I have a folder with 5 files, I mount this archive and no files are cached immediately. The second I read one of the files, that specific file that I read is cached into /tmp. That way you can have fast reads and sequential reads to a file, but ignore files that are not used. An upgraded version of this would be the ability to cache files for a specific amount of time after last read, and then remove the cache after the time is up.

@fdegros
Copy link
Collaborator

fdegros commented Jan 15, 2025

This idea mostly makes sense for archive formats that are designed to be accessed in random order without having to decompress the whole archive (ZIP, 7Z without solid compression, RAR, ISO, and maybe uncompressed TAR).

However, for all the solidly compressed archive formats (7Z with solid compression, TAR.GZ, TAR.XZ, TAR.BZ2...), it makes more sense to cache everything, since the whole archive needs to be decompressed in order to get the file list when mounting it.

Do you need to deal with a specific archive format?

@fdegros fdegros added the question Further information is requested label Jan 15, 2025
@arthurmelton
Copy link
Author

Specifically the types of files I work with are the archives you talked about first, archives where files can be individually extracted. I could look into making a PR, but I would only be able to start it in a week.

@fdegros
Copy link
Collaborator

fdegros commented Jan 16, 2025

If you deal with ZIP archives, I suggest you try the mount-zip program. It is quite similar to fuse-archive, but it also features a lazy caching mechanism, which might be what you want.

If you deal with non-solidly compressed archives other than ZIP, I suggest you try fuse-archive with the -o nocache option. Mounting such archives will be instantaneous, but you will pay for the decompression price when reading the individual files from the mounted archive.

@fdegros fdegros removed the question Further information is requested label Jan 16, 2025
@arthurmelton
Copy link
Author

More specifically I am working with zip, rar, and 7z, so sadly I am not able to uniformly use mount-zip, as it only supports a third of what I want. I don't expect the simple caching I suggested to be great, but I would expect it to be faster than -o nocache. If I explain this in the way, to actually not have the XY problem, my problem is that my archives are too big to store a whole uncompressed version of, but would like to have quicker speeds than nocache. Fortunately individual files can be uncompressed, and I want to do quite a lot of redundant reads on those files, hence the want for caching the full file after read.

@fdegros
Copy link
Collaborator

fdegros commented Jan 17, 2025

Ok. You described some very specific constraints for which an incremental or "lazy" caching mechanism makes sense.

It is certainly possible to implement it. But I don't promise that it is going to be a high priority item for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants