Option to cache read files, but not whole archive #31

arthurmelton · 2025-01-14T23:34:46Z

Hello, I was wondering if there was a possibility for the program to cache files that are either read, or recently read. For example say I have a folder with 5 files, I mount this archive and no files are cached immediately. The second I read one of the files, that specific file that I read is cached into /tmp. That way you can have fast reads and sequential reads to a file, but ignore files that are not used. An upgraded version of this would be the ability to cache files for a specific amount of time after last read, and then remove the cache after the time is up.

fdegros · 2025-01-15T00:59:44Z

This idea mostly makes sense for archive formats that are designed to be accessed in random order without having to decompress the whole archive (ZIP, 7Z without solid compression, RAR, ISO, and maybe uncompressed TAR).

However, for all the solidly compressed archive formats (7Z with solid compression, TAR.GZ, TAR.XZ, TAR.BZ2...), it makes more sense to cache everything, since the whole archive needs to be decompressed in order to get the file list when mounting it.

Do you need to deal with a specific archive format?

arthurmelton · 2025-01-16T14:44:56Z

Specifically the types of files I work with are the archives you talked about first, archives where files can be individually extracted. I could look into making a PR, but I would only be able to start it in a week.

fdegros · 2025-01-16T23:34:44Z

If you deal with ZIP archives, I suggest you try the mount-zip program. It is quite similar to fuse-archive, but it also features a lazy caching mechanism, which might be what you want.

If you deal with non-solidly compressed archives other than ZIP, I suggest you try fuse-archive with the -o nocache option. Mounting such archives will be instantaneous, but you will pay for the decompression price when reading the individual files from the mounted archive.

arthurmelton · 2025-01-17T02:05:05Z

More specifically I am working with zip, rar, and 7z, so sadly I am not able to uniformly use mount-zip, as it only supports a third of what I want. I don't expect the simple caching I suggested to be great, but I would expect it to be faster than -o nocache. If I explain this in the way, to actually not have the XY problem, my problem is that my archives are too big to store a whole uncompressed version of, but would like to have quicker speeds than nocache. Fortunately individual files can be uncompressed, and I want to do quite a lot of redundant reads on those files, hence the want for caching the full file after read.

fdegros · 2025-01-17T03:55:00Z

Ok. You described some very specific constraints for which an incremental or "lazy" caching mechanism makes sense.

It is certainly possible to implement it. But I don't promise that it is going to be a high priority item for me.

fdegros assigned arthurmelton Jan 15, 2025

fdegros added the question Further information is requested label Jan 15, 2025

fdegros removed the question Further information is requested label Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to cache read files, but not whole archive #31

Option to cache read files, but not whole archive #31

arthurmelton commented Jan 14, 2025

fdegros commented Jan 15, 2025 •

edited

Loading

arthurmelton commented Jan 16, 2025

fdegros commented Jan 16, 2025

arthurmelton commented Jan 17, 2025

fdegros commented Jan 17, 2025

Option to cache read files, but not whole archive #31

Option to cache read files, but not whole archive #31

Comments

arthurmelton commented Jan 14, 2025

fdegros commented Jan 15, 2025 • edited Loading

arthurmelton commented Jan 16, 2025

fdegros commented Jan 16, 2025

arthurmelton commented Jan 17, 2025

fdegros commented Jan 17, 2025

fdegros commented Jan 15, 2025 •

edited

Loading