Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A better cache clean-up and expiration policy #4980

Open
terrajobst opened this issue Apr 5, 2017 · 42 comments
Open

A better cache clean-up and expiration policy #4980

terrajobst opened this issue Apr 5, 2017 · 42 comments
Labels
Area:HttpCaching http caching by all tools Category:Quality Week Issues that should be considered for quality week help wanted Considered good issues for community contributions. Priority:2 Issues for the current backlog. Type:Feature

Comments

@terrajobst
Copy link

terrajobst commented Apr 5, 2017

edit by @zivkan: a first step has been implemented, allowing package "last used in restore" date to be updated: NuGet/NuGet.Client#4222

edit by NuGet Client Triage:

  • Restore driven - A minimally impactful approach might be to run a "gc" periodically, like say once a week or once a month.
  • Additional command - Community projects like NuGet Cleaner combine with updatePackageLastAccessTime

original post below:


The user local NuGet cache has a tendency to grow unconditionally. Which results in funny tweets like this from @davidfowl:

David Fowler@davidfowl 9m
How big is your nuget cache? Mine is 14 gigs #nuget
image

Can we track when an item was last hit by package restore and apply some expiration policy? I don't have good data, but I bet a good chunk of the bloat aren't individual packages with a single version but a small number of packages with many versions. If were to just delete older versions that weren't needed for restore for a while, we're probably getting to something saner.

Thoughts?

@emgarten
Copy link
Member

emgarten commented Apr 5, 2017

Related: #2308

@terrajobst
Copy link
Author

terrajobst commented Apr 5, 2017

Yeah, I'm not a fan of an explicit clean as the sole mechanism. We have other discussions on how to hide package restore even more by making dotnet build behave like VS, i.e. running restore if necessary.

I think having a nuget gc command as @davidfowl suggested would be OK, but I really really want nuget restore to apply some cache expiration policy automatically. It doesn't have to be perfect, but something that doesn't fill up my drive indefinitely unless I invoke some special command.

@emgarten emgarten added this to the Backlog milestone Oct 17, 2017
@emgarten emgarten added Priority:3 Issues under consideration. With enough upvotes, will be reconsidered to be added to the backlog. Area:HttpCaching http caching by all tools labels Oct 17, 2017
@anangaur
Copy link
Member

@terrajobst You still would need some process to clean the cache up? I think and explicit command to purge is the best way. Some of the requirements:

  • Should be able to clean unused packages older than a date (yyyymmdd), time(1m,3m,6m, etc.) with defaults (may be 3m?)
  • Should be able clean up everything except packages used in a project/solution/repo/dir
  • [P2] Should be able to clean up packages from a source.
  • [P2] Should be able to clean up all unsigned. /cc: @rido-min

Today we do not keep any metadata around source, signature or lastUsed information. May be we should start persisting these info.

@anangaur anangaur self-assigned this Apr 30, 2018
@terrajobst
Copy link
Author

The way I'd design the cache is like this:

  1. Every time package restore resolves to a cache location, bump one of the dates (e.g. touch the .nuspec file)
  2. Every N days (default: 7), delete folders from the package cache that weren't used during the last M months (default: 2 months)

Not sure (1) is viable for perf reasons. If it's not, only bump the dates every so often.

The meta point is that a cache without expiration control isn't a cache -- it's a leak.

@anangaur
Copy link
Member

Perf is one consideration. restore could just touch the file (like in linux touch command) which should be more perfomant (or my belief :) ). But I am still not sure which command/process does the clean up i.e. delete of the expired or old packages from the cache. Again restore cannot be the one to do this (performance criterion).

@terrajobst
Copy link
Author

terrajobst commented Jun 13, 2018

It has to be automatic which means it has to be part of restore.

@anangaur
Copy link
Member

Ideally yes but without impacting restore perf :)

@terrajobst
Copy link
Author

Agreed, but it seems there must be a compromise between spamming my drive and impacting my package restore times.

Maybe we can address 90% of this if .NET Core wouldn’t be modeled as a large number of packages though.

@anangaur
Copy link
Member

Maybe we can address 90% of this if .NET Core wouldn’t be modeled as a large number of packages though.

@terrajobst - what's the current thinking here? Is there anything beyond early discussion in this direction? All ears.

@terrajobst
Copy link
Author

It’s all early discussions for .NET Core 3.0

@dotMorten
Copy link

dotMorten commented Oct 24, 2018

FWIW I put this little console app in a scheduled task. It deletes any package not accessed the past 30 days:

using System;
using System.IO;
using System.Linq;

namespace NugetCacheCleaner
{
    class Program
    {
        static void Main(string[] args)
        {
            int minDays = 30;
            if (args.Length > 0 && int.TryParse(args[0], out minDays)) { }
            string nugetcache = $"{Environment.GetFolderPath(Environment.SpecialFolder.UserProfile)}\\.nuget\\packages\\";
            DirectoryInfo info = new DirectoryInfo(nugetcache);
            long totalDeleted = 0;
            foreach (var folder in info.GetDirectories())
            {
                foreach (var versionFolder in folder.GetDirectories())
                {
                    var folderInfo = versionFolder.GetFiles("*.*", SearchOption.AllDirectories).GroupBy(i => 1).Select(g => new { Size = g.Sum(f => f.Length), LastAccessTime = g.Max(f => f.LastAccessTime) }).First();
                    var lastAccessed = DateTime.Now - folderInfo.LastAccessTime;
                    if (lastAccessed > TimeSpan.FromDays(minDays))
                    {
                        Console.WriteLine($"{versionFolder.FullName} last accessed {Math.Floor(lastAccessed.TotalDays)} days ago");
                        try
                        {
                            versionFolder.MoveTo(Path.Combine(versionFolder.Parent.FullName, "_" + versionFolder.Name)); //Attempt to rename before deleting
                            versionFolder.Delete(true);
                            totalDeleted += folderInfo.Size;
                        }
                        catch { }
                    }
                }
                if (folder.GetDirectories().Length == 0)
                    folder.Delete(true);
            }
            var mbDeleted = (totalDeleted / 1024d / 1024d).ToString("0");
            Console.WriteLine($"Done! Deleted {mbDeleted} Mb");
        }
    }
}

@terrajobst
Copy link
Author

If you're lazy like me, you might want to use @dotMorten's code through this dotnet global tool: https://github.com/terrajobst/dotnet-nuget-gc

@Liversage
Copy link

Liversage commented Oct 25, 2018

A bit tangential to this issue but I would like to make a comment on the placement of the NuGet cache on the file system.

The cache is inside the ~/.nuget folder and the developers making this decision must have been using a Unix-like operating system to pick this name. However, I would like to direct your attention to another operating system popular among .NET developers: Microsoft Windows. 😉

On Windows the tilde ~ doesn't have a special meaning in the "Windows shell". However, Windows has the concept of known folders and ~ on Unix is the same as the known folder FOLDERID_Profile or as an environment variable %USERPROFILE% which typically is C:\Users\UserName.

While Microsoft is providing less and less specific guidelines on how to use these known folders I would be so bold as to say that applications should not create their own folder in the user profile folder. This is the user's space and dumping application data here is not a good user experience. Also, adding a dot in front of a folder does not make any sense on Windows. These folders are by default hidden on Unix-like operating systems but not so on Windows. In fact, you are unable to use Windows Explorer to create or rename a folder when the name of the folder starts with a dot.

The correct placement of a cache on Windows is in the subfolder vendor\product in FOLDERID_LocalAppData which typically is the folder %USERPROFILE%\AppData\Local so the resulting folder would be something like %USERPROFILE%\AppData\Local\Microsoft\NuGet. While I'm sure most experienced Windows developers will agree with me on this it is surprisingly hard to find specific and concrete guidelines on this. The best I have found is section 10 in Certification requirements for Windows Desktop Apps which unfortunately has some typographical errors.

More generally and on Windows, please move all dot-prefixed application data folders and files away from the user profile folder and into the local application data folder where they belong. 🙂

@terrajobst
Copy link
Author

I would be so bold as to say that applications should not create their own folder in the user profile folder.

Totally agree. %LocalAppData%\NuGet would be the correct location IMHO.

More generally and on Windows, please move all dot-prefixed application data folders and files away from the user profile folder and into the local application data folder where they belong.

Agree as well.

@anangaur
Copy link
Member

@Liversage %LocalAppData%\NuGet aka %USERPROFILE%\AppData\Local\NuGet seems like the right location. We already use this location used for some caching purposes.

@alejandro5042
Copy link

alejandro5042 commented Nov 12, 2019

A related solution for Windows is a Disk Cleanup Handler, as proposed in #8076 by @miloush.

@alejandro5042
Copy link

alejandro5042 commented Nov 12, 2019

Insight: We only need to clear the cache when a new nuget is added to it

If we are not adding to the cache, it's certainly not getting bigger, so what's the point in checking it?

Also, when we are adding to the cache, the cost of deleting old nugets may be eclipsed by downloading new nugets (unpredictable over the Internet), unzipping, rewriting package-lock.json files, etc. This is an opportune time to manage our cache.

This strikes the balance @terrajobst mentioned:

[...] there must be a compromise between spamming my drive and impacting my package restore times.

Option: Run cache cleanup when threshold is reached

When unpacking a new nuget:

  • Size up how big the new nuget is (this data may already exist in the unzipping step)
  • Store the size and date in a file somewhere (e.g. MyNuget.cache-info.json)
  • Add up all the nuget sizes
  • If size >= N GB, delete nugets older than 90 days

Where the size threshold N can be determined by: max(disk size * 10%, 2 GB)

Yes, this means the cache can grow over past the threshold, but we also don't want to delete nugets that might reasonably still be in use. This is a heuristic meant to help the common case, after all. Users can still clear the cache with:

dotnet nuget locals all --clear
nuget locals all -clear
(etc)

Several options on when to add sizes and delete:

  • On the first nuget that finishes to download
    • There is a chance other nugets will need to download, but we don't have to wait until that's done to start deleting nugets. Yes, we won't be counting those new nugets in the size estimation, but maybe that's OK
  • When any nuget finishes to download
    • Keep a running total and if it ever goes over the threshold, start deleting
  • When the last nuget finishes its download
    • This is the most accurate but probably unnecessary to wait until this time

Option: Delete in background thread

If performance is still a factor, we can additionally delete in a background thread:

  • Thread could set low CPU and disk priority
  • Kill the delete thread when we're done restoring so that the build can complete

Some care will need to be done to not kill an in-progress delete. Here are two options:

  1. Kill the thread when it's done the current delete operation
  2. Move nugets we want to delete to a nuget-defined TEMP folder (moving is atomic). Delete all files in that folder until empty

@anangaur anangaur changed the title A better cache expiration policy A better cache clean-up and expiration policy Jun 16, 2020
@anangaur
Copy link
Member

From an email discussion regarding this topic:

"...Everyone’s nuget cache and .nuget\packages folder is filling up as we upgrade packages. The only solution we have currently is to delete the whole packages folder and re-download everything, which can be a huge hit on DevOps especially while working remotely. It would great if we had a way to say “delete all the nuget packages except for these packages @ the specified version”.

@zivkan
Copy link
Member

zivkan commented Oct 29, 2020

Interesting, the restore time when it did not need to modify last modified times are about the same as when it does update the last write time. This doesn't line up with my microbenchmarks where checking the last write time, without updating it, was only very slightly slower than File.Exists, while updating the last write time was much slower (in percentage terms).

All your tests with the modification are consistently slower than the test without your changes, which gives strong evidence that this code change does impact performance, but the results don't have the same trend as the microbenchmarks, so I don't feel like I understand what's really going on here.

I think we need more information. Maybe microbenchmarks of LocalPackageFileCache.Sha512Exists will help, or maybe we need PerfView traces before and after to compare. While I personally think that 300ms on a 6.5s restore (so, just under 5%) seems ok, I'm not involved in the perf meetings with partners and executives, so I don't know if this will pass.

@mfkl
Copy link

mfkl commented Nov 6, 2020

While I personally think that 300ms on a 6.5s restore (so, just under 5%) seems ok,

Running these benchmarks on both dev and my branch produced quite some different results on many successive runs. My opinion is that if you are going to be looking at fractions of seconds diffs, then the current perf script (e.g. starting an exe manually) won't provide reliable perf results.

Maybe microbenchmarks of LocalPackageFileCache.Sha512Exists will help

Yes, I'll try to put that together and let you know.

@zivkan
Copy link
Member

zivkan commented Nov 6, 2020

Running these benchmarks on both dev and my branch produced quite some different results on many successive runs. My opinion is that if you are going to be looking at fractions of seconds diffs, then the current perf script (e.g. starting an exe manually) won't provide reliable perf results.

Perhaps running the script more times, or changing the script to execute more runs, will increase our confidence. This is exactly what BenchmarkDotNet does when it detects what it's running has high standard deviation. Low standard deviation benchmarks run 15 times, high standard deviation benchmarks get up to 100 runs.

The randomness/variability of starting an exe is the same between the dev and feature branch builds of nuget.exe, so larger number of runs will smooth it out. I didn't do a statistical analysis to check if the mean of the before-vs-after runs are within the "margin of error", but it certainly looks like your feature branch is systematically slower than the dev branch. Unless someone can point out a reason why this method of testing is systematically unfair to one nuget.exe and not the other, then I believe that a larger number of tests is sufficient. Next time I can also calculate standard deviation to check if the mean values of the two nuget.exes are within "margin of error". That's the limit of my statistical skills, however.

In another issue you mentioned that your feature branch is changing the timestamp of files not only in the global packages folder, but also read-only fallback folders. This is not only a bug, but also means that your feature branch is doing more work than is necessary, and therefore we could improve performance by fixing the bug. An alternative is to ensure that benchmarks are using a nuget.config that clears fallback folders (the example doesn't show how to clear, but it's the same as package sources, use <clear/>), so only the global packages folder is used. Once we determine that the feature has acceptable perf, we can spend time on fixing the bug.

Yes, I'll try to put that together and let you know.

For what it's worth, I believe that in December I'll have capacity to work on this for a few days. I don't want to "steal credit" for your work, so I can find other work to do if you'd like to drive this to completion yourself and have a PR/commit with only your name. Otherwise I can copy your branch, so your name is in the commits and try to complete it myself. Having said that, December is a month away, so if we don't encounter too many more problems you might be able to complete it before then anyway.

@mfkl
Copy link

mfkl commented Nov 14, 2020

Perhaps running the script more times

I have ran the script dozen of times on either branches and get wildly different results. If my CPU is already at 20% usage from other apps just before running the script, perf numbers go x2 easily.

In another issue you mentioned that your feature branch is changing the timestamp of files not only in the global packages folder, but also read-only fallback folders.

Right, I've made a bad hack to fix this temporarily mfkl/NuGet.Client@5d2c27a. This change won't fix the result deviations though, only using benchmarkdotnet would I'm afraid.

I don't want to "steal credit" for your work

Thanks, I appreciate your concern, but I don't really care about credits for this. As long as Software gets better, I'm happy :-)

@Danielku15
Copy link

I stumbled over this issue in relation to another topic and I would like to give just a small input for consideration on clean up policies. I wasn't able to quickly grasp all details of the already long ongoing discussion so I dare to just drop my input in the hope it fits into the current plans.

NuGet supports floating versions which can be quite useful if you have continuous deployment of your packages (in our case we have still old fashioned nightlies for some package deploys). If you go for a semver like 1.0.0-alpha.### and consume the package via 1.0.0-alpha.* you will get continuously new packages if they are available. This also means after a while you end up with a lot of pre-release versions which might not be actually relevant anymore.

It would be great if the cleanup policy could be shorter by default for such pre-release packages. I would expect in most of the cases you would always only need the maybe one or two latest pre-release version of one tag for a particular library version in the local cache.

@eatdrinksleepcode
Copy link

Needed to manually delete some large older packages from my cache today to make room on my disk, and found this issue. As Immo said, this is a leak, and as such I am disappointed that it hasn't been prioritized more highly.

There are two potential problems I can see with the spec as proposed:

  • Problem: When the user first updates to a nuget version that has this feature, their entire cache will be wiped out, since the files have not been touched in previous versions.
    • Possible Solution: before running the cleanup process the first time, touch all packages one time. This would cause unused packages to be retained even longer, but would put them on the path to being purged eventually. This solution would require the ability to keep track of whether the cleanup process had ever run before.
  • Problem: If option 1 from the spec is too slow, and so option 2 is implemented instead, how would nuget know whether an "old" package was actually unused or just being used in a project whose dependencies are not changing, and therefore is only performing no-op restores?
    • Possible solution: Touch packages even during a no-op restore only if they are going to cross the cleanup threshold soon. For example, if the cleanup threshold is set at 3 months, and the no-op restore uses a package that has not been touched in two months, it touches the package to update it's last accessed time before the cleanup process.
    • A potential issue with this solution would be if the cleanup threshold is set much more aggressively (say, 1 week), then the touch threshold would have to be set even shorter (say, a few days), which could cause no-op restores to slow down more frequently than desired. But it seems unlikely that the cleanup threshold would need to be set that short, and if for some reason it is, then that likely means that disk space is at a premium, in which case slower no-op restores does not seem an unreasonable price to pay.

/cc @zivkan

@zivkan
Copy link
Member

zivkan commented Mar 16, 2022

Thanks to @mfkl for implementing the "core" part that will update the last access time of one file in each package on restore: NuGet/NuGet.Client#4222

I'm guessing it's going to ship in .NET SDK 6.0.300 and VS 17.2.

Once it's shipped, and we start using it for all our solutions, then we'll be able to use tools like https://github.com/chgill-MSFT/NuGetCleaner

@nkolev92 nkolev92 added Category:Quality Week Issues that should be considered for quality week and removed Category:Customer Sprint labels Aug 29, 2022
@C-Wal
Copy link

C-Wal commented Oct 20, 2022

Personally I'd prefer to have an option that kept the last n number of versions downloaded of each package instead of simple date-based expiry.

@nkolev92 nkolev92 added Priority:2 Issues for the current backlog. and removed Priority:3 Issues under consideration. With enough upvotes, will be reconsidered to be added to the backlog. labels Dec 7, 2022
@jaredthirsk
Copy link

If you're lazy like me, you might want to use @dotMorten's code through this dotnet global tool: https://github.com/terrajobst/dotnet-nuget-gc

Done! Deleted 62,359 MB.
Wow, that's a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area:HttpCaching http caching by all tools Category:Quality Week Issues that should be considered for quality week help wanted Considered good issues for community contributions. Priority:2 Issues for the current backlog. Type:Feature
Projects
None yet
Development

No branches or pull requests