Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CacheBeta: the nuget cache keeps growing #11770

Closed
stan-sz opened this issue Nov 14, 2019 · 12 comments
Closed

CacheBeta: the nuget cache keeps growing #11770

stan-sz opened this issue Nov 14, 2019 · 12 comments
Assignees

Comments

@stan-sz
Copy link

stan-sz commented Nov 14, 2019

Question, Bug, or Feature?
Type: Bug

Enter Task Name: CacheBeta@0

list here (V# not needed): 0

Environment

  • Server - Azure Pipelines or TFS on-premises?
    Azure Pipelines

  • Agent - Hosted or Private:
    Private

    • If using Hosted agent, provide agent queue name:

    • If using private agent, provide the OS of the machine running the agent and the agent version: OS: linux, agent: 2.160.0

Issue Description

The global nuget cache grows with pipeline runs, and the cache task keeps restoring that whole nuget cache, even if it contains long unused packages.

According to https://docs.microsoft.com/en-us/azure/devops/pipelines/caching/?view=azure-devops#netnuget setting the NUGET_PACKAGES sets the global nuget cache folder. After running the pipeline for multiple builds (including PRs) the global cache accumulates all previously used package versions.

Consider clearing the cache on key miss (https://docs.microsoft.com/en-us/nuget/consume-packages/managing-the-global-packages-and-cache-folders) or keeping track of packages read-from during the pipeline and storing only these packages on storing the cache in the Post-job task.

@johnterickson
Copy link
Contributor

Thanks @stan-sz !

One thing we could take a look at now that we are preserving timestamps is https://www.nuget.org/packages/dotnet-nuget-gc

Unfortunately that uses LastAccessTime which is spotty: https://unix.stackexchange.com/questions/8840/last-time-file-opened

@terrajobst Any suggestions?

@johnterickson
Copy link
Contributor

@stan-sz : Are you using a lockfile?

One ugly way is to reset on a regular basis: e.g.

- script: echo week_$(date +%-d) > week_number.txt
- task: Cache@2
  inputs:
    - key: '... | ./week_number.txt | ...'

@stan-sz
Copy link
Author

stan-sz commented Nov 15, 2019

We are not using a lockfile. We rely purely on ProjectReferences. However, the dotnet-nuget-gc option won't work with the example, because dotnet-nuget-gc assumes the dotnet cache is under ~/.nuget/packages, while the example explicitly sets NUGET_PACKAGES='$(Pipeline.Workspace)/.nuget/packages'. Making dotnet-nuget-gc accept the nuget cache path as an optional param would be a simple fix, otherwise it does not work for Azure Pipelines (@jeffkl asked for this in terrajobst/dotnet-nuget-gc#7).

As for your second suggestion - the problem is that even if cache is invalidated, the nuget cache is still persisted, so the unused packages from the nuget cache still end up being cached.

@terrajobst
Copy link

terrajobst commented Nov 15, 2019

I wrote dotnet-nuget-gc as a workaround for NuGet/Home#4980. At the time, I didn't know about LastAccessTime being meaningless by default :-(

I thought pipelines will throw the machine's state away when the build is complete. Is that not the case?

@rrelyea what do you think CI servers should do avoid unbounded growth?

@fadnavistanmay
Copy link
Contributor

Hi @terrajobst " Ithought pipelines will throw the machine's state away when the build is complete. Is that not the case?" -> In this case, it's a self-hosted agent, so the machine will be reused.

@terrajobst
Copy link

Gotcha, I missed that. Yeah, it seems you'll want a similar way to nuke build state, unless you have a lot of trust in your ability to not accidentally depend on machine state between runs 😄

@johnterickson johnterickson removed the bug label Nov 18, 2019
@johnterickson
Copy link
Contributor

Removing bug label as task is performing as expected. Leaving open to discuss tooling for NuGet.

@johnterickson
Copy link
Contributor

Given that this is primarily an issue with the NuGet client (NuGet/Home#4980) I will close this.

There is still the (ugly) option to reset it on a regular basis: #11770 (comment)

@Bouke
Copy link

Bouke commented Jan 20, 2021

May I suggest reopening this ticket? We're running into the same problem with the cache accumulating package versions and causing unbounded growth. I could imagine an option to not restore the cache for master builds when there's not an exact cache hit.

@carlin-q-scott
Copy link

@johnterickson This is isn't an issue with NuGet client. The recommended cache fallback key matches all caches for the build definition. So it will always restore and persist the same cache. I removed the fallback key from my build definition and it had a cache miss, but the Post-job cache step matched the generic cache fingerprint and decided to not do anything since it thought the generic cache had been used.

Should I file that bug separately?

@johnterickson
Copy link
Contributor

This is written by one of the NuGet devs and looks promising: https://github.com/chrisraygill/NuGetCleaner

@carlin-q-scott
Copy link

@johnterickson That says we need to modify the FS behavior using a powershell command in order to track "last accessed" tracking. Will that even work with a nuget cache? I suspect even if "last accessed" is being tracked by the Windows VM, it won't be persisted to the cache in blob storage.

Since you didn't respond to my bug report here, I will open a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants