Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Configurable Allocation GC Threshold #13632

Open
tyler-domitrovich opened this issue Jul 6, 2022 · 1 comment
Open

Allow Configurable Allocation GC Threshold #13632

tyler-domitrovich opened this issue Jul 6, 2022 · 1 comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/config type/enhancement

Comments

@tyler-domitrovich
Copy link

Currently the Nomad config offers several *_gc_threshold options which allow users to prevent various items in the raft state from being garbage collected after they enter a terminal state for a limited duration. This feature has been valuable when debugging failed jobs or deployments since they will remain visible even if a GC is ran.

A similar config option for allocations would be useful for debugging failed allocations. Currently it appears that allocations are marked eligible for GC as soon as they enter a terminal state.

Proposal

I propose the addition of an alloc_gc_threshold config option that allows allocations to be immune from garbage collection for the specified duration after they enter a terminal state.

Attempted Solutions

I am currently unaware of a workaround, but would be interested in any recommendations!

@tgross
Copy link
Member

tgross commented Jul 7, 2022

Hi @tyler-domitrovich!

A similar config option for allocations would be useful for debugging failed allocations. Currently it appears that allocations are marked eligible for GC as soon as they enter a terminal state.

That's not quite the case, but this is definitely could be better-documented! Allocations are GC'd when we GC Evaluations (see core_sched.go#L284-L359). This is for data consistency: we don't want Allocations floating around orphaned from the Evaluation that created them. So we check that both the Allocations are terminal and that they're older than the eval_gc_threshold.

But as you've discovered, that means that if an allocation lives for more than an hour (by default), it's eligible for GC as soon as it's terminal. It'd be nice to "pad out" that window by some time for debugging, for sure. I'll mark this issue as a feature request for the roadmap. (And if you feel like giving it a go yourself I'd be happy to help shepherd that along!)


A note for anyone hitting this issue in search, this is not the same thing as client-side allocation GC, which is already configurable via client.gc_interval (and a handful of other knobs on that page). That GC process is unrelated to what happens on the server and is driven by disk space pressure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/config type/enhancement
Projects
Status: Needs Roadmapping
Development

No branches or pull requests

2 participants