You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Linux magic 5.5.13-arch2-1 #1 SMP PREEMPT Mon, 30 Mar 2020 20:42:41 +0000 x86_64 GNU/Linux
Issue
All tasks are currently being killed off on a particular node shortly after they start, anything scheduled here that needs to run on this node ends up in a permanent pending state.
Restarting nomad allows the task to be rescheduled after purging it but it will quickly fail again. The logs suggest the garbage collector is destroying everything on the node the moment it's catalogued???
Hi @thetooth! The "forced collection" message you're seeing is because the client is full enough that it needs to GC allocations that have failed. ~4 minutes passed between start and finish, and the GC can't reap a running task. The RPC error you're seeing is coming from the stats collector not having anything to collect stats from; it shouldn't be throwing an error on a task stop and that's probably a bug but that's unrelated to why the task has stopped in the first place.
A few things we could use here:
Take a look at your client GC config so as to delay the GC of allocations: gc_max_allocs, gc_disk_usage_threshold, etc.
Once you've done that, you should be able to get the nomad alloc logs :alloc_id for the allocation (don't forget to check with the -stderr flag too)
Turn up the client's logs to debug so that we can see if the client is giving us any more clues as to what's going on.
Without additional information there's not much more we can do with this issue. Going to close this one out. Please feel free to re-open or open a new issue if you have additional information.
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Nomad version
Nomad v1.0.1 (c9c68aa)
Operating system and Environment details
Linux magic 5.5.13-arch2-1 #1 SMP PREEMPT Mon, 30 Mar 2020 20:42:41 +0000 x86_64 GNU/Linux
Issue
All tasks are currently being killed off on a particular node shortly after they start, anything scheduled here that needs to run on this node ends up in a permanent pending state.
Restarting nomad allows the task to be rescheduled after purging it but it will quickly fail again. The logs suggest the garbage collector is destroying everything on the node the moment it's catalogued???
The text was updated successfully, but these errors were encountered: