Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: detailed allocation GC docs and logging of allocation state before GC #3421

Open
margueritepd opened this issue Oct 19, 2017 · 6 comments
Assignees
Labels
theme/client theme/docs Documentation issues and enhancements type/enhancement

Comments

@margueritepd
Copy link
Contributor

I recently had some allocations fail and later be replaced with new ones. Judging by the client logs below, allocations were GC'd less than 2h after allocation failure. I can't tell by the logs when the Nomad servers garbage collected the allocations.

2017/10/18 21:50:00.152653 [INFO] client: garbage collecting allocation 44ff048f-cb00-3c0b-528b-9aa76f1867cc due to forced collection
2017/10/18 21:50:00.152515 [INFO] client: marking allocation 44ff048f-cb00-3c0b-528b-9aa76f1867cc for GC
2017/10/18 20:42:25.759585 [INFO] client: marking allocation 44ff048f-cb00-3c0b-528b-9aa76f1867cc for GC
2017/10/18 20:42:25.759441 [INFO] client: marking allocation 44ff048f-cb00-3c0b-528b-9aa76f1867cc for GC

By the time I noticed that the allocation had failed, any debugging information was long gone. When googling for documentation about Nomad allocation GC, I have only been able to find the GC API call documentation and various github comments, so I've had to piece together an understanding of how alloc GC works.

Two things I would like to see:

  • logging of reason for allocation failure before the allocation is GC'd
  • more explicit documentation about GC behaviour & timelines on both the client and server

Let me know what you think!

@schmichael
Copy link
Member

Good timing! I'm working on the GC as we speak and the logging will be improved.

forced collection is the reason. Did you force a GC via the /v1/system/gc HTTP endpoint. Otherwise it was probably the periodic server initiated GC which unfortunately are only logged at DEBUG level.

@margueritepd
Copy link
Contributor Author

Great to hear! Better logging and more explicit documentation on how GC works will be very helpful.

No manual call to /v1/system/gc was made. I was under the impression (from gleaning github comments) that the clients kept allocations around until forced to clean up from lack of disk space, but if the server forces clean up on its own schedule that makes sense. Clearer documentation would definitely help reason about these things!

@schmichael
Copy link
Member

#3445 includes some logging improvements to the GC but does not directly address the items you list.

@holtwilkins
Copy link
Contributor

holtwilkins commented Jul 11, 2018

Hey @schmichael I'm pretty curious about this too. I'm trying to make Nomad stop being so aggressive about cleaning up old allocations, but not having much luck.

I've tried setting gc_max_allocs to 600, and it seems to have helped a little, but Nomad is still cleaning-up allocations well before I've collected 600. It shouldn't be getting anywhere near 80% disk utilisation either (or 70% inode utilisation), so I'm not sure what else could be going on here? Similar to @margueritepd , this is severely limiting our ability to troubleshoot failing jobs, so I'd like to know how I can tune this to keep troubleshooting material around.

root@ip-10-2-233-187:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             16G     0   16G   0% /dev
tmpfs           3.1G  308M  2.8G  10% /run
/dev/nvme0n1p1   97G  3.6G   94G   4% /
tmpfs            16G     0   16G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/nvme1n1    246G   83M  234G   1% /opt/hashistack
/dev/nvme2n1    246G  7.3G  227G   4% /var/lib/docker
tmpfs           3.1G     0  3.1G   0% /run/user/1000
tmpfs           1.0M     0  1.0M   0% /opt/hashistack/nomad/alloc/ec0bfd07-d229-6713-d282-5bc20c364892/fabio/secrets
overlay         246G  7.3G  227G   4% /var/lib/docker/overlay2/2a0cc2f3ad6228c3b67dc3db31fa96a53d6647df799a6020981f681727218378/merged
shm              64M     0   64M   0% /var/lib/docker/containers/d9ca31436d1d5fa64123a92ebacfb027c7d5b6dd7ef46f4b06819495203e570c/mounts/shm
tmpfs           1.0M     0  1.0M   0% /opt/hashistack/nomad/alloc/29fd1646-3cd1-6b89-2e55-f770841c0d4d/redirect/secrets
overlay         246G  7.3G  227G   4% /var/lib/docker/overlay2/9e5254b71933c04196e2703b03fb235582f7abf3b663274e572a38fe974a5c77/merged
shm              64M     0   64M   0% /var/lib/docker/containers/fa46dc637041402c2da3250f6c6e85ef4fd3403a561254cd5625f32167f366d7/mounts/shm
root@ip-10-2-233-187:~# ls -1 /opt/hashistack/nomad/alloc/ | wc -l
42
root@ip-10-2-233-187:~# grep gc /opt/hashistack/nomad/conf.d/*
/opt/hashistack/nomad/conf.d/client.hcl:    gc_interval = "5m"
/opt/hashistack/nomad/conf.d/client.hcl:    gc_max_allocs = 600

As for the debug logs, I'm not sure what I'm looking for, but I don't see anything that tells me why it decided to force this collection.

Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.003680 [DEBUG] http: Request GET /v1/allocations?prefix=e7a6f05c (925.312µs)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.005695 [DEBUG] http: Request GET /v1/allocation/e7a6f05c-c770-1626-20d0-0db1d2791bed (1.098235ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.034602 [DEBUG] http: Request GET /v1/allocations?prefix=e7a6f05c (1.224081ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.037078 [DEBUG] http: Request GET /v1/allocation/e7a6f05c-c770-1626-20d0-0db1d2791bed (1.606743ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.069325 [DEBUG] http: Request GET /v1/allocations?prefix=e7a6f05c-c770-1626-20d0-0db1d2791bed (1.189478ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.071825 [DEBUG] http: Request GET /v1/allocation/e7a6f05c-c770-1626-20d0-0db1d2791bed (1.558742ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.103712 [DEBUG] http: Request GET /v1/allocations?prefix=e7a6f05c-c770-1626-20d0-0db1d2791bed (1.958686ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.105823 [DEBUG] http: Request GET /v1/allocation/e7a6f05c-c770-1626-20d0-0db1d2791bed (1.117806ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.137913 [DEBUG] http: Request GET /v1/allocations?prefix=e7a6f05c-c770-1626-20d0-0db1d2791bed (926.151µs)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.140360 [DEBUG] http: Request GET /v1/allocation/e7a6f05c-c770-1626-20d0-0db1d2791bed (1.498045ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.172380 [DEBUG] http: Request GET /v1/allocations?prefix=e7a6f05c-c770-1626-20d0-0db1d2791bed (1.226399ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.174837 [DEBUG] http: Request GET /v1/allocation/e7a6f05c-c770-1626-20d0-0db1d2791bed (1.49045ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.205521 [DEBUG] http: Request GET /v1/allocations?prefix=ec0bfd07 (1.018545ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.207414 [DEBUG] http: Request GET /v1/allocation/ec0bfd07-d229-6713-d282-5bc20c364892 (1.071525ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.236774 [DEBUG] http: Request GET /v1/allocations?prefix=ec0bfd07 (1.11573ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.238764 [DEBUG] http: Request GET /v1/allocation/ec0bfd07-d229-6713-d282-5bc20c364892 (1.210661ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.268730 [DEBUG] http: Request GET /v1/allocations?prefix=ec0bfd07 (987.695µs)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.272626 [DEBUG] http: Request GET /v1/allocation/ec0bfd07-d229-6713-d282-5bc20c364892 (2.998744ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.312814 [DEBUG] http: Request GET /v1/allocations?prefix=ec0bfd07-d229-6713-d282-5bc20c364892 (10.08835ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.315343 [DEBUG] http: Request GET /v1/allocation/ec0bfd07-d229-6713-d282-5bc20c364892 (1.712561ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.343898 [DEBUG] http: Request GET /v1/allocations?prefix=ec0bfd07-d229-6713-d282-5bc20c364892 (859.027µs)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.346204 [DEBUG] http: Request GET /v1/allocation/ec0bfd07-d229-6713-d282-5bc20c364892 (1.441897ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.374149 [DEBUG] http: Request GET /v1/allocations?prefix=ec0bfd07-d229-6713-d282-5bc20c364892 (1.034511ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.376433 [DEBUG] http: Request GET /v1/allocation/ec0bfd07-d229-6713-d282-5bc20c364892 (1.484758ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.406492 [DEBUG] http: Request GET /v1/allocations?prefix=ec0bfd07-d229-6713-d282-5bc20c364892 (1.058281ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.408615 [DEBUG] http: Request GET /v1/allocation/ec0bfd07-d229-6713-d282-5bc20c364892 (1.336813ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.438323 [DEBUG] http: Request GET /v1/allocations?prefix=f2937291 (933.751µs)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.440300 [DEBUG] http: Request GET /v1/allocation/f2937291-f9a0-abea-c368-dacfdbd1453d (1.129102ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.468672 [DEBUG] http: Request GET /v1/allocations?prefix=f2937291 (1.197681ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.470736 [DEBUG] http: Request GET /v1/allocation/f2937291-f9a0-abea-c368-dacfdbd1453d (1.139369ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.502658 [DEBUG] http: Request GET /v1/allocations?prefix=f2937291 (2.846407ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.512273 [DEBUG] http: Request GET /v1/allocation/f2937291-f9a0-abea-c368-dacfdbd1453d (8.654215ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.542595 [DEBUG] http: Request GET /v1/allocations?prefix=f2937291-f9a0-abea-c368-dacfdbd1453d (914.572µs)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.544621 [DEBUG] http: Request GET /v1/allocation/f2937291-f9a0-abea-c368-dacfdbd1453d (1.105978ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.574435 [DEBUG] http: Request GET /v1/allocations?prefix=f2937291-f9a0-abea-c368-dacfdbd1453d (956.429µs)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.577111 [DEBUG] http: Request GET /v1/allocation/f2937291-f9a0-abea-c368-dacfdbd1453d (1.845837ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.608363 [DEBUG] http: Request GET /v1/allocations?prefix=f2937291-f9a0-abea-c368-dacfdbd1453d (1.014081ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.610456 [DEBUG] http: Request GET /v1/allocation/f2937291-f9a0-abea-c368-dacfdbd1453d (1.139116ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.641008 [DEBUG] http: Request GET /v1/allocations?prefix=f2937291-f9a0-abea-c368-dacfdbd1453d (1.278214ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.643608 [DEBUG] http: Request GET /v1/allocation/f2937291-f9a0-abea-c368-dacfdbd1453d (1.706629ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.672222 [DEBUG] http: Request GET /v1/allocations?prefix=fbfdd042 (1.128754ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.674995 [DEBUG] http: Request GET /v1/allocation/fbfdd042-e250-3448-c078-06f59340d9b3 (1.958545ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.703533 [DEBUG] http: Request GET /v1/allocations?prefix=fbfdd042 (1.147387ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.706227 [DEBUG] http: Request GET /v1/allocation/fbfdd042-e250-3448-c078-06f59340d9b3 (1.863357ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.737875 [DEBUG] http: Request GET /v1/allocations?prefix=fbfdd042 (1.182825ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.740536 [DEBUG] http: Request GET /v1/allocation/fbfdd042-e250-3448-c078-06f59340d9b3 (1.881201ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.772625 [DEBUG] http: Request GET /v1/allocations?prefix=fbfdd042-e250-3448-c078-06f59340d9b3 (927.286µs)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.775515 [DEBUG] http: Request GET /v1/allocation/fbfdd042-e250-3448-c078-06f59340d9b3 (2.002904ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.804413 [DEBUG] http: Request GET /v1/allocations?prefix=fbfdd042-e250-3448-c078-06f59340d9b3 (1.185495ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.806603 [DEBUG] http: Request GET /v1/allocation/fbfdd042-e250-3448-c078-06f59340d9b3 (1.387149ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.836496 [DEBUG] http: Request GET /v1/allocations?prefix=fbfdd042-e250-3448-c078-06f59340d9b3 (854.769µs)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.839310 [DEBUG] http: Request GET /v1/allocation/fbfdd042-e250-3448-c078-06f59340d9b3 (1.913087ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.870275 [DEBUG] http: Request GET /v1/allocations?prefix=fbfdd042-e250-3448-c078-06f59340d9b3 (1.111855ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.872544 [DEBUG] http: Request GET /v1/allocation/fbfdd042-e250-3448-c078-06f59340d9b3 (1.436841ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.903173 [DEBUG] http: Request GET /v1/allocations?prefix=fe2645ab (1.198268ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.905163 [DEBUG] http: Request GET /v1/allocation/fe2645ab-f783-1cb8-5152-1ab41a15f15a (1.112841ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.934233 [DEBUG] http: Request GET /v1/allocations?prefix=fe2645ab (1.07565ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.936604 [DEBUG] http: Request GET /v1/allocation/fe2645ab-f783-1cb8-5152-1ab41a15f15a (1.439917ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.966999 [DEBUG] http: Request GET /v1/allocations?prefix=fe2645ab (1.141737ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.969382 [DEBUG] http: Request GET /v1/allocation/fe2645ab-f783-1cb8-5152-1ab41a15f15a (1.514451ms)
Jul 11 04:31:13 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:13.999045 [DEBUG] http: Request GET /v1/allocations?prefix=fe2645ab-f783-1cb8-5152-1ab41a15f15a (1.088026ms)
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.001469 [DEBUG] http: Request GET /v1/allocation/fe2645ab-f783-1cb8-5152-1ab41a15f15a (1.542787ms)
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.031683 [DEBUG] http: Request GET /v1/allocations?prefix=fe2645ab-f783-1cb8-5152-1ab41a15f15a (929.361µs)
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.033685 [DEBUG] http: Request GET /v1/allocation/fe2645ab-f783-1cb8-5152-1ab41a15f15a (1.08413ms)
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.064591 [DEBUG] http: Request GET /v1/allocations?prefix=fe2645ab-f783-1cb8-5152-1ab41a15f15a (1.229859ms)
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.067155 [DEBUG] http: Request GET /v1/allocation/fe2645ab-f783-1cb8-5152-1ab41a15f15a (1.589927ms)
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.097635 [DEBUG] http: Request GET /v1/allocations?prefix=fe2645ab-f783-1cb8-5152-1ab41a15f15a (1.188303ms)
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.099967 [DEBUG] http: Request GET /v1/allocation/fe2645ab-f783-1cb8-5152-1ab41a15f15a (1.464539ms)
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.240560 [DEBUG] client: updated allocations at index 5172654 (total 42) (pulled 14) (filtered 28)
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.240681 [DEBUG] client: allocs: (added 0) (removed 12) (updated 14) (ignore 28)
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.240841 [INFO] client.gc: garbage collecting allocation fbfdd042-e250-3448-c078-06f59340d9b3 due to forced collection
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.240847 [INFO] client.gc: garbage collecting allocation 7b6a3448-544f-7a2b-0a6c-9b1f21256351 due to forced collection
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.240864 [INFO] client.gc: garbage collecting allocation c34340bd-9dbc-ecdc-55b1-4e160eaf0022 due to forced collection
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.240869 [INFO] client.gc: garbage collecting allocation 1ccf20d2-53e8-6b25-d869-cc3b84bf746c due to forced collection
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.240913 [INFO] client.gc: garbage collecting allocation 6be5c421-90d9-6be0-ea92-654a46ce2b98 due to forced collection
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.240940 [INFO] client.gc: garbage collecting allocation 38fe940b-f8fd-35aa-48a6-c784bb485134 due to forced collection
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.240953 [INFO] client.gc: garbage collecting allocation 2cc0e2f5-8385-49d1-2f93-2d69ac9531e3 due to forced collection
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.240965 [DEBUG] client: dropping update to terminal alloc 'f2937291-f9a0-abea-c368-dacfdbd1453d'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.240967 [INFO] client.gc: garbage collecting allocation a3efda3b-f57c-2b1f-2b87-0497ce3a27b6 due to forced collection
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.240975 [INFO] client.gc: garbage collecting allocation a9a144c0-1d30-30c0-0e71-a0d72f3d8e04 due to forced collection
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.240980 [DEBUG] client: dropping update to terminal alloc 'a2c04f69-1824-98f1-6b81-67d1f5b3b96b'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.240997 [INFO] client.gc: garbage collecting allocation 9c522d98-eca1-d450-d002-d74e33dace1b due to forced collection
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.241009 [DEBUG] client: dropping update to terminal alloc 'ba14d8c9-1c8e-b122-e2fb-76b320ef40da'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.241007 [INFO] client.gc: garbage collecting allocation 07af1a30-ac9f-0b89-2342-ef50b5093f73 due to forced collection
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.241019 [DEBUG] client: dropping update to terminal alloc 'e59bdf2f-7961-3f82-5f59-8bc1fbcee7ed'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.241026 [DEBUG] client: dropping update to terminal alloc 'e7a6f05c-c770-1626-20d0-0db1d2791bed'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.241032 [DEBUG] client: dropping update to terminal alloc '8987e2ef-8654-c928-a482-e2892b2065f3'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.241038 [DEBUG] client: dropping update to terminal alloc '69e3c84b-2821-78bd-0aea-3a391adf585a'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.241013 [INFO] client.gc: garbage collecting allocation 36c5e823-7fca-98db-a116-e0c674ff43b0 due to forced collection
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.241018 [DEBUG] client: dropping update to terminal alloc 'bfbbb486-513f-f242-288b-25601770c20b'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.241015 [DEBUG] client: dropping update to terminal alloc '6823974e-8fb3-593c-e7d4-a5a8901e9479'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.241023 [DEBUG] client: dropping update to terminal alloc '9d9fb35d-a163-0a3d-780a-bb5cf78afbb6'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.241025 [DEBUG] client: dropping update to terminal alloc 'b41dc687-655c-1fbc-80f3-bbdff8cc6f76'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.241031 [DEBUG] client: dropping update to terminal alloc 'ae6c1153-6991-d3f7-5c5c-d1fb1efefc9b'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.241026 [DEBUG] client: dropping update to terminal alloc '83dcc035-ac77-8079-939f-6618cb038805'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.241035 [DEBUG] client: dropping update to terminal alloc 'bc9e71bd-0b16-fd50-4e22-dd47c15ffdfb'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.242392 [DEBUG] client: terminating runner for alloc 'fbfdd042-e250-3448-c078-06f59340d9b3'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.242417 [DEBUG] client.gc: garbage collected "fbfdd042-e250-3448-c078-06f59340d9b3"
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.243504 [DEBUG] client: terminating runner for alloc '7b6a3448-544f-7a2b-0a6c-9b1f21256351'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.243535 [DEBUG] client.gc: garbage collected "7b6a3448-544f-7a2b-0a6c-9b1f21256351"
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.244859 [DEBUG] client: terminating runner for alloc 'c34340bd-9dbc-ecdc-55b1-4e160eaf0022'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.244894 [DEBUG] client.gc: garbage collected "c34340bd-9dbc-ecdc-55b1-4e160eaf0022"
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.247478 [DEBUG] client: terminating runner for alloc '6be5c421-90d9-6be0-ea92-654a46ce2b98'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.247512 [DEBUG] client.gc: garbage collected "6be5c421-90d9-6be0-ea92-654a46ce2b98"
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.248741 [DEBUG] client: terminating runner for alloc '1ccf20d2-53e8-6b25-d869-cc3b84bf746c'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.248766 [DEBUG] client.gc: garbage collected "1ccf20d2-53e8-6b25-d869-cc3b84bf746c"
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.249788 [DEBUG] client: terminating runner for alloc '38fe940b-f8fd-35aa-48a6-c784bb485134'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.249818 [DEBUG] client.gc: garbage collected "38fe940b-f8fd-35aa-48a6-c784bb485134"
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.250917 [DEBUG] client: terminating runner for alloc '2cc0e2f5-8385-49d1-2f93-2d69ac9531e3'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.250943 [DEBUG] client.gc: garbage collected "2cc0e2f5-8385-49d1-2f93-2d69ac9531e3"
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.251897 [DEBUG] client: terminating runner for alloc 'a3efda3b-f57c-2b1f-2b87-0497ce3a27b6'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.251923 [DEBUG] client.gc: garbage collected "a3efda3b-f57c-2b1f-2b87-0497ce3a27b6"
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.253027 [DEBUG] client: terminating runner for alloc 'a9a144c0-1d30-30c0-0e71-a0d72f3d8e04'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.253051 [DEBUG] client.gc: garbage collected "a9a144c0-1d30-30c0-0e71-a0d72f3d8e04"
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.254056 [DEBUG] client: terminating runner for alloc '9c522d98-eca1-d450-d002-d74e33dace1b'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.254075 [DEBUG] client.gc: garbage collected "9c522d98-eca1-d450-d002-d74e33dace1b"
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.255124 [DEBUG] client: terminating runner for alloc '07af1a30-ac9f-0b89-2342-ef50b5093f73'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.255167 [DEBUG] client.gc: garbage collected "07af1a30-ac9f-0b89-2342-ef50b5093f73"
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.258133 [DEBUG] client: terminating runner for alloc '36c5e823-7fca-98db-a116-e0c674ff43b0'
Jul 11 04:31:14 ip-10-2-233-187 nomad[15096]:     2018/07/11 04:31:14.258167 [DEBUG] client.gc: garbage collected "36c5e823-7fca-98db-a116-e0c674ff43b0"

@schmichael
Copy link
Member

Hi @holtwilkins,

Nomad servers periodically GC dead jobs and force GC their allocations on their client nodes. You can configure that interval on servers with the job_gc_threshold parameter. It defaults to 4 hours.

@holtwilkins
Copy link
Contributor

Cool, I'll try bumping that one too. I guess even more reason it would be nice to have better GC logging in Nomad :).

@tgross tgross added this to Needs Roadmapping in Nomad - Community Issues Triage Feb 12, 2021
@tgross tgross removed this from Needs Roadmapping in Nomad - Community Issues Triage Mar 4, 2021
@tgross tgross added the theme/docs Documentation issues and enhancements label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/client theme/docs Documentation issues and enhancements type/enhancement
Projects
None yet
Development

No branches or pull requests

4 participants