Backport of snapshot restore-from-archive streaming and filtering into release/1.3.x #14243

hc-github-team-nomad-core · 2022-08-23T18:30:17Z

Backport

This PR is auto-generated from #13658 to be assessed for backporting due to the inclusion of the label backport/1.3.x.

The below text is copied from the body of the original PR.

This changeset implements two improvements to restoring FSM snapshots from archives:

The existing implementation decompresses the archive to a temporary file before reading it in to the FSM. For large snapshots this performs a lot of disk IO. Stream decompress the snapshot as we read it, without first writing to a temporary file. This also moves some of the work to a second core.
Add bexpr filters to the RestoreFromArchive helper. The operator can pass these as -filter arguments to nomad operator snapshot state (and other commands in the future) to include only desired data when reading the snapshot.

Deferred for this PR: the nomad operator snapshot state command still has to load everything that's been filtered into the FSM before writing it out to a large JSON blob. We should provide a tool that streams the decoded objects directly to an encoder without loading into the FSM, so that we can emit NDJSON, write out to a sqlite DB, etc.

Example:

Starting with a 439MB snapshot (~13GiB uncompressed), I want to filter for all objects associated with 3 different jobs and 3 different nodes:

time nomad operator snapshot state -filter '
    JobID == "job1" or
    JobID == "job2" or
    JobID == "job3" or
    NodeID == "3b3471d7-c519-8e3c-d7fd-dc692ca44744" or
    NodeID == "455775de-b4b4-0cb6-75eb-6c534618a005" or
    NodeID == "0d8e2a62-2712-cb4c-fb15-9831fdac57fe" or
    ID == "job1" or
    ID == "job2" or
    ID == "job3" or
    ID == "3b3471d7-c519-8e3c-d7fd-dc692ca44744" or
    ID == "455775de-b4b4-0cb6-75eb-6c534618a005" or
    ID == "0d8e2a62-2712-cb4c-fb15-9831fdac57fe"
' \
      ./nomad_operator_snapshot_save_2022_05_12_1543-0700.snap \
      > filtered-state.json

real    24m15.805s
user    29m55.036s
sys     7m50.618s

$ cat filtered-state.json| jq '.Allocs | length'
5490
$ cat filtered-state.json| jq '.Evals | length'
667

Previously this would write ~13GiB to disk, read 14GiB from disk, and saturate 1 core for over an hour before running out of memory on my machine (16GiB) and crashing.

With this change, the command reads ~450MiB from disk, only writes the 197MiB JSON blob to disk, and uses about 150% CPU, maxing out memory usage around 330MB.

github-actions · 2022-12-24T02:11:50Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

tgross added 4 commits July 8, 2022 20:16

backport of commit f7d7a13

ddc1a3f

backport of commit bc12749

557a432

backport of commit 5730498

e2031d0

backport of commit c34cb8e

22a8f7e

hc-github-team-nomad-core force-pushed the backport/snapshot-restore-filter/indirectly-oriented-hamster branch from e853704 to 22a8f7e Compare August 23, 2022 18:30

hc-github-team-nomad-core merged commit 52879c4 into release/1.3.x Aug 23, 2022

hc-github-team-nomad-core deleted the backport/snapshot-restore-filter/indirectly-oriented-hamster branch August 23, 2022 18:30

vercel bot deployed to Preview – nomad-storybook-and-ui August 23, 2022 18:42 View deployment

github-actions bot locked as resolved and limited conversation to collaborators Dec 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport of snapshot restore-from-archive streaming and filtering into release/1.3.x #14243

Backport of snapshot restore-from-archive streaming and filtering into release/1.3.x #14243

hc-github-team-nomad-core commented Aug 23, 2022

github-actions bot commented Dec 24, 2022

Backport of snapshot restore-from-archive streaming and filtering into release/1.3.x #14243

Backport of snapshot restore-from-archive streaming and filtering into release/1.3.x #14243

Conversation

hc-github-team-nomad-core commented Aug 23, 2022

Backport

github-actions bot commented Dec 24, 2022