Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

39 - implement ancestry chunking memory optimization #93

Merged
merged 30 commits into from
Jan 16, 2024

Conversation

alexdunnjpl
Copy link
Contributor

@alexdunnjpl alexdunnjpl commented Jan 12, 2024

🗒️ Summary

Implements a new chunked processing paradigm for ancestry sweeper to reduce provisioned memory requirement at the cost of some runtime speed and the need to provision enough disk space to perform swapping. The disk dump avoids the need to hold the full ancestry history in memory to generate AncestryRecords.

Performance tuning is available via env vars ANCESTRY_NONAGGREGATE_QUERY_PAGE_SIZE (default 2000) and ANCESTRY_DISK_DUMP_MEMORY_THRESHOLD (expressed in percent, default 80). Further details are available in ancestry.runtimeconstants.py

Performance test on en-prod resulted in ability to drop from 1vCPU and 4GB memory to 0.5vCPU and 1GB memory (cost savings of 42%), at the cost of 31% increased runtime. Proportional benefit is expected to be significantly higher for nodes with greater quantities of data as higher memory allocations require more vCPUs, all except one of which are wasted, though this will require confirmation. This also, critically, removes the inability to scale to large quantities of products due to finite ability to increase provisioned RAM.

⚙️ Test Data and/or Report

Regression tests augmented/updated/passed. Functional tests implemented for new dump/merge behaviour to ensure production of correct update content. Manually tested against en-prod with a dry-run and shown to produce equal update count to pre-PR version. Performance-tested against en-prod.

N.B. Docs are failing due to sphinx issue (locally and in github actions) not anymore

♻️ Related Issues

fixes #39

Ops notes

I switched ECS task definitions to use tag stable rather than latest, thinking that latest would automatically reference the last-pushed image (and we don't want a dev pushing a development-related tag and causing all tasks to use that by default), but this appears not to be the case. Is latest a manual tag rather than something auto-generated? If so, task definitions should be switched back to using latest. @sjoshi-jpl @jordanpadams ?

@sjoshi-jpl once this is deployed to ECR, I'd recommend the following approach to tune provisioned memory/vCPUs

  • provision 200GB ephemeral ECS storage
  • cut vCPUs to 1, and memory to the lesser of current-value/2 and 8GB (max permitted for 1vCPU). For very-small nodes, 0.25-0.5vCPUs may be warranted
  • run sweeper once - if it crashes, set env var ANCESTRY_DISK_DUMP_MEMORY_THRESHOLD=60 and try again (you should not have to cut any further)
  • when complete, look for log line like 22:43:38,074::pds.registrysweepers.ancestry.generation::INFO::On-disk swap comprised of 3 files totalling 0.3GB
  • choose provisioned ephemeral ECS storage value according to the total size logged(maybe current*2?)

If you're able to benchmark provisioned memory/vCPUs before/after and runtime before/after, awesome, but I understand that's annoying work.

now covers case where a collection with no shared member refs exists
only applied to some queries, thus far.
initial test suggests it may take up to 20% longer, but this may not be the case where pagination is deep
missing stubs failure persists despite installation of psutil library stubs
this may be causing OOM condition where it should not occur
this is due to OOM exit on en-prod - either the estimation is off or the task containers are subject to a <100% kill threshold
Copy link
Member

@nutjob4life nutjob4life left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi folks. This looks all right and I've approved this PR.

However, I'm always a little suspicious when a process is asking about its own virtual memory usage and is writing things to disk itself. This comes from the whole Squid vs Varnish days, when memory-mapping a giant file into memory turned out to be the smarter solution (the Varnish solution), because the operating system itself will always know how to manage a process' memory better than the process ever could (since its own logic would either be waking up pages that were already swapped out and/or would duplicate OS logic).

However, I do recognize that AWS changes constraints a bit. I sort of wish I understood the problem a bit better.

In any case, you got my 👍

src/pds/registrysweepers/provenance/__init__.py Outdated Show resolved Hide resolved
src/pds/registrysweepers/repairkit/__init__.py Outdated Show resolved Hide resolved
src/pds/registrysweepers/repairkit/allarrays.py Outdated Show resolved Hide resolved
@alexdunnjpl
Copy link
Contributor Author

@nutjob4life firm agreement in principle.

I sort of wish I understood the problem a bit better.

If you're happy to sanity-check, the initial problem is that we need to take a large set of collection-level documents containing references to all non-aggregate products present in that collection, and create nonaggregate-level records containing

  • all parent collections of that non-agg
  • all parent bundles of those collections

This requires (naively) keeping the set of all non-agg records in memory from the start of the collection-level document iteration, to ensure that a non-agg's history is complete before a db update is sent off to the db (since reading, appending, then updating requires an infeasible quantity of db calls).

So my solution was to say "okay, let's write the current page of non-agg history to disk when it gets too large, then we can merge the histories together later in a way which doesn't require holding it all in-memory. There is now additional motivation to use as much memory as possible, since it's a bottleneck for execution time now, in addition to costing more to provision if the memory use is unnecessarily peaky.

The problem, then, is that a large proportion of the memory demand (let's say 100%, for simplicity's sake) is due to the need to take dict-like chunks of size S and then perform

  1. designate a chunk "active"
  2. for each other chunk, remove/merge its values to the active chunk for all keys which are present in the active chunk
  3. now that the active chunk is known to contain complete values for all its keys, send it off to the db as part of a _bulk write
  4. rinse and repeat for remaining chunks

Ideally, the amount of memory required should be ~2S, since two pages are loaded simultaneously.

What ends up happening is that additional data is accumulated into the active chunk, resulting in (2S + merged_data) memory use... my solution was to split the chunks in half when they got larger than the largest non-merged chunk. Kinda brainless, but it works.

Finally, the manual garbage collection. At this point, the memory usage spikes to ~3S for a split second right as a new inactive chunk is loaded from disk for merging. I think what is happening is that the active and previous-inactive chunks are loaded (2S), then the third chunk starts loading before the GC releases the now-unneeded previous-inactive chunk. Manually calling del then gc.collect() successfully prevents this by triggering the release as a blocking call prior to loading the next chunk from disk.

Possibly my understanding of what's going on is flawed. Almost-certainly there's a better solution for this, I'm just not aware of such.

@alexdunnjpl alexdunnjpl merged commit f8ba507 into main Jan 16, 2024
2 checks passed
@alexdunnjpl alexdunnjpl deleted the ancestry-chunking-memory-optimization branch January 16, 2024 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Profile memory usage
2 participants