-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jk/cumulus 3940 18.5.x #3877
Jk/cumulus 3940 18.5.x #3877
Conversation
This commit updates the following: - esClient is properly passed through api lambda/lib methods such that write granules calls from process-s3-dead-letter-archive can pass in an instance of EsClient rather than relying on default per-granule object/client behavior - The API endpoint and related code are updated such that maxDbPool, concurrency and batchSize are exposed as endpoint options, allowing user customization of tuning behavior for the DLA recovery tool - Minor typing/call fixes
This commit updates: - archive/cumulus/example to pass through memory configuration options to the fargate task definition
Updating to needs review - tests have passed manually, CI is having unrelated issues. |
} | ||
)); | ||
}), | ||
{ concurrency }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def appreciate a clear limit on concurrency
packages/api/performance_tests.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not that this can't be run as a shell script, but why can't this performance test be done purely in js, for consistency and understandability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for thinking about that - this part of the PR is entirely intended to be a starting point.
A couple of reasons, although the approach is certainly something we can dive into, I'm not particularly advocating for this setup:
- It works (a super weak reason, to be sure!)
- Most of our test scaffolding in CI and package.json is written using bash/shell
- As part of my attempts to profile the tests via AVA it became clear AVA itself isn't a fan of being included/directly invoked (they have an error throw in a couple of inconvenient spots if you try to hack around that), making that sort of scripting wind up being invoking node to... invoke node (via ava) which without wrap logic winds up being inefficient and YAGNI. I'm entirely open to suggestions, particularly with respect to future think on what we want to do to start a better pattern here
- It involves the least amount of backtracking if we change how we're doing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking through this in more detail: we're adding some more intelligent parameter typing, we're bounding the processing to a limited concurrency (and associated resource limit in the DB) and testing that fact. all of this looks good and well though through w/r to the encountered issue
Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com>
Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com>
@jennyhliu thanks very much for the review 🙇🏻 - I believe I addressed the outstanding concerns/comments and adjusted the docs as well. @etcart / @jennyhliu Take a look and let me know what you think. |
@jennyhliu I believe 33be27a and 7412bee should address the issue you were seeing in test. Thanks for taking a look, let me know what you think! 🙇🏻 |
Updated code in this module doesn't significantly impact test coverage, other than increasing the denominator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work👍 thank you!
* Update message recovery/granule write logic to properly use esClient This commit updates the following: - esClient is properly passed through api lambda/lib methods such that write granules calls from process-s3-dead-letter-archive can pass in an instance of EsClient rather than relying on default per-granule object/client behavior - The API endpoint and related code are updated such that maxDbPool, concurrency and batchSize are exposed as endpoint options, allowing user customization of tuning behavior for the DLA recovery tool - Minor typing/call fixes * Update Core to allow DLA recovery configuration This commit updates: - archive/cumulus/example to pass through memory configuration options to the fargate task definition * Add api performance test * Update docs/changelog * Update CHANGELOG and documentation * Update CHANGELOG * Fix linting * Fix units * Update dead letter archive feature doc * Update test spec * Update logging, make perf test script executable * Fix broken package.json ava exclusion configuration * Add zod parsing to dead letter endpoint * Update tf-modules/archive/async_operation.tf Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * Update tf-modules/archive/async_operation.tf Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * Address db pool configuration concern in PR * Update env config passthroughs/make log/docs consistent * Update tf-modules/archive/async_operation.tf Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * Update tf-modules/archive/async_operation.tf Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * Update per PR suggestion * Update concurrency defaults for consistency * Update startAsyncOperations to allow for optional container names * Update dead letter archive endpoint to specify new container name * Update API defaults/units to 30 to match system defaults * Fix defaults for endpoint tests * Add changed params to demonstrate payload handling * Updarte coverage metric Updated code in this module doesn't significantly impact test coverage, other than increasing the denominator. * fixup * Update performance tests to match documented defaults --------- Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com>
* Jk/cumulus 3940 18.5.x (#3877) * Update message recovery/granule write logic to properly use esClient This commit updates the following: - esClient is properly passed through api lambda/lib methods such that write granules calls from process-s3-dead-letter-archive can pass in an instance of EsClient rather than relying on default per-granule object/client behavior - The API endpoint and related code are updated such that maxDbPool, concurrency and batchSize are exposed as endpoint options, allowing user customization of tuning behavior for the DLA recovery tool - Minor typing/call fixes * Update Core to allow DLA recovery configuration This commit updates: - archive/cumulus/example to pass through memory configuration options to the fargate task definition * Add api performance test * Update docs/changelog * Update CHANGELOG and documentation * Update CHANGELOG * Fix linting * Fix units * Update dead letter archive feature doc * Update test spec * Update logging, make perf test script executable * Fix broken package.json ava exclusion configuration * Add zod parsing to dead letter endpoint * Update tf-modules/archive/async_operation.tf Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * Update tf-modules/archive/async_operation.tf Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * Address db pool configuration concern in PR * Update env config passthroughs/make log/docs consistent * Update tf-modules/archive/async_operation.tf Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * Update tf-modules/archive/async_operation.tf Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * Update per PR suggestion * Update concurrency defaults for consistency * Update startAsyncOperations to allow for optional container names * Update dead letter archive endpoint to specify new container name * Update API defaults/units to 30 to match system defaults * Fix defaults for endpoint tests * Add changed params to demonstrate payload handling * Updarte coverage metric Updated code in this module doesn't significantly impact test coverage, other than increasing the denominator. * fixup * Update performance tests to match documented defaults --------- Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * Update docs to add variables.tf link to default values for new config options * Minor/formating edit * Fix bad merge/remove invalid jsdoc param * Minor edit/add space to variables file --------- Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com>
* slightly misnamed * adding api-client tests + removing extraneous changes * naming fix * update to id name * small indexer change * pmap concurrency limited * reverting some changes * reduce memory size to see if it needs to scale down to fit * redefinition of workflow with a usable payload in * json typo * see if this is a helpful debug output * WIP * Jk/cumulus 3940 to main from #3877 (#3877) (#3890) * Jk/cumulus 3940 18.5.x (#3877) * Update message recovery/granule write logic to properly use esClient This commit updates the following: - esClient is properly passed through api lambda/lib methods such that write granules calls from process-s3-dead-letter-archive can pass in an instance of EsClient rather than relying on default per-granule object/client behavior - The API endpoint and related code are updated such that maxDbPool, concurrency and batchSize are exposed as endpoint options, allowing user customization of tuning behavior for the DLA recovery tool - Minor typing/call fixes * Update Core to allow DLA recovery configuration This commit updates: - archive/cumulus/example to pass through memory configuration options to the fargate task definition * Add api performance test * Update docs/changelog * Update CHANGELOG and documentation * Update CHANGELOG * Fix linting * Fix units * Update dead letter archive feature doc * Update test spec * Update logging, make perf test script executable * Fix broken package.json ava exclusion configuration * Add zod parsing to dead letter endpoint * Update tf-modules/archive/async_operation.tf Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * Update tf-modules/archive/async_operation.tf Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * Address db pool configuration concern in PR * Update env config passthroughs/make log/docs consistent * Update tf-modules/archive/async_operation.tf Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * Update tf-modules/archive/async_operation.tf Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * Update per PR suggestion * Update concurrency defaults for consistency * Update startAsyncOperations to allow for optional container names * Update dead letter archive endpoint to specify new container name * Update API defaults/units to 30 to match system defaults * Fix defaults for endpoint tests * Add changed params to demonstrate payload handling * Updarte coverage metric Updated code in this module doesn't significantly impact test coverage, other than increasing the denominator. * fixup * Update performance tests to match documented defaults --------- Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * Update docs to add variables.tf link to default values for new config options * Minor/formating edit * Fix bad merge/remove invalid jsdoc param * Minor edit/add space to variables file --------- Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * adding changes from code review * removing parent from esUpdate * commenting out some es temp. + some PR feedback for formatting * commenting out more ES * adding back in ES update after some changes * changes to get ES working + linting * chaging back from 2 to 50 in tests * PR feedback (zod) + fixing concurrency * decreasing num granules 50 -> 25 for test * removing res.send for batchPatch * adding back in res.send + lowering numbers * adding console.log for test * added more error logging * removing all console.log * bumping concurrency * lowering nums * WIP moving to some standard data for ease of maintenance * refactor of workflow spec to use common functions * PR feedback (1/6/25) * bumping up numGranules 5 -> 25 * bumping numGranules 25 -> 50 * correcting names * small formatting fix * fixing linting * CUMULUS-3965:Update launch templates to require IMDSv2 (#3894) * integration finally works * try these memory reservation * reduce memory footprints * corrected versioning * missed changes meant to rbing in * PR feedback minus (adding tests to db/lib + tests/endpoints) * missing funcs * get webpack in * import fix for test-granules endpoints test * PR feedback * PR feedback part-1 1/9/2025 * add doc inline * maybe too much mocking * de-nesting try's * much better mocking * PR feedback - remaining * fixing some jsdocs + PR feedback * WIP dependency cycles and cleanup in tests * fixing lint errors * changelog * adding CHANGELOG linking * removing extraneous string * fix to error checking from aws latest * typo fix in error check * reverting AWS changes in anticipation for cherry-pick and fixing a couple docstrings * correct new name for endpoints and functions * cleanup leftover api endpoitn from previous format * some typing cleanup * correct api return values and needed data file for int tests * api-client should expect 200 from test-granules.js * fix changelog and tasks.md extra entries * testing for ummg and iso * Update CHANGELOG.md Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * WIP still figuring out the right way to update the cmr file * move to using api to get granule data from ID * shift tests to using granuleId as input * test files have granule id payload instead of granule payload * sourceCollection in code and tests to allow update with details before collection * small format fix * config and linter errors * more linting * fix input configuration * double check my inputs * let me make sure the config is right * CUMULUS-3967 -- Forward port of release fix -- Pin @aws-sdk/client-s3 to 3.726 due to unit failure in > 3.729.0 (#3905) * Pin @aws-sdk/client-s3 to 3.726 due to unit failure in > 3.729.0 * Fixup * Fix dependency concern * Update CHANGELOG * think this needs to be corected * remove unnecessary eslints * moving to non-hardcoded buckets * CL for memory reservation * remove gitignore from task * appears to be workign bu tstill need to test cmr udates * remove unneeded import o CMRFile * accidentally commited mangled ts output * move in s3 functionality * name scheme change over to s3 specific * removing iso references along with linting and more naming * typo in copypasta * testing and fixing some issues encountered * fxed lambda integration. fixed naming in workflow * linter fixes * name cleanup and point to module.ingest * typo in config names * fixed naming in workflow * should get taks from module.cumulus * right name * small cleanyups for review * corrected configuration * getting cmrURLType and distribution url configured * cleaning up collections to name/version collection basebones * linting * add in tests for url replacement behavior * fix pathing in package * remove es from testing * how are there still more of these typos to find? * full resources by default * didn't mean to commit htat here * Update CHANGELOG.md Co-authored-by: Jonathan Kovarik <Jkovarik@users.noreply.github.com> * distribution_endpoint is not a config * fixing changelog entries * type isntead of interface * Update packages/aws-client/src/S3.ts Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * use copyObject low case * getting config values right * but *what* is undefined * shift to correctly using "granuleIds" input * update workflow definition and call * remove unnecessary etag return * migration of some code to cmr-utils * adding and updating tests * typo in asl * Move 'move_granule_collections_workflow' from example to /tf-modules/workflow (#3913) * move workflow over * fix s3 spec * WIP updating to use ingested granule * two typo fixes * 0one int test using ingest Granule to stage * let me check better what the outputs are * get this better * move integration tests to ingested "true" granule * I slightly hate this solution but * linting and naming * cleanup and cmr-utils nyc fix * fix isObject * expanded tests and nycconfig * linter fixes * fixes to cmr collection parsing and testing * debug log for weird collection version * Revert "debug log for weird collection version" This reverts commit 4bcd9f4. * corrected naming in worklfow spec * fixed int test cleanup * linting and int test cleanup * output config * cmrjs test fixes * smol test data cleanup * remove some unnedded tf changes * one more memory reversion * docstrings and ts-check * you can't fix all the typing in one pr * simplify the s3CredsEndpoint weirdness * pared down dependencies * more fix to dependencies * cleanup type hints in ts file * further type cleanup * cleanup some typing in S3.ts * mistake in typing changes * removing recursive cmr data search relying instead on spec * linting fixes * move from alter in place to copy and alter behavior * fix some of the Object typings * typing distributionBucketMap * some linting * descriptive docstrings * fixing some things in cmr-utils * coverage lowered with the removal of covered code * Update tasks/change-granule-collection-s3/src/index.ts Co-authored-by: Jonathan Kovarik <Jkovarik@users.noreply.github.com> * Update tasks/change-granule-collection-s3/src/index.ts Co-authored-by: Jonathan Kovarik <Jkovarik@users.noreply.github.com> * validating file collisions or not using checksums * retry behavior * remove unnecessary async/await * improved import structure/blocking * checking cmr data for belonging in correct collection in ase of a collision * shift to { [granuleId: Object] } clarity * DRY up the omit validApiFIle * move type declarations to separate file * linting * ditch proxyquire in test * Update tasks/change-granule-collection-s3/src/index.ts Co-authored-by: Jonathan Kovarik <Jkovarik@users.noreply.github.com> * more informative error message * invalidBehavior to invalidGranuleBehavior * simplify config values setup * concurrency back on * pr feedback * some pr updates * make sure metadata still gets updated even if its the same file * some formatting cleanup * added tests and refactor metadata vs regular file move needed * more PR feedback file validation, expanded testing, and linting * small doc change * cleaning up some tests * check testing copy vs move object in s3 * say "set" instead of "update" * tests cleanup and remove sns topic * linting fixes * Update ChangeGranuleCollectionS3 integration test to not require ingest workflow (#3918) * linting fixes * lowered coverage threshold * camelcase * handle undefined files * fix reduce in tests * aand linting * update internal workflow deployment scheme * update readme with correct configuration and output * fix Granule nesting questions in cmr xml and copy cmr metadata unless collision * pretries at bottom level * better return typing * improved debug outputs for bad file (no more parfing entire granules out) * check this out later * test for metadata integrity * fixes to undefined stuff * ok I may have been a little overzealous in cutting docstring params * test that files are not mixed up as they move * small typo * mistaken naming in tf variables * lets use a different id here just in case * tryna figure out how I get to fake processing with no granules * needs to calc filename not throw * ok, this needed to be folowed up on * fix fileName massage * linting fixes * clear synchronous callable gathering for pmap * lint fixes * Update tasks/change-granule-collection-s3/src/index.ts Co-authored-by: Jonathan Kovarik <Jkovarik@users.noreply.github.com> * cmr file update broken into two functions (#3919) * fix pipes in readme * cmr edge cases and tests * Add pRetry logging to all pRetry calls (#3920) * Add pRetry logging to all pRetry calls * Add optional chaining to logstrings * improved coverage * maybe 200 granules is unnecessarily slow * Fix broken integration test * Fix integration cleanup --------- Co-authored-by: Naga Nages <nnagavenkata@yahoo.com> Co-authored-by: Jonathan Kovarik <Jkovarik@users.noreply.github.com> Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com>
Summary: Summary of changes
Addresses CUMULUS-9340
Reviewers - please take a look at report linked in JIRA issue for more context.
Changes
Changed
cumulus
andarchive
module configuration to allow configuration of the dead_letter_recovery_operation task definition to better allow configuration of the tool's operating environment.batchSize
,concurrency
andmaxDbPool
options to /endpoints/recoverCumulusMessage (note these values are correct at time of this release only):batchSize
- specifies how many DLA objects to read from S3 and hold in memory. Defaults to 1000.concurrency
- specifies how many messages to process at the same time. Defaults to 30.maxDbPool
- specifies how many database connections to allow the process to utilize. Defaults to 50, value should target (concurrency
* expected granules per message)Fixed
process-s3-dead-letter-archive
and downstream calls to pass in a esClient towriteRecordsFunction
and update downstream calls to utilize the client.PR Checklist