-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ecarton/cumulus 3751 s3 task #3910
base: feature/CUMULUS-3751
Are you sure you want to change the base?
Conversation
* Jk/cumulus 3940 18.5.x (#3877) * Update message recovery/granule write logic to properly use esClient This commit updates the following: - esClient is properly passed through api lambda/lib methods such that write granules calls from process-s3-dead-letter-archive can pass in an instance of EsClient rather than relying on default per-granule object/client behavior - The API endpoint and related code are updated such that maxDbPool, concurrency and batchSize are exposed as endpoint options, allowing user customization of tuning behavior for the DLA recovery tool - Minor typing/call fixes * Update Core to allow DLA recovery configuration This commit updates: - archive/cumulus/example to pass through memory configuration options to the fargate task definition * Add api performance test * Update docs/changelog * Update CHANGELOG and documentation * Update CHANGELOG * Fix linting * Fix units * Update dead letter archive feature doc * Update test spec * Update logging, make perf test script executable * Fix broken package.json ava exclusion configuration * Add zod parsing to dead letter endpoint * Update tf-modules/archive/async_operation.tf Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * Update tf-modules/archive/async_operation.tf Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * Address db pool configuration concern in PR * Update env config passthroughs/make log/docs consistent * Update tf-modules/archive/async_operation.tf Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * Update tf-modules/archive/async_operation.tf Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * Update per PR suggestion * Update concurrency defaults for consistency * Update startAsyncOperations to allow for optional container names * Update dead letter archive endpoint to specify new container name * Update API defaults/units to 30 to match system defaults * Fix defaults for endpoint tests * Add changed params to demonstrate payload handling * Updarte coverage metric Updated code in this module doesn't significantly impact test coverage, other than increasing the denominator. * fixup * Update performance tests to match documented defaults --------- Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com> * Update docs to add variables.tf link to default values for new config options * Minor/formating edit * Fix bad merge/remove invalid jsdoc param * Minor edit/add space to variables file --------- Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com>
tasks/change-granule-collection-s3/src/update_cmr_file_collection.ts
Outdated
Show resolved
Hide resolved
cmrObjects: { [granuleId: string]: Object }, | ||
s3MultipartChunksizeMb?: number, | ||
}): Promise<void> { | ||
const sourceGranulesById = keyBy(sourceGranules, 'granuleId'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One future proofing thought comment. The duplicate granule work will very likely result in granules that are not unique by granuleId. Our datastore already doesn't enforce it, just our API and ingest code. Obviously this task in context is fine, but we should be careful in the rest of the PR that we're not burying a related concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how else might a granule be uniquely identified to sync them? certainly it will be necessary in the duplicate_granule stuff to be able to do that. obvs granuleID_collection is the obvious way but won't work here
t.is(updated.Granule.Collection.VersionId, 'b'); | ||
}); | ||
|
||
test('updateCmrFileCollections updates Echo10Files when missing', (t) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change test title
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if there's a granule flag but no collection, it's still writing that right? Add test coverage if accurate.
}); | ||
}); | ||
|
||
test('updateCmrFileCollections updates umm meta file', (t) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really updating a meta file, it's updating the passed in CMR meta object - update test title.
} | ||
} | ||
}, | ||
"oldGranules": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Task: double check this matches before merging
} from '@cumulus/cmrjs'; | ||
import { runCumulusTask } from '@cumulus/cumulus-message-adapter-js'; | ||
import { s3 } from '@cumulus/aws-client/services'; | ||
import { BucketsConfig } from '@cumulus/common'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix @cumulus/common multi-import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
18176c7
Bucket: targetFile.bucket, | ||
Key: targetFile.key, | ||
}), | ||
{ retries: 5, minTimeout: 2000, maxTimeout: 2000 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add logging to these retries, e.g. https://github.com/sindresorhus/p-retry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: consider retry/time deploy configuration
Co-authored-by: Jonathan Kovarik <Jkovarik@users.noreply.github.com>
* Add pRetry logging to all pRetry calls * Add optional chaining to logstrings
Summary: 3751 just the s3 copy part
Addresses CUMULUS-3751: Move granules across collections
Changes
PR Checklist