Support S3 for cache files #849

Mr0grog · 2023-02-16T17:53:31Z

To support running this job on an actual scheduled job runner that doesn't have persistent storage (see #757), we need to be able to store the unplaybackable cache in S3. You can now use 's3://' paths in the --unplaybackable option:

wm import ia 'https://somewhere.com/' --unplaybackable 's3://bucket/unplaybackable.json'

To support running this job on an actual scheduled job runner that doesn't have persistent storage, we need to be able to store the unplaybackable cache in S3. You can now use 's3://' paths in the `--unplaybackable` option: wm import ia 'https://somewhere.com/' --unplaybackable 's3://bucket/unplaybackable.json'

Instead of running the import job as a cron script on a random EC2 VM, run it as an actual CronJob in Kubernetes with everything else. This also cleans up the docs around jobs. Work not visible here: created a new IAM account for jobs that can write to relevant S3 buckets, added ability to store cache files in S3 (edgi-govdata-archiving/web-monitoring-processing#849) since we have no persistent storage in Kubernetes. Why do this now? See: - edgi-govdata-archiving/web-monitoring#168 - edgi-govdata-archiving/web-monitoring-processing#757

Instead of running the import job as a cron script on a random EC2 VM, run it as an actual CronJob in Kubernetes with everything else. This also cleans up the docs around jobs. Why do this now? See: - edgi-govdata-archiving/web-monitoring#168 - edgi-govdata-archiving/web-monitoring-processing#757 Work not visible here: - Created a new IAM account for jobs that can write to relevant S3 buckets. - Added ability to store cache files in S3 (edgi-govdata-archiving/web-monitoring-processing#849) since we have no persistent storage in Kubernetes.

Mr0grog merged commit 250c554 into main Feb 16, 2023

Mr0grog deleted the sometimes-the-local-filesystem-is-not-a-very-good-cache branch February 16, 2023 17:54

Mr0grog added a commit that referenced this pull request Feb 16, 2023

Release #849

64e6743

Mr0grog mentioned this pull request Feb 16, 2023

Run import job as Kubernetes CronJob edgi-govdata-archiving/web-monitoring-ops#44

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support S3 for cache files #849

Support S3 for cache files #849

Mr0grog commented Feb 16, 2023

Support S3 for cache files #849

Support S3 for cache files #849

Conversation

Mr0grog commented Feb 16, 2023