Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite Dockerfile to run imports and healthcheck and run it on ECS #757

Closed
Mr0grog opened this issue Dec 17, 2021 · 2 comments
Closed

Rewrite Dockerfile to run imports and healthcheck and run it on ECS #757

Mr0grog opened this issue Dec 17, 2021 · 2 comments

Comments

@Mr0grog
Copy link
Member

Mr0grog commented Dec 17, 2021

There’s a Dockerfile in this repo, but it’s currently set up to run the diffing server, which we split off into a separate project a while back. The Dockerfile here is not defunct.

We should update this dockerfile to run our other scripts: wm import ia and ia_healthcheck. (The image should be set up so you can provide a command to run either one.) We currently run these as cron scripts in a manually managed server, but it would probably be better if these were a Docker image that can run on a schedule in ECS. We would need less manual fiddling with the server if so (at least in theory).

Mr0grog added a commit to edgi-govdata-archiving/web-monitoring-ops that referenced this issue Feb 16, 2023
We used to run this in an EC2 instance with cron, but it's probably better managed along with everything else in k8s.

See also edgi-govdata-archiving/web-monitoring-processing#757.
@Mr0grog
Copy link
Member Author

Mr0grog commented Feb 16, 2023

The healthcheck is now deployed to Kubernetes in edgi-govdata-archiving/web-monitoring-ops@624716a

Will configure the import job similarly tomorrow.

Mr0grog added a commit that referenced this issue Feb 16, 2023
To support running this job on an actual scheduled job runner that doesn't have persistent storage (see #757), we need to be able to store the unplaybackable cache in S3. You can now use 's3://' paths in the `--unplaybackable` option:

    wm import ia 'https://somewhere.com/' --unplaybackable 's3://bucket/unplaybackable.json'
Mr0grog added a commit to edgi-govdata-archiving/web-monitoring-ops that referenced this issue Feb 16, 2023
Instead of running the import job as a cron script on a random EC2 VM, run it as an actual CronJob in Kubernetes with everything else. This also cleans up the docs around jobs.

Work not visible here: created a new IAM account for jobs that can write to relevant S3 buckets, added ability to store cache files in S3 (edgi-govdata-archiving/web-monitoring-processing#849) since we have no persistent storage in Kubernetes.

Why do this now? See:
- edgi-govdata-archiving/web-monitoring#168
- edgi-govdata-archiving/web-monitoring-processing#757
Mr0grog added a commit to edgi-govdata-archiving/web-monitoring-ops that referenced this issue Feb 16, 2023
Instead of running the import job as a cron script on a random EC2 VM, run it as an actual CronJob in Kubernetes with everything else. This also cleans up the docs around jobs.

Work not visible here: created a new IAM account for jobs that can write to relevant S3 buckets, added ability to store cache files in S3 (edgi-govdata-archiving/web-monitoring-processing#849) since we have no persistent storage in Kubernetes.

Why do this now? See:
- edgi-govdata-archiving/web-monitoring#168
- edgi-govdata-archiving/web-monitoring-processing#757
Mr0grog added a commit to edgi-govdata-archiving/web-monitoring-ops that referenced this issue Feb 17, 2023
Instead of running the import job as a cron script on a random EC2 VM, run it as an actual CronJob in Kubernetes with everything else. This also cleans up the docs around jobs.

Why do this now? See:
- edgi-govdata-archiving/web-monitoring#168
- edgi-govdata-archiving/web-monitoring-processing#757

Work not visible here:
- Created a new IAM account for jobs that can write to relevant S3 buckets.
- Added ability to store cache files in S3 (edgi-govdata-archiving/web-monitoring-processing#849) since we have no persistent storage in Kubernetes.
@Mr0grog
Copy link
Member Author

Mr0grog commented Feb 17, 2023

Well, Kubernetes configuration was kind of a mess, but it’s done. This is good to go. edgi-govdata-archiving/web-monitoring-ops#44

@Mr0grog Mr0grog closed this as completed Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant