Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flask/DevOps: Intermittent outages #950

Open
cklamann opened this issue Nov 24, 2021 · 2 comments
Open

Flask/DevOps: Intermittent outages #950

cklamann opened this issue Nov 24, 2021 · 2 comments
Assignees

Comments

@cklamann
Copy link
Contributor

Users have observed intermittent outages of the STAGER backend, with the server returning a 504. Users report that these outages started a few weeks ago and weren't observed previously. Anecdotally, it seems that concurrency might be a factor, but we'll probably need additional logging to know more.

@kevinlul
Copy link
Collaborator

#955

@kevinlul
Copy link
Collaborator

Also somewhat related is #754, if query efficiency is a major contributor

@kevinlul kevinlul self-assigned this Nov 29, 2021
kevinlul added a commit that referenced this issue Dec 10, 2021
This is to help gain insights for #950.
Leverage [prometheus-flask-exporter](https://github.com/rycus86/prometheus_flask_exporter), a high-level client atop Prometheus' Python client designed for Flask and Gunicorn.
Use `/tmp` for the multiprocess directory to temporarily write metrics for the manager process, which is always writable and reset when the container is recreated, as required.
Add alternate gunicorn configuration for testing this metrics setup without going straight into production.

Integrate entrypoints into Dockerfile, now that stager-dev is no longer needed.
Disable pip cache when building images to reduce image size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants