-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proxy: increase timeout since raven can take lots of time to process requests #122
Conversation
…requests Fix this timeout error in some Raven notebooks: ``` HTTPError: 504 Server Error: Gateway Time-out for url: https://pavics.ouranos.ca/twitcher/ows/proxy/raven/wps ``` This is just a work-around since something is very wrong on our production host Boreas (a physical host with 128G ram and 48 logical cpu). During the test, Boreas was having this "load average: 6.35, 5.90, 4.33". For its hardware specs, it is basically idle. The increased timeout was not needed for my test VM (10G ram, 2 cpu), medus.ouranos.ca (physical host with 16G ram, 16 logical cpu) and hirondelle.crim.ca (VM with 32G ram, 8 cpu).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should document the fact any process running longer than this limit will fail.
Also what happens on the PyWPS side, is the process killed or it becomes a zombie ?
It just continue and return later ... when it's too late. So no zombie or accumulated queue. |
@tlvu I confirm that the 2 notebooks (Ouranosinc/raven#362 and Ouranosinc/raven#357) now work without time-outing! |
114b6bb
to
2507c6f
Compare
143cdf5
to
4a6b5a4
Compare
The machine is up since 13 days, no need a reboot. I will analyze incoming traffic with more details. |
@moulab88 Just to be clear, the traffic is not blocked. There is simply something that appear to slow down Raven responses from the client point of view. There are quite a few possible reasons here:
Ping me if you need anything. |
The OS firewall is disable. I will just check if there are dropped packets on the connections. |
Sensible suggestion. Migration will take time, we have lots of images and data. Also there might not be enough space. Given complicated migration, we probably should find a way to test the idea without doing the full migration. |
Forget to mention /var (265 GB Free), this space was reserved for this need. |
We should also be using Currently, it is not the case:
|
There is clean up to do and we'll need an even bigger partition ! |
oh!! it will be a good candidate to a new SSD/nVME disk/partition. |
With Raven is there an issue |
This is an excellent question. There should be Raven notebooks using async mode, since this issue exists Ouranosinc/raven#353, but I have not seen the same "queue not cleared" problem with Raven !!! This is very odd, why Finch have the problem and not Raven ! |
I don't know if we should read too much into that. Maybe Raven has not been exercised as intensively as Finch. |
…uld-be-configurable proxy: proxy_read_timeout config should be configurable We have a performance problem with the production deployment at Ouranos so we need a longer timeout. Being a Ouranos specific need, it should not be hardcoded as in previous PR #122. The previous increase was sometime not enough ! The value is now configurable via `env.local` as most other customizations. Documentation updated. Timeout in Prod: ``` WPS_URL=https://pavics.ouranos.ca/twitcher/ows/proxy/raven/wps FINCH_WPS_URL=https://pavics.ouranos.ca/twitcher/ows/proxy/finch/wps FLYINGPIGEON_WPS _URL=https://pavics.ouranos.ca/twitcher/ows/proxy/flyingpigeon/wps pytest --nbval-lax --verbose docs/source/notebooks/Running_HMETS_with_CANOPEX_datas et.ipynb --sanitize-with docs/source/output-sanitize.cfg --ignore docs/source/notebooks/.ipynb_checkpoints HTTPError: 504 Server Error: Gateway Time-out for url: https://pavics.ouranos.ca/twitcher/ows/proxy/raven/wps ===================================================== 11 failed, 4 passed, 1 warning in 249.80s (0:04:09) =========================================== ``` Pass easily on my test VM with very modest hardware (10G ram, 2 cpu): ``` WPS_URL=https://lvupavicsmaster.ouranos.ca/twitcher/ows/proxy/raven/wps FINCH_WPS_URL=https://lvupavicsmaster.ouranos.ca/twitcher/ows/proxy/finch/wp s FLYINGPIGEON_WPS_URL=https://lvupavicsmaster.ouranos.ca/twitcher/ows/proxy/flyingpigeon/wps pytest --nbval-lax --verbose docs/source/notebooks/Runni ng_HMETS_with_CANOPEX_dataset.ipynb --sanitize-with docs/source/output-sanitize.cfg --ignore docs/source/notebooks/.ipynb_checkpoints =========================================================== 15 passed, 1 warning in 33.84s =========================================================== ``` Pass against Medus: ``` WPS_URL=https://medus.ouranos.ca/twitcher/ows/proxy/raven/wps FINCH_WPS_URL=https://medus.ouranos.ca/twitcher/ows/proxy/finch/wps FLYINGPIGEON_WPS_URL=https://medus.ouranos.ca/twitcher/ows/proxy/flyingpigeon/wps pytest --nbval-lax --verbose docs/source/notebooks/Running_HMETS_with_CANOPEX_dataset.ipynb --sanitize-with docs/source/output-sanitize.cfg --ignore docs/source/notebooks/.ipynb_checkpoints ============================================== 15 passed, 1 warning in 42.44s ======================================================= ``` Pass against `hirondelle.crim.ca`: ``` WPS_URL=https://hirondelle.crim.ca/twitcher/ows/proxy/raven/wps FINCH_WPS_URL=https://hirondelle.crim.ca/twitcher/ows/proxy/finch/wps FLYINGPIGEON_WPS_URL=https://hirondelle.crim.ca/twitcher/ows/proxy/flyingpigeon/wps pytest --nbval-lax --verbose docs/source/notebooks/Running_HMETS_with_CANOPEX_dataset.ipynb --sanitize-with docs/source/output-sanitize.cfg --ignore docs/source/notebooks/.ipynb_checkpoints =============================================== 15 passed, 1 warning in 35.61s =============================================== ``` For comparison, a run on Prod without Twitcher (PR bird-house/birdhouse-deploy-ouranos#5): ``` WPS_URL=https://pavics.ouranos.ca/raven/wps FINCH_WPS_URL=https://pavics.ouranos.ca/twitcher/ows/proxy/finch/wps FLYINGPIGEON_WPS_URL=https://pavics .ouranos.ca/twitcher/ows/proxy/flyingpigeon/wps pytest --nbval-lax --verbose docs/source/notebooks/Running_HMETS_with_CANOPEX_dataset.ipynb --sanitize -with docs/source/output-sanitize.cfg --ignore docs/source/notebooks/.ipynb_checkpoints HTTPError: 504 Server Error: Gateway Time-out for url: https://pavics.ouranos.ca/raven/wps ================================================ 11 failed, 4 passed, 1 warning in 248.99s (0:04:08) ================================================= ``` A run on Prod without Twitcher and Nginx (direct hit Raven): ``` WPS_URL=http://pavics.ouranos.ca:8096/ FINCH_WPS_URL=https://pavics.ouranos.ca/twitcher/ows/proxy/finch/wps FLYINGPIGEON_WPS_URL=https://pavics.oura nos.ca/twitcher/ows/proxy/flyingpigeon/wps pytest --nbval-lax --verbose docs/source/notebooks/Running_HMETS_with_CANOPEX_dataset.ipynb --sanitize-with docs/source/output-sanitize.cfg --ignore docs/source/notebooks/.ipynb_checkpoints ===================================================== 15 passed, 1 warning in 218.46s (0:03:38) ======================================================
@moulab88 Performance seems to have improved on prod Boreas, did you change something? See test results in Ouranosinc/PAVICS-e2e-workflow-tests#61 |
I did nothing on my side. |
Fix this timeout error in some Raven notebooks:
This is just a work-around since something is very wrong on our
production host Boreas (a physical host with 128G ram and 48 logical cpu).
During the test, Boreas was having this "load average: 6.35, 5.90,
4.33". For its hardware specs, it is basically idle.
The increased timeout was not needed for my test VM (10G ram, 2 cpu),
medus.ouranos.ca (physical host with 16G ram, 16 logical cpu) and
hirondelle.crim.ca (VM with 32G ram, 8 cpu).
Should fix Ouranosinc/raven#362 and fix Ouranosinc/raven#357.
Ping @moulab88 to take a look at Boreas. Just a wild guess, is it due for a reboot?
Ping @richardarsenault can you retry the 2 broken Raven notebooks? I've already deployed this to prod.