You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The webserver might stay running even though she cannot connect to rabbit. Erros as such:
log_level=ERROR | log_timestamp=2023-05-30 05:49:29,365 | log_source=servicelib.rabbitmq:_connection_close_callback(26) | log_uid=None | log_msg=Rabbit connection closed with exception from amqp://******:******@staging_rabbit:PORT/?name=webserver_osparc-dalco-08-staging-simcore_staging_webserver-3_743:<Future finished exception=<ChannelNotFoundEntity: The client attempted to work with a server entity that does not exist: "NOT_FOUND - no queue 'amq_0x4b9e264ac393f09832b9b39b4addc5f5' in vhost '/'">>
Expected Behavior
werbserver retries establishing the connection to rabbitMQ a few times
If webserver ultimately cannot connect to rabbitMQ, the webserver container dies gracefully with an error exit code and a clear log message. The webserver container will then be restarted by the container orchestrator
Steps To Reproduce
Unclear, this happened as a result of a reboot of machines by the Z43 IT department. To the best of our knowledge oSparc is already robust to the restart of individual machines if they are restarted gracefully.
The error we see is actually the webserver re-connecting to the rabbitmq. It is expected that the webserver uses a so-called exlusive queue, which is set to be durable. This means that when rabbitmq restarts it re-creates the exchanges and the queues. It seems this was not the case.
We need to go over the deployment policies of rabbitmq in order to make it re-startable as it is not the case now.
sanderegg
changed the title
Missing retries in the webserver: Connection to RabbitMQ
Missing docker volume for RabbitMQ: Connection to RabbitMQ
May 30, 2023
Is there an existing issue for this?
Which deploy/s?
No response
Current Behavior
The webserver might stay running even though she cannot connect to rabbit. Erros as such:
Expected Behavior
Steps To Reproduce
Unclear, this happened as a result of a reboot of machines by the Z43 IT department. To the best of our knowledge oSparc is already robust to the restart of individual machines if they are restarted gracefully.
Anything else?
@matusdrobuliak66 is the main guy involved in tracing this issue
The text was updated successfully, but these errors were encountered: