-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues starting after 3.5.1 upgrade #18496
Comments
This is also happening on docker
What can we do to work around this problem? |
Yep - looks like snap package updated to 3.5.1 and after restart rocketchat-server is getting stuck at that same point and it isn't opening the listening TCP socket so my reverse proxy gateway is responding with a 502 bad gateway error. Edit: Reverting snap to 3.4.2 doesn't seem to help either. Server starts and then exits immediately without any indication of error. Further Edit: stracing the node process it is definitely talking to mongo at least - looks vaguely like it is looping over rocket_notification_queue - just nothing else going on. |
I have exactly the same issue after upgrading to 3.5.1. First restart of docker containers after upgrading worked, but it gets stuck after the second restart at Same as @cjpluss2k18 , I restarted the second time to enable push notifications, but my rocketchat instance does not start after 7 minutes (... or half an hour for that matter) :/ EDIT: Now it progresses to this point and gets stuck here
|
@sampaiodiego could we please get this triaged? Seems like a breaking issue :) |
same here |
Well this is quite nasty.. since the system worked for a while after upgrading. Now I'm forced to revert to an older backup from 2 days ago... which means dataloss for everyone.. \o/ |
Same problem here since our snap installation automatically upgraded to 3.5.1 during the night. I've tried to revert to a previous snap but the same issue persists. Still stuck at the SAUMonitor line. |
For those curious - I popped into the community rocketchat server snap channel and based on the conversation going on there it appears that this problem is being caused by the rocketchat server attempting to reach out to cloud.rocket.chat which currently appears to be non-responsive in some fashion? Adding a firewall rule to block outbound traffic - iptables -I OUTPUT -d 51.81.11.138/24 -j REJECT appears to allow my instance to start normally but presumably things like push notifications won't work. Obviously, I don't recommend using that as anything other than a work-around. |
This work-around does work! |
A proper fix will land on release 3.5.2. a Docker image will be available in a few minutes. A snap release will be available in a few hours. |
I wondered why my server started right up on my test instance which is not joined to the rocket.chat cloud or connected to its services. Makes perfect sense. @sampaiodiego , I noticed the PR you merged only defers the startup checks, but it doesn't seem to address the issue of trying to access |
My RocketChat server (running on virtual machine CentOS Linux release 7.8.2003 (Core) with latest updates just crash today. I run all OS updates (MongoDB, NodeJS and RocketChat to the latest version), but without success. |
@bkraul it's not a case when the service isn't available, but when the service doesn't reply keeping the connection waiting for the stoping the startup, we didn't realize those requests were making the startup to wait so the first solution was to move them to async. This PR changes the HTTP calls to always have a forced timeout of 20s and prevent those connections to stay forever there. We are fixing our cloud services to provide a timeout as well to prevent this to happen again on old versions of Rocket.Chat or any other place using the cloud APIs. Unfortunately, it was the first time we had this kind of problem, it seems to be related to our services and database connection making the requests to wait forever. We are implementing all the necessary changes to prevent this to happen again. Our cloud services are back to the normal response time since a few minutes ago, so if you restart your servers they should ba to work again. Thanks |
@rodrigok Thanks for the update. Sounds painful. :( |
Description:
We successfully upgraded to 3.5.1 but when we later tried to restart Rocketchat it hung at
SAUMonitor ➔ info [aggregate] - Start Cron.
and progressed no further.Since this was in Kubernetes the liveness probe eventually killed it, each time getting stuck at the same point until eventually after 7 times it managed to progress and start correctly. It's possible that it would've succeeded with a longer wait for liveness check but there's no way to verify now as it seems to be fine now.
See additional context for some background to why we restarted.
Steps to reproduce:
Not able to reproduce now, but it was upon start of a new kubernetes pod.
Expected behavior:
Rocketchat starts up in a timely fashion.
Actual behavior:
Rocketchat container starts and gets to a certain point in the logs (see logs) and hangs before eventually being killed by kubernetes due to liveness/readiness probes.
Server Setup Information:
Client Setup Information
Additional context
One point to note is the reason we restarted was to apply the enabling of push notifications, I'm not sure if there is any possible interaction here. We received the notice to enable them and actioned accordingly and we then encountered this.
These are the liveness/readiness checks in kubernetes which have served us well for a few years:
Relevant logs:
The text was updated successfully, but these errors were encountered: