-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: walreceiver
did not restart after erroring out
#8172
Comments
walreceiver
did not restart after erroring ourwalreceiver
did not restart after erroring out
Did it. But the problem is not reproduced: walreceiver is restarted.
which should cause termination of the whole VM (not sure if k8s will restart). |
I wonder if there is any proof that walreceiver is actually died and not restarted? I looked through postmaster code but didn't find some obvious explanation which can prevent crashed walreceiver from been restarted. |
i manually checked that there were no walreceiver running on replica, here is ps output https://neondb.slack.com/archives/C04DGM6SMTM/p1719401779142989?thread_ts=1719394592.373479&cid=C04DGM6SMTM |
I failed to reproduce the problem by throwing FATAL exception in walreceiver (I tried different places and frequency). |
We also didn't notice that in prod for a long time, but keeping it open for now |
Got an interesting case with one of the production read-only endpoints. Walreceiver errored out and died:
but then it did not start again.
https://neondb.slack.com/archives/C04DGM6SMTM/p1719394592373479
https://console.neon.tech/admin/regions/aws-eu-central-1/computes/compute-lingering-forest-a2yogi5o
Heikki suggested to try to manually reproduce by adding
elog(FATAL, "crashme")
in walreceiver.The text was updated successfully, but these errors were encountered: