You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the supervisor process ends for any reason, Positron doesn't respond gracefully. All the sessions stop responding without any notifications or user feedback.
It isn't likely that the process will end on its own, but it could exit abruptly in rare circumstances:
Something triggers a close of all terminals (since the supervisor runs in a terminal)
An OOM killer or other system level reaper forces the supervisor to terminate
The supervisor itself panics and crashes due to an unexpected runtime condition
Steps to reproduce the issue:
Ensure the supervisor is enabled, then start a Python session. Get the process ID of the supervisor using Python's os library to get the parent process ID of the Python session:
>>> os.getppid()
84461
Then, kill that process (e.g. kill 84461 in a terminal).
Nothing appears to happen in the UI, but you can no longer execute code, LSPs are dead, you don't get diagnostics, etc.
Expected or desired behavior:
At the very least:
A notification should be displayed to the user indicating that a crash has occurred, along with any output emitted by the supervisor that might be relevant (e.g. a backtrace)
All affected sessions should enter the Exited state
More ambitiously, we could do better at crash recovery:
The supervisor itself could run sessions in such a way that it could reattach after a crash
We could attempt to restart all the sessions that were running before the crash
Implementation-wise, we might do something like check the process ID to see if it's still running if we see our websocket disconnect or an API call fails with a connection refused error unexpectedly, or we could have some sort of heartbeat for the supervisor itself.
The text was updated successfully, but these errors were encountered:
This change causes Positron to handle an abrupt or unexpected exit of
the supervisor process more gracefully. Before the change, the IDE
basically became unresponsive. After the change, the situation is still
Not Great, but it's recoverable; it's about the same as having all your
interpreters crash at once. You'll see some error popups, but you can
get back to a working state.
Because Positron has open websockets for all the active sessions, the
best clue that the supervisor process has been terminated is that these
sockets mysteriously all disconnect. So the main approach here is to
fire disconnect events from the sessions; if we get one of these for a
session that hasn't exited, we check the process table to see what's
going on.
There are a couple of significant caveats:
- This approach only works when the supervisor process is local. We will
probably want to implement a mechanism that works for remote supervisors
too, if we add support for those.
- This approach won't help if the supervisor process is alive but
unresponsive. Though that state hasn't occurred in nature so far, we
will eventually want to guard against it.
Addresses #5037.
### QA Notes
We don't expect supervisor exits to happen to anyone during the normal
course of events, and this change does not attempt to make a supervisor
exit a pleasant experience.
This change also has a drive-by fix that addresses an issue running
notebook kernels under the supervisor.
---------
Signed-off-by: Jonathan <jonathan@posit.co>
Co-authored-by: sharon <sharon-wang@users.noreply.github.com>
System details:
Positron and OS details:
Any version with supervisor enabled.
Interpreter details:
Any.
Describe the issue:
If the supervisor process ends for any reason, Positron doesn't respond gracefully. All the sessions stop responding without any notifications or user feedback.
It isn't likely that the process will end on its own, but it could exit abruptly in rare circumstances:
Steps to reproduce the issue:
Ensure the supervisor is enabled, then start a Python session. Get the process ID of the supervisor using Python's
os
library to get the parent process ID of the Python session:Then, kill that process (e.g.
kill 84461
in a terminal).Nothing appears to happen in the UI, but you can no longer execute code, LSPs are dead, you don't get diagnostics, etc.
Expected or desired behavior:
At the very least:
More ambitiously, we could do better at crash recovery:
Implementation-wise, we might do something like check the process ID to see if it's still running if we see our websocket disconnect or an API call fails with a connection refused error unexpectedly, or we could have some sort of heartbeat for the supervisor itself.
The text was updated successfully, but these errors were encountered: