Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Desktop process stuck in state where all client requests return "connection refused" #1999

Open
RebeccaMahany opened this issue Dec 13, 2024 · 4 comments
Labels
bug-fixes Bug Fixes

Comments

@RebeccaMahany
Copy link
Contributor

RebeccaMahany commented Dec 13, 2024

A device began reporting "connection refused" repeatedly when talking to the /refresh endpoint on Dec 11 for the process for user 502, and then stayed in this state through "failing" to send notifications to user 502 on Dec 12.

Notifications did go through for user 501. @James-Pickett suggests that launcher didn't handle user switching appropriately. When looking through similar logs in Cloud Log, we noticed that this "connection refused" error appears pretty regularly when a user exists with a UID greater than 501 -- i.e. on devices with multiple user accounts.

We should investigate how devices are getting stuck in this state, and figure out an appropriate way to remediate the issue.

Example log:

{
    "time":"2024-12-11T18:52:00.121731Z",
    "level":"ERROR",
    "source":{
        "function":"github.com/kolide/launcher/ee/desktop/runner.(*DesktopUsersProcessesRunner).refreshMenu",
        "file":"/Users/runner/work/launcher/launcher/ee/desktop/runner/runner.go",
        "line":553
    },
    "msg":"sending refresh command to user desktop process",
    "component":"desktop_runner",
    "uid":"502",
    "pid":28623,
    "path":"/usr/bin/sudo",
    "err":"Get \"http://unix/refresh\": dial unix /var/kolide-k2/k2device.kolide.com/desktop_502/desktop.sock_4944: connect: connection refused"
}
@RebeccaMahany RebeccaMahany added the bug-fixes Bug Fixes label Dec 13, 2024
@directionless
Copy link
Contributor

My hunch is this is slightly misreported... I cannot imagine connection refused is a misreported error. But I could believe that in addition to the 40+ connection refused errors, there are 40+ connections that should have been flagged as success

@James-Pickett
Copy link
Contributor

James-Pickett commented Dec 13, 2024

I noticed that the UID was 502, typically the 2nd user created on macos. Wonder if there is some user switching at play here that caused things to go haywire.

@RebeccaMahany
Copy link
Contributor Author

We've solved at least some of the mystery:

  • launcher attempts to send the notification to all user processes -- here, we had a process for the 501 user and the 502 user
  • sending the notification to the 501 user succeeded
  • sending the notification to the 502 user failed
  • launcher counts that as a failure to notify overall -- and so launcher retried in one minute

We will update to count the above scenario as successful rather than failed, which will fix the behavior that prompted filing this issue.

I'm leaving the issue open because I think it's still useful to track down why devices get stuck in a state with all desktop process requests for a particular user being "connection refused" -- like James mentioned, maybe something went wrong with user switching, and we had a desktop process still extant that should've been cleaned up. I'll edit the title and description accordingly.

@RebeccaMahany RebeccaMahany changed the title Desktop runner reports failure talking to desktop process despite requests making it through Desktop process stuck in state where all client requests return "connection refused" Dec 13, 2024
@directionless
Copy link
Contributor

nice sleuthing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fixes Bug Fixes
Projects
None yet
Development

No branches or pull requests

3 participants