-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Indexer holding many SSH sessions open #142
Comments
We might be able to switch from SSH to HTTPS, by storing the token into
Or possibly:
|
Possibly, but I'm not entirely sure it'd fix the problem. It's probably just a side effect of the actual problem. GitPython calls git directly, so all of the git interactions, for better or worse, happen within git itself and are separate to python. |
Can we figure out which operation those processes were created to perform? Since we have so few new mods indexed per day, I think we can rule out
NetKAN-Infra/netkan/netkan/indexer.py Lines 212 to 213 in 614db33
OK, reading up on context managers, it sounds like the calling of |
Not quite the case; we do perform one NetKAN-Infra/netkan/netkan/indexer.py Lines 247 to 250 in 614db33
In total, each batch of 10 messages looks like it would do:
... regardless of whether any of the modules were changed. We can probably eliminate the |
Yeah, context managers are pretty neat. If we were leaning on threads I could see scenarios where we might trip ourselves up, but that isn't the case here. I have pondered if we are tripping up the abuse mechanisms and stalling the ssh connections. As previously there were significant pauses between batches of 10, but with the Inflator improvements we can really rip through the indexing run. |
Well. It would appear the issue is essentially identical after the last run. Which is interesting.
So whatever the problem is, it's going to be very obvious when it's no longer a problem! |
Apparently related (I didn't find this, @techman83 did): https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ Apparently git doesn't wait for ssh to finish, which turns it into a zombie process, and containers don't have a mechanism for cleaning them up. |
Also apparently related: aws/amazon-ecs-agent#852 |
Problem
Each run is leaving many defunct ssh processes open, which would suggest that however gitpython is being used, isn't letting go of the process properly.
A single run
< 24 hours of uptime
The result is that over time the service starts thrashing the disk and eventually crashes.
The text was updated successfully, but these errors were encountered: