Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect thrashing using PSI #28

Open
hakavlad opened this issue Aug 4, 2019 · 1 comment
Open

Detect thrashing using PSI #28

hakavlad opened this issue Aug 4, 2019 · 1 comment

Comments

@hakavlad
Copy link

hakavlad commented Aug 4, 2019

IMHO PSI is maybe best metrics to detect thrashing.
https://lwn.net/Articles/759658/
https://facebookmicrosites.github.io/psi/

You can try to use it to detect thrashing instead of vmstat.

PSI file example (/proc/pressure/memory):

some avg10=70.24 avg60=68.52 avg300=69.91 total=3559632828
full avg10=57.59 avg60=58.06 avg300=60.38 total=3300487258

Use total metrics.

@tobixen
Copy link
Owner

tobixen commented Nov 27, 2019

This is interesting.

This sounds a bit like thrash-protect to me:

We wrote a <100 line POC python script to monitor memory
pressure and kill stuff way before such pathological thrashing leads
to full system losses that require forcible hard resets.

At some point I should look more into this and replace the current algorithm (based in swapin*swapout) with the new /proc/pressure/memory-statistics. However, /proc/pressure is missing in most of the production servers I'm having responsibility for, so it will take a long time before the backward-compatible algorithm can be obsoleted.

tobixen added a commit that referenced this issue Nov 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants