Replies: 6 comments
-
Hi ! Working on this topic specifically because of #273, I can think more clearly about this. I think there are several reasons for this:
When #274 is merged, the calculation will include idle time (based on sysinfo instead of procfs), and hopefully memory's share. You will still have a discrepancy, but it should match idle time's share. I think of adding a metric about idle CPU time (which, if accounted "as a process" could fill part of the gap) and metrics regarding ram usage per process should help understanding the allocation of power accross processes. Does it make sense for you ? |
Beta Was this translation helpful? Give feedback.
-
Hello, |
Beta Was this translation helpful? Give feedback.
-
Hi, I think idle time might be a more important reason than RAM, but can't confirm this is the only reason for what you see. We also have seen different levels of discrepancies depending on hardware, so we could also simply have an issue regarding the implementation. I do hope we have more consistent result with this new implementation based on sysinfo. Would you like to help us test this new version once it's there and compare ? |
Beta Was this translation helpful? Give feedback.
-
Hi, |
Beta Was this translation helpful? Give feedback.
-
Could you give a try to the dev branch ? You should see a greater difference than before between sum of processes power and host power, when the machine is mostly idle. Using the --resources flag on json exporter you should be able to track the per process cpu % as well. It seems to show that the gap is pretty consistent with the % of cpu time spent in idle. (the docker-compose at the root of the project includes a grafana dashboard that helps see this correlation if needed) scaph_domain_power_microwatts metric shows the power per RAPL Domain (including dram domain). As expected (at least on the machines tested so far) the power used by this domain is to low to explain the gap. I think idle time was really the metric we didn't account for properly before. I'd be glad to discuss about this once you've tested it in your context. |
Beta Was this translation helpful? Give feedback.
-
Hi ! @hrexha28 if by any chance you could test dev branch and give feedback, that would be great :) Closing for now and moving to discussion. several changes impacting what you describe are coming in 1.0, documentation with it, so we could also continue the discuttion then. |
Beta Was this translation helpful? Give feedback.
-
Bug description
Hi, if i use the tool to measure the power distribution of several processes with the json exporter the sum of all processes power is far greater than the host. Sometimes even the power of one process exceeds the host power.
To Reproduce
./scaphandre -t 10 -s 0 -n 100000000 -f result.json
Expected behavior
sum(all process power) <= host power
Screenshots
Environment
Beta Was this translation helpful? Give feedback.
All reactions