-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable/disable functionality for monitoring components #118
Conversation
Note to self - |
932214f
to
d68f450
Compare
Just rebased this PR to go cleanly on top of master |
d68f450
to
ce22bd7
Compare
I think this guy is ready to go now. After that we can finally make a proper PR for the NVIDIA GPU monitoring. The yak is shaved. |
In our case the class name can always be used Change indenting and lower case method names
New command line option "-m" than enables on/off (~mon = off) Can be given multiple times or comma separated
Remove debug printouts from option parsing
The "+MON" option was never implemented and doesn't really seem to be necessary (avoid having two ways to do things, it's just confusing)
c087e17
to
ac35d0c
Compare
Probably the user got something wrong, so we should print this warning (but we do not abort the job)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this @graeme-a-stewart. I love the idea but I'm a bit conflicted w/ the set of rules we have. Please see below.
<< "[--monitor, -m mon] Enable or disable monitors:\n" | ||
<< " ~NAME disables that monitor\n" | ||
<< " NAME enables that monitor\n" | ||
<< " comma separation supported or specifing\n" | ||
<< " multiple times\n" | ||
<< " Special name '[~]all' sets default state\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the idea but I'm a bit conflicted w/ the set of rules we have.
For example, if as a user I'd like to only monitor cpumon
, my initial reaction would be to execute prmon -m cpumon -- [command]
. However, doing so I end up getting everything (w/o any message) because I should've used a rather awkward prmon -m ~all,cpumon -- [command]
.
Can we do something like this instead:
- If no
-m
is provided collect everything (default as is now) - Don't allow mixing
~
and non-~
- If the default is
all
then one can turn a subset off things w/ a simple-m ~a,~b
- If the user provides
-m a,b
then only enable those monitors
- If the default is
In all cases, we better print exactly what's going to be monitored. Currently we only print a message saying wallmon
cannot be turned-off if the user tries to do so.
Then there are edge cases, which can be blamed on the user, but -m ~all,all
ends up collecting only wallmon
but -m all,~all
ends up collecting all (i.e. the options are not commutative). This can be eliminated if we don't allow mixing but at the very least perhaps one can print a warning in such cases. If a simple -m ~all
is provided (one can argue that's silly but) we can simply bail out printing a message that the user asked for monitoring nothing.
Also another issue is that, w/ this approach we're altering the orders of the columns in the text file. Now this is not really a problem on our side per se (for example the plotting script uses column names). However, ADC is making plots using these files and if they have hardcoded column numbers we should make sure they're aware. Technically it's someone else's problem but would be nice to give a heads up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like your proposal for the syntax - I was probably thinking too generally and then it does become a bit confusing (especially with all
being the implicit default).
The issue with ~all,all != all,~all
is just impossible to solve satisfactorily, but is avoided if we disallow mixing on/off arguments.
Re. the columns, we should check that they use column names as we can't really guaranttee any particular ordering.
I actually just thought of something that wouldn't fit in this scheme. If we were to develop any components where we did not want to have them enabled by default (it could happen, if something was expensive to monitor or required a special setup) then that wouldn't fit into this on/off scheme. |
Just thinking about this again, maybe we could add two switches, enable and monitor.
Maybe I am overthinking it (!), but I do now think that a better state for device level monitoring components would be disabled by default, because they only give sensible values on a machine used exclusively by the monitored job. e.g., I would consider adding device level GPU stats like i/o, but I would disable that by default. |
To be honest this is one of those things where I'd prefer a pragmatic approach. So far we've been monitoring everything and this hasn't been a problem really. From your preliminary results, the overhead that comes from the new GPU monitoring is practically negligible, too. Therefore, my inclination is to enable everything by default (device-level or otherwise - in the current implementation we don't treat these differently anyhow). I'd keep a single argument and allow the user to either specify what should be exclusively monitored ( |
I went back to this again, in the light of the discussion with Vince and this is now implemented in a much simpler way in #178 |
Closes #107
This PR builds off #115, so it's WIP until that is accepted.However, it's essentially ready to go in its own right.A new switch,
-m/--monitor
, is added that allows monitoring components to be enabled or disabled. The switch can be given multiple times and allows for a comma separated list of monitors to be given. If the name of the monitor is given the component is enabled. If the name is prefixed with~
then it is disabled. The special nameall
sets the default status (useful when given as~all
that switches all components off by default).The
wallmon
component cannot be disabled as it's needed by the main monitor loop.-m ~netmon
-m ~netmon,~iomon
-m ~netmon -m ~iomon
-m ~all,memmon,cpumon
-m ~all -m memmon,cpumon
As part of this PR
pidutils.cpp
is renamed toprmonutils.cpp
as it now contains generally utilities for the main prmon code (cf.utils.cpp
which is for the monitors).