Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add power_limit to nvidia_smi #15142

Closed
jpvlerbe opened this issue Apr 11, 2024 · 6 comments · Fixed by #15144
Closed

add power_limit to nvidia_smi #15142

jpvlerbe opened this issue Apr 11, 2024 · 6 comments · Fixed by #15144
Labels
feature request Requests for new plugin and for new features to existing plugins waiting for response waiting for response from contributor

Comments

@jpvlerbe
Copy link

Use Case

To get the usage of the gpu I would like to use the percentage of power used. So from the output of nvidia-smi power_draw/power_limit

This seems to be a better metric to show usage than either the ram usage or gpu usage.

Expected behavior

next to power_draw include power_limit field

Actual behavior

only power_draw is included

Additional info

Or can somehow add this to the configuration?

@jpvlerbe jpvlerbe added the feature request Requests for new plugin and for new features to existing plugins label Apr 11, 2024
@powersj
Copy link
Contributor

powersj commented Apr 11, 2024

Hi,

Please provide the output of /usr/bin/nvidia-smi -q -x so we can see an example.

In v12 I see numerous power limit fields.

Thanks

@powersj powersj added the waiting for response waiting for response from contributor label Apr 11, 2024
@jpvlerbe
Copy link
Author

I am referring to this metric
image

which I am assuming corresponds to current_power_limit

I tried with fieldinclude=["current_power_limit"] but I am guessing that does not work as it is not in the list from the README or in parser.go (probably adding a line after line 80?)

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Apr 11, 2024
@powersj
Copy link
Contributor

powersj commented Apr 11, 2024

@jpvlerbe I need you to run the nvidia-smi command above and get the output. Telegraf does not parse that table output, but instead the output from that command. I need to see what is available from the data that telegraf can see.

FWIW that 350W is the total cap value. Not something I would expect to see change.

@powersj powersj added the waiting for response waiting for response from contributor label Apr 11, 2024
@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Apr 11, 2024
@jpvlerbe
Copy link
Author

jpvlerbe commented Apr 11, 2024

`$ nvidia-smi -q -x | grep power
<clocks_throttle_reason_sw_power_cap>Not Active</clocks_throttle_reason_sw_power_cap>

                    <clocks_throttle_reason_hw_power_brake_slowdown>Not Active</clocks_throttle_reason_hw_power_brake_slowdown>

            <power_readings>

                    <power_state>P8</power_state>

                    <power_management>Supported</power_management>

                    <power_draw>35.32 W</power_draw>

                    <power_limit>420.00 W</power_limit>

                    <default_power_limit>420.00 W</default_power_limit>

                    <enforced_power_limit>420.00 W</enforced_power_limit>

                    <min_power_limit>100.00 W</min_power_limit>

                    <max_power_limit>450.00 W</max_power_limit>

            </power_readings>

                    <clocks_throttle_reason_sw_power_cap>Active</clocks_throttle_reason_sw_power_cap>

                    <clocks_throttle_reason_hw_power_brake_slowdown>Not Active</clocks_throttle_reason_hw_power_brake_slowdown>

            <power_readings>

                    <power_state>P8</power_state>

                    <power_management>Supported</power_management>

                    <power_draw>28.49 W</power_draw>

                    <power_limit>350.00 W</power_limit>

                    <default_power_limit>350.00 W</default_power_limit>

                    <enforced_power_limit>350.00 W</enforced_power_limit>

                    <min_power_limit>100.00 W</min_power_limit>

                    <max_power_limit>365.00 W</max_power_limit>

            </power_readings>`

I guess this is the v11 scheme, and the limit I am after is "power_limit"

I have 2 gpu's btw

@powersj
Copy link
Contributor

powersj commented Apr 11, 2024

I guess this is the v11 scheme, and the limit I am after is "power_limit"

You are making this harder than it needs to be with all these edits ;) just copy and paste without grep would have answered everything I need.

In ~30mins from this post, there will be artifacts in #15144 attached as a comment. Please download one of them and ensure the new power_limit field shows up.

@powersj
Copy link
Contributor

powersj commented Apr 11, 2024

artifacts are now posted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requests for new plugin and for new features to existing plugins waiting for response waiting for response from contributor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants