Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegraf - NVidia SMI plugin not working on Windows client #6573

Closed
dbu748 opened this issue Oct 24, 2019 · 6 comments · Fixed by #6639
Closed

Telegraf - NVidia SMI plugin not working on Windows client #6573

dbu748 opened this issue Oct 24, 2019 · 6 comments · Fixed by #6639
Labels
area/nvidia bug unexpected problem or unintended behavior
Milestone

Comments

@dbu748
Copy link

dbu748 commented Oct 24, 2019

Hi,

I'm trying to monitor NVidia GPU on Windows client machines but it prevents Telegraf from running.

When I run it manually using the plugin troubleshooting command (adapted for powershell)

.\nvidia-smi.exe --format="noheader,nounits,csv" --query-gpu="fan.speed,memory.total,memory.used,memory.free,pstate,temperature.gpu,name,uuid,compute_mode,utilization.gpu,utilization.memory,index,power.draw,pcie.link.gen.current,pcie.link.width.current,encoder.stats.sessionCount,encoder.stats.averageFps,encoder.stats.averageLatency,clocks.current.graphics,clocks.current.sm,clocks.current.memory,clocks.current.video"

I get the error

Field “encoder.stats.sessionCount” is not a valid field to query.

All three encoder stats (sessionCount, averageFps, averageLatency) are in fact "not valid". Once they are removed, the command output a result.

OS: Windows 7
Telegraf: 1.12.3
nvidia-smi: 376.33
GPU: Quadro 4000

@danielnelson danielnelson added the bug unexpected problem or unintended behavior label Oct 24, 2019
@danielnelson
Copy link
Contributor

Can you run these commands and attach the output:

nvidia-smi.exe --format="csv"
nvidia-smi.exe -q -u -x

@dbu748
Copy link
Author

dbu748 commented Oct 25, 2019

Yes of course...

PS C:\Program Files\NVIDIA Corporation\NVSMI> .\nvidia-smi.exe --format="csv"
Invalid combination of input arguments. Please run 'nvidia-smi -h' for help.

From the help, selective queries (using format) must have one of --query-xyz=.
If you were looking for the full output

PS C:\Program Files\NVIDIA Corporation\NVSMI> .\nvidia-smi -q

==============NVSMI LOG==============

Timestamp                           : Fri Oct 25 09:22:24 2019
Driver Version                      : 376.33

Attached GPUs                       : 1
GPU 0000:05:00.0
    Product Name                    : Quadro 4000
    Product Brand                   : Quadro
    Display Mode                    : Enabled
    Display Active                  : Enabled
    Persistence Mode                : N/A
    Accounting Mode                 : N/A
    Accounting Mode Buffer Size     : N/A
    Driver Model
        Current                     : WDDM
        Pending                     : WDDM
    Serial Number                   : N/A
    GPU UUID                        : GPU-795b0a66-e968-2870-0c63-69189e2df04e
    Minor Number                    : N/A
    VBIOS Version                   : 70.00.67.00.02
    MultiGPU Board                  : No
    Board ID                        : 0x500
    GPU Part Number                 : N/A
    Inforom Version
        Image Version               : N/A
        OEM Object                  : N/A
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization mode         : None
    PCI
        Bus                         : 0x05
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x06DD10DE
        Bus Id                      : 0000:05:00.0
        Sub System Id               : 0x078010DE
        GPU Link Info
            PCIe Generation
                Max                 : 2
                Current             : 1
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays since reset         : N/A
        Tx Throughput               : N/A
        Rx Throughput               : N/A
    Fan Speed                       : 37 %
    Performance State               : P1
    Clocks Throttle Reasons         : N/A
    FB Memory Usage
        Total                       : 2048 MiB
        Used                        : 228 MiB
        Free                        : 1820 MiB
    BAR1 Memory Usage
        Total                       : N/A
        Used                        : N/A
        Free                        : N/A
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 1 %
        Encoder                     : N/A
        Decoder                     : N/A
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            Single Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                Total               : N/A
            Double Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                Total               : N/A
        Aggregate
            Single Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                Total               : N/A
            Double Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending                     : N/A
    Temperature
        GPU Current Temp            : 71 C
        GPU Shutdown Temp           : N/A
        GPU Slowdown Temp           : N/A
    Power Readings
        Power Management            : N/A
        Power Draw                  : N/A
        Power Limit                 : N/A
        Default Power Limit         : N/A
        Enforced Power Limit        : N/A
        Min Power Limit             : N/A
        Max Power Limit             : N/A
    Clocks
        Graphics                    : 405 MHz
        SM                          : 810 MHz
        Memory                      : 1404 MHz
        Video                       : 405 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 475 MHz
        SM                          : 950 MHz
        Memory                      : 1404 MHz
        Video                       : 540 MHz
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes
        Process ID                  : 5378
            Type                    : G
            Name                    : C:\Program Files (x86)\Google\Chrome\Application\chrome.exe
            Used GPU Memory         : Not available in WDDM driver model
        Process ID                  : 6332
            Type                    : G
            Name                    : C:\Windows\system32\Dwm.exe
            Used GPU Memory         : Not available in WDDM driver model

Finally the unit

PS C:\Program Files\NVIDIA Corporation\NVSMI> .\nvidia-smi -q -u -x
<?xml version="1.0" ?>
<!DOCTYPE nvidia_smi_log SYSTEM "nvsmi_unit_v8.dtd">
<nvidia_smi_log>
        <timestamp>Fri Oct 25 09:12:34 2019</timestamp>
        <driver_version>376.33</driver_version>
        <hic_info>
        </hic_info>
        <attached_units>0</attached_units>
</nvidia_smi_log>

I hope it helps.

@sjwang90 sjwang90 added the ready label Nov 4, 2019
@glinton
Copy link
Contributor

glinton commented Nov 6, 2019

It seems like one solution could be to refactor the nvidia-smi plugin to parse the xml output and query nvidia-smi -q -x
@dbu748 I'm imagining it's not in the list, but can you check the output of nvidia-smi.exe --help-query-gpu? It prints a list of available to query gpu bits as i understand.

@dbu748
Copy link
Author

dbu748 commented Nov 7, 2019

No, you're right it's not in the list generated by nvidia-smi.exe --help-query-gpu. Unless you need it, I won't copy the whole output. It really seems to be the explanation of all the metrics that are output by nvidia-smi.exe -q (shown in my last post).

I imagined at some point that having a white/black list of the metrics in the plugin config could do it but obviously your solution will probably be the more flexible and broadly supported !

@glinton
Copy link
Contributor

glinton commented Nov 8, 2019

@dbu748 the following build should work for you. You can try it out and verify that it collects all the data you expect.

@dbu748
Copy link
Author

dbu748 commented Nov 11, 2019

@glinton: Thank you Greg, it seems to work great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/nvidia bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants