Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to start NVML integration #2382

Open
Julia-elsammak opened this issue May 15, 2024 · 3 comments
Open

Unable to start NVML integration #2382

Julia-elsammak opened this issue May 15, 2024 · 3 comments

Comments

@Julia-elsammak
Copy link

Julia-elsammak commented May 15, 2024

Output of the info page

When installing NVML integration, getting the following error:

Loading Errors

nvml
----
  Core Check Loader:
    Check nvml not found in Catalog

  JMX Check Loader:
    check is not a jmx check, or unable to determine if it's so

  Python Check Loader:
    unable to import module 'nvml': No module named 'nvml'`

Looking at the debug logs

2024-05-11 18:18:54 CST | CORE | DEBUG | (pkg/collector/python/loader.go:158 in Load) | Unable to load python module - datadog_checks.nvml: unable to import module 'datadog_checks.nvml': Traceback (most recent call last):
  File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/datadog_checks/nvml/__init__.py", line 5, in <module>
    from .nvml import NvmlCheck
  File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/datadog_checks/nvml/nvml.py", line 16, in <module>
    from .api_pb2 import ListPodResourcesRequest
  File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/datadog_checks/nvml/api_pb2.py", line 25, in <module>
    _LISTPODRESOURCESREQUEST = _descriptor.Descriptor(
                               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/google/protobuf/descriptor.py", line 296, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates`

To fix this issue:

  • Utilize the NVIDA DCGM Exporter:
    This method is recommended best practices as the feature is owned and supported by Datadog. Included in the accompanying documentation is an example configuration that executes the same processes as the NVML Integration.
    Nvidia DCGM Exporter: https://docs.datadoghq.com/integrations/dcgm/?tab=hostdocker#overview
  • Google/protobuf library isn't directly installed by the nvml check, but rather is packaged with the Datadog Agent, the nvml check will need to be updated to resolve this issue. The nvml manifest.json in Github.
  • Downgrade to the Agent version v7.50.3. The reason this may have started now is that v7.51.0 of the Agent upgraded the Python used from 3.9 to 3.11, which would have also updated the included libraries like google/protobuf.
@basilnsage
Copy link

Tagging @cep21 and @cswatt who have worked on this before, if you'd be so kind as to have a look please.

@cep21
Copy link
Contributor

cep21 commented May 18, 2024

All of those fixes seem reasonable. As datadog's officially supporting the NVIDA DCGM Exporter now, I've deprecated the nvml plugin internally. It may be best to add it as deprecated here as well. Someone could also modify the plugin to refuse to install for newer datadog versions,but I won't have time to contribute this.

@tmart-ops
Copy link

datadog-agent updates have broken this integration for me as well. I've been able to use the DCGM exporter but it requires running the DCGM exporter container which is less than ideal if it's a machine that doesn't run Docker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants