Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel: nv_mem nv_get_p2p_free_callback:155 nv_get_p2p_free_callback -- invalid dma_mapping #111

Open
susol-hjkim opened this issue Dec 22, 2022 · 2 comments

Comments

@susol-hjkim
Copy link

Hello ~

This system occured unexpect reboot.
I saw some logs before unexpected reboot in /var/log/syslog.

Dec 20 18:48:09 A100-42 kernel: nv_mem nv_get_p2p_free_callback:155 nv_get_p2p_free_callback -- invalid dma_mapping
Dec 20 18:48:09 A100-42 kernel: nv_mem nv_get_p2p_free_callback:155 nv_get_p2p_free_callback -- invalid dma_mapping

What is these logs mean?
Do that logs have relationship with unexpected reboot?

[ENV]
OS: ubuntu 20.04
Kernel : 5.4.0-42-generic
H/W : Supermicro AS-4124GO-NART (like DGX A100)

[GPU : 8ea]
NVIDIA A100-SXM4-80GB
Driver Version : 470.103.01
CUDA Version : 11.4

[IB : 8ea]
Ofed ver : OFED-5.6.0.1.6.1
nv_peer_mem : v1.0
CA 'mlx5_0'
CA type: MT4123
Number of ports: 1
Firmware version: 20.32.1010
Hardware version: 0
Node GUID: 0x08c0eb0300c8ff40
System image GUID: 0x08c0eb0300c8ff40
Port 1:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 173
LMC: 0
SM lid: 233
Capability mask: 0x2651e848
Port GUID: 0x08c0eb0300c8ff40
Link layer: InfiniBand

Thanks ~

@drossetti
Copy link

drossetti commented Mar 31, 2023

is this with github/nv_peer_mem or R470/nvidia-peermem?
Ofed ver : OFED-5.6.0.1.6.1 is not even available for download anymore. Should you not move to a 5.x LTS release?

@drossetti
Copy link

This may be due to #53

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants