Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modprobe: ERROR: could not insert 'nv_peer_mem': Unknown symbol in module #84

Open
kramanella opened this issue Feb 12, 2021 · 5 comments

Comments

@kramanella
Copy link

Trying to install nvidia_peer_memory-1.1-0.x86_64 on a RHEL 7.8 node with ofed 5.0-2.1.8.0 and hitting a modprobe error.

[root@n120 ~]# modprobe -v nv_peer_mem
insmod /lib/modules/3.10.0-1127.19.1.el7.x86_64/extra/nv_peer_mem.ko
modprobe: ERROR: could not insert 'nv_peer_mem': Unknown symbol in module, or unknown parameter (see dmesg)

From dmesg:
...
[Fri Feb 12 11:47:20 2021] nv_peer_mem: Unknown symbol ib_register_peer_memory_client (err 0)
[Fri Feb 12 11:47:20 2021] nv_peer_mem: Unknown symbol ib_unregister_peer_memory_client (err 0)
[Fri Feb 12 11:49:38 2021] nv_peer_mem: Unknown symbol ib_register_peer_memory_client (err 0)
[Fri Feb 12 11:49:38 2021] nv_peer_mem: Unknown symbol ib_unregister_peer_memory_client (err 0)

Thanks in advance!
nv_peer_mem-modprobe.txt

File attached with output requested from similar issue.

@ferasd
Copy link
Contributor

ferasd commented Feb 14, 2021

Did you upgrade OFED after installing nv_peer_mem?
try to remove nv_peer_mem and install again, this should fix the issue

@kramanella
Copy link
Author

kramanella commented Feb 16, 2021

Reinstalled OFED50 after installing nv_peer_mem.
It builds successful but throws the warnings:
depmod: WARNING: /lib/modules/3.10.0-1127.19.1.el7.x86_64/extra/nv_peer_mem.ko needs unknown symbol ib_register_peer_memory_client
depmod: WARNING: /lib/modules/3.10.0-1127.19.1.el7.x86_64/extra/nv_peer_mem.ko needs unknown symbol ib_unregister_peer_memory_client

Removing nv_peer_mem and installing again doesn't fix it either.

Looking around, the module entries exist in nv_peer_mem.ko but the symbols don't.
[root@n120 modules]# nm -a /lib/modules/3.10.0-1127.19.1.el7.x86_64/extra/nv_peer_mem.ko | grep -E 'ib_register_peer_memory_client|ib_unregister_peer_memory_client'
U ib_register_peer_memory_client
U ib_unregister_peer_memory_client

On my system OFED50 installs modules under /lib/modules/3.10.0-1127.el7.x86_64 where the kernel suffix is truncated (not sure if this is normal behavior)
uname -a
Linux n120 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 11 19:12:04 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux

The module entires and valid symbols are found in ib_core.ko
[root@n120 modules]# nm -a /lib/modules/3.10.0-1127.el7.x86_64/extra/mlnx-ofa_kernel/drivers/infiniband/core/ib_core.ko | grep -E 'ib_register_peer_memory_client|ib_unregister_peer_memory_client'
00000000fec50ade A __crc_ib_register_peer_memory_client
00000000bde5c050 A __crc_ib_unregister_peer_memory_client
0000000000011450 T ib_register_peer_memory_client
00000000000113c0 T ib_unregister_peer_memory_client
00000000000003e0 r __kcrctab_ib_register_peer_memory_client
0000000000000538 r __kcrctab_ib_unregister_peer_memory_client
0000000000000cc3 r __kstrtab_ib_register_peer_memory_client
0000000000000ca2 r __kstrtab_ib_unregister_peer_memory_client
00000000000007c0 r __ksymtab_ib_register_peer_memory_client
0000000000000a70 r __ksymtab_ib_unregister_peer_memory_client

Getting lost in this, more guidance please!

@yug0slav
Copy link

yug0slav commented Oct 1, 2021

  • similar issue on 3.10.0-1160.42.2.el7.x86_64 kernel

    • MLNX_OFED_LINUX-5.0-2.1.8.0
  • rpm install output

Running transaction
  Installing : nvidia_peer_memory-1.1-0.x86_64                                                                                                                                                             1/1
modprobe: ERROR: could not insert 'nv_peer_mem': Unknown symbol in module, or unknown parameter (see dmesg)
Uploading Package Profile
Loaded plugins: fastestmirror, langpacks, nvidia, product-id, subscription-
              : manager
Loaded plugins: fastestmirror, langpacks, nvidia, product-id, subscription-
              : manager
  Verifying  : nvidia_peer_memory-1.1-0.x86_64                                                                                                                                                             1/1

Installed:
  nvidia_peer_memory.x86_64 0:1.1-0

Complete!
Uploading Enabled Repositories Report
Loaded plugins: fastestmirror, langpacks, nvidia, product-id, subscription-
              : manager
  • dmesg errors
[ 4828.021813] nv_peer_mem: Unknown symbol ib_register_peer_memory_client (err 0)
[ 4828.021883] nv_peer_mem: Unknown symbol ib_unregister_peer_memory_client (err 0)
[ 4850.225402] nv_peer_mem: Unknown symbol ib_register_peer_memory_client (err 0)
[ 4850.225460] nv_peer_mem: Unknown symbol ib_unregister_peer_memory_client (err 0)
[ 5164.062703] nv_peer_mem: Unknown symbol ib_register_peer_memory_client (err 0)
[ 5164.062748] nv_peer_mem: Unknown symbol ib_unregister_peer_memory_client (err 0)

@tzafrir-mellanox
Copy link
Contributor

Some Ubuntu kernels seem to not have CONFIG_MODVERSIONS set (while others have it set) and this breaks building nv_peer_mem.

I'm not sure it would be OK to build on a kernel with no MODVERSIONS. But if anybody wants to tackle this: I guess what you need to fix is to remove the parameter KBUILD_EXTRA_SYMBOLS= from the build command. Assuming that this actually help:

you can start by adding to the Makefile something along the lines of:

-include $(KDIR)/.config
ifneq (y,$(CONFIG_MODVERSIONS))

...

@Micket
Copy link

Micket commented Nov 30, 2021

I see the same error with ib_register_peer_memory_client and ib_register_peer_memory_client on Rocky Linux 8.4 with OFED 5.4-1.0.3.0, so it doesn't seem like the issue has anything to do with CONFIG_MODVERSIONS (maybe?)

After finding the 1.2 release (which doesn't seem to be mentioned on mellanox homepage?) and newer OFED drivers (which weren't listed on mellanox repo https://linux.mellanox.com/public/repo/mlnx_ofed/ for some reason... 👎 ) I managed to build this successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants