Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libvma uses huge amount of memory (~4x8G) with max RLIMIT_NOFILE #1063

Open
1 task done
champtar opened this issue Jan 5, 2024 · 2 comments
Open
1 task done

libvma uses huge amount of memory (~4x8G) with max RLIMIT_NOFILE #1063

champtar opened this issue Jan 5, 2024 · 2 comments
Assignees

Comments

@champtar
Copy link

champtar commented Jan 5, 2024

Subject

libvma uses huge amount of memory (~4x8G) with max RLIMIT_NOFILE
Going from EL8 to EL9, default Max open files limit goes from 1048576 to 1073741816, this is true on any host using systemd 240+ if not overridden (systemd/systemd@a8b627a / https://access.redhat.com/solutions/1479623)
(might not be true for user session but true for container)

In libvma there is this code:

fd_collection::fd_collection() :
lock_mutex_recursive("fd_collection"),
m_timer_handle(0),
m_b_sysvar_offloaded_sockets(safe_mce_sys().offloaded_sockets)
{
fdcoll_logfunc("");
m_pendig_to_remove_lst.set_id("fd_collection (%p) : m_pendig_to_remove_lst", this);
m_n_fd_map_size = 1024;
struct rlimit rlim;
if ((getrlimit(RLIMIT_NOFILE, &rlim) == 0) && ((int)rlim.rlim_max > m_n_fd_map_size))
m_n_fd_map_size = rlim.rlim_max;
fdcoll_logdbg("using open files max limit of %d file descriptors", m_n_fd_map_size);
m_p_sockfd_map = new socket_fd_api*[m_n_fd_map_size];
memset(m_p_sockfd_map, 0, m_n_fd_map_size * sizeof(socket_fd_api*));
m_p_epfd_map = new epfd_info*[m_n_fd_map_size];
memset(m_p_epfd_map, 0, m_n_fd_map_size * sizeof(epfd_info*));
m_p_cq_channel_map = new cq_channel_info*[m_n_fd_map_size];
memset(m_p_cq_channel_map, 0, m_n_fd_map_size * sizeof(cq_channel_info*));
m_p_tap_map = new ring_tap*[m_n_fd_map_size];
memset(m_p_tap_map, 0, m_n_fd_map_size * sizeof(ring_tap*));
}

when running with strace it gives:

prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1073741816, rlim_max=1073741816}) = 0
mmap(NULL, 8589934592, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f130c000000
mmap(NULL, 8589934592, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f110c000000
mmap(NULL, 8589934592, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0f0c000000
mmap(NULL, 8589934592, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0d0c000000

Issue type

  • Bug report

Configuration:

  • Product version: libvma-9.7.2-1.x86_64
  • OS: Alma 9

Actual behavior:

libvma allocate 32G of RAM for bookeeping

Expected behavior:

Either:

  • warn when RLIMIT_NOFILE is too high
  • error out when RLIMIT_NOFILE is too high
  • set RLIMIT_NOFILE to a lower value if it's too high
  • rewrite the function to not preallocate the memory

Steps to reproduce:

ulimit -n 1073741816
# run libvma
@champtar
Copy link
Author

(nvidia support case number 00656662)

@igor-ivanov
Copy link
Collaborator

@AlexanderGrissik please assist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants