Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue: 2336523 Return RX pbufs to cq_mgr instead of global pool #917

Open
wants to merge 16 commits into
base: vNext
Choose a base branch
from

Conversation

pasis
Copy link
Member

@pasis pasis commented Oct 23, 2020

Global buffer_pool lock leads to a lock contention in multi-threaded applications. We can mitigate it by using ring per core and returning pbufs to respective cq_mgr cache.

temp->rx.timestamps.hw.tv_nsec = 0;
temp->rx.timestamps.hw.tv_sec = 0;
temp->rx.hw_raw_timestamp = 0;
memset(&temp->rx, 0, sizeof(temp->rx));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do the same change in all identical places (for example: ring_tap) or avoid change

void ring_simple::mem_buf_rx_release(mem_buf_desc_t* p_mem_buf_desc)
{
/* Call this method under m_lock_ring_rx lock */
m_p_cq_mgr_rx->reclaim_recv_buffers(p_mem_buf_desc);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please consider using reclaim_recv_buffers_no_lock() call

@@ -143,6 +143,8 @@ inline void sockinfo_tcp::init_pbuf_custom(mem_buf_desc_t *p_desc)
p_desc->lwip_pbuf.pbuf.type = PBUF_REF;
p_desc->lwip_pbuf.pbuf.next = NULL;
p_desc->lwip_pbuf.pbuf.payload = (u8_t *)p_desc->p_buffer + p_desc->rx.tcp.n_transport_header_len;
/* Override default free function to return rx pbuf to the CQ cache */
p_desc->lwip_pbuf.custom_free_function = sockinfo_tcp::tcp_rx_pbuf_free;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to explicitly change pbuf_free() with 'tcp_rx_pbuf_free()' and use the same change as was done for tcp_rx_pbuf_free(). It is more save for future.
The reason: custom_free_function callback is common for tx and rx flow and it is easy to do mistake in future change

Copy link
Collaborator

@igor-ivanov igor-ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comments

@pasis pasis changed the base branch from master to vNext December 16, 2020 16:20
@pasis
Copy link
Member Author

pasis commented Dec 16, 2020

Rebased on top of vNext and updated with the latest version.

Signed-off-by: Igor Ivanov <igor.ivanov.va@gmail.com>
Signed-off-by: Igor Ivanov <igor.ivanov.va@gmail.com>
m_recv_rings is introduced to control rx flow.
m_recv_rings can have the same size as m_bond_rings in case
every simple ring should process income traffic or
less if RoCE LAG device is used (for RoCE LAG device
single simple ring should handle rx).

Signed-off-by: Igor Ivanov <igor.ivanov.va@gmail.com>
1. The register storage class specifier is deprecated in C++11
2. Disable -Wno-free-nonheap-object diagnostic that looks as
   incorrect for gcc-11

Signed-off-by: Igor Ivanov <igor.ivanov.va@gmail.com>
drain_and_proccess() is mainly called in following cases as
Internal thread:
   Frequency of real polling can be controlled by
   VMA_PROGRESS_ENGINE_INTERVAL and VMA_PROGRESS_ENGINE_WCE_MAX.
socketxtreme:
   User does socketxtreme_poll()
Cleanup:
   QP down logic to release rx buffers should force polling to do this.
   Not null argument indicates one.

Signed-off-by: Igor Ivanov <igor.ivanov.va@gmail.com>
@igor-ivanov
Copy link
Collaborator

pasis and others added 10 commits February 20, 2021 17:06
ring_bond passes lkey query to one of its child rings. However,
get_tx_lkey() uses m_bond_rings and send_lwip_buffer() uses m_xmit_rings
to choose the child ring. This inconsistency can lead to a situation when
lkey is set from a wrong ring.

Use m_xmit_rings in get_tx_lkey().

Signed-off-by: Dmytro Podgornyi <dmytrop@mellanox.com>
Both team and bonding Master interface has IFF_MASTER flag bit set.

However, linux sys files, such as BONDING_MODE_PARAM_FILE,  are used
to check bonding interface. The linux 'team' kernel module does not
create those files as 'bonding' module.

In other words, 'bonding' interface verify does not work for 'team'
interface, so skip 'team' interface when check 'bonding'.

Otherwise, we will have significant performance issues for UDP traffic
over 'team' interface.

Fixes: ee0a1b9 ("issue: 2233904 Check bonding device using netlink")

Signed-off-by: Honggang Li <honli@redhat.com>
Reviewed-by: Igor Ivanov <igori@nvidia.com>
Suppress an implicit-fallthrough warning
Suppress stringop-truncation warning
Fix 'memset' usage

Signed-off-by: Honggang Li <honli@redhat.com>
Reviewed-by: Igor Ivanov <igori@nvidia.com>
TIMESTAMP option must be included in keepalives and zero window probes.
However, VMA ignores the option for these segments. Add TIMESTAMP
support in the same way as during sending an empty ACK.

TODO: Reduce copy-paste in the functions that send single pbufs.

Signed-off-by: Dmytro Podgornyi <dmytrop@mellanox.com>
When we add an item to a non-empty m_reg_action_q, we don't have to
wakeup the internal thread, because it will handle the item. As result,
we reduce number of epoll_ctl() calls.

Signed-off-by: Dmytro Podgornyi <dmytrop@mellanox.com>
Access to m_is_sleeping must be serialized. Otherwise, this leads to
a race and respectful entity may not be awaken when requested.

Signed-off-by: Dmytro Podgornyi <dmytrop@mellanox.com>
The buildroot should not be deleted at start of %install
https://docs.fedoraproject.org/en-US/packaging-guidelines/#_tags_and_sections

Signed-off-by: Igor Ivanov <igor.ivanov.va@gmail.com>
Remove %{use_extralib} macro

%make_build is a macro that is available starting rpm-4.12.
It is equivalent to `make %{?_smp_mflags}`.

Signed-off-by: Igor Ivanov <igor.ivanov.va@gmail.com>
RDMA communications require that physical memory in the computer
be pinned. Pinning memory is normally a very privileged operation.
In order to allow users other than root to run large RDMA applications,
it will likely be necessary to increase the amount of memory
that non-root users are allowed to pin in the system.
This is done by adding a file in the /etc/security/limits.d/ directory.
It can be done by an administrator of system for user/group that
use libvma.

Signed-off-by: Igor Ivanov <igor.ivanov.va@gmail.com>
This changes are done to be aligned with last Fedora requirements.

Signed-off-by: Igor Ivanov <igor.ivanov.va@gmail.com>
Global buffer_pool lock leads to a lock contention in multi-threaded
applications. We can mitigate it by using ring per core and returning
pbufs to respective cq_mgr cache.

Signed-off-by: Dmytro Podgornyi <dmytrop@mellanox.com>
@swx-jenkins5
Copy link

Can one of the admins verify this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants