Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Netlink conn retry logic v2 #194

Merged
merged 1 commit into from
Oct 2, 2022
Merged

Conversation

turekt
Copy link
Contributor

@turekt turekt commented Sep 29, 2022

Hi,

I am opening a version 2 PR for review and discussion, related to PR #191. I have further investigated the error from issue #175 and I understand why the issue is happening for the nftables lasting connection. The root cause for this issue is that it seems that the acknowledgments for messages are being sent under specific conditions.

By observing the following debug output:

// Batch message init - add table filter, request ack
nl: send msgs: {Header:{Length:20 Type:unknown(16) Flags:request Sequence:1655472395 PID:5277} Data:[0 0 0 10]}
nl: send msgs: {Header:{Length:40 Type:unknown(2560) Flags:request|acknowledge|0x400 Sequence:1655472396 PID:5277} Data:[2 0 0 0 11 0 1 0 102 105 108 116 101 114 0 0 8 0 2 0 0 0 0 0]}
nl: send msgs: {Header:{Length:20 Type:unknown(17) Flags:request Sequence:1655472397 PID:5277} Data:[0 0 0 10]}
// Acknowledge sequence [12 129 172 98] = 1655472396, add table message
nl: recv: {Header:{Length:36 Type:error Flags:0x100 Sequence:1655472396 PID:5277} Data:[0 0 0 0 40 0 0 0 0 10 5 4 12 129 172 98 157 20 0 0]}

// Batch message init - add set test_set, request ack
nl: send msgs: {Header:{Length:20 Type:unknown(16) Flags:request Sequence:1655472398 PID:5277} Data:[0 0 0 10]}
nl: send msgs: {Header:{Length:92 Type:unknown(2569) Flags:request|acknowledge|0x400 Sequence:1655472399 PID:5277} Data:[2 0 0 0 11 0 1 0 102 105 108 116 101 114 0 0 13 0 2 0 116 101 115 116 95 115 101 116 0 0 0 0 8 0 3 0 0 0 0 0 8 0 4 0 0 0 0 41 8 0 5 0 0 0 0 16 8 0 10 0 0 0 0 1 10 0 13 0 0 4 1 0 0 0 0 0]}
nl: send msgs: {Header:{Length:20 Type:unknown(17) Flags:request Sequence:1655472400 PID:5277} Data:[0 0 0 10]}
// Acknowledge sequence [15 129 162 98] = 1655472399, add set message
nl: recv: {Header:{Length:36 Type:error Flags:0x100 Sequence:1655472399 PID:5277} Data:[0 0 0 0 92 0 0 0 9 10 5 4 15 129 172 98 157 20 0 0]}

// Get set by name test_set, not batch
nl: send msgs: {Header:{Length:48 Type:unknown(2570) Flags:request|acknowledge Sequence:1655472401 PID:5277} Data:[2 0 0 0 11 0 1 0 102 105 108 116 101 114 0 0 13 0 2 0 116 101 115 116 95 115 101 116 0 0 0 0]}
// Send test_set data
nl: recv: {Header:{Length:92 Type:unknown(2569) Flags:0 Sequence:1655472401 PID:5277} Data:[2 0 0 4 11 0 1 0 102 105 108 116 101 114 0 0 13 0 2 0 116 101 115 116 95 115 101 116 0 0 0 0 12 0 16 0 0 0 0 0 0 0 0 1 8 0 4 0 0 0 0 41 8 0 5 0 0 0 0 16 10 0 13 0 0 4 1 0 0 0 0 0 4 0 9 0]}

// Get set elements test_set, not batch
nl: send msgs: {Header:{Length:48 Type:unknown(2573) Flags:request|acknowledge|0x300 Sequence:1655472402 PID:5277} Data:[2 0 0 0 11 0 1 0 102 105 108 116 101 114 0 0 13 0 2 0 116 101 115 116 95 115 101 116 0 0 0 0]}
// Acknowledge sequence [17 129 172 97] = 1655472401, get set by name message ! out-of-sync !
nl: recv: {Header:{Length:36 Type:error Flags:0x100 Sequence:1655472401 PID:5277} Data:[0 0 0 0 48 0 0 0 10 10 5 0 17 129 172 98 157 20 0 0]}
// Send test_set elements data, as multi, no ack returned
nl: recv: {Header:{Length:80 Type:unknown(2572) Flags:multi Sequence:1655472402 PID:5277} Data:[2 0 0 4 11 0 1 0 102 105 108 116 101 114 0 0 13 0 2 0 116 101 115 116 95 115 101 116 0 0 0 0 32 0 3 0 28 0 1 0 24 0 1 0 20 0 1 0 119 103 49 0 0 0 0 0 0 0 0 0 0 0 0 0]}
nl: recv: {Header:{Length:52 Type:unknown(2572) Flags:multi Sequence:1655472402 PID:5277} Data:[2 0 0 4 11 0 1 0 102 105 108 116 101 114 0 0 13 0 2 0 116 101 115 116 95 115 101 116 0 0 0 0 4 0 3 0]}
nl: recv: {Header:{Length:20 Type:done Flags:multi Sequence:1655472402 PID:5277} Data:[0 0 0 0]}

// Batch message init - add set element wg1
nl: send msgs: {Header:{Length:20 Type:unknown(16) Flags:request Sequence:1655472403 PID:5277} Data:[0 0 0 10]}
nl: send msgs: {Header:{Length:88 Type:unknown(2572) Flags:request|acknowledge|0x400 Sequence:1655472404 PID:5277} Data:[2 0 0 0 13 0 2 0 116 101 115 116 95 115 101 116 0 0 0 0 8 0 4 0 0 0 0 1 11 0 1 0 102 105 108 116 101 114 0 0 32 0 3 128 28 0 1 128 24 0 1 128 20 0 1 0 119 103 49 0 0 0 0 0 0 0 0 0 0 0 0 0]}
nl: send msgs: {Header:{Length:20 Type:unknown(17) Flags:request Sequence:1655472405 PID:5277} Data:[0 0 0 10]}
// Acknowledge sequence [20 129 172 98] = 1655472404, add set element message
nl: recv: {Header:{Length:36 Type:error Flags:0x100 Sequence:1655472404 PID:5277} Data:[0 0 0 0 88 0 0 0 12 10 5 4 20 129 172 98 157 20 0 0]}

Similar behaviour is observed with GetObj - no acks are given in case NLM_F_MULTI is set in response. I have confirmed this behaviour by checking the kernel code: https://github.com/torvalds/linux/blob/7e062cda7d90543ac8c7700fc7c5527d0c0f22ad/net/netlink/af_netlink.c#L2387-L2390, if NLM_F_DUMP is set, kernel runs the dump function which will always skip sending acks.

This PR introduces a different fix by checking if the connection is lasting and verifying if the send message is flagged as NLM_F_DUMP to determine if we can expect an ack message or not.

Let me know what you think about this approach.

Copy link
Collaborator

@stapelberg stapelberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for revising your approach, this looks much better! :)

conn.go Outdated Show resolved Hide resolved
conn.go Show resolved Hide resolved
conn.go Outdated Show resolved Hide resolved
conn.go Show resolved Hide resolved
Fixes google#175 | Adds receiveAckAware in conn.go | Adds test for "no error" message use case
@stapelberg stapelberg merged commit 535f5eb into google:main Oct 2, 2022
@stapelberg
Copy link
Collaborator

Thanks very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants