add timeout handler and max rtt option #49

zey1996 · 2023-06-21T08:35:58Z

I believe that developers should have a way to know when a data packet is lost or does not return within a specified time.

you can set MaxRtt and OnTimeout func

Ontime func while be call when a request was not answered within a specified time

you only need add:

  pinger.MaxRtt = time.Second * 3

  pinger.OnTimeOut = func(packet *probing.Packet) {
    fmt.Println("timeout", packet.Addr, packet.Rtt, packet.TTL)
  }

It has good compatibility, as if MaxRtt is not set, it will have the same behavior as before.

However, there are still some issues remaining.

The timing is not precise; for example, if a maximum round-trip time (maxrtt) of 100ms is set, but the request time exceeds 101ms, it is possible that the OnRecv function will still be used.
If a request is received after the timeout, the OnDuplicateRecv function will be called.

But I believe that this is acceptable.

ping.go

zey1996 · 2023-06-21T09:27:14Z

I noticed have some memory problem. I think this pr can fix it. the user should set timeout to resolve memory problem.

zey1996 · 2023-06-21T09:30:33Z

https://github.com/prometheus-community/pro-bing/blob/2dcd76c30c10636979f44f178ed9538e5a6e7043/ping.go#LL714C17-L714C17
why not return a error when uuid is not exist?

you can set MaxRtt and OnTimeout func Ontime func while be call when a request was not answered within a specified time Signed-off-by: zhangenyao <zey563416123@gmail.com>

wz2b · 2024-01-25T22:14:10Z

I wrote this comment in a related issue but I think I should repeat it here. I think I get the issue here, but I'm wondering if the solution is to go farther, and not just have the 'native' pinger try to replicate /usr/bin/ping.

If I interpret the man page correctly, when you request a certain number of packets, it sends -c, -w, and -W. If you don't specify a long enough -w (which the plugin calls deadline) then you might have gotten all the responses back, you just didn't wait long enough. The other thing that is confusing is -W (timeout) is only used if there are no responses. if there are responses, then that is ignored and it actually waits two times some RTT that it calculates - presumably from earlier pings, maybe from only the last one.

I think the fundamental problem is how dynamic the 'ping' command line tool is, then 'native' tries to mimic that and brings in the same problems.

I repeatedly get way worse results from this plugin than I get running ping from the command line on the same host. I think the reason for this is ping doesn't behave very well with -c, -w, -W when you have high variability in response time. I'm not sure how I would fix this. I think a total rewrite of 'native' could fix it, but it would want to take on a different strategy, such as launching -c n pings -W t seconds apart, then wait until either all of them come back or -w has passed, and count.

I think I'd make a few changes. One idea is to space pings out at a fixed interval, regardless of RTT. The second would be to put serial numbers of some sort in every ping, then add a few fields to detect if you got any responses out of order and if you got any duplicates. Then, end the entire thing on either deadline or you get count responses. The down side to this is you might have multiple pings in flight at the same time. I think that's rarely a problem, but if we wanted to not break backward compatibility you could signal you want this new behavior by specifying some kind of backward compatibility flag, or replicate 'native' and make a new choice with a new name.

Thoughts ?

matthiasr · 2024-03-27T08:33:08Z

ping.go

@@ -183,6 +184,8 @@ type Pinger struct {
 	// OnRecvError is called when an error occurs while Pinger attempts to receive a packet
 	OnRecvError func(error)

+	// OnTimeOut is called when a packet don't have received after MaxRtt.
+	OnTimeOut func(*Packet)


I find this naming confusing: There's a Timeout field, but OnTimeOut is for MaxRtt.

The spelling is also inconsistent with the comment on MaxRtt – I believe Timeout is more correct.

Agreed. There's a total runtime timeout, and this introduces a packet timeout.

Perhaps OnPacketTimeout and PacketTimeout instead of MaxRtt.

matthiasr · 2024-03-27T08:33:44Z

ping.go

@@ -129,6 +127,9 @@ type Pinger struct {
 	// Timeout specifies a timeout before ping exits, regardless of how many
 	// packets have been received.
 	Timeout time.Duration
+	// MaxRtt If no response is received after this time, OnTimeout is called
+	// important! This option is not guaranteed. and if we receive the packet that was timeout, the function OnDuplicateRecv will be called
+	MaxRtt time.Duration


Not sure either way: Should we capitalize RTT as in common style guidelines, or keep it as Rtt for consistency with RecordRtts?

matthiasr · 2024-03-27T08:35:12Z

ping.go

@@ -520,20 +520,48 @@ func (p *Pinger) runLoop(

 	timeout := time.NewTicker(p.Timeout)
 	interval := time.NewTicker(p.Interval)
+	packetTimeout := time.NewTimer(time.Duration(math.MaxInt64))


I associate timeouts with contexts, would those be suitable here? Would it make a difference to how we would handle them?

We're trying to be very careful about how we use stdlib functions here. We want to avoid any cases where managing the timeout would spawn a goroutine. If you had an interval of 100ms, and a timeout of 15s, that would be creating and destroying a lot of goroutines in the background.

Maybe a context call would be efficient enough? It would be useful to benchmark.

matthiasr · 2024-03-27T08:37:38Z

ping.go

 			p.PacketsRecvDuplicates++
 			if p.OnDuplicateRecv != nil {
 				p.OnDuplicateRecv(inPkt)
 			}
 			return nil
 		}
 		// remove it from the list of sequences we're waiting for so we don't get duplicates.
-		delete(p.awaitingSequences[*pktUUID], pkt.Seq)
+		p.awaitingSequences.removeElem(e)


What protects awaitingSequences from concurrent access?

matthiasr · 2024-03-27T08:46:06Z

seq_map_test.go

+		}
+	})
+
+	t.Run("putElem", func(t *testing.T) {


These subtests mutate a shared map, and rely on it being in a specific state. Please make each test self-contained so it can be run in isolation. Also, I would recommend making the tests run in parallel (t.Parallell() both for the main and the subtests), and try running the tests with -race -count 100 to make sure they behave well. This makes the tests much less brittle and tends to reveal unexpected race conditions.

Also, please either document that seqMap must not be used concurrently, or add a test that actually uses it concurrently.

zey1996 mentioned this pull request Jun 21, 2023

add timeout handler #48

Closed

zey1996 force-pushed the add_maxrtt branch from 2407aba to c225ef2 Compare June 21, 2023 08:42

SuperQ reviewed Jun 21, 2023

View reviewed changes

ping.go Outdated Show resolved Hide resolved

zey1996 force-pushed the add_maxrtt branch from c225ef2 to 2dcd76c Compare June 21, 2023 09:19

zey1996 force-pushed the add_maxrtt branch from 2dcd76c to 488c235 Compare June 21, 2023 09:28

add timeout handler and max rtt option

58f802b

you can set MaxRtt and OnTimeout func Ontime func while be call when a request was not answered within a specified time Signed-off-by: zhangenyao <zey563416123@gmail.com>

zey1996 force-pushed the add_maxrtt branch from 488c235 to 58f802b Compare June 21, 2023 09:38

SuperQ requested a review from metalmatze August 26, 2023 18:35

mpenning mentioned this pull request Nov 8, 2023

Packet Loss is (arguably) calculated incorrectly #13

Open

matthiasr reviewed Mar 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add timeout handler and max rtt option #49

add timeout handler and max rtt option #49

zey1996 commented Jun 21, 2023

zey1996 commented Jun 21, 2023

zey1996 commented Jun 21, 2023 •

edited

Loading

wz2b commented Jan 25, 2024

matthiasr Mar 27, 2024

SuperQ Mar 27, 2024

matthiasr Mar 27, 2024

matthiasr Mar 27, 2024

SuperQ Mar 29, 2024

matthiasr Mar 27, 2024

matthiasr Mar 27, 2024

matthiasr Mar 27, 2024

add timeout handler and max rtt option #49

Are you sure you want to change the base?

add timeout handler and max rtt option #49

Conversation

zey1996 commented Jun 21, 2023

zey1996 commented Jun 21, 2023

zey1996 commented Jun 21, 2023 • edited Loading

wz2b commented Jan 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zey1996 commented Jun 21, 2023 •

edited

Loading