Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add timeout handler and max rtt option #49

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zey1996
Copy link

@zey1996 zey1996 commented Jun 21, 2023

I believe that developers should have a way to know when a data packet is lost or does not return within a specified time.

you can set MaxRtt and OnTimeout func

Ontime func while be call when a request was not answered within a specified time

you only need add:

  pinger.MaxRtt = time.Second * 3

  pinger.OnTimeOut = func(packet *probing.Packet) {
    fmt.Println("timeout", packet.Addr, packet.Rtt, packet.TTL)
  }

It has good compatibility, as if MaxRtt is not set, it will have the same behavior as before.

However, there are still some issues remaining.

  • The timing is not precise; for example, if a maximum round-trip time (maxrtt) of 100ms is set, but the request time exceeds 101ms, it is possible that the OnRecv function will still be used.
  • If a request is received after the timeout, the OnDuplicateRecv function will be called.

But I believe that this is acceptable.

ping.go Outdated Show resolved Hide resolved
@zey1996
Copy link
Author

zey1996 commented Jun 21, 2023

I noticed have some memory problem. I think this pr can fix it. the user should set timeout to resolve memory problem.

@zey1996
Copy link
Author

zey1996 commented Jun 21, 2023

you can set MaxRtt and OnTimeout func

Ontime func while be call when a request was not answered within a specified time

Signed-off-by: zhangenyao <zey563416123@gmail.com>
@wz2b
Copy link

wz2b commented Jan 25, 2024

I wrote this comment in a related issue but I think I should repeat it here. I think I get the issue here, but I'm wondering if the solution is to go farther, and not just have the 'native' pinger try to replicate /usr/bin/ping.

If I interpret the man page correctly, when you request a certain number of packets, it sends -c, -w, and -W. If you don't specify a long enough -w (which the plugin calls deadline) then you might have gotten all the responses back, you just didn't wait long enough. The other thing that is confusing is -W (timeout) is only used if there are no responses. if there are responses, then that is ignored and it actually waits two times some RTT that it calculates - presumably from earlier pings, maybe from only the last one.

I think the fundamental problem is how dynamic the 'ping' command line tool is, then 'native' tries to mimic that and brings in the same problems.

I repeatedly get way worse results from this plugin than I get running ping from the command line on the same host. I think the reason for this is ping doesn't behave very well with -c, -w, -W when you have high variability in response time. I'm not sure how I would fix this. I think a total rewrite of 'native' could fix it, but it would want to take on a different strategy, such as launching -c n pings -W t seconds apart, then wait until either all of them come back or -w has passed, and count.

I think I'd make a few changes. One idea is to space pings out at a fixed interval, regardless of RTT. The second would be to put serial numbers of some sort in every ping, then add a few fields to detect if you got any responses out of order and if you got any duplicates. Then, end the entire thing on either deadline or you get count responses. The down side to this is you might have multiple pings in flight at the same time. I think that's rarely a problem, but if we wanted to not break backward compatibility you could signal you want this new behavior by specifying some kind of backward compatibility flag, or replicate 'native' and make a new choice with a new name.

Thoughts ?

@@ -183,6 +184,8 @@ type Pinger struct {
// OnRecvError is called when an error occurs while Pinger attempts to receive a packet
OnRecvError func(error)

// OnTimeOut is called when a packet don't have received after MaxRtt.
OnTimeOut func(*Packet)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this naming confusing: There's a Timeout field, but OnTimeOut is for MaxRtt.

The spelling is also inconsistent with the comment on MaxRtt – I believe Timeout is more correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. There's a total runtime timeout, and this introduces a packet timeout.

Perhaps OnPacketTimeout and PacketTimeout instead of MaxRtt.

@@ -129,6 +127,9 @@ type Pinger struct {
// Timeout specifies a timeout before ping exits, regardless of how many
// packets have been received.
Timeout time.Duration
// MaxRtt If no response is received after this time, OnTimeout is called
// important! This option is not guaranteed. and if we receive the packet that was timeout, the function OnDuplicateRecv will be called
MaxRtt time.Duration

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure either way: Should we capitalize RTT as in common style guidelines, or keep it as Rtt for consistency with RecordRtts?

@@ -520,20 +520,48 @@ func (p *Pinger) runLoop(

timeout := time.NewTicker(p.Timeout)
interval := time.NewTicker(p.Interval)
packetTimeout := time.NewTimer(time.Duration(math.MaxInt64))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I associate timeouts with contexts, would those be suitable here? Would it make a difference to how we would handle them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to be very careful about how we use stdlib functions here. We want to avoid any cases where managing the timeout would spawn a goroutine. If you had an interval of 100ms, and a timeout of 15s, that would be creating and destroying a lot of goroutines in the background.

Maybe a context call would be efficient enough? It would be useful to benchmark.

p.PacketsRecvDuplicates++
if p.OnDuplicateRecv != nil {
p.OnDuplicateRecv(inPkt)
}
return nil
}
// remove it from the list of sequences we're waiting for so we don't get duplicates.
delete(p.awaitingSequences[*pktUUID], pkt.Seq)
p.awaitingSequences.removeElem(e)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What protects awaitingSequences from concurrent access?

}
})

t.Run("putElem", func(t *testing.T) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These subtests mutate a shared map, and rely on it being in a specific state. Please make each test self-contained so it can be run in isolation. Also, I would recommend making the tests run in parallel (t.Parallell() both for the main and the subtests), and try running the tests with -race -count 100 to make sure they behave well. This makes the tests much less brittle and tends to reveal unexpected race conditions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please either document that seqMap must not be used concurrently, or add a test that actually uses it concurrently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants