Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rafthttp: probe connection for Raft message transport #10022

Closed
wants to merge 5 commits into from

Commits on Aug 29, 2018

  1. etcdserver/api/rafthttp: rename to "pipelineProber"

    Preliminary work to add prober to "streamRt"
    
    Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
    gyuho committed Aug 29, 2018
    Configuration menu
    Copy the full SHA
    8446d14 View commit details
    Browse the repository at this point in the history
  2. etcdserver/api/rafthttp: display roundtripper name in warnings

    Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
    gyuho committed Aug 29, 2018
    Configuration menu
    Copy the full SHA
    f0d7043 View commit details
    Browse the repository at this point in the history
  3. etcdserver/api/rafthttp: add Raft message stream prober

    In our production cluster, we found one TCP connection had >8-sec
    latencies to a remote peer, but "etcd_network_peer_round_trip_time_seconds"
    metrics shows <1-sec latency distribution, which means we
    weren't sampling enough, or all the latency spikes happen
    outside of snapshot pipeline connection. The later is most likely
    the case, since the cluster had leader elections from missing
    heartbeats.
    
    This PR adds another probing routine to monitor the connection
    for Raft message transports.
    
    Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
    gyuho committed Aug 29, 2018
    Configuration menu
    Copy the full SHA
    6c94deb View commit details
    Browse the repository at this point in the history
  4. etcdserver/api/rafthttp: add 'ConnectionType' label to round-trip metric

    We need to track which connection had high latency spikes.
    
    ```
    etcd_network_peer_round_trip_time_seconds_bucket{ConnectionType="ROUND_TRIPPER_RAFT_MESSAGE",To="729934363faa4a24",le="0.0001"} 0
    etcd_network_peer_round_trip_time_seconds_bucket{ConnectionType="ROUND_TRIPPER_RAFT_MESSAGE",To="729934363faa4a24",le="0.0002"} 1
    etcd_network_peer_round_trip_time_seconds_bucket{ConnectionType="ROUND_TRIPPER_SNAPSHOT",To="729934363faa4a24",le="0.0001"} 0
    etcd_network_peer_round_trip_time_seconds_bucket{ConnectionType="ROUND_TRIPPER_SNAPSHOT",To="729934363faa4a24",le="0.0002"} 1
    ```
    
    Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
    gyuho committed Aug 29, 2018
    Configuration menu
    Copy the full SHA
    b796e43 View commit details
    Browse the repository at this point in the history
  5. etcdserver/api/rafthttp: fix prober test failures

    Fix
    
    ```
    panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    	panic: runtime error: invalid memory address or nil pointer dereference
    ```
    
    Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
    gyuho committed Aug 29, 2018
    Configuration menu
    Copy the full SHA
    37cf84c View commit details
    Browse the repository at this point in the history