Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

balancer: create many tcp connections use the same endpoint #11371

Closed
cfc4n opened this issue Nov 19, 2019 · 8 comments
Closed

balancer: create many tcp connections use the same endpoint #11371

cfc4n opened this issue Nov 19, 2019 · 8 comments

Comments

@cfc4n
Copy link
Contributor

cfc4n commented Nov 19, 2019

What version of etcd are you using?

3.3.17

What version of Go are you using (go version)?

go version go1.13.1 linux/amd64

What operating system (Linux, Windows, …) and version?

CentOS release 6.5 * 2

What did you do?

Client: 192.168.1.199
Etcd Node A:192.168.1.101

What did you expect to see?

step 1 : run the code

package main

import (
	"context"
	"fmt"
	"github.com/coreos/etcd/clientv3"
	"google.golang.org/grpc/grpclog"
	"log"
	"os"
	"os/signal"
	"syscall"
	"time"
)

const ETCD_CONNECT_TIMEOUT = 5 * time.Second

func main() {
	log.Println("try to connect etcd cluster :%s", time.Now())

	Etcd_dsn := []string{
		"http://192.168.1.101:2379",
		"http://192.168.1.101:2379",
		"http://192.168.1.101:2379",
	}

	loger := grpclog.NewLoggerV2WithVerbosity(os.Stderr, os.Stderr, os.Stderr, 1)
	clientv3.SetLogger(loger)


	cfg := clientv3.Config{
		Endpoints:   Etcd_dsn,
		DialTimeout: ETCD_CONNECT_TIMEOUT,
	}

	client, err := clientv3.New(cfg)
	if err != nil {
		panic(err)
	}

	log.Println("connected etcd cluster")

	log.Println("get etcd key /sec/hids/")
	ctx, ctxCancelFun := context.WithTimeout(context.Background(), time.Second*5)
	_, err = client.Get(ctx, "/sec/hids/", clientv3.WithCountOnly(), clientv3.WithPrefix())
	defer ctxCancelFun()
	if err != nil {
		panic(err)
	}
	log.Println("foreach result")
	log.Println("start goroutine")
	// set iptables on etcd node A...
	// iptables -A INPUT -p tcp -s 192.168.1.199 -j DROP
	go func() {
		<- time.After(time.Second * 10)
		log.Println("start to get key ...")
		ctx, ctxCancelFun := context.WithTimeout(context.Background(), time.Second*5)
		defer ctxCancelFun()
		_, err := client.Get(ctx, "/cc/etcd-dns")

		if err != nil {
			log.Println(err)
		}
		log.Println("start to get key ... end")
	}()

	signaler()
	fmt.Println("exit")
}


func signaler() {
	var ch chan os.Signal
	ch = make(chan os.Signal, 2)
	signal.Notify(ch, syscall.SIGHUP, syscall.SIGTERM, syscall.SIGINT, syscall.SIGPROF)
	for {
		switch <-ch {
		case syscall.SIGHUP:
			os.Exit(0)
		case syscall.SIGTERM:
			os.Exit(0)
		case syscall.SIGINT:
			os.Exit(0)
		}
	}

}

set iptables command

set iptables command on etcd node A After start goroutine strings display to simulate CPU busy, and can't accept TCP socket.

iptables -A INPUT -p tcp -s 192.168.1.199 -j DROP

check TCP connections number

netstat -antp|grep 2379
tcp        0      0 192.168.1.199:54323       192.168.1.101:2379      SYN_SENT
tcp        0      0 192.168.1.199:54324       192.168.1.101:2379      SYN_SENT
tcp        0      0 192.168.1.199:54325       192.168.1.101:2379      SYN_SENT
tcp        0      0 192.168.1.199:54326       192.168.1.101:2379      SYN_SENT
tcp        0      0 192.168.1.199:54327       192.168.1.101:2379      SYN_SENT
tcp        0      0 192.168.1.199:54328       192.168.1.101:2379      SYN_SENT
tcp        0      0 192.168.1.199:54329       192.168.1.101:2379      SYN_SENT

There are 6~7 connections with SYN_SENT status TCP connection.
In fact, It will create a tcp connection with every endpoints

Etcd client create many tcp connects with the same endpoint when client balancer works.

What did you see instead?

It create one tcp connection only ,or create an other tcp socket insead when connect failed.

More detail

casestudy

In my case, that is SYN-FLOOD attack.
etcd cluster:7 nodes
etcd clients:200K +
DB size:6-8G

  1. each of clients create a new tcp connection with next endpoint when network fluctuations.
  2. client connection has been disconnected, triggering .
  3. etcd process crashed when send packets to a disconnection client, triggering 3.3.7 panic: send on closed channel #9956 .
  4. supervisor pulls etcd process up again.
  5. etcd node rejoins the etcd cluster, and the cluster leader send snapshots to the etcd new node, and block heartbeats from all nodes. (fixed by learner)
  6. other etcd node will election a new leader.
  7. Next endpoint receives a lot of TCP requests instantaneously, with high CPU load and unable to accept TCP SYN.
  8. Connections of client's is SYN_SENT status.
  9. Grpc balancer mechanism will try tryAllAddrs function, retry all endpoints with a backoff mechanism.
  10. Etcd client will send a signal to grpc's balancer mechanism per hb.healthCheckTimeout (min 3S, max set by client connect) in the healthBalancer.updateUnhealthy function, trigger tryAllAddrs again, reconnect all of endpoints.
  11. All of clients will trigger DDOS attacks to etcd cluster.

bugs

@cfc4n
Copy link
Contributor Author

cfc4n commented Nov 26, 2019

I recived a reply from grpc-go community like this:

It's a feature for the users to create multiple TCP connections to same endpoint, and the users have the full control. I don't think this is a problem in gRPC.

Closing. Please reply if you have more updates.

And I agree with that etcd client need to full control about create TCP connections.

and what about you think ?
/cc @xiang90 @gyuho

@xiang90
Copy link
Contributor

xiang90 commented Nov 28, 2019

It create one tcp connection only ,or create an other tcp socket insead when connect failed.

this is the behavior of the old etcd grpc client. i think the team made the decision to pre create tcp connections with every given endpoint. and we feel it is fine since the number of etcd server should be small.

what is your concern on creating a tcp connection on ever given etcd server endpoint?

/cc @gyuho @jpbetz

@stale
Copy link

stale bot commented Apr 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 6, 2020
@stale stale bot closed this as completed Apr 27, 2020
@cfc4n
Copy link
Contributor Author

cfc4n commented Jun 12, 2020

reopen, assign @cfc4n .

ref. #9949

@xiang90 xiang90 reopened this Jun 12, 2020
@stale stale bot removed the stale label Jun 12, 2020
@gyuho
Copy link
Contributor

gyuho commented Jun 12, 2020

this is the behavior of the old etcd grpc client. i think the team made the decision to pre create tcp connections with every given endpoint. and we feel it is fine since the number of etcd server should be small.

@xiang90 is right. The main motivation was to "simplify" the previous implementation. The old balancer used to keep only one connection, but the code became too complicated and error-prone.

@cfc4n
Copy link
Contributor Author

cfc4n commented Aug 21, 2020

ping...ping...

@cfc4n
Copy link
Contributor Author

cfc4n commented Sep 24, 2020

I found a similar bug #grpc/grpc-go#3667 , fixed by #grpc/grpc-go#2985 .
and I'll continue to follow up on this issue.

@stale
Copy link

stale bot commented Dec 23, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 23, 2020
@stale stale bot closed this as completed Jan 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants