Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

functional-tester: enable TLS, phase 1 #9534

Merged
merged 14 commits into from
Apr 6, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG-3.4.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ See [code changes](https://github.com/coreos/etcd/compare/v3.3.0...v3.4.0) and [
- Futhermore, when `--auto-compaction-mode=periodic --auto-compaction-retention=30m` and writes per minute are about 1000, `v3.3.0`, `v3.3.1`, and `v3.3.2` compact revision 30000, 33000, and 36000, for every 3-minute, while `v3.3.3` *or later* compacts revision 30000, 60000, and 90000, for every 30-minute.
- Improve [lease expire/revoke operation performance](https://github.com/coreos/etcd/pull/9418), address [lease scalability issue](https://github.com/coreos/etcd/issues/9496).
- Make [Lease `Lookup` non-blocking with concurrent `Grant`/`Revoke`](https://github.com/coreos/etcd/pull/9229).
- Improve functional tester coverage: use [proxy layer to run network fault tests in CIs](https://github.com/coreos/etcd/pull/9081), enable [TLS](https://github.com/coreos/etcd/issues/8943), add [liveness mode](https://github.com/coreos/etcd/issues/9230), [shuffle test sequence](https://github.com/coreos/etcd/issues/9381).
- Improve [functional tester](https://github.com/coreos/etcd/tree/master/tools/functional-tester) coverage: use [proxy layer to run network fault tests in CI](https://github.com/coreos/etcd/pull/9081), enable [TLS both for server and client](https://github.com/coreos/etcd/pull/9534), add [liveness mode](https://github.com/coreos/etcd/issues/9230), and [shuffle test sequence](https://github.com/coreos/etcd/issues/9381).

### Breaking Changes

Expand Down
14 changes: 1 addition & 13 deletions tools/functional-tester/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,7 @@

etcd functional test suite tests the functionality of an etcd cluster with a focus on failure resistance under high pressure. It sets up an etcd cluster and inject failures into the cluster by killing the process or isolate the network of the process. It expects the etcd cluster to recover within a short amount of time after fixing the fault.

etcd functional test suite has two components: etcd-agent and etcd-tester. etcd-agent runs on every test machines and etcd-tester is a single controller of the test. etcd-tester controls all the etcd-agent to start etcd clusters and simulate various failure cases.

## Requirements

The environment of the cluster must be stable enough, so etcd test suite can assume that most of the failures are generated by itself.

## etcd agent

etcd agent is a daemon on each machines. It can start, stop, restart, isolate and terminate an etcd process. The agent exposes these functionality via HTTP RPC.

## etcd tester

etcd functional tester control the progress of the functional tests. It calls the RPC of the etcd agent to simulate various test cases. For example, it can start a three members cluster by sending three start RPC calls to three different etcd agents. It can make one of the member failed by sending stop RPC call to one etcd agent.
etcd functional test suite has two components: etcd-agent and etcd-tester. etcd-agent runs on every test machine, and etcd-tester is a single controller of the test. tester controls agents: start etcd process, stop, terminate, inject failures, and so on.

### Run locally

Expand Down
161 changes: 150 additions & 11 deletions tools/functional-tester/agent/handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,11 @@ package agent
import (
"errors"
"fmt"
"io/ioutil"
"net/url"
"os"
"os/exec"
"path/filepath"
"syscall"
"time"

Expand Down Expand Up @@ -72,6 +74,7 @@ func (srv *Server) handleInitialStartEtcd(req *rpcpb.Request) (*rpcpb.Response,
return &rpcpb.Response{
Success: false,
Status: fmt.Sprintf("%q is not valid; last server operation was %q", rpcpb.Operation_InitialStartEtcd.String(), srv.last.String()),
Member: req.Member,
}, nil
}

Expand All @@ -84,16 +87,22 @@ func (srv *Server) handleInitialStartEtcd(req *rpcpb.Request) (*rpcpb.Response,
}
srv.lg.Info("created base directory", zap.String("path", srv.Member.BaseDir))

if err = srv.createEtcdFile(); err != nil {
if err = srv.saveEtcdLogFile(); err != nil {
return nil, err
}

srv.creatEtcdCmd()

err = srv.startEtcdCmd()
if err != nil {
if err = srv.saveTLSAssets(); err != nil {
return nil, err
}
if err = srv.startEtcdCmd(); err != nil {
return nil, err
}
srv.lg.Info("started etcd", zap.String("command-path", srv.etcdCmd.Path))
if err = srv.loadAutoTLSAssets(); err != nil {
return nil, err
}

// wait some time for etcd listener start
// before setting up proxy
Expand All @@ -104,10 +113,12 @@ func (srv *Server) handleInitialStartEtcd(req *rpcpb.Request) (*rpcpb.Response,

return &rpcpb.Response{
Success: true,
Status: "successfully started etcd!",
Status: "start etcd PASS",
Member: srv.Member,
}, nil
}

// TODO: support TLS
func (srv *Server) startProxy() error {
if srv.Member.EtcdClientProxy {
advertiseClientURL, advertiseClientURLPort, err := getURLAndPort(srv.Member.Etcd.AdvertiseClientURLs[0])
Expand All @@ -133,7 +144,7 @@ func (srv *Server) startProxy() error {
}

if srv.Member.EtcdPeerProxy {
advertisePeerURL, advertisePeerURLPort, err := getURLAndPort(srv.Member.Etcd.InitialAdvertisePeerURLs[0])
advertisePeerURL, advertisePeerURLPort, err := getURLAndPort(srv.Member.Etcd.AdvertisePeerURLs[0])
if err != nil {
return err
}
Expand Down Expand Up @@ -200,7 +211,7 @@ func (srv *Server) stopProxy() {
}
}

func (srv *Server) createEtcdFile() error {
func (srv *Server) saveEtcdLogFile() error {
var err error
srv.etcdLogFile, err = os.Create(srv.Member.EtcdLogPath)
if err != nil {
Expand All @@ -225,6 +236,128 @@ func (srv *Server) creatEtcdCmd() {
srv.etcdCmd.Stderr = srv.etcdLogFile
}

func (srv *Server) saveTLSAssets() error {
// if started with manual TLS, stores TLS assets
// from tester/client to disk before starting etcd process
// TODO: not implemented yet
if !srv.Member.Etcd.ClientAutoTLS {
if srv.Member.Etcd.ClientCertAuth {
return fmt.Errorf("manual TLS setup is not implemented yet, but Member.Etcd.ClientCertAuth is %v", srv.Member.Etcd.ClientCertAuth)
}
if srv.Member.Etcd.ClientCertFile != "" {
return fmt.Errorf("manual TLS setup is not implemented yet, but Member.Etcd.ClientCertFile is %q", srv.Member.Etcd.ClientCertFile)
}
if srv.Member.Etcd.ClientKeyFile != "" {
return fmt.Errorf("manual TLS setup is not implemented yet, but Member.Etcd.ClientKeyFile is %q", srv.Member.Etcd.ClientKeyFile)
}
if srv.Member.Etcd.ClientTrustedCAFile != "" {
return fmt.Errorf("manual TLS setup is not implemented yet, but Member.Etcd.ClientTrustedCAFile is %q", srv.Member.Etcd.ClientTrustedCAFile)
}
}
if !srv.Member.Etcd.PeerAutoTLS {
if srv.Member.Etcd.PeerClientCertAuth {
return fmt.Errorf("manual TLS setup is not implemented yet, but Member.Etcd.PeerClientCertAuth is %v", srv.Member.Etcd.PeerClientCertAuth)
}
if srv.Member.Etcd.PeerCertFile != "" {
return fmt.Errorf("manual TLS setup is not implemented yet, but Member.Etcd.PeerCertFile is %q", srv.Member.Etcd.PeerCertFile)
}
if srv.Member.Etcd.PeerKeyFile != "" {
return fmt.Errorf("manual TLS setup is not implemented yet, but Member.Etcd.PeerKeyFile is %q", srv.Member.Etcd.PeerKeyFile)
}
if srv.Member.Etcd.PeerTrustedCAFile != "" {
return fmt.Errorf("manual TLS setup is not implemented yet, but Member.Etcd.PeerTrustedCAFile is %q", srv.Member.Etcd.PeerTrustedCAFile)
}
}

// TODO
return nil
}

func (srv *Server) loadAutoTLSAssets() error {
// if started with auto TLS, sends back TLS assets to tester/client
if srv.Member.Etcd.ClientAutoTLS {
// in case of slow disk
time.Sleep(time.Second)

fdir := filepath.Join(srv.Member.Etcd.DataDir, "fixtures", "client")

srv.lg.Info(
"loading client TLS assets",
zap.String("dir", fdir),
zap.String("endpoint", srv.EtcdClientEndpoint),
)

certPath := filepath.Join(fdir, "cert.pem")
if !fileutil.Exist(certPath) {
return fmt.Errorf("cannot find %q", certPath)
}
certData, err := ioutil.ReadFile(certPath)
if err != nil {
return fmt.Errorf("cannot read %q (%v)", certPath, err)
}
srv.Member.ClientCertData = string(certData)

keyPath := filepath.Join(fdir, "key.pem")
if !fileutil.Exist(keyPath) {
return fmt.Errorf("cannot find %q", keyPath)
}
keyData, err := ioutil.ReadFile(keyPath)
if err != nil {
return fmt.Errorf("cannot read %q (%v)", keyPath, err)
}
srv.Member.ClientKeyData = string(keyData)

srv.lg.Info(
"loaded client TLS assets",
zap.String("peer-cert-path", certPath),
zap.Int("peer-cert-length", len(certData)),
zap.String("peer-key-path", keyPath),
zap.Int("peer-key-length", len(keyData)),
)
}
if srv.Member.Etcd.ClientAutoTLS {
// in case of slow disk
time.Sleep(time.Second)

fdir := filepath.Join(srv.Member.Etcd.DataDir, "fixtures", "peer")

srv.lg.Info(
"loading client TLS assets",
zap.String("dir", fdir),
zap.String("endpoint", srv.EtcdClientEndpoint),
)

certPath := filepath.Join(fdir, "cert.pem")
if !fileutil.Exist(certPath) {
return fmt.Errorf("cannot find %q", certPath)
}
certData, err := ioutil.ReadFile(certPath)
if err != nil {
return fmt.Errorf("cannot read %q (%v)", certPath, err)
}
srv.Member.PeerCertData = string(certData)

keyPath := filepath.Join(fdir, "key.pem")
if !fileutil.Exist(keyPath) {
return fmt.Errorf("cannot find %q", keyPath)
}
keyData, err := ioutil.ReadFile(keyPath)
if err != nil {
return fmt.Errorf("cannot read %q (%v)", keyPath, err)
}
srv.Member.PeerKeyData = string(keyData)

srv.lg.Info(
"loaded peer TLS assets",
zap.String("peer-cert-path", certPath),
zap.Int("peer-cert-length", len(certData)),
zap.String("peer-key-path", keyPath),
zap.Int("peer-key-length", len(keyData)),
)
}
return nil
}

// start but do not wait for it to complete
func (srv *Server) startEtcdCmd() error {
return srv.etcdCmd.Start()
Expand All @@ -233,12 +366,17 @@ func (srv *Server) startEtcdCmd() error {
func (srv *Server) handleRestartEtcd() (*rpcpb.Response, error) {
srv.creatEtcdCmd()

srv.lg.Info("restarting etcd")
err := srv.startEtcdCmd()
if err != nil {
var err error
if err = srv.saveTLSAssets(); err != nil {
return nil, err
}
if err = srv.startEtcdCmd(); err != nil {
return nil, err
}
srv.lg.Info("restarted etcd", zap.String("command-path", srv.etcdCmd.Path))
if err = srv.loadAutoTLSAssets(); err != nil {
return nil, err
}

// wait some time for etcd listener start
// before setting up proxy
Expand All @@ -251,7 +389,8 @@ func (srv *Server) handleRestartEtcd() (*rpcpb.Response, error) {

return &rpcpb.Response{
Success: true,
Status: "successfully restarted etcd!",
Status: "restart etcd PASS",
Member: srv.Member,
}, nil
}

Expand Down Expand Up @@ -293,7 +432,7 @@ func (srv *Server) handleFailArchive() (*rpcpb.Response, error) {
}
srv.lg.Info("archived data", zap.String("base-dir", srv.Member.BaseDir))

if err = srv.createEtcdFile(); err != nil {
if err = srv.saveEtcdLogFile(); err != nil {
return nil, err
}

Expand Down
20 changes: 18 additions & 2 deletions tools/functional-tester/rpcpb/etcd_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,19 @@ var etcdFields = []string{

"ListenClientURLs",
"AdvertiseClientURLs",
"ClientAutoTLS",
"ClientCertAuth",
"ClientCertFile",
"ClientKeyFile",
"ClientTrustedCAFile",

"ListenPeerURLs",
"InitialAdvertisePeerURLs",
"AdvertisePeerURLs",
"PeerAutoTLS",
"PeerClientCertAuth",
"PeerCertFile",
"PeerKeyFile",
"PeerTrustedCAFile",

"InitialCluster",
"InitialClusterState",
Expand Down Expand Up @@ -72,12 +83,17 @@ func (cfg *Etcd) Flags() (fs []string) {
default:
panic(fmt.Errorf("field %q (%v) cannot be parsed", name, fv.Type().Kind()))
}

fname := field.Tag.Get("yaml")

// TODO: remove this
if fname == "initial-corrupt-check" {
fname = "experimental-" + fname
}
fs = append(fs, fmt.Sprintf("--%s=%s", fname, sv))

if sv != "" {
fs = append(fs, fmt.Sprintf("--%s=%s", fname, sv))
}
}
return fs
}
63 changes: 42 additions & 21 deletions tools/functional-tester/rpcpb/etcd_config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,34 +21,55 @@ import (

func TestEtcdFlags(t *testing.T) {
cfg := &Etcd{
Name: "s1",
DataDir: "/tmp/etcd-agent-data-1/etcd.data",
WALDir: "/tmp/etcd-agent-data-1/etcd.data/member/wal",
HeartbeatIntervalMs: 100,
ElectionTimeoutMs: 1000,
ListenClientURLs: []string{"127.0.0.1:1379"},
AdvertiseClientURLs: []string{"127.0.0.1:13790"},
ListenPeerURLs: []string{"127.0.0.1:1380"},
InitialAdvertisePeerURLs: []string{"127.0.0.1:13800"},
InitialCluster: "s1=127.0.0.1:13800,s2=127.0.0.1:23800,s3=127.0.0.1:33800",
InitialClusterState: "new",
InitialClusterToken: "tkn",
SnapshotCount: 10000,
QuotaBackendBytes: 10740000000,
PreVote: true,
InitialCorruptCheck: true,
Name: "s1",
DataDir: "/tmp/etcd-agent-data-1/etcd.data",
WALDir: "/tmp/etcd-agent-data-1/etcd.data/member/wal",

HeartbeatIntervalMs: 100,
ElectionTimeoutMs: 1000,

ListenClientURLs: []string{"https://127.0.0.1:1379"},
AdvertiseClientURLs: []string{"https://127.0.0.1:13790"},
ClientAutoTLS: true,
ClientCertAuth: false,
ClientCertFile: "",
ClientKeyFile: "",
ClientTrustedCAFile: "",

ListenPeerURLs: []string{"https://127.0.0.1:1380"},
AdvertisePeerURLs: []string{"https://127.0.0.1:13800"},
PeerAutoTLS: true,
PeerClientCertAuth: false,
PeerCertFile: "",
PeerKeyFile: "",
PeerTrustedCAFile: "",

InitialCluster: "s1=https://127.0.0.1:13800,s2=https://127.0.0.1:23800,s3=https://127.0.0.1:33800",
InitialClusterState: "new",
InitialClusterToken: "tkn",

SnapshotCount: 10000,
QuotaBackendBytes: 10740000000,

PreVote: true,
InitialCorruptCheck: true,
}

exp := []string{
"--name=s1",
"--data-dir=/tmp/etcd-agent-data-1/etcd.data",
"--wal-dir=/tmp/etcd-agent-data-1/etcd.data/member/wal",
"--heartbeat-interval=100",
"--election-timeout=1000",
"--listen-client-urls=127.0.0.1:1379",
"--advertise-client-urls=127.0.0.1:13790",
"--listen-peer-urls=127.0.0.1:1380",
"--initial-advertise-peer-urls=127.0.0.1:13800",
"--initial-cluster=s1=127.0.0.1:13800,s2=127.0.0.1:23800,s3=127.0.0.1:33800",
"--listen-client-urls=https://127.0.0.1:1379",
"--advertise-client-urls=https://127.0.0.1:13790",
"--auto-tls=true",
"--client-cert-auth=false",
"--listen-peer-urls=https://127.0.0.1:1380",
"--initial-advertise-peer-urls=https://127.0.0.1:13800",
"--peer-auto-tls=true",
"--peer-client-cert-auth=false",
"--initial-cluster=s1=https://127.0.0.1:13800,s2=https://127.0.0.1:23800,s3=https://127.0.0.1:33800",
"--initial-cluster-state=new",
"--initial-cluster-token=tkn",
"--snapshot-count=10000",
Expand Down
Loading