diff --git a/.drone.yml b/.drone.yml index c282c2c4793a..c8d318fb3e91 100644 --- a/.drone.yml +++ b/.drone.yml @@ -767,4 +767,4 @@ volumes: host: path: /var/run/libvirt/ - name: cache - temp: {} \ No newline at end of file + temp: {} diff --git a/docs/adrs/integrate-vpns.md b/docs/adrs/integrate-vpns.md new file mode 100644 index 000000000000..e0f955747e40 --- /dev/null +++ b/docs/adrs/integrate-vpns.md @@ -0,0 +1,76 @@ +# Integrate vpn in k3s + +Date: 2023-04-26 + +## Status + +Under review + +## Context + +There are kubernetes use cases which require a kubernetes cluster to be deployed on a set of heterogeneous nodes, i.e. baremetal nodes, AWS VMs, Azure VMs, etc. Some of these use cases: + * Edge apps which are divided into two parts: a small footprint part deployed at the edge and the "non-edge" part deployed in the DC. These need to be always connected. + * Having a baremetal cluster that requires, only in certain periods, to be extended with hyperscalers VMs to cope out with the demand + * Require cluster to include nodes in different hyperscalers due to resiliency reasons or legal requirements, e.g. GDPR + +As of today, k3s allows to deploy a cluster on a set of heterogeneous nodes by a simple and robust solution. This is achieved by using the [websocket proxy](https://github.com/k3s-io/k3s/blob/master/pkg/agent/run.go#L277) to connect the control-plane of the cluster, i.e. kube-api <==> kubelet, and a vpn-type flannel backend, e.g. wireguard, to connect the data-plane, i.e. pod <==> pod/node. + +The current solution works well but has a few limitations: + * It requires the server to have a public IP + * It requires the server to open ports on that external IP (e.g. 6443) + * Projects like prometheus or metrics-server that attempt to scrape nodes directly will not work, as pod --> host traffic does not work from scratch + * There is no central management point for your vpn. Therefore, it is impossible to: + 1. Have a vpn topology view + 2. Monitor node status, performance, etc + 3. Configure ACLs or other policies + 4. Other features + +There are well known projects which can be used as an alternative to our solution. In general, these projects set up a vpn mesh that includes all nodes and thus we could deploy k3s as if all nodes belonged to the same network. Besides, these projects include a central management point that offer extra features and do not require a public IP to be available. Some of these projects are: tailscale, netmaker, nebula or zerotier. + +We already have users that are operating k3s on top of one of these vpn solutions. However, it is sometimes a pain for them because they are not necessarily network experts and run into integration problems such as: performance issues due to double encapsulation, [mtu issues due to vpn tunneling encapsulation](https://github.com/k3s-io/k3s/issues/4743), strange errors due to wrong vpn configuration, etc. Moreover, they need to first deploy the vpn, collect important information and then deploy k3s using that information. These three steps are not always easy to automate and automation is paramount in edge use cases. + +My proposal is to integrate the best or the two best of these projects into k3s. Integrating in the sense of setting up the vpn and configuring k3s accordingly, so that the user ends up with a working heterogeneous cluster in a matter of seconds by just deploying k3s. At this stage, the proposal is not to incorporate the vpn binaries or daemons inside the k3s binary, we require the user to have the vpn binary installed. + +Therefore, the user would have 1 or 2 alternatives to deploy k3s in an heterogeneous cluster: +1 - Our simple and robust solution +2 - vpn solution (e.g. tailscale) +3 - (Optional) vpn solution 2 (e.g. netmaker) + +In terms of support, we could always offer support for alternative 1 and best effort support for alternative 2 (or 3). We don't control those projects and some of them have proprietary parts or are using a freemium business model (e.g. lmimited number nodes) + +In the first round, only tailscale will be integrated + +### Architecture + +Going a bit deeper into the code, this is a high-level summary of the changes applied to k3s: + * New flag is passed for both server and agent. This flag provide the name and the auth-keys required for the node to be accepted in the vpn and set node configs (e.g. allow routing of podCIDR via the VPN) + * Functions that start the vpn and provide information of its status in "package netutil" (pkg/netutil/vpn.go) + * VPNInfo struct in "package netutil" that includes important vpn endpoint information such as the vpn endpoint IP and the nodeID in the vpn context (for config purposes) + * The collection of vpn endpoint information and its start are implemented by calling the vpn binary. Tailscale has a "not-feature-complete" go client but netmaker does not, so calls to the binary is the common denominator + * In the agents, if a vpn config flag is detected, the vpn is started before the websocket proxy is created, so the agent can contact the server + * In the servers, if a vpn config flag is detected, the vpn is started before the apiserver is started, so that agents can contact the server. AdvertiseIP is changed to use the VPN IP + * If a vpn config flag is detected, the vpn info is collected and the nodeIP replaced before the kubelet certificate is created (due to SAN). This happens in func get(...) of pkg/agent/config/config.go + * A flannel backend is defined: tailscale. These use the general purpose "extension" backend, which executes shell commands when certain events happen (e.g. new node is added) + * When a new node is added, flannel queries the subnet podCIDR for that node. The new backends, by executing the vpn binary with certain flags, allow traffic to/from that subnet podCIDR to flow via the VPN + * In HA scenarios, etcd IP will not use the VPN IP (serverConfig.ControlConfig.PrivateIP) but the main interface. Running etcd traffic over the internet does not make sense. Therefore, k3s-HA over VPN is not supported + + +## Decision + +??? + +## Consequences + +Good +==== +* Users can automatically deploy vpn+k3s in seconds that seamlessly work and connect heterogeneous nodes +* New exciting feature for the community +* We offer not only our simple solution but some extra ones for heterogeneous clusters +* Fills the gap for useful use cases in edge + +Bad +=== +* Integration with 3rd party projects which we do not control and thus complete support is not possible (similar to CNI plugins) +* Some of these projects are not 100% open source (e.g. tailscale) and some are in its infancy (i.e. buggy), e.g. netmaker. +* Not possible to configure a set of heterogeneous nodes in Rancher Management. Therefore, it is currently impossible to deploy through it but could be deployed standalone + diff --git a/pkg/agent/config/config.go b/pkg/agent/config/config.go index 345fec6d1d28..a35410014a17 100644 --- a/pkg/agent/config/config.go +++ b/pkg/agent/config/config.go @@ -28,6 +28,7 @@ import ( "github.com/k3s-io/k3s/pkg/daemons/control/deps" "github.com/k3s-io/k3s/pkg/util" "github.com/k3s-io/k3s/pkg/version" + "github.com/k3s-io/k3s/pkg/vpn" "github.com/pkg/errors" "github.com/rancher/wrangler/pkg/slice" "github.com/sirupsen/logrus" @@ -382,6 +383,29 @@ func get(ctx context.Context, envInfo *cmds.Agent, proxy proxy.Proxy) (*config.N return nil, err } + // If there is a VPN, we must overwrite NodeIP and flannel interface + var vpnInfo vpn.VPNInfo + if envInfo.VPNAuth != "" { + vpnInfo, err = vpn.GetVPNInfo(envInfo.VPNAuth) + if err != nil { + return nil, err + } + if len(vpnInfo.IPs) != 0 { + logrus.Infof("Node-ip changed to %v due to VPN", vpnInfo.IPs) + if len(envInfo.NodeIP) != 0 { + logrus.Warn("VPN provider overrides configured node-ip parameter") + } + if len(envInfo.NodeExternalIP) != 0 { + logrus.Warn("VPN provider overrides node-external-ip parameter") + } + nodeIPs = vpnInfo.IPs + flannelIface, err = net.InterfaceByName(vpnInfo.VPNInterface) + if err != nil { + return nil, errors.Wrapf(err, "unable to find vpn interface: %s", vpnInfo.VPNInterface) + } + } + } + nodeExternalIPs, err := util.ParseStringSliceToIPs(envInfo.NodeExternalIP) if err != nil { return nil, fmt.Errorf("invalid node-external-ip: %w", err) @@ -532,6 +556,11 @@ func get(ctx context.Context, envInfo *cmds.Agent, proxy proxy.Proxy) (*config.N nodeConfig.AgentConfig.CNIBinDir = filepath.Dir(hostLocal) nodeConfig.AgentConfig.CNIConfDir = filepath.Join(envInfo.DataDir, "agent", "etc", "cni", "net.d") nodeConfig.AgentConfig.FlannelCniConfFile = envInfo.FlannelCniConfFile + + // It does not make sense to use VPN without its flannel backend + if envInfo.VPNAuth != "" { + nodeConfig.FlannelBackend = vpnInfo.ProviderName + } } if nodeConfig.Docker { diff --git a/pkg/agent/flannel/setup.go b/pkg/agent/flannel/setup.go index 17ee6999da03..c7b1a5f4fcf2 100644 --- a/pkg/agent/flannel/setup.go +++ b/pkg/agent/flannel/setup.go @@ -75,6 +75,12 @@ const ( "PSK": "%psk%" }` + tailscaledBackend = `{ + "Type": "extension", + "PostStartupCommand": "tailscale up --accept-routes --advertise-routes=%Routes%", + "ShutdownCommand": "tailscale down" +}` + wireguardNativeBackend = `{ "Type": "wireguard", "PersistentKeepaliveInterval": %PersistentKeepaliveInterval%, @@ -232,6 +238,19 @@ func createFlannelConf(nodeConfig *config.Node) error { logrus.Warnf("The ipsec backend is deprecated and will be removed in k3s v1.27; please switch to wireguard-native. Check our docs for information on how to migrate.") case config.FlannelBackendWireguard: logrus.Fatalf("The wireguard backend was deprecated in K3s v1.26, please switch to wireguard-native. Check our docs at docs.k3s.io/installation/network-options for information about how to migrate.") + case config.FlannelBackendTailscale: + var routes string + switch netMode { + case ipv4: + routes = "$SUBNET" + case (ipv4 + ipv6): + routes = "$SUBNET,$IPV6SUBNET" + case ipv6: + routes = "$IPV6SUBNET" + default: + return fmt.Errorf("incorrect netMode for flannel tailscale backend") + } + backendConf = strings.ReplaceAll(tailscaledBackend, "%Routes%", routes) case config.FlannelBackendWireguardNative: mode, ok := backendOptions["Mode"] if !ok { diff --git a/pkg/cli/agent/agent.go b/pkg/cli/agent/agent.go index 501ac1bc0c1b..79067ad158a6 100644 --- a/pkg/cli/agent/agent.go +++ b/pkg/cli/agent/agent.go @@ -11,9 +11,9 @@ import ( "github.com/k3s-io/k3s/pkg/agent" "github.com/k3s-io/k3s/pkg/cli/cmds" "github.com/k3s-io/k3s/pkg/datadir" - "github.com/k3s-io/k3s/pkg/token" "github.com/k3s-io/k3s/pkg/util" "github.com/k3s-io/k3s/pkg/version" + "github.com/k3s-io/k3s/pkg/vpn" "github.com/rancher/wrangler/pkg/signals" "github.com/sirupsen/logrus" "github.com/urfave/cli" @@ -40,7 +40,7 @@ func Run(ctx *cli.Context) error { } if cmds.AgentConfig.TokenFile != "" { - token, err := token.ReadFile(cmds.AgentConfig.TokenFile) + token, err := util.ReadFile(cmds.AgentConfig.TokenFile) if err != nil { return err } @@ -76,5 +76,20 @@ func Run(ctx *cli.Context) error { contextCtx := signals.SetupSignalContext() + if cmds.AgentConfig.VPNAuthFile != "" { + cmds.AgentConfig.VPNAuth, err = util.ReadFile(cmds.AgentConfig.VPNAuthFile) + if err != nil { + return err + } + } + + // Starts the VPN in the agent if config was set up + if cmds.AgentConfig.VPNAuth != "" { + err := vpn.StartVPN(cmds.AgentConfig.VPNAuth) + if err != nil { + return err + } + } + return agent.Run(contextCtx, cfg) } diff --git a/pkg/cli/cmds/agent.go b/pkg/cli/cmds/agent.go index 336906eea353..543b8c8c4273 100644 --- a/pkg/cli/cmds/agent.go +++ b/pkg/cli/cmds/agent.go @@ -30,6 +30,8 @@ type Agent struct { FlannelIface string FlannelConf string FlannelCniConfFile string + VPNAuth string + VPNAuthFile string Debug bool Rootless bool RootlessAlreadyUnshared bool @@ -151,6 +153,18 @@ var ( Usage: "(agent/networking) Override default flannel cni config file", Destination: &AgentConfig.FlannelCniConfFile, } + VPNAuth = &cli.StringFlag{ + Name: "vpn-auth", + Usage: "(agent/networking) (experimental) Credentials for the VPN provider. It must include the provider name and join key in the format name=,joinKey=", + EnvVar: version.ProgramUpper + "_VPN_AUTH", + Destination: &AgentConfig.VPNAuth, + } + VPNAuthFile = &cli.StringFlag{ + Name: "vpn-auth-file", + Usage: "(agent/networking) (experimental) File containing credentials for the VPN provider. It must include the provider name and join key in the format name=,joinKey=", + EnvVar: version.ProgramUpper + "_VPN_AUTH_FILE", + Destination: &AgentConfig.VPNAuthFile, + } ResolvConfFlag = &cli.StringFlag{ Name: "resolv-conf", Usage: "(agent/networking) Kubelet resolv.conf file", @@ -254,6 +268,8 @@ func NewAgentCommand(action func(ctx *cli.Context) error) cli.Command { PreferBundledBin, // Deprecated/hidden below DockerFlag, + VPNAuth, + VPNAuthFile, }, } } diff --git a/pkg/cli/cmds/server.go b/pkg/cli/cmds/server.go index e4b4aa0a4a34..05a8b60b9edc 100644 --- a/pkg/cli/cmds/server.go +++ b/pkg/cli/cmds/server.go @@ -501,6 +501,8 @@ var ServerFlags = []cli.Flag{ FlannelIfaceFlag, FlannelConfFlag, FlannelCniConfFileFlag, + VPNAuth, + VPNAuthFile, ExtraKubeletArgs, ExtraKubeProxyArgs, ProtectKernelDefaultsFlag, diff --git a/pkg/cli/server/server.go b/pkg/cli/server/server.go index 0797ab0bc10c..cab4ee17e1da 100644 --- a/pkg/cli/server/server.go +++ b/pkg/cli/server/server.go @@ -20,9 +20,9 @@ import ( "github.com/k3s-io/k3s/pkg/etcd" "github.com/k3s-io/k3s/pkg/rootless" "github.com/k3s-io/k3s/pkg/server" - "github.com/k3s-io/k3s/pkg/token" "github.com/k3s-io/k3s/pkg/util" "github.com/k3s-io/k3s/pkg/version" + "github.com/k3s-io/k3s/pkg/vpn" "github.com/pkg/errors" "github.com/rancher/wrangler/pkg/signals" "github.com/sirupsen/logrus" @@ -90,6 +90,21 @@ func run(app *cli.Context, cfg *cmds.Server, leaderControllers server.CustomCont } } + if cmds.AgentConfig.VPNAuthFile != "" { + cmds.AgentConfig.VPNAuth, err = util.ReadFile(cmds.AgentConfig.VPNAuthFile) + if err != nil { + return err + } + } + + // Starts the VPN in the server if config was set up + if cmds.AgentConfig.VPNAuth != "" { + err := vpn.StartVPN(cmds.AgentConfig.VPNAuth) + if err != nil { + return err + } + } + agentReady := make(chan struct{}) serverConfig := server.Config{} @@ -99,13 +114,13 @@ func run(app *cli.Context, cfg *cmds.Server, leaderControllers server.CustomCont serverConfig.ControlConfig.AgentToken = cfg.AgentToken serverConfig.ControlConfig.JoinURL = cfg.ServerURL if cfg.AgentTokenFile != "" { - serverConfig.ControlConfig.AgentToken, err = token.ReadFile(cfg.AgentTokenFile) + serverConfig.ControlConfig.AgentToken, err = util.ReadFile(cfg.AgentTokenFile) if err != nil { return err } } if cfg.TokenFile != "" { - serverConfig.ControlConfig.Token, err = token.ReadFile(cfg.TokenFile) + serverConfig.ControlConfig.Token, err = util.ReadFile(cfg.TokenFile) if err != nil { return err } @@ -207,14 +222,31 @@ func run(app *cli.Context, cfg *cmds.Server, leaderControllers server.CustomCont serverConfig.ControlConfig.PrivateIP = util.GetFirstValidIPString(cmds.AgentConfig.NodeIP) } - // if not set, try setting advertise-ip from agent node-external-ip - if serverConfig.ControlConfig.AdvertiseIP == "" && len(cmds.AgentConfig.NodeExternalIP) != 0 { - serverConfig.ControlConfig.AdvertiseIP = util.GetFirstValidIPString(cmds.AgentConfig.NodeExternalIP) - } + // if not set, try setting advertise-ip from agent VPN + if cmds.AgentConfig.VPNAuth != "" { + vpnInfo, err := vpn.GetVPNInfo(cmds.AgentConfig.VPNAuth) + if err != nil { + return err + } + if len(vpnInfo.IPs) != 0 { + logrus.Infof("Advertise-address changed to %v due to VPN", vpnInfo.IPs) + if serverConfig.ControlConfig.AdvertiseIP != "" { + logrus.Warn("Conflict in the config detected. VPN integration overwrites advertise-address but the config is setting the advertise-address parameter") + } + serverConfig.ControlConfig.AdvertiseIP = vpnInfo.IPs[0].String() + } + logrus.Warn("Etcd IP (PrivateIP) remains the local IP. Running etcd traffic over VPN is not recommended due to performance issues") + } else { - // if not set, try setting advertise-ip from agent node-ip - if serverConfig.ControlConfig.AdvertiseIP == "" && len(cmds.AgentConfig.NodeIP) != 0 { - serverConfig.ControlConfig.AdvertiseIP = util.GetFirstValidIPString(cmds.AgentConfig.NodeIP) + // if not set, try setting advertise-ip from agent node-external-ip + if serverConfig.ControlConfig.AdvertiseIP == "" && len(cmds.AgentConfig.NodeExternalIP) != 0 { + serverConfig.ControlConfig.AdvertiseIP = util.GetFirstValidIPString(cmds.AgentConfig.NodeExternalIP) + } + + // if not set, try setting advertise-ip from agent node-ip + if serverConfig.ControlConfig.AdvertiseIP == "" && len(cmds.AgentConfig.NodeIP) != 0 { + serverConfig.ControlConfig.AdvertiseIP = util.GetFirstValidIPString(cmds.AgentConfig.NodeIP) + } } // if we ended up with any advertise-ips, ensure they're added to the SAN list; diff --git a/pkg/cluster/encrypt.go b/pkg/cluster/encrypt.go index d82a75e46d10..1046d61e1a8b 100644 --- a/pkg/cluster/encrypt.go +++ b/pkg/cluster/encrypt.go @@ -12,7 +12,7 @@ import ( "io" "strings" - "github.com/k3s-io/k3s/pkg/token" + "github.com/k3s-io/k3s/pkg/util" "golang.org/x/crypto/pbkdf2" ) @@ -32,7 +32,7 @@ func keyHash(passphrase string) string { // encrypt encrypts a byte slice using aes+gcm with a pbkdf2 key derived from the passphrase and a random salt. // It returns a byte slice containing the salt and base64-encoded ciphertext. func encrypt(passphrase string, plaintext []byte) ([]byte, error) { - salt, err := token.Random(8) + salt, err := util.Random(8) if err != nil { return nil, err } diff --git a/pkg/daemons/config/types.go b/pkg/daemons/config/types.go index 43176b6e4116..37292abaa145 100644 --- a/pkg/daemons/config/types.go +++ b/pkg/daemons/config/types.go @@ -26,6 +26,7 @@ const ( FlannelBackendIPSEC = "ipsec" FlannelBackendWireguard = "wireguard" FlannelBackendWireguardNative = "wireguard-native" + FlannelBackendTailscale = "tailscale" EgressSelectorModeAgent = "agent" EgressSelectorModeCluster = "cluster" EgressSelectorModeDisabled = "disabled" diff --git a/pkg/daemons/control/deps/deps.go b/pkg/daemons/control/deps/deps.go index 6753e6c792ed..6afae2335898 100644 --- a/pkg/daemons/control/deps/deps.go +++ b/pkg/daemons/control/deps/deps.go @@ -23,7 +23,6 @@ import ( "github.com/k3s-io/k3s/pkg/cloudprovider" "github.com/k3s-io/k3s/pkg/daemons/config" "github.com/k3s-io/k3s/pkg/passwd" - "github.com/k3s-io/k3s/pkg/token" "github.com/k3s-io/k3s/pkg/util" "github.com/k3s-io/k3s/pkg/version" certutil "github.com/rancher/dynamiclistener/cert" @@ -269,7 +268,7 @@ func genEncryptedNetworkInfo(controlConfig *config.Control) error { return nil } - psk, err := token.Random(ipsecTokenSize) + psk, err := util.Random(ipsecTokenSize) if err != nil { return err } @@ -288,7 +287,7 @@ func getServerPass(passwd *passwd.Passwd, config *config.Control) (string, error serverPass, _ = passwd.Pass("server") } if serverPass == "" { - serverPass, err = token.Random(16) + serverPass, err = util.Random(16) if err != nil { return "", err } diff --git a/pkg/nodeconfig/nodeconfig.go b/pkg/nodeconfig/nodeconfig.go index b7cda4a0ffe6..d54ad5e432e7 100644 --- a/pkg/nodeconfig/nodeconfig.go +++ b/pkg/nodeconfig/nodeconfig.go @@ -133,6 +133,7 @@ func isSecret(key string) bool { version.ProgramUpper + "_DATASTORE_ENDPOINT", version.ProgramUpper + "_AGENT_TOKEN", version.ProgramUpper + "_CLUSTER_SECRET", + version.ProgramUpper + "_VPN_AUTH", "AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY", "--token", @@ -141,6 +142,7 @@ func isSecret(key string) bool { "--datastore-endpoint", "--etcd-s3-access-key", "--etcd-s3-secret-key", + "--vpn-auth", } for _, secret := range secretData { if key == secret { diff --git a/pkg/passwd/passwd.go b/pkg/passwd/passwd.go index fe83c9c25134..49c46298f9fb 100644 --- a/pkg/passwd/passwd.go +++ b/pkg/passwd/passwd.go @@ -7,7 +7,6 @@ import ( "os" "strings" - "github.com/k3s-io/k3s/pkg/token" "github.com/k3s-io/k3s/pkg/util" ) @@ -83,7 +82,7 @@ func (p *Passwd) EnsureUser(name, role, passwd string) error { } if passwd == "" { - token, err := token.Random(16) + token, err := util.Random(16) if err != nil { return err } diff --git a/pkg/token/read.go b/pkg/token/read.go deleted file mode 100644 index 101046e39a3d..000000000000 --- a/pkg/token/read.go +++ /dev/null @@ -1,38 +0,0 @@ -package token - -import ( - cryptorand "crypto/rand" - "encoding/hex" - "os" - "strings" - "time" - - "github.com/sirupsen/logrus" -) - -func Random(size int) (string, error) { - token := make([]byte, size, size) - _, err := cryptorand.Read(token) - if err != nil { - return "", err - } - return hex.EncodeToString(token), err -} - -func ReadFile(path string) (string, error) { - if path == "" { - return "", nil - } - - for { - tokenBytes, err := os.ReadFile(path) - if err == nil { - return strings.TrimSpace(string(tokenBytes)), nil - } else if os.IsNotExist(err) { - logrus.Infof("Waiting for %s to be available\n", path) - time.Sleep(2 * time.Second) - } else { - return "", err - } - } -} diff --git a/pkg/util/command.go b/pkg/util/command.go new file mode 100644 index 000000000000..5965229e46e2 --- /dev/null +++ b/pkg/util/command.go @@ -0,0 +1,21 @@ +package util + +import ( + "bytes" + "os/exec" +) + +// ExecCommand executes a command using the VPN binary +// In case of error != nil, the string returned var will have more information +func ExecCommand(command string, args []string) (string, error) { + var out, errOut bytes.Buffer + + cmd := exec.Command(command, args...) + cmd.Stdout = &out + cmd.Stderr = &errOut + err := cmd.Run() + if err != nil { + return errOut.String(), err + } + return out.String(), nil +} diff --git a/pkg/util/file.go b/pkg/util/file.go index 3ac0f2081333..2d44ee94e893 100644 --- a/pkg/util/file.go +++ b/pkg/util/file.go @@ -4,6 +4,11 @@ package util import ( "os" + "strings" + "time" + + "github.com/pkg/errors" + "github.com/sirupsen/logrus" ) func SetFileModeForPath(name string, mode os.FileMode) error { @@ -13,3 +18,24 @@ func SetFileModeForPath(name string, mode os.FileMode) error { func SetFileModeForFile(file *os.File, mode os.FileMode) error { return file.Chmod(mode) } + +// ReadFile reads from a file +func ReadFile(path string) (string, error) { + if path == "" { + return "", nil + } + + for start := time.Now(); time.Since(start) < 4*time.Minute; { + vpnBytes, err := os.ReadFile(path) + if err == nil { + return strings.TrimSpace(string(vpnBytes)), nil + } else if os.IsNotExist(err) { + logrus.Infof("Waiting for %s to be available\n", path) + time.Sleep(2 * time.Second) + } else { + return "", err + } + } + + return "", errors.New("Timeout while trying to read the file") +} diff --git a/pkg/util/file_windows.go b/pkg/util/file_windows.go index 2dbc4c98ee26..c6f7601393b7 100644 --- a/pkg/util/file_windows.go +++ b/pkg/util/file_windows.go @@ -2,6 +2,11 @@ package util import ( "os" + "strings" + "time" + + "github.com/pkg/errors" + "github.com/sirupsen/logrus" ) func SetFileModeForPath(name string, mode os.FileMode) error { @@ -11,3 +16,24 @@ func SetFileModeForPath(name string, mode os.FileMode) error { func SetFileModeForFile(file *os.File, mode os.FileMode) error { return nil } + +// ReadFile reads from a file +func ReadFile(path string) (string, error) { + if path == "" { + return "", nil + } + + for start := time.Now(); time.Since(start) < 4*time.Minute; { + vpnBytes, err := os.ReadFile(path) + if err == nil { + return strings.TrimSpace(string(vpnBytes)), nil + } else if os.IsNotExist(err) { + logrus.Infof("Waiting for %s to be available\n", path) + time.Sleep(2 * time.Second) + } else { + return "", err + } + } + + return "", errors.New("Timeout while trying to read the file") +} \ No newline at end of file diff --git a/pkg/util/token.go b/pkg/util/token.go new file mode 100644 index 000000000000..a47a4eefd99d --- /dev/null +++ b/pkg/util/token.go @@ -0,0 +1,15 @@ +package util + +import ( + cryptorand "crypto/rand" + "encoding/hex" +) + +func Random(size int) (string, error) { + token := make([]byte, size, size) + _, err := cryptorand.Read(token) + if err != nil { + return "", err + } + return hex.EncodeToString(token), err +} diff --git a/pkg/vpn/vpn.go b/pkg/vpn/vpn.go new file mode 100644 index 000000000000..d3ae35475af7 --- /dev/null +++ b/pkg/vpn/vpn.go @@ -0,0 +1,126 @@ +package vpn + +import ( + "encoding/json" + "errors" + "fmt" + "net" + "strings" + + "github.com/k3s-io/k3s/pkg/util" + + "github.com/sirupsen/logrus" +) + +const ( + tailscaleIf = "tailscale0" +) + +type TailscaleOutput struct { + TailscaleIPs []string `json:"TailscaleIPs"` +} + +// VPNInfo includes node information of the VPN. It is a general struct in case we want to add more vpn integrations +type VPNInfo struct { + IPs []net.IP + NodeID string + ProviderName string + VPNInterface string +} + +// vpnCliAuthInfo includes auth information of the VPN. It is a general struct in case we want to add more vpn integrations +type vpnCliAuthInfo struct { + Name string + JoinKey string +} + +// StartVPN starts the VPN interface. General function in case we want to add more vpn integrations +func StartVPN(vpnAuthConfigFile string) error { + authInfo, err := getVPNAuthInfo(vpnAuthConfigFile) + if err != nil { + return err + } + + logrus.Infof("Starting VPN: %s", authInfo.Name) + switch authInfo.Name { + case "tailscale": + outpt, err := util.ExecCommand("tailscale", []string{"up", "--authkey", authInfo.JoinKey, "--reset"}) + if err != nil { + return err + } + logrus.Debugf("Output from tailscale up: %v", outpt) + return nil + default: + return fmt.Errorf("Requested VPN: %s is not supported. We currently only support tailscale", authInfo.Name) + } +} + +// GetVPNInfo returns a VPNInfo object with details about the VPN. General function in case we want to add more vpn integrations +func GetVPNInfo(vpnAuth string) (VPNInfo, error) { + authInfo, err := getVPNAuthInfo(vpnAuth) + if err != nil { + return VPNInfo{}, err + } + + if authInfo.Name == "tailscale" { + return getTailscaleInfo() + } + return VPNInfo{}, nil +} + +// getVPNAuthInfo returns the required authInfo object +func getVPNAuthInfo(vpnAuth string) (vpnCliAuthInfo, error) { + var authInfo vpnCliAuthInfo + vpnParameters := strings.Split(vpnAuth, ",") + for _, vpnKeyValues := range vpnParameters { + vpnKeyValue := strings.Split(vpnKeyValues, "=") + switch vpnKeyValue[0] { + case "name": + authInfo.Name = vpnKeyValue[1] + case "joinKey": + authInfo.JoinKey = vpnKeyValue[1] + default: + return vpnCliAuthInfo{}, fmt.Errorf("VPN Error. The passed VPN auth info includes an unknown parameter: %v", vpnKeyValue[0]) + } + } + + if err := isVPNConfigOK(authInfo); err != nil { + return authInfo, err + } + return authInfo, nil +} + +// isVPNConfigOK checks that the config is complete +func isVPNConfigOK(authInfo vpnCliAuthInfo) error { + if authInfo.Name == "tailscale" { + if authInfo.JoinKey == "" { + return errors.New("VPN Error. Tailscale requires a JoinKey") + } + return nil + } + + return errors.New("Requested VPN: " + authInfo.Name + " is not supported. We currently only support tailscale") +} + +// getTailscaleInfo returns the IPs of the interface +func getTailscaleInfo() (VPNInfo, error) { + output, err := util.ExecCommand("tailscale", []string{"status", "--json"}) + if err != nil { + return VPNInfo{}, fmt.Errorf("failed to run tailscale status --json: %v", err) + } + + logrus.Debugf("Output from tailscale status --json: %v", output) + + var tailscaleOutput TailscaleOutput + var internalIPs []net.IP + err = json.Unmarshal([]byte(output), &tailscaleOutput) + if err != nil { + return VPNInfo{}, fmt.Errorf("failed to unmarshal tailscale output: %v", err) + } + + for _, address := range tailscaleOutput.TailscaleIPs { + internalIPs = append(internalIPs, net.ParseIP(address)) + } + + return VPNInfo{IPs: internalIPs, NodeID: "", ProviderName: "tailscale", VPNInterface: tailscaleIf}, nil +} diff --git a/scripts/build b/scripts/build index 8dab7e48ae0c..9322f280b420 100755 --- a/scripts/build +++ b/scripts/build @@ -86,6 +86,8 @@ mkdir -p bin if [ ${ARCH} = armv7l ] || [ ${ARCH} = arm ]; then export GOARCH="arm" export GOARM="7" + # Context: https://github.com/golang/go/issues/58425#issuecomment-1426415912 + export GOEXPERIMENT=nounified fi if [ ${ARCH} = s390x ]; then diff --git a/tests/e2e/tailscale/README.md b/tests/e2e/tailscale/README.md new file mode 100644 index 000000000000..3cf0b28380ce --- /dev/null +++ b/tests/e2e/tailscale/README.md @@ -0,0 +1,18 @@ +# How to run taliscale (E2E) Tests + +Tailscale requires three steps before running the test: + +1 - Log into tailscale or create an account "https://login.tailscale.com/" + +2 - In the `Access controls` section, add the cluster routes in the autoApprovers section. For example: + +``` + "autoApprovers": { + "routes": { + "10.42.0.0/16": ["testing@xyz.com"], + "2001:cafe:42:0::/56": ["testing@xyz.com"], + }, + }, +``` + +3 - In `Settings` > `Keys`, generate an auth key which is Reusable and Ephemeral. That key should be the value of a new env variable `E2E_TAILSCALE_KEY` diff --git a/tests/e2e/tailscale/Vagrantfile b/tests/e2e/tailscale/Vagrantfile new file mode 100644 index 000000000000..bced13d8067e --- /dev/null +++ b/tests/e2e/tailscale/Vagrantfile @@ -0,0 +1,83 @@ +ENV['VAGRANT_NO_PARALLEL'] = 'no' +NODE_ROLES = (ENV['E2E_NODE_ROLES'] || + ["server-0", "agent-0" ]) +NODE_BOXES = (ENV['E2E_NODE_BOXES'] || + ['generic/ubuntu2004', 'generic/ubuntu2004']) +GITHUB_BRANCH = (ENV['E2E_GITHUB_BRANCH'] || "master") +RELEASE_VERSION = (ENV['E2E_RELEASE_VERSION'] || "") +NODE_CPUS = (ENV['E2E_NODE_CPUS'] || 2).to_i +NODE_MEMORY = (ENV['E2E_NODE_MEMORY'] || 2048).to_i +# This key must be created using tailscale web +TAILSCALE_KEY = (ENV['E2E_TAILSCALE_KEY'] || "") +NETWORK4_PREFIX = "10.10.10" +install_type = "" + +def provision(node, roles, role_num, node_num) + node.vm.box = NODE_BOXES[node_num] + node.vm.hostname = "#{roles[0]}-#{role_num}" + node_ip4 = "#{NETWORK4_PREFIX}.#{100+node_num}" + node.vm.network "private_network", ip: node_ip4, netmask: "255.255.255.0" + + scripts_location = Dir.exists?("./scripts") ? "./scripts" : "../scripts" + vagrant_defaults = File.exists?("./vagrantdefaults.rb") ? "./vagrantdefaults.rb" : "../vagrantdefaults.rb" + load vagrant_defaults + + defaultOSConfigure(node.vm) + + install_type = getInstallType(node.vm, RELEASE_VERSION, GITHUB_BRANCH) + + node.vm.provision "Ping Check", type: "shell", inline: "ping -c 2 k3s.io" + node.vm.provision "Install tailscale", type: "shell", inline: "curl -fsSL https://tailscale.com/install.sh | sh" + + if roles.include?("server") && role_num == 0 + server_IP = nil + node.vm.provision :k3s, run: 'once' do |k3s| + k3s.config_mode = '0644' # side-step https://github.com/k3s-io/k3s/issues/4321 + k3s.args = "server " + k3s.config = <<~YAML + cluster-init: true + token: vagrant + vpn-auth: "name=tailscale,joinKey=#{TAILSCALE_KEY}" + YAML + k3s.env = ["K3S_KUBECONFIG_MODE=0644", install_type] + end + end + if roles.include?("agent") + node.vm.provision :k3s, run: 'once' do |k3s| + k3s.config_mode = '0644' # side-step https://github.com/k3s-io/k3s/issues/4321 + k3s.args = "agent " + k3s.config = <<~YAML + server: https://TAILSCALEIP:6443 + token: vagrant + vpn-auth: "name=tailscale,joinKey=#{TAILSCALE_KEY}" + YAML + k3s.env = ["K3S_KUBECONFIG_MODE=0644", "INSTALL_K3S_SKIP_START=true", install_type] + end + end +end + +Vagrant.configure("2") do |config| + config.vagrant.plugins = ["vagrant-k3s", "vagrant-reload", "vagrant-libvirt", "vagrant-scp"] + config.vm.provider "libvirt" do |v| + v.cpus = NODE_CPUS + v.memory = NODE_MEMORY + end + + if NODE_ROLES.kind_of?(String) + NODE_ROLES = NODE_ROLES.split(" ", -1) + end + if NODE_BOXES.kind_of?(String) + NODE_BOXES = NODE_BOXES.split(" ", -1) + end + + # Must iterate on the index, vagrant does not understand iterating + # over the node roles themselves + NODE_ROLES.length.times do |i| + name = NODE_ROLES[i] + config.vm.define name do |node| + roles = name.split("-", -1) + role_num = roles.pop.to_i + provision(node, roles, role_num, i) + end + end +end diff --git a/tests/e2e/tailscale/tailscale_test.go b/tests/e2e/tailscale/tailscale_test.go new file mode 100644 index 000000000000..e61244635d48 --- /dev/null +++ b/tests/e2e/tailscale/tailscale_test.go @@ -0,0 +1,126 @@ +package tailscale + +import ( + "flag" + "fmt" + "os" + "testing" + + "github.com/k3s-io/k3s/tests/e2e" + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" +) + +// Valid nodeOS: generic/ubuntu2004, opensuse/Leap-15.3.x86_64 +var nodeOS = flag.String("nodeOS", "generic/ubuntu2004", "VM operating system") +var serverCount = flag.Int("serverCount", 1, "number of server nodes") +var agentCount = flag.Int("agentCount", 1, "number of agent nodes") +var ci = flag.Bool("ci", false, "running on CI") +var local = flag.Bool("local", false, "deploy a locally built K3s binary") + +func Test_E2ETailscale(t *testing.T) { + flag.Parse() + RegisterFailHandler(Fail) + suiteConfig, reporterConfig := GinkgoConfiguration() + RunSpecs(t, "Tailscale Test Suite", suiteConfig, reporterConfig) +} + +var ( + kubeConfigFile string + serverNodeNames []string + agentNodeNames []string +) + +var _ = ReportAfterEach(e2e.GenReport) + +var _ = Describe("Verify Tailscale Configuration", Ordered, func() { + + It("Starts up with no issues", func() { + var err error + if *local { + serverNodeNames, agentNodeNames, err = e2e.CreateLocalCluster(*nodeOS, *serverCount, *agentCount) + } else { + serverNodeNames, agentNodeNames, err = e2e.CreateCluster(*nodeOS, *serverCount, *agentCount) + } + Expect(err).NotTo(HaveOccurred(), e2e.GetVagrantLog(err)) + fmt.Println("CLUSTER CONFIG") + fmt.Println("OS:", *nodeOS) + fmt.Println("Server Nodes:", serverNodeNames) + fmt.Println("Agent Nodes:", agentNodeNames) + kubeConfigFile, err = e2e.GenKubeConfigFile(serverNodeNames[0]) + Expect(err).NotTo(HaveOccurred()) + }) + + // Server node needs to be ready before we continue + It("Checks Node Status", func() { + Eventually(func(g Gomega) { + nodes, err := e2e.ParseNodes(kubeConfigFile, false) + g.Expect(err).NotTo(HaveOccurred()) + for _, node := range nodes { + g.Expect(node.Status).Should(Equal("Ready")) + } + }, "620s", "5s").Should(Succeed()) + _, err := e2e.ParseNodes(kubeConfigFile, true) + Expect(err).NotTo(HaveOccurred()) + }) + + It("Change agent's config", func() { + nodeIPs, _ := e2e.GetNodeIPs(kubeConfigFile) + cmd := fmt.Sprintf("sudo sed -i 's/TAILSCALEIP/%s/g' /etc/rancher/k3s/config.yaml", nodeIPs[0].IPv4) + for _, agent := range agentNodeNames { + _, err := e2e.RunCmdOnNode(cmd, agent) + Expect(err).NotTo(HaveOccurred()) + } + }) + + It("Restart agents", func() { + err := e2e.RestartCluster(agentNodeNames) + Expect(err).NotTo(HaveOccurred(), e2e.GetVagrantLog(err)) + }) + + It("Checks Node Status", func() { + Eventually(func(g Gomega) { + nodes, err := e2e.ParseNodes(kubeConfigFile, false) + g.Expect(err).NotTo(HaveOccurred()) + for _, node := range nodes { + g.Expect(node.Status).Should(Equal("Ready")) + } + }, "620s", "5s").Should(Succeed()) + _, err := e2e.ParseNodes(kubeConfigFile, true) + Expect(err).NotTo(HaveOccurred()) + }) + + It("Verifies that server and agent have a tailscale IP as nodeIP", func() { + nodeIPs, err := e2e.GetNodeIPs(kubeConfigFile) + Expect(err).NotTo(HaveOccurred()) + for _, node := range nodeIPs { + Expect(node.IPv4).Should(ContainSubstring("100.")) + } + }) + + It("Verify routing is correct and uses tailscale0 interface for internode traffic", func() { + // table 52 is the one configured by tailscale + cmd := "ip route show table 52" + for _, node := range append(serverNodeNames, agentNodeNames...) { + output, err := e2e.RunCmdOnNode(cmd, node) + fmt.Println(err) + Expect(err).NotTo(HaveOccurred()) + Expect(output).Should(ContainSubstring("10.42.")) + } + }) + +}) + +var failed bool +var _ = AfterEach(func() { + failed = failed || CurrentSpecReport().Failed() +}) + +var _ = AfterSuite(func() { + if failed && !*ci { + fmt.Println("FAILED!") + } else { + Expect(e2e.DestroyCluster()).To(Succeed()) + Expect(os.Remove(kubeConfigFile)).To(Succeed()) + } +})