Skip to content
This repository has been archived by the owner on Dec 7, 2023. It is now read-only.

Enable multiple non-IP interface to be connected via tc redirect #836

Merged
merged 28 commits into from
Jun 25, 2021
Merged

Enable multiple non-IP interface to be connected via tc redirect #836

merged 28 commits into from
Jun 25, 2021

Conversation

networkop
Copy link
Contributor

@networkop networkop commented May 10, 2021

Hi @darkowlzz @stealthybox 👋
This is the PR that covers #832 and #831. At a high level it has the following impact:

1. Introduced a new CL argument --sandbox-env-vars accepting a comma-separated list of key=value pairs.

These values are passed to the respective container runtimes and used as env variables. I've had a choice two either create a new API version and add env vars as a new spec field or pass them around in VM's annotations. I've opted for the second option to minimise the impact of this change. I'm not sure if it's a good idea, happy to change it if necessary.

2. Introduced a new bool arg called wait to StartVM function - if set to false, this bypasses the waitForSpawn check.

This flag defaults to true for all existing function invocations to preserve backwards compatibility. However, when used via API, users can set this to false and skip the check for ignite-spawn. The purpose is to get the container PID to configure additional interfaces before ignite-spawn is fully initialised.

3. Ignite-spawn can wait for a number of interfaces to be connected before firing up the VM.

This is controlled through an environment variable called IGNITE_INTFS. To preserve backwards compatibility it defaults to 1, so without any variables set, the behaviour is the same as now. However, if this value is set to 1 on higher, SetupContainerNetworking will wait for that number of interfaces to be connected (up to a maximum timeout).

4. Ignite-spawn will connect additional veth and tap interfaces via tc redirect.

For backwards compatibility, the behaviour is to always use the current way of interconnecting interfaces (via bridge). However, if there's no IP on the interface, it will be interconnected with a VM via tc redirect.


In general, all these changes strive to preserve the happy-path behavior of pre-existing code, so no major changes are expected for existing users.

@networkop networkop marked this pull request as draft May 10, 2021 21:38
@networkop networkop marked this pull request as ready for review May 11, 2021 13:58
@networkop
Copy link
Contributor Author

It seems like a number of e2e tests are failing, however they are also failing on the main branch. I'm not sure if anyone is aware of that?

=== RUN   TestRunGitops
    gitops_test.go:134: assertion failed: error is not nil: exit status 1: cmd:
        "/home/null/ignite/bin/ignite rm -f my-vm"
        time="2021-05-11T15:14:54+01:00" level=fatal msg="unknown runtime \"\""

    gitops_test.go:134: failed to run "/home/null/ignite/bin/ignite rm -f my-vm": exit status 1

And then a number of lifecycle tests all fail when VMs are re-started, e.g.:

=== RUN   TestVMLifecycleWithDockerAndDockerBridge
    vm_lifecycle_test.go:52: assertion failed: error is not nil: exit status 1: cmd:
        "/home/null/ignite/bin/ignite start e2e-test-vm-lifecycle-docker-and-docker-bridge --runtime=docker --network-plugin=docker-bridge"
        time="2021-05-11T15:17:37+01:00" level=info msg="Networking is handled by \"docker-bridge\""
        time="2021-05-11T15:17:37+01:00" level=info msg="Started Firecracker VM \"999d41998dd1afa4\" in a container with ID \"7bce9d72f1fbbd8eb9b857dda35daf7eefec1d2faf7bff46c9b5d477341fdbf5\""
        time="2021-05-11T15:17:47+01:00" level=fatal msg="timeout waiting for ignite-spawn startup"

@networkop networkop marked this pull request as draft May 13, 2021 16:03
Copy link
Contributor

@darkowlzz darkowlzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this.
Left an initial review with some suggestions.

func SetupContainerNetworking() ([]DHCPInterface, error) {
var (
maxIntfsVar = "IGNITE_INTFS"
defaultMaxIntfs = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment above describing these may be helpful. Intfs may be confusing for new readers without the context. Could be though of relating to some filesystem 😄

interval := 1 * time.Second
timeout := 1 * time.Minute
maxIntfs, err := strconv.Atoi(args[maxIntfsVar])
if err != nil || maxIntfs < 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it'll be helpful to log this error. Silently defaulting may be surprising and hard to debug.

return allIfaces, dhcpIfaces, nil
}

func isIgnored(link net.Interface, allIfaces *[]FCInterface) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allIfaces isn't being used below and can be removed?


interfacesCount++
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to perform the error check below first and then check for noIPs and setup tc redirect in this block?

@@ -86,7 +86,7 @@ func StopVM(vm *api.VM, kill, silent bool) error {
}

// Remove VM networking
if err = removeNetworking(vm.PrefixedID(), vm.Spec.Network.Ports...); err != nil {
if err = removeNetworking(vm.Status.Runtime.ID, vm.Spec.Network.Ports...); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer: #827 (comment) . Just a reference to the discussion around this change.

@@ -96,3 +100,18 @@ func patchStopped(vm *api.VM) error {
patch := []byte(`{"status":{"running":false,"network":null,"runtime":null,"startTime":null}}`)
return patchutil.NewPatcher(scheme.Serializer).ApplyOnFile(constants.IGNITE_SPAWN_VM_FILE_PATH, patch, vm.GroupVersionKind())
}

func parseEnvVars(vars []string) map[string]string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is used just once, above, maybe os.Environ() should be called in this function itself for now. Keeping true to the meaning of the function name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO using map of config key=values without mutating the ignite-spawn Environment is better, but this is conflating a lot of config ideas. (flags producing ENV vs. flags overriding ENV)

In general though, is this the proper strategy for config?

Since we already have the information in the VM object, ignite-spawn has access to it and it can read the value from it directly.

That will get of all of the flag-passing and parsing code within ignite & ignite-spawn.

nit:
The names of the variables/annotations in this code should not reference any idea of ENV.
We should call it what it is with a proper domain.
There should be an explicit key for each thing you want.

metadata:
  annotations:
    ignite.weave.works/interface-count: '3'
    # ignite.weave.works/interface-mode/eth0: 'dhcp-bridge'
    # ignite.weave.works/interface-mode/eth2: 'tc-redirect'
    # ignite.weave.works/interface-mode/macvtap-uvnrt: 'macvtap'

@@ -95,6 +97,9 @@ func (cf *CreateFlags) NewCreateOptions(args []string, fs *flag.FlagSet) (*Creat
return nil, err
}

// Save env variables
baseVM.SetAnnotation(constants.IGNITE_ENV_VAR_ANNOTATION, cf.EnvVars)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to provide a way to set this cf.EnvVars via the ignite configuration (global) as well, like other similar VM configurations in

type ConfigurationSpec struct {
Runtime igniteRuntime.Name `json:"runtime,omitempty"`
NetworkPlugin igniteNetwork.PluginName `json:"networkPlugin,omitempty"`
VMDefaults VMSpec `json:"vmDefaults,omitempty"`
IDPrefix string `json:"idPrefix,omitempty"`
}
to allow setting default for all the VMs and allow overriding that through cf flag and individual VM configuration.
Since this is set as annotation, the baseVM should have the annotation inherited from the ignite configuration. Annotation in VM configuration should override/append the baseVM configuration. The VM create flag value should append/override everything before.

@@ -78,7 +78,7 @@ func Start(so *StartOptions, fs *flag.FlagSet) error {
return err
}

if err := operations.StartVM(so.vm, so.Debug); err != nil {
if err := operations.StartVM(so.vm, so.Debug, true); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe before setting true here, the VM configuration/annotation needs to be checked?

defaultMaxIntfs = 1
)

type FCInterface struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A description about this interface would be nice to have in the godoc.

@darkowlzz
Copy link
Contributor

It seems like a number of e2e tests are failing, however they are also failing on the main branch. I'm not sure if anyone is aware of that?

We are aware of some flaky tests. But these look new failures, haven't seen them fail before.

@networkop
Copy link
Contributor Author

thanks @darkowlzz , I'm still working on this, so converted this to draft again. I'm also on PTO next week , so let's put this on hold until Monday week. I'll try to push out some updates by the 24th.

Comment on lines 84 to 93
// Only set envVars if at least one was configured
var envVars []string
parts := strings.Split(vm.GetAnnotation((constants.IGNITE_ENV_VAR_ANNOTATION)), ",")
for _, part := range parts {
if part == "" {
continue
}
envVars = append(envVars, part)
}
config.EnvVars = envVars
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay to remove this section since the VM Config file is already a sufficient mechanism to pass the config to ignite-spawn.

MACFilter string
}

func SetupContainerNetworking(args map[string]string) ([]FCInterface, []DHCPInterface, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here instead of map[string]string containing custom-parse env, we can just parse annotations.
Alternatively, you could pass the entire VM object.
Ultimately, we will have a specific VM.Spec.NetworkInfo sub-type that could be passed here should the annotations graduate into a future API version.

@stealthybox
Copy link
Contributor

@networkop, thanks so much for working on this change. I'm really excited about it and you've inspired a really good use of annotations for low-cost experiments across ignite.

@networkop
Copy link
Contributor Author

brilliant, thanks for your comments @darkowlzz and @stealthybox. I think all of your suggestions make perfect sense and I'll make the necessary changes later this week. Hope to see y'all next Monday.

@networkop
Copy link
Contributor Author

@darkowlzz @stealthybox PR is ready for review.

@networkop networkop requested a review from stealthybox June 13, 2021 07:37
case supportedModes.tc:
result[k] = v
default:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we print a warning for unsupported mode here?
Ideally, it will show up in the ignite spawn log.

@stealthybox
Copy link
Contributor

commit 219c8e89f2b80373cfbbf53de6c21101f24cb672
Author: leigh capili <leigh@null.net>
Date:   Mon Jun 21 10:34:48 2021 -0600

    Use set for interface modes

diff --git pkg/container/network.go pkg/container/network.go
index 2c28e99c..af00e47f 100644
--- pkg/container/network.go
+++ pkg/container/network.go
@@ -33,18 +33,20 @@ ip link set eth0 master br0
 ip link set vm0 master br0
 */
 
-// Array of container interfaces to ignore (not forward to vm)
+// ignoreInterfaces is a Set of container interface names to ignore (not forward to vm)
 var ignoreInterfaces = map[string]struct{}{
 	"lo":    {},
 	"sit0":  {},
 	"tunl0": {},
 }
 
-var supportedModes = struct {
-	dhcp, tc string
-}{
-	dhcp: "dhcp-bridge",
-	tc:   "tc-redirect",
+const MODE_DHCP = "dhcp-bridge"
+const MODE_TC = "tc-redirect"
+
+// supportedModes is a Set of Mode strings
+var supportedModes = map[string]struct{}{
+	MODE_DHCP: {},
+	MODE_TC:   {},
 }
 
 var mainInterface = "eth0"
@@ -57,7 +59,7 @@ func SetupContainerNetworking(vm *api.VM) (firecracker.NetworkInterfaces, []DHCP
 
 	// Setting up mainInterface if not defined
 	if _, ok := vmIntfs[mainInterface]; !ok {
-		vmIntfs[mainInterface] = supportedModes.dhcp
+		vmIntfs[mainInterface] = MODE_DHCP
 	}
 
 	interval := 1 * time.Second
@@ -116,7 +118,7 @@ func collectInterfaces(vmIntfs map[string]string) (bool, error) {
 		// default fallback behaviour to always consider intfs with an address
 		addrs, _ := intf.Addrs()
 		if len(addrs) > 0 {
-			vmIntfs[intf.Name] = supportedModes.dhcp
+			vmIntfs[intf.Name] = MODE_DHCP
 		}
 	}
 
@@ -127,7 +129,7 @@ func collectInterfaces(vmIntfs map[string]string) (bool, error) {
 		}
 
 		// for DHCP interface, we need to make sure IP and route exist
-		if mode == supportedModes.dhcp {
+		if mode == MODE_DHCP {
 			intf := foundIntfs[intfName]
 			_, _, _, noIPs, err := getAddress(&intf)
 			if err != nil {
@@ -163,7 +165,7 @@ func networkSetup(fcIntfs *firecracker.NetworkInterfaces, dhcpIntfs *[]DHCPInter
 		}
 
 		switch vmIntfs[intfName] {
-		case supportedModes.dhcp:
+		case MODE_DHCP:
 			ipNet, gw, err := takeAddress(intf)
 			if err != nil {
 				return fmt.Errorf("error parsing interface %q: %s", intfName, err)
@@ -185,7 +187,7 @@ func networkSetup(fcIntfs *firecracker.NetworkInterfaces, dhcpIntfs *[]DHCPInter
 					HostDevName: dhcpIface.VMTAP,
 				},
 			})
-		case supportedModes.tc:
+		case MODE_TC:
 			tcInterface, err := addTcRedirect(intf)
 			if err != nil {
 				log.Errorf("Failed to setup tc redirect %v", err)
@@ -494,22 +496,20 @@ func maskString(mask net.IPMask) string {
 func parseExtraIntfs(vm *api.VM) map[string]string {
 	result := make(map[string]string)
 
-	for k, v := range vm.GetObjectMeta().Annotations {
-		if !strings.HasPrefix(k, constants.IGNITE_INTERFACE_ANNOTATION) {
+	for intf, mode := range vm.GetObjectMeta().Annotations {
+		if !strings.HasPrefix(intf, constants.IGNITE_INTERFACE_ANNOTATION) {
 			continue
 		}
 
-		k = strings.TrimPrefix(k, constants.IGNITE_INTERFACE_ANNOTATION)
-		if k == "" {
+		intf = strings.TrimPrefix(intf, constants.IGNITE_INTERFACE_ANNOTATION)
+		if intf == "" {
 			continue
 		}
 
-		switch v {
-		case supportedModes.dhcp:
-			result[k] = v
-		case supportedModes.tc:
-			result[k] = v
-		default:
+		if _, ok := supportedModes[mode]; ok {
+			result[intf] = mode
+		} else {
+			log.Warnf("VM specifies unrecognized mode %q for interface %q", mode, intf)
 			continue
 		}
 

// This func returns true if it's done, and optionally an error
retry, err := networkSetup(&dhcpIfaces)
retry, err := collectInterfaces(vmIntfs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noting here, this will always have eth0 as the main interface included here

}
}

return fmt.Errorf("timeout waiting for ignite-spawn startup")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move https://github.com/weaveworks/ignite/pull/836/files#diff-fe685b5ad77268d55269c38f483185a644e6b741eddc7b86511bbe3d51fe0ad7L133-L136 here and call providers.Client.VMs().Set(vm) one more time.

	// Set the start time for the VM
	startTime := apiruntime.Timestamp()
	vm.Status.StartTime = &startTime
	vm.Status.Network.Plugin = providers.NetworkPluginName

Goal is for waitForSpawn to update the VM object with the proper VM start time.

@stealthybox
Copy link
Contributor

diff --git pkg/operations/start.go pkg/operations/start.go
index b553fc3c..06d868a1 100644
--- pkg/operations/start.go
+++ pkg/operations/start.go
@@ -153,30 +153,28 @@ func StartVMNonBlocking(vm *api.VM, debug bool) (*VMChannels, error) {
 		log.Infof("Started Firecracker VM %q in a container with ID %q", vm.GetUID(), containerID)
 	}
 
-	// TODO: This is temporary until we have proper communication to the container
-	go waitForSpawn(vm, vmChans)
-
 	// Set the container ID for the VM
 	vm.Status.Runtime.ID = containerID
 	vm.Status.Runtime.Name = providers.RuntimeName
 
-	// Set the start time for the VM
-	startTime := apiruntime.Timestamp()
-	vm.Status.StartTime = &startTime
-	vm.Status.Network.Plugin = providers.NetworkPluginName
-
 	// Append non-loopback runtime IP addresses of the VM to its state
 	for _, addr := range result.Addresses {
 		if !addr.IP.IsLoopback() {
 			vm.Status.Network.IPAddresses = append(vm.Status.Network.IPAddresses, addr.IP)
 		}
 	}
+	vm.Status.Network.Plugin = providers.NetworkPluginName
+
+	// write the API object in a non-running state before we wait for spawn's network logic and firecracker
+	if err := providers.Client.VMs().Set(vm); err != nil {
+		return vmChans, err
+	}
 
-	// Set the VM's status to running
-	vm.Status.Running = true
+	// TODO: This is temporary until we have proper communication to the container
+	// It's best to perform any imperative changes to the VM object pointer before this go-routine starts
+	go waitForSpawn(vm, vmChans)
 
-	// Write the state changes
-	return vmChans, providers.Client.VMs().Set(vm)
+	return vmChans, nil
 }
 
 // verifyPulled pulls the ignite-spawn image if it's not present
@@ -202,12 +200,26 @@ func waitForSpawn(vm *api.VM, vmChans *VMChannels) {
 	const timeout = 10 * time.Second
 	const checkInterval = 100 * time.Millisecond
 
-	startTime := time.Now()
-	for time.Since(startTime) < timeout {
+	timer := time.Now()
+	for time.Since(timer) < timeout {
 		time.Sleep(checkInterval)
 
 		if util.FileExists(path.Join(vm.ObjectPath(), constants.PROMETHEUS_SOCKET)) {
-			vmChans.SpawnFinished <- nil
+			// Before we write the VM, we should REALLY REALLY re-fetch the API object from storage
+			vm, err := providers.Client.VMs().Get(vm.GetUID())
+			if err != nil {
+				vmChans.SpawnFinished <- err
+			}
+
+			// Set the start time for the VM
+			startTime := apiruntime.Timestamp()
+			vm.Status.StartTime = &startTime
+
+			// Set the VM's status to running
+			vm.Status.Running = true
+
+			// Write the state changes, send any errors through the channel
+			vmChans.SpawnFinished <- providers.Client.VMs().Set(vm)
 			return
 		}
 	}


idOut, idErr := idCmd.Cmd.CombinedOutput()
assert.Check(t, idErr, fmt.Sprintf("vm id not found: \n%q\n%s", idCmd.Cmd, idOut))
vmID := string(idOut[:len(idOut)-2])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@darkowlzz suggests doing something similar to TestRunWithoutName but for the VM ID instead of slicing the stdout:

		With("--template={{.ObjectMeta.Name}}")

@networkop
Copy link
Contributor Author

networkop commented Jun 21, 2021

@darkowlzz @stealthybox I'm running into an issue after our latest waitForSpawn refactoring. Since we moved the vm.Status.Running = true into a goroutine, I can no longer ignore the vmChannels , i.e. I can't do:

_, err = operations.StartVMNonBlocking(so.vm, so.Debug)

because the VM status never changes to running and from the pov of ignite it remains in the following state:

VM ID                   IMAGE                           KERNEL                                  SIZE    CPUS    MEMORY          CREATED      STATUS          IPS             PORTS   NAME
a3710f7f8f76395e        weaveworks/ignite-ubuntu:latest weaveworks/ignite-kernel:5.4.108        3.0 GB  1       800.0 MB        32s ago      *Up <nil>       192.168.223.2           e2e-test-vm-multinet
WARN[0000] The symbol * on the VM status indicates that the VM manifest on disk may not be up-to-date with the actual VM status from the container runtime

I'm not sure what's the best way forward now. On the one hand, we can't move Running=true before waitForSpawn because the VM is not started, on the other hand, I need to return from the StartVMNonBlocking in order to connect additional interfaces.

In fact, thinking about it, waitForSpawn may never complete when invoked from the CLI and vmChannels are ignored, since when the main() completes, all of the child goroutines are cleaned up, so the final timestamp/state changes may never get done.

What I've done for now, is moved the vm.Status.Running = true back into the StartVMNonBlocking function. This would mean that the VM will transition to running state eariler but the timestamp will still be correct. It feels a bit wrong. Almost as if we need some new state, not just running true/false but something in the middle to indicate that it has been started but may not be running yet.

WDYT?

@networkop
Copy link
Contributor Author

@darkowlzz @stealthybox can you have a look at the latest commit? I've re-worked one of the e2e tests to use ignite API instead of CLI. This would allow us to drop the --wait CLI flag (because of the above reasons) and wait for the vmChannels to return before doing exec.
If you think it looks alright, I'll update the remaining two tests in a similar way.

@stealthybox
Copy link
Contributor

@networkop I think this is a good way forward for the tests 👍

Perhaps we could copy the waitForSpawn code into containerlab?
I think the better structural change is to support the channels in containerlab.

@networkop
Copy link
Contributor Author

cool, I think this is it then, we've got all the changes from yesterday merged. two major deviations are

  • dropped the --wait flag in CLI
  • refactored e2e tests to use API

what do y'all think?

@networkop networkop requested a review from stealthybox June 23, 2021 18:59
Comment on lines +39 to +40
_ = igniteDocker.SetDockerRuntime()
_ = igniteDocker.SetDockerNetwork()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a natural consequence of globals being side-effected here that these tests cannot be parallelized with other test runs that need different values.
This doesn't affect the existing test suites which make individual invocations of the ignite binary.
Also, we already run our e2e suite in serial anyway.

All of this is fine, but it's worth noting. 👍

Comment on lines +120 to +123
// check that the VM has started before trying exec
if err := <-vmChans.SpawnFinished; err != nil {
t.Fatalf("failed to start a VM: \n%q\n", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that these tests exercise the wait code that polls for the Firecracker Prometheus socket.

Copy link
Contributor

@stealthybox stealthybox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@darkowlzz should probably review as well considering the new e2e technique

@networkop do you think you can make containerlab work with the channels?

@networkop
Copy link
Contributor Author

@stealthybox maybe, I had an idea of adding an extra func to clab's runtime interface, something like waitForWarmup that will make sure all of the individual containers have started. I need to see how it will work with the rest of the runtimes.

Copy link
Contributor

@darkowlzz darkowlzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good to me.
Left a comment about the wait in exec command which may not be concerning.

@@ -30,5 +31,8 @@ func (ef *ExecFlags) NewExecOptions(vmMatch string, command ...string) (eo *Exec

// Exec executes command in a VM based on the provided ExecOptions.
func Exec(eo *ExecOptions) error {
if err := waitForSSH(eo.vm, constants.SSH_DEFAULT_TIMEOUT_SECONDS, 5); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be a concern, but I noticed that runSSH() below accepts a timeout parameter. If this waitForSSH() with default timeout of 5 seconds runs before runSSH(), the user provided timeout will not work, waitForSSH() will return error after waiting for 5 seconds and runSSH() with user provided timeout will never be executed.
Should the timeout for waitForSSH() be the same as provided in the ExecOptions.Timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, that's a good catch. I've added it there to workaround the behavior introduced by the --wait flag and I forgot to remove it. Will fix this up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my above conclusion was wrong. Complete removal of waitForSSH leads to multinet tests failure when the prometheus socket is up -> VM is started but SSH is still not up yet. So we need it there, but I've changed the 5 second timeout to the user-provided timeout value.

@stealthybox stealthybox merged commit 2f840ad into weaveworks:main Jun 25, 2021
@stealthybox
Copy link
Contributor

Nice
Shall I purchase some ice-cream cake to celebrate?

@networkop
Copy link
Contributor Author

brilliant! thanks @darkowlzz and @stealthybox for all the support 🙏

@darkowlzz darkowlzz added the area/networking Issues related to networking label Jun 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/networking Issues related to networking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants