Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QEMU: Unblock bootpd if start fails to due to blocking #16789

Merged
merged 3 commits into from
Jul 11, 2023

Conversation

spowelljr
Copy link
Member

Closes #16776

Even if you have bootpd unblocked in the builtin macOS firewall, you can run into the situation where it's still being blocked. To resolve it you have to run the unblock command again.

This PR checks the output of a start failure on QEMU, and if it's due to bootpd being blocked runs the unblock command and retries the start again.

Before:

$ minikube start --driver qemu
😄  minikube v1.30.1 on Darwin 13.4.1 (arm64)
✨  Using the qemu2 driver based on user configuration
🌐  Automatically selected the socket_vmnet network
👍  Starting control plane node minikube in cluster minikube
🔥  Creating qemu2 VM (CPUs=2, Memory=4000MB, Disk=20000MB) ...

❌  Exiting due to IF_BOOTPD_FIREWALL: ip not found: failed to get IP address: could not find an IP address for fe:70:96:84:47:94
💡  Suggestion: 

    Your firewall is likely blocking bootpd, to unblock it run:
    sudo /usr/libexec/ApplicationFirewall/socketfilterfw --add /usr/libexec/bootpd
    sudo /usr/libexec/ApplicationFirewall/socketfilterfw --unblock /usr/libexec/bootpd

╭───────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                           │
│    😿  If the above advice does not help, please let us know:                             │
│    👉  https://github.com/kubernetes/minikube/issues/new/choose                           │
│                                                                                           │
│    Please run `minikube logs --file=logs.txt` and attach logs.txt to the GitHub issue.    │
│                                                                                           │
╰───────────────────────────────────────────────────────────────────────────────────────────╯

After:

minikube start --driver qemu
😄  minikube v1.30.1 on Darwin 13.4.1 (arm64)
✨  Using the qemu2 driver based on user configuration
🌐  Automatically selected the socket_vmnet network
👍  Starting control plane node minikube in cluster minikube
🔥  Creating qemu2 VM (CPUs=2, Memory=4000MB, Disk=20000MB) ...
🔑  Your firewall is blocking bootpd which is required for socket_vmnet. The following commands will be executed to unblock bootpd:

    $ sudo /usr/libexec/ApplicationFirewall/socketfilterfw --add /usr/libexec/bootpd 
    $ sudo /usr/libexec/ApplicationFirewall/socketfilterfw --unblock /usr/libexec/bootpd 


Password: 
🔄  Sucessfully unblocked bootpd process from firewall, retrying
🔥  Deleting "minikube" in qemu2 ...
🤦  StartHost failed, but will try again: creating host: create: creating: ip not found: failed to get IP address: could not find an IP address for 52:cd:17:52:c9:4d
🔥  Creating qemu2 VM (CPUs=2, Memory=4000MB, Disk=20000MB) ...
🐳  Preparing Kubernetes v1.27.3 on Docker 24.0.2 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring bridge CNI (Container Networking Interface) ...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🔎  Verifying Kubernetes components...
🌟  Enabled addons: default-storageclass, storage-provisioner
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 28, 2023
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 28, 2023
log.Debugf("IP: %s", d.IPAddress)
if unblockErr := firewall.UnblockBootpd(); unblockErr != nil {
klog.Errorf("failed unblocking bootpd from firewall: %v", unblockErr)
exit.Error(reason.IfBootpdFirewall, "ip not found", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious if we could return an error and let the CMD package to exit ? I generally frown upon exit in non-cmd code, but if there is a good reason or makes way cleaner code I am okay with Exit here..

what u thnk?

Copy link
Member Author

@spowelljr spowelljr Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I did exit.Error is because it will retry if the first start fails. In the case where we know bootpd is being blocked and we failed to unblock the process it's almost certainly going to fail the retry so I was exiting. I could remove it, the downside would be that it would have to go through the start loop one more time which is 30+ seconds.

log.Debugf("IP: %s", d.IPAddress)
if unblockErr := firewall.UnblockBootpd(); unblockErr != nil {
klog.Errorf("failed unblocking bootpd from firewall: %v", unblockErr)
exit.Error(reason.IfBootpdFirewall, "ip not found", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment, and say if return errr, it will retry which makes no sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be able to use re-triable errors pattern.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: medyagh, spowelljr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@medyagh
Copy link
Member

medyagh commented Jul 11, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jul 11, 2023
@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 16789) |
+----------------+----------+---------------------+
| minikube start | 53.0s    | 52.4s               |
| enable ingress | 28.3s    | 28.0s               |
+----------------+----------+---------------------+

Times for minikube start: 50.6s 53.0s 52.9s 57.0s 51.7s
Times for minikube (PR 16789) start: 52.7s 54.0s 51.5s 52.2s 51.8s

Times for minikube ingress: 28.3s 29.3s 27.8s 27.7s 28.3s
Times for minikube (PR 16789) ingress: 27.8s 27.8s 28.3s 27.3s 28.7s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 16789) |
+----------------+----------+---------------------+
| minikube start | 24.6s    | 24.0s               |
| enable ingress | 45.7s    | 48.8s               |
+----------------+----------+---------------------+

Times for minikube ingress: 48.9s 33.5s 48.9s 48.4s 48.9s
Times for minikube (PR 16789) ingress: 49.4s 48.4s 48.9s 48.4s 48.9s

Times for minikube start: 25.8s 25.5s 22.9s 25.6s 23.0s
Times for minikube (PR 16789) start: 22.7s 25.4s 23.1s 23.3s 25.7s

docker driver with containerd runtime

+-------------------+----------+---------------------+
|      COMMAND      | MINIKUBE | MINIKUBE (PR 16789) |
+-------------------+----------+---------------------+
| minikube start    | 23.7s    | 22.8s               |
| ⚠️  enable ingress | 29.0s    | 34.6s ⚠️             |
+-------------------+----------+---------------------+

Times for minikube start: 24.1s 24.7s 23.8s 23.1s 22.9s
Times for minikube (PR 16789) start: 22.8s 24.2s 23.2s 23.2s 20.3s

Times for minikube ingress: 31.4s 31.4s 30.5s 31.4s 20.4s
Times for minikube (PR 16789) ingress: 47.4s 31.4s 31.4s 31.4s 31.4s

@minikube-pr-bot
Copy link

These are the flake rates of all failed tests.

Environment Failed Tests Flake Rate (%)
KVM_Linux_containerd TestRunningBinaryUpgrade (gopogh) 0.59 (chart)
Hyperkit_macOS TestStartStop/group/embed-certs/serial/AddonExistsAfterStop (gopogh) 1.22 (chart)
Hyperkit_macOS TestStartStop/group/embed-certs/serial/Pause (gopogh) 1.22 (chart)
Hyperkit_macOS TestStartStop/group/embed-certs/serial/SecondStart (gopogh) 1.22 (chart)
Hyperkit_macOS TestStartStop/group/embed-certs/serial/UserAppExistsAfterStop (gopogh) 1.22 (chart)
Hyperkit_macOS TestStartStop/group/embed-certs/serial/VerifyKubernetesImages (gopogh) 1.22 (chart)
QEMU_macOS TestFunctional/parallel/CertSync (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/DashboardCmd (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/DockerEnv/bash (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/FileSync (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/ImageCommands/ImageBuild (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/ImageCommands/ImageLoadDaemon (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/ImageCommands/ImageLoadFromFile (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/ImageCommands/ImageReloadDaemon (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/ImageCommands/ImageSaveToFile (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/ImageCommands/ImageTagAndLoadDaemon (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/NodeLabels (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/NonActiveRuntimeDisabled (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/PersistentVolumeClaim (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/ServiceCmd/DeployApp (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/ServiceCmd/Format (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/ServiceCmd/HTTPS (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/ServiceCmd/JSONOutput (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/ServiceCmd/List (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/ServiceCmd/URL (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/StatusCmd (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/TunnelCmd/serial/AccessDirect (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/TunnelCmd/serial/AccessThroughDNS (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/TunnelCmd/serial/DNSResolutionByDig (gopogh) 1.29 (chart)
QEMU_macOS TestFunctional/parallel/TunnelCmd/serial/WaitService/Setup (gopogh) 1.29 (chart)
More tests... Continued...

Too many tests failed - See test logs for more details.

To see the flake rates of all tests by environment, click here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unblock firewall for macos one more time if it it is unblocked and macos still doesnt unblock
4 participants