Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(vm): better handle of ctrl+c when qemu is not responding #627

Merged
merged 3 commits into from
Oct 25, 2023

Conversation

TheNotary
Copy link
Contributor

@TheNotary TheNotary commented Oct 17, 2023

Contributor's Note

  • I have added / updated documentation in /docs for any user-facing features or additions.
  • I have added / updated templates in /example for any new or updated resources / data sources.
  • I have ran make example to verify that the change works as expected.

I'm still getting up to speed with GoLang, and I'm finding it very tricky to know whether my code is readable or not. Usually this type of refactor shortens the code up, but in this case it got longer so I'm reluctant raising this PR. (Was the deep nesting intentional to shorten the code?)

I had a hunch that using a golang tick pattern here might help, but I thought I'd just check in and see what others thought before racking my brain further. If someone wants to pick up this PR and finish the refactor, that would be great, it's been really slow going for me and I might learn faster just seeing the end form.

Proof of Work

To reproduce the results, in terraform, configure a VM to agent.enabled = true but then in the VM you're testing against, just delete the qemu-guest-agent and reboot. Thereafter, a terraform plan operation will likely hang, and ctrl + C will be unable to exit from the provider's operation because of the structure of the loop and the use of continue statements.

Below is a screenshot of this patch allowing ctrl + c to be captured.

Screenshot 2023-10-17 at 1 07 02 PM

Community Note

  • Please vote on this pull request by adding a 👍 reaction to the original pull request comment to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for pull request followers and do not help prioritize the request

@TheNotary TheNotary force-pushed the open-source branch 3 times, most recently from 07306a5 to ffc6689 Compare October 17, 2023 18:16
…u agent lookups would fail

Signed-off-by: TheNotary <799247+TheNotary@users.noreply.github.com>
@otopetrik
Copy link
Contributor

Thanks, it makes the code look more readable.

Improving the behavior on Ctrl-C with Proxmox-enabled yet VM-missing qemu-guest-agent would be nice.

Current behavior:

  • terraform apply, creating VM and waiting for it. Ctrl-C handled quickly
  • terraform plan, refreshing state, Ctrl-C handled quickly
  • terraform apply, refreshing state, Ctrl-C requires second press

Tried the PR, the behavior seems the same (tested on Terraform v1.5.3, newer versions may differ...)

Tried modifying it to checking ctx.Deadline and ctx.Done, but with no luck.
It looks like the ctx has no idea, that it should stop when Ctrl-C is pressed in "refreshing state" as part of terraform apply.
Which is weird, because terraform plan and terraform apply should probably do the same thing, at least until the they display the plan.

TF_LOG=trace terraform apply shows after Ctrl-C:

[TRACE] provider.terraform-provider-proxmox_v0.35.0: plugin received interrupt signal, ignoring: count=1 timestamp=2023-10-21T...
it seems to be from here https://github.com/hashicorp/go-plugin/blob/main/server.go#L448

That gave me an idea for the following modification:

	ch := make(chan os.Signal, 1)
	signal.Notify(ch, os.Interrupt)
	for timeElapsed.Seconds() < timeMaxSeconds {
		timeElapsed = time.Since(timeStart)

		// check if terraform wants to shut us down (we try to poll the ctx every 200ms)
		if ctx.Err() != nil {
			return nil, fmt.Errorf("error waiting for VM network interfaces: %w", ctx.Err())
		}

		select {
		case <-ch:
			{
				return nil, fmt.Errorf("error by signal")
			}
		default:
		}
                ....

terraform apply can then be terminated in planning stages using single Ctrl-C.

Getting signal directly instead of using contex seems like an ugly hack, but I no idea if there even is a non-hacky solution.

@TheNotary
Copy link
Contributor Author

Didn't notice this until just now. First thanks for the code quality feedback! About the difference in behavior between apply vs plan, that's interesting and unfortunate to learn, I hadn't been monitoring apply in my investigation. I quick google found a post from timblacktu announcing his annoyance with TF tending to hang, and him ctrl + cing (twice I assume) and then restoring his state file.

If your sigint check does the trick, I say go for it. I'm getting the impression, Hashicorp didn't spend a lot of time what to do when a user ctrl + cs an apply operation. I don't remember how I adopted my TF workflow, but I catch myself always performing "plan" followed by "apply". I wonder if I picked up on that because plan is easier to bail from on most TF plugins??? Since working with this plugin provider, I've started to jump right to apply because refreshing the state takes a while (maybe it can be speed up, or maybe my infrastructure is too dated to be doing anything in 2023).

I accidentally clicked the update branch but I pushed it back to how it was where I left off. It looks like you're using a case/switch that looks more elegant. If you'd like to own this fix from here, I'd appreciate it. I'm away from my infrastructure at the moment so it would be difficult for me to do testing until I'm done traveling.

@bpg
Copy link
Owner

bpg commented Oct 23, 2023

It looks like the new linter started complaining about some older code not related to this PR. #645 should fix it, so please merge / rebase on main

Signed-off-by: Pavel Boldyrev <627562+bpg@users.noreply.github.com>
@pull-request-size pull-request-size bot added size/L and removed size/M labels Oct 25, 2023
@bpg bpg changed the title fix(vm): resolves issue where ctrl+c was not being responded when qem… fix(vm): better handle of ctrl+c when qemu is not responding Oct 25, 2023
Copy link
Owner

@bpg bpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @TheNotary!

I feel like most of those "sleep & loop" waiting patterns can be updated to something more elegant, there are bunch of libraries around that can handle waiting, with different type of strategies, better error handling, etc, etc., but even this improvement is great!

I've incorporated @otopetrik 's suggestion and tested in my env, everything seems to be working as expected, so

LGTM! 🚀

@bpg bpg merged commit aec09e4 into bpg:main Oct 25, 2023
8 checks passed
@ghost ghost mentioned this pull request Oct 25, 2023
@TheNotary TheNotary deleted the open-source branch October 25, 2023 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants