Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Bump controller-pkg to expose Provider field #284

Merged
merged 1 commit into from
Oct 11, 2023
Merged

Conversation

Callisto13
Copy link
Member

@Callisto13 Callisto13 commented May 31, 2023

This PR exposes the option to select a provider for a deployment or machine.

The default is set by the operator who configured flintlockd on the remote host and cannot be altered here.
The provider can be set in the spec:

 kind: MicrovmMachineTemplate
 apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
 spec:
   template:
     spec:
       provider: "cloudhypervisor"
...

Merging as-is would introduce a bug high chance of user error, consider...

Scenario: A user sets provider: "cloudhypervisor" in their CAPMVM template. A host is chosen from the list, the request is made to flintlock, but it fails because the cloud-hypervisor-static binary is not present on the host. What do we do here?

Perhaps as part of liquidmetal-dev/flintlock#509 we can add information about available providers. CAPMVM can call this endpoint first (I believe we already do a check that the service is reachable), check the provider is available, and select another host if not. This would complicate the failureDomain selection process.

Second option is to do this on the flintlock side: if the desired provider is not available, cycle through to the next one until all have been tried. We would need to feed this back to the CAPMVM end user so it is clear what has happened, like in a clear log-line or the mvm spec (both flint and capmvm sides).

Or maybe this does not count as a bug? Flintlock already can be misconfigured or deliberately set up with constraints which make CAPMVM creations fail. Perhaps we need to improve documentation (operators need to be clear about what is available on hosts), and put more into both handling failures gracefully and making it easy to get relevant information about the target hosts.

@Callisto13 Callisto13 requested a review from yitsushi May 31, 2023 13:38
@Callisto13
Copy link
Member Author

if merged, follow with liquidmetal-dev/site#14 after releasing

@Callisto13 Callisto13 added the kind/feature New feature or request label May 31, 2023
@yitsushi
Copy link
Contributor

I would say it's a "user-error" kind of bug. Yes, we can make the UX better, but I wouldn't count it as a bug. (see my comment here: liquidmetal-dev/controller-pkg#3 (comment)).

If the reported error shows what's the issue, like "failed to provision microvm because provider binary not found: %provider%", I think it's totally fine. We can't say it's a bug if a user tries to use a linux binary on a mac machine.

@yitsushi
Copy link
Contributor

cycle through to the next one until all have been tried

I don't think it would work:

  1. Different providers have different requirements, like kernel, base image, i don't what else.
  2. A user requested to use X, we should fail if it's not possible and not use Y. They requested X for a reason.

@Callisto13
Copy link
Member Author

Different providers have different requirements, like kernel, base image, i don't what else

oh yeeeeh forgot about all that

@Callisto13
Copy link
Member Author

I would say it's a "user-error" kind of bug

Eh it is and isn't. A user cannot choose which host they get a node on. They could enable cloud-h on some hosts and not others, at which point it becomes annoying luck.

But I suppose operators could say "don't set the provider override at all"... idk this just feels like we are deliberately setting a trap.

@yitsushi
Copy link
Contributor

yitsushi commented Jun 2, 2023

Eh it is and isn't. A user cannot choose which host they get a node on. They could enable cloud-h on some hosts and not others, at which point it becomes annoying luck.

Created an issue that can resolve this and I think we wanted to do it, but never get there. And a place for further discussions.

#285

@Callisto13
Copy link
Member Author

Callisto13 commented Jun 2, 2023

We also have this problem right now if people use the latest flintlock:

  • operator has 2 hosts with firecracker, 2 hosts with cloud-hypervisor
  • capmvm user specifies all hosts in template
  • capmvm user specifies either firecracker or ch kernels in mvm spec
  • wrong host is chosen at random
  • misery ensues

could do with that tagging story sooner rather than later

@yitsushi
Copy link
Contributor

yitsushi commented Jun 5, 2023

could do with that tagging story sooner rather than later

I can only agree 💯

@richardcase richardcase merged commit 29a61b9 into main Oct 11, 2023
@richardcase richardcase deleted the provider branch October 11, 2023 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants