Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: cloud-hypervisor support #609

Merged
merged 3 commits into from
May 15, 2023

Conversation

richardcase
Copy link
Member

@richardcase richardcase commented Dec 20, 2022

What this PR does / why we need it:

Flintlock will default to Firecracker for creating microvms but you can:

  • change the default to Cloud Hypervisor when starting Flintlock
  • specify on a per VM basis whether to use Firecracker or Cloud Hypervisor

A couple of things to note:

  • Cloud Hypervisor supports macvtap so you don't have to use the Weaveworks fork
  • Cloud Hypervisor doesn't have a metadata service so you need to find another solution if you rely on this. For cloud-init we attach an additional volume.
  • Cloud Hypervisor supports pci-passtrhough, vDPA etc. These features aren't currently exposed via the API but they will be in the future.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #417

Special notes for your reviewer:

Checklist:

  • squashed commits into logical changes
  • includes documentation
  • adds unit tests
  • adds or updates e2e tests

@richardcase richardcase added kind/feature New feature or request area/api Indicates an issue or PR relates to the APIs area/cloud-hypervisor Indicates an issue or PR related to Cloud Hypervisor labels Dec 20, 2022
Copy link
Member

@Callisto13 Callisto13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass looks good so far.
Is you linter on? Mine lit up quite a bit on lots of little things.

You may have to wait until after xmas for a more indepth review 😉

README.md Outdated Show resolved Hide resolved
infrastructure/microvm/cloudhypervisor/cloudinit.go Outdated Show resolved Hide resolved
infrastructure/microvm/cloudhypervisor/cloudinit.go Outdated Show resolved Hide resolved
infrastructure/microvm/cloudhypervisor/create.go Outdated Show resolved Hide resolved
infrastructure/microvm/cloudhypervisor/create.go Outdated Show resolved Hide resolved
infrastructure/microvm/providers.go Outdated Show resolved Hide resolved
infrastructure/microvm/providers.go Outdated Show resolved Hide resolved
internal/command/flags/flags.go Outdated Show resolved Hide resolved
internal/command/metrics/serve.go Outdated Show resolved Hide resolved
userdocs/docs/grpc/services/microvm/v1alpha1/proto.md Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jan 7, 2023

Codecov Report

Patch coverage: 36.00% and project coverage change: +1.89 🎉

Comparison is base (bfd2b3f) 56.29% compared to head (bf79563) 58.18%.

❗ Current head bf79563 differs from pull request most recent head af74aac. Consider uploading reports for the commit af74aac to get more accurate results

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #609      +/-   ##
==========================================
+ Coverage   56.29%   58.18%   +1.89%     
==========================================
  Files          57       56       -1     
  Lines        2780     2705      -75     
==========================================
+ Hits         1565     1574       +9     
+ Misses       1071      985      -86     
- Partials      144      146       +2     
Impacted Files Coverage Δ
core/application/app.go 100.00% <ø> (ø)
core/models/capability.go 0.00% <ø> (ø)
infrastructure/microvm/firecracker/config.go 0.00% <0.00%> (ø)
infrastructure/microvm/firecracker/create.go 0.00% <ø> (ø)
infrastructure/microvm/firecracker/provider.go 0.00% <0.00%> (ø)
infrastructure/microvm/firecracker/state.go 0.00% <0.00%> (ø)
core/plans/microvm_delete.go 58.92% <25.00%> (-1.45%) ⬇️
core/plans/microvm_create_update.go 50.00% <28.57%> (ø)
core/application/commands.go 65.32% <75.00%> (+0.36%) ⬆️

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@Callisto13
Copy link
Member

Callisto13 commented Jan 18, 2023

Starting to run this myself and something is not obvious looking at the code: How do I make CH the default? Looks like providers are recognised as "supported" by the presence of their bin names in the flintlock run config as it starts. Since we default these bin names, they will always be "on". So to make CH the default, do I have to explicitly set the FC bin to "" in the run flags?

Also do we verify that the bins exist on PATH before calling them? (I would assume we do but I cannot find it).

@Callisto13
Copy link
Member

Callisto13 commented Jan 18, 2023

Oh also should we log at start saying which providers are enabled? And which is the default?

Can we also have log lines which say which provider is being used for a specific microvm? I am having a hard time seeing which is being picked up. Right now there is a field in some logs, service eg: msg="checking state of microvm" controller=microvm service=firecracker_microvm vmid=ns1/mvm1/01GQ2EW7B839SF89X3SQC9183D but this is a CH mvm so it is confusing. So that data needs to be changed, and it would be good to change that field to provider and have it logged in every line.

@Callisto13
Copy link
Member

Starting to run this myself and something is not obvious looking at the code: How do I make CH the default? Looks like providers are recognised as "supported" by the presence of their bin names in the flintlock run config as it starts. Since we default these bin names, they will always be "on". So to make CH the default, do I have to explicitly set the FC bin to "" in the run flags?.

Hah just spotted the--default-provider flag 🤦‍♀️

@Callisto13
Copy link
Member

Callisto13 commented Jan 18, 2023

Okay I got a mvm started, but the network is not all there:

ci-info: ++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++
ci-info: +---------+-------+-----------+-----------+-------+-------------------+
ci-info: |  Device |   Up  |  Address  |    Mask   | Scope |     Hw-Address    |
ci-info: +---------+-------+-----------+-----------+-------+-------------------+
ci-info: |  bond0  | False |     .     |     .     |   .   | e2:19:63:29:cd:6a |
ci-info: |  dummy0 | False |     .     |     .     |   .   | 6a:36:88:c6:59:20 |
ci-info: |   ens4  | False |     .     |     .     |   .   | 2e:ca:e9:1e:57:44 |
ci-info: | ip6tnl0 | False |     .     |     .     |   .   |         .         |
ci-info: |    lo   |  True | 127.0.0.1 | 255.0.0.0 |  host |         .         |
ci-info: |    lo   |  True |  ::1/128  |     .     |  host |         .         |
ci-info: |  tunl0  | False |     .     |     .     |   .   |         .         |
ci-info: +---------+-------+-----------+-----------+-------+-------------------+
ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
ci-info: +-------+-------------+---------+-----------+-------+
ci-info: | Route | Destination | Gateway | Interface | Flags |
ci-info: +-------+-------------+---------+-----------+-------+
ci-info: +-------+-------------+---------+-----------+-------+

Is there anything obvious I am missing before I start digging? I remember something about the metadata volume? I forget whether we are mounting that in by default or if I need to add it? Do you have any example config from where you tested it?

@richardcase
Copy link
Member Author

Will try and get comments sorted today.

@richardcase
Copy link
Member Author

Is there anything obvious I am missing before I start digging? I remember something about the metadata volume? I forget whether we are mounting that in by default or if I need to add it? Do you have any example config from where you tested it?

I just tried it again this morning and i can create a vm that uses macvtap to connect to my network:

[    2.814946] cloud-init[1506]: ci-info: +---------+-------+------------------------------+---------------+--------+-------------------+
[    2.815033] cloud-init[1506]: ci-info: |  Device |   Up  |           Address            |      Mask     | Scope  |     Hw-Address    |
[    2.815124] cloud-init[1506]: ci-info: +---------+-------+------------------------------+---------------+--------+-------------------+
[    2.815210] cloud-init[1506]: ci-info: |  bond0  | False |              .               |       .       |   .    | 62:8a:f3:75:19:7e |
[    2.815297] cloud-init[1506]: ci-info: |  dummy0 | False |              .               |       .       |   .    | ea:f2:03:8b:d3:42 |
[    2.815386] cloud-init[1506]: ci-info: |   ens4  |  True |         192.168.9.67         | 255.255.255.0 | global | a6:68:f3:f7:83:af |
[    2.815473] cloud-init[1506]: ci-info: |   ens4  |  True | fe80::a468:f3ff:fef7:83af/64 |       .       |  link  | a6:68:f3:f7:83:af |
[    2.815560] cloud-init[1506]: ci-info: | ip6tnl0 | False |              .               |       .       |   .    |         .         |
[    2.815648] cloud-init[1506]: ci-info: |    lo   |  True |          127.0.0.1           |   255.0.0.0   |  host  |         .         |
[    2.815776] cloud-init[1506]: ci-info: |    lo   |  True |           ::1/128            |       .       |  host  |         .         |
[    2.815866] cloud-init[1506]: ci-info: |  tunl0  | False |              .               |       .       |   .    |         .         |
[    2.815950] cloud-init[1506]: ci-info: +---------+-------+------------------------------+---------------+--------+-------------------+

I use fl to make the requests, and specifically the version from this PR.

The command i use is:

fl microvm create --host 127.0.0.1:9090 --name chtest --kernel-image ghcr.io/weaveworks-liquidmetal/flintlock-kernel-cloudhypervisor:5.12 --kernel-filename boot/vmlinux.bin --metadata-hostname chtest --network-interface eth1:macvtap --root-image ghcr.io/weaveworks-liquidmetal/capmvm-kubernetes-cloudhypervisor:1.23.5 --metadata-ssh-key-file ./test_rsa.pub

@Callisto13
Copy link
Member

👀 cool, will have another crack with that

@richardcase richardcase force-pushed the cloudhypervisor branch 4 times, most recently from 3934047 to b53e475 Compare February 3, 2023 14:40
@Callisto13
Copy link
Member

Managed to get it to work this time 🎉 (on the second attempt in which I actually installed cloudhypervisor 🤦‍♀️ )

For posterity here is what I did:

  • Create devices on Equinix with terraform
    • sshed on to mvm host device
    • systemctl stop flintlockd.service
    • add default-provider: cloudhypervisor to /etc/opt/flintlockd/config.yaml
    • install cloud-hypervisor-static bin to /usr/local/bin
    • systemctl start flintlockd
  • Build fl from the above mentioned branch
  • Use suggested command (swapping in own key)
  • Profit
  • Verified interfaces created and could ssh into mvm

Last time i tried this it was using my new images, so i must have been the problem there

@Callisto13
Copy link
Member

@richardcase there are still a lot of firecracker references in the logs on a CH mvm:

level=info msg="checking state of microvm" controller=microvm service=firecracker_microvm vmid=default/chtest/01GSCZ2AV7492T0V4G2AE94YXV
level=info msg="checking state of microvm" controller=microvm service=firecracker_microvm vmid=default/chtest/01GSCZ2AV7492T0V4G2AE94YXV
level=error msg="failed to reconcile vmid default/chtest/01GSCZ2AV7492T0V4G2AE94YXV: executing plan: executing plan steps: executing steps: executing step microvm_create: creating microvm: starting firecracker process: starting cloud hypervisor process: %!w(<nil>)" controller=microvm
level=info msg="checking state of microvm" controller=microvm service=firecracker_microvm vmid=default/chtest/01GSCZ2AV7492T0V4G2AE94YXV
level=info msg="checking state of microvm" controller=microvm service=firecracker_microvm vmid=default/chtest/01GSCZ2AV7492T0V4G2AE94YXV
... etc on every 'checking state'

@Callisto13
Copy link
Member

Tested with the new images and all good 🎉 liquidmetal-dev/image-builder#55 (turns out i was missing a mac address originally)

Comment on lines +157 to +162
arg := fmt.Sprintf("fd=%d,mac=", fd)
if netInt.GuestMAC != "" {
arg = arg + netInt.GuestMAC
}
Copy link
Member

@Callisto13 Callisto13 Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thing to note here is that we are not defaulting the mac address to the host mac in the case that GuestMac is not set. This is what we do in firecracker: https://github.com/weaveworks-liquidmetal/flintlock/blob/main/infrastructure/firecracker/config.go#L184-L192

Do we want to maintain similar behaviour here? It took me a moment or 2 to figure out what i was missing from my usual spec on a failed mvm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need reacquaint myself why i did this, i vaguely remember there was a reason for changing this

@Callisto13
Copy link
Member

tagging @yitsushi and @jmickey as well as it is a large pr. if you don't have time no worries :)

@Callisto13
Copy link
Member

Callisto13 commented Feb 16, 2023

Another thing which would be useful is to write the config which the CH process ends up using somewhere, if that's possible? it does end up in cloudhypervisor.log, but it's a bit messy to read, and it does help debugging to quickly check what we generated for the process

(could be left to follow-up pr)

Copy link
Contributor

@yitsushi yitsushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, the only blocker I see (apart from a few string replacements from @Callisto13's review) is the --net flag in case it goes wild.

core/models/capability.go Outdated Show resolved Hide resolved
core/plans/microvm_create_update.go Outdated Show resolved Hide resolved
@richardcase
Copy link
Member Author

Another thing which would be useful is to write the config which the CH process ends up using somewhere, if that's possible? it does end up in cloudhypervisor.log, but it's a bit messy to read, and it does help debugging to quickly check what we generated for the process

(could be left to follow-up pr)

We don't have a config file for configuring the microvm like we do for firecracker. Instead its all command line args.

We do use files for cloud-init and these are in the disk.

We could write a config file to disk and then create the CH args from the config file, this is what i originally did with the firecracker provider. wdyt?

@richardcase
Copy link
Member Author

@Callisto13 @yitsushi - thanks for the reviews. I have made changes, the only outstanding comments are around defaulting the mac address and also the config file (i think we could follow up on the config file post merge).

} else if iface.Type == models.IfaceTypeTap {
args = append(args, fmt.Sprintf("tap=%s,mac=%s", status.HostDeviceName, iface.GuestMAC))
} else {
logger.Warn("unknown network interface type", "name", iface.GuestDeviceName, "type", iface.Type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will still end up with a command where the outcome is not predictable as the next parameter (if there's any) will be used as "dev" argument.

cmd --api-socket /foo --log-file /bar -v --net --other baz

In this case it will just cry "--net" has no argument but it's expected OR it will try to use --other as argument and will cry "i don't know what to do with baz".

@Callisto13
Copy link
Member

We don't have a config file for configuring the microvm like we do for firecracker. Instead its all command line args.

Yeh I know, I was thinking it would just be a handy thing for us when debugging.

We could write a config file to disk and then create the CH args from the config file, this is what i originally did with the firecracker provider. wdyt?

Yeh that might be good. Either thing can wait until another PR tho

@richardcase
Copy link
Member Author

@Callisto13 @yitsushi - i will try to get the last of the comments sorted today on this.

Flintlock will default to Firecracker for creating microvms but you can:
- change the default to Cloud Hypervisor when starting Flintlock
- specify on a per VM basis whether to use Firecracker or Cloud Hypervisor

A couple of things to note:
- Cloud Hypervisor supports macvtap so you don't have to use the
  Weaveworks fork
- Cloud Hypervisor doesn't have a metadata service so you need to find another solution if you rely on this. For cloud-init we attach an additional volume.
- Cloud Hypervisor supports pci-passtrhough, vDPA etc. These features aren't currently exposed via the API but they will be in the future.

Signed-off-by: Richard Case <richard.case@outlook.com>
Signed-off-by: Richard Case <richard.case@outlook.com>
Signed-off-by: Richard Case <richard.case@outlook.com>
@richardcase
Copy link
Member Author

I think this is good to go, i made the change based on feedback from @yitsushi

Copy link
Contributor

@yitsushi yitsushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :shipit:

@richardcase richardcase merged commit 87684fd into liquidmetal-dev:main May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue or PR relates to the APIs area/cloud-hypervisor Indicates an issue or PR related to Cloud Hypervisor kind/feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Cloud Hypervisor
4 participants