Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCO-1191, MCO-1194: Experiment with creating internal go bindings for bootc , Incorporate support for bootc commands to Machine Config Daemon #4465

Merged
merged 3 commits into from
Jul 29, 2024

Conversation

inesqyx
Copy link
Contributor

@inesqyx inesqyx commented Jul 10, 2024

  • Create a bootc.go to store bootc status reading and command line execution
  • Create bootc internal go-bindings inside bootc.go. Code designed in the way to be easily separated into a standalone lib: https://github.com/inesqyx/bootc-client-go/tree/bootc-client-go
  • Consolidate functions that rpm-ostree and bootc share in common into one place

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 10, 2024
Copy link
Contributor

openshift-ci bot commented Jul 10, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 10, 2024

@inesqyx: This pull request references MCO-1191 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.17.0" version, but no target version was set.

This pull request references MCO-1194 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set.

In response to this:

WIP

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 10, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 16, 2024

@inesqyx: This pull request references MCO-1191 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.17.0" version, but no target version was set.

This pull request references MCO-1194 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set.

In response to this:

  • Create a bootc.go to store bootc status reading and command line execution
  • Create bootc internal go-bindings inside bootc.go. Code designed in the way to be easily separated into a standalone lib: https://github.com/inesqyx/bootc-client-go/tree/bootc-client-go
  • Consolidate functions that rpm-ostree and bootc share in common into one place

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@inesqyx inesqyx marked this pull request as ready for review July 16, 2024 13:34
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 16, 2024
@inesqyx
Copy link
Contributor Author

inesqyx commented Jul 16, 2024

/test all

Copy link
Member

@cheesesashimi cheesesashimi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is off to a great start! I have some ideas and suggestions that can make this even better.

}

// Spec collects the host specification
// This mirrors https://github.com/containers/bootc/blob/94521ecb19b8d0f2bfc36220322c60cafa036296/lib/src/spec.rs#L46
Copy link
Member

@cheesesashimi cheesesashimi Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought (non-blocking): This is way beyond the scope of this PR, but it would be great if we could programmatically consume these Rust structs and generate Go structs from it using some kind of code-gen.

Just to be clear, I am not asking you to do that right now 😄

Copy link
Contributor Author

@inesqyx inesqyx Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree!! I think currently we are doing it manually for both bootc bindings and rpm-ostree bindings. It will def be better to have something auto-gen. @cgwalters might have some idea according to comment down: https://issues.redhat.com/browse/MCO-1026 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containers/bootc#703 - does that help?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would help! We'll need a way to consume that JSON schema, but it's a step in the right direction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some trials on autogen-ing bootc go-bindings by consuming the JSON schema: https://github.com/inesqyx/bootc-client-go

@@ -0,0 +1,245 @@
package daemon
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Let's make the bootc client live in a separate Go module from the rest of the MCD code paths. The reason is because it will help enforce better boundaries between it and the rest of the MCD code path.

To do that:

  1. Create a directory called pkg/daemon/bootc.
  2. Move the bootc.go file into it. The MCD can then import from that package. There may be some overlap between your code and functions in the MCD package. For now, it's fine to duplicate them into the bootc module. You can also move your bootc related constants into this module as well.

In the future, we could make the module name more generic and move the rpm-ostree client into here as well, but that is an idea for another day.

Copy link
Contributor Author

@inesqyx inesqyx Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me! Will do while featuregating - https://issues.redhat.com/browse/MCO-1192

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this!

return
}

// // GetStatus returns multi-line human-readable text describing system status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Should this be commented out? If it's not needed, delete it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having this block here but commented out because according to containers/bootc#408, it seems that the bootc team is actively working on creating options for outputting human-readable bootc status, which is not ready yet. By the time we merge this PR, if they have something handy for us to use, I think we can un-comment the block and bring it in; while if not, then we should delete it :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense 😄 . We can leave it in that case.

// Switch target a new container image reference to boot for the system or errors if already swictehd.
func (b *BootcClient) Switch(imgURL string) (err error) {
// Try to re-link the merged pull secrets if they exist, since it could have been populated without a daemon reboot
useMergedPullSecrets(bootcSystem)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: What happens if useMergedPullSecrets() throws an error?

suggestion: Check the error and return it before continuing.

}

// Switch target a new container image reference to boot for the system or errors if already swictehd.
func (b *BootcClient) Switch(imgURL string) (err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: You're using a named return here, but you don't declare or assign anything to an err variable within this function.

suggestion: There's nothing wrong with doing that, but for consistency, use a naked return instead:

func (b *BootcClient) Switch(imgURL string) error {
    //...
}

@@ -239,15 +176,15 @@ func (r *RpmOstreeClient) IsNewEnoughForLayering() (bool, error) {
// RebaseLayered rebases system or errors if already rebased.
func (r *RpmOstreeClient) RebaseLayered(imgURL string) (err error) {
// Try to re-link the merged pull secrets if they exist, since it could have been populated without a daemon reboot
useMergedPullSecrets()
useMergedPullSecrets(rpmOstreeSystem)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: What happens when useMergedPullSecrets() fails here?

suggestion: We should be checking for any errors that are returned from here. Not doing so can introduce very difficult-to-track-down bugs. I know you're not the one who introduced this particular change, but I think it would make sense to wrap it in an if err != nil { return err } block.

klog.Infof("Executing rebase to %s", imgURL)
return runRpmOstree("rebase", "--experimental", "ostree-unverified-registry:"+imgURL)
}

// RebaseLayeredFromContainerStorage rebases the system from an existing local container storage image.
func (r *RpmOstreeClient) RebaseLayeredFromContainerStorage(imgURL string) (err error) {
// Try to re-link the merged pull secrets if they exist, since it could have been populated without a daemon reboot
useMergedPullSecrets()
useMergedPullSecrets(rpmOstreeSystem)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Same as above regarding error checking.

// Note that pull secret for 'bootc' to fetch updates from registry which reauires authentication is also stored in
// /etc/ostree/auth.json

func useMergedPullSecrets(system string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: What happens if an invalid system value is provided?

suggestion: Be more strict about what values this function will accept. We can also use this as an opportunity to do some code de-duplication:

func useMergedPullSecrets(system string) error {
	// Associate each link function with a system value.
	linkFuncs := map[string]func(string) error {
		rpmOstreeSystem: linkOstreeAuthFile,
		bootcSystem: linkBootcAuthFile,
	}

	// Check whether the provided system value is in the map. If not, return an error.
	if _, ok := linkFuncs[system]; !ok {
		return fmt.Errorf("unknown system value %q", system)
	}

	// Then, whenever you want to call one of those functions, just do the following:
	return linkFuncs[system](kubeletAuthFile)
}

Alternatively, we can push the association of the system value into a linkAuthFile() function as described above.

)

const (
rpmOstreeSystem = "rpm-ostree"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: We may want to make rpmOstreeSystem and bootcSystem their own special types:

type imageSystem string

const (
	rpmOstreeSystem imageSystem  = "rpm-ostree"
	bootcSystem imageSystem = "bootc"
)

This will provide a guard against accidentally passing the wrong value to any functions that expect one of these values.

We can also augment that with a simple validation function:

func validateImageSystem(imgSys imageSystem) error {
	// Sets provided by https://github.com/kubernetes/apimachinery/tree/master/pkg/util/sets
	// Imported as k8s.io/apimachinery/pkg/util/sets
	valid := sets.New[imageSystem](rpmOstreeSystem, bootcSystem)
	if valid.Has(imgSys) {
		return nil
	}

	return fmt.Errorf("invalid system %s, valid: %v", imgSys, sets.List(valid))
}

// Pull secret for 'bootc' to fetch updates from registry which requires authentication is stored in /etc/ostree/auth.json
// per https://github.com/containers/bootc/blob/5e9279d6674b28d2c451baeaf981a92a1aa388ff/docs/src/building/secrets.md?plain=1#L4

func linkBootcAuthFile(path string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (non-blocking): We could potentially deduplicate this code a bit if we pass the system value in and it knows which path to associate with each system type:

func linkAuthFile(path string, system imageSystem) error {
	// Associate each of the systems with a particular authfile path.
	imageSystemAuthFilePaths := map[imageSystem]string{
		rpmOstreeSystem: ostreeAuthFile,
		bootcSystem: bootcAuthFile,
	}

	// Try looking up the system type.
	authFile, ok := imageSystemAuthFilePaths[system]
	if !ok {
		// We got an invalid value, return an error.
		return fmt.Errorf("unknown system %q", system)
	}

	if _, err := os.Lstat(authFile); err != nil {
		if errors.Is(err, os.ErrNotExist) {
			// filepath.Dir() takes a path such as /etc/ostree/auth.json and returns the parent directory path; /etc/ostree in this case.
			if err := os.MkdirAll(filepath.Dir(authFile), 0o544); err != nil {
				return err
			}
		}
	} else {
		// Remove older symlink if it exists since it needs to be overwritten
		if err := os.Remove(authFile); err != nil {
			return err
		}
	}

	klog.Infof("Linking %s authfile (%s) to %s", system, authFile, path)
	return os.Symlink(path, authFile)
}

@inesqyx
Copy link
Contributor Author

inesqyx commented Jul 16, 2024

Thanks, @cheesesashimi , for the detailed review!! I think a lot of the suggestions and issues applies to both rpm-ostree.go and bootc.go. I will make sure to revisit both of them and have errors better guarded, conventions better matched. Ty!

@inesqyx
Copy link
Contributor Author

inesqyx commented Jul 16, 2024

/retest-required

// make sure we get access to them when we Initialize
err := useMergedPullSecrets(bootcSystem)
if err != nil {
klog.Errorf("error while linking bootc pull secrets %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: do we just want to log the error and continue when an error occurs here? If not we should do return useMergePullSecrets() to ensure we actually error out when an error happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's return the error and cut it there. I was following this check that currently exists in rpm-ostree:

// Commands like update and rebase need the pull secrets to pull images and manifests,
	// make sure we get access to them when we Initialize
	err := useMergedPullSecrets(rpmOstreeSystem)
	if err != nil {
		klog.Errorf("error while linking rpm-ostree pull secrets %v", err)
	}

But with a second thought, I think it is something to be prevented.

Copy link
Contributor Author

@inesqyx inesqyx Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing that pull secrets might not be necessary for some rpm-ostree commands and that is why it allows failed fetch here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering the same thing too, we should confirm this as it is better to completely error out than just log and wait for it to fail at a later stage.

Copy link
Contributor Author

@inesqyx inesqyx Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least in the following implications

  • Switch
  • RebaseLayered
  • RebaseLayeredFromContainerStorage

We def need the presence and correct reading of the pull secret.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have error-ed them out in the latest changes according to discussions here:
[1] Keeping the error will result in fail at a later time and hard-to-track bugs: #4465 (comment)
#4465 (comment)
[2] Several frequently invoked rpm-ostree/bootc commands need the presence of the pull secret. Better to set them up at initialization: #4465 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: Maybe do this change only for the bootc code path and open another PR for fixing this in the rpm-ostree code path as that is actually affecting code actively used today. This way we can test it separately and avoid any potential regressions. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update:
Kept for rpm-ostree initialization only:

	err := useMergedPullSecrets(...)
	if err != nil {
		klog.Errorf("error while linking rpm-ostree pull secrets %v", err)
	}

Updated to for bootc switch and rpm-ostree rebase because the pull secret is mandatory for future steps

	err := useMergedPullSecrets(...)
	if err != nil {
		return fmt.Errorf("Error while ensuring access to pull secrets: %w", err)
	}

func (b *BootcClient) Switch(imgURL string) error {
// Try to re-link the merged pull secrets if they exist, since it could have been populated without a daemon reboot
if err := useMergedPullSecrets(bootcSystem); err != nil {
klog.Errorf("error while linking bootc pull secrets %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: same question here as above, do we just want to log the error and continue on?

// Try to re-link the merged pull secrets if they exist, since it could have been populated without a daemon reboot
useMergedPullSecrets()
if err := useMergedPullSecrets(rpmOstreeSystem); err != nil {
klog.Errorf("error while linking rpm-ostree pull secrets %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question here, do we just want to log the error and continue?

// Try to re-link the merged pull secrets if they exist, since it could have been populated without a daemon reboot
useMergedPullSecrets()
if err := useMergedPullSecrets(rpmOstreeSystem); err != nil {
klog.Errorf("error while linking rpm-ostree pull secrets %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and same question here as above again, do we just want to log the error and continue?

@inesqyx
Copy link
Contributor Author

inesqyx commented Jul 17, 2024

/retest-required

@inesqyx inesqyx force-pushed the MCO-1191 branch 2 times, most recently from 6b14dd2 to cb6edf8 Compare July 18, 2024 13:36
@umohnani8
Copy link
Contributor

LGTM
Great work @inesqyx!

@inesqyx
Copy link
Contributor Author

inesqyx commented Jul 18, 2024

/retest-required

@inesqyx inesqyx force-pushed the MCO-1191 branch 2 times, most recently from 29b8106 to fe5b726 Compare July 22, 2024 21:20
@inesqyx
Copy link
Contributor Author

inesqyx commented Jul 23, 2024

/retest-required

Copy link
Member

@cheesesashimi cheesesashimi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done! Just a few more suggestions and ideas to make this even better.

return nil
}

return fmt.Errorf("invalid system %s, valid: %v", imgSys, sets.List(valid))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quibble (non-blocking): Maybe this should read "valid systems:" instead?

return &bootedImageInfo, nil
}

// Switch target a new container image reference to boot for the system or errors if already swictehd.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: swictehd should be switched

// }

// GetBootcBootedOSImageURL returns the image URL as well as the image version(for logging) and the ostree commit (for comparisons)
func (b *BootcClient) GetBootcBootedOSImageURL() (*BootedImageInfo, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quibble: This function name feels a bit redundant.

suggestion (non-blocking): Maybe rename to something like GetBootedImageInfo() instead?

baseChecksum = ""
}

bootedImageInfo := BootedImageInfo{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Nicely done here!

if err := os.MkdirAll(filepath.Dir(authFilePath), 0o544); err != nil {
return err
}
}
Copy link
Member

@cheesesashimi cheesesashimi Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: If os.Lstat() returns an error that is not matched by errors.Is(err, os.ErrNotExist), we should return that error back to the caller. As-is, we'll still try to do the os.Symlink() operation which seems undesirable.

Instead, we might want to do something like this instead:

if _, err := os.Lstat(authFilePath); err != nil {
	if !errors.Is(err, os.ErrNotExist) {
		// Return any non-matching errors.
		return err
	}

	if err := os.MkdirAll(filepath.Dir(authFilePath), 0o544); err != nil {
		return err
	}
} // ...

}

// check if merged secret file exists
if _, err := os.Stat(imageRegistryAuthFile); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: If the error returned by os.Stat() is not matched by errors.Is(err, os.ErrNotExist), we just keep going which could be undesirable. Instead, we might want to do something like this:

if _, err := os.Stat(imageRegistryAuthFile); err != nil {
	if !errors.Is(err, os.ErrNotExist) {
		return fmt.Errorf("could not determine if %s exists: %w", imageRegistryAuthFile, err)
	}

	klog.Errorf("Merged secret file does not exist; defaulting to cluster pull secret")
	return linkAuthFile(system, kubeletAuthFile)
}

@inesqyx
Copy link
Contributor Author

inesqyx commented Jul 25, 2024

/retest-required

Copy link
Contributor

openshift-ci bot commented Jul 25, 2024

@inesqyx: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-vsphere-ovn-upi 10fee6e link false /test e2e-vsphere-ovn-upi
ci/prow/e2e-vsphere-ovn-zones 10fee6e link false /test e2e-vsphere-ovn-zones
ci/prow/e2e-vsphere-ovn-upi-zones 10fee6e link false /test e2e-vsphere-ovn-upi-zones

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

}

// QueryStatus loads the current system state.
func (client *Client) QueryStatus() (*BootcStatus, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought (non-blocking): While doing some work on #4284, something I ended up doing was having to manually parse the output for rpm-ostree there. The reason I had to do that was because my execution context was a lot different. In other words, I was trying to execute rpm-ostree status --json on a cluster node from my workstation (or test pod) as opposed to being the MCD executing it directly on the node.

In the future, we could make a version of this client that accepts a []byte{} array which contains the status and allows one to use all of the usual methods attached to it.

@cheesesashimi
Copy link
Member

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 29, 2024
Copy link
Contributor

openshift-ci bot commented Jul 29, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheesesashimi, inesqyx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 29, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 4a518fe into openshift:master Jul 29, 2024
14 of 17 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-machine-config-operator
This PR has been included in build ose-machine-config-operator-container-v4.17.0-202407300147.p0.g4a518fe.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants