Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operator-sdk generate openapi breaks with operator-sdk>=0.12 #434

Closed
10 of 19 tasks
yrobla opened this issue Feb 19, 2020 · 19 comments
Closed
10 of 19 tasks

operator-sdk generate openapi breaks with operator-sdk>=0.12 #434

yrobla opened this issue Feb 19, 2020 · 19 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@yrobla
Copy link
Member

yrobla commented Feb 19, 2020

Original Post

When running make test on the baremetal-operator, it tries to call operator-sdk generate openapi command. When having an operator-sdk version >=0.12, following warning/errors are thrown:

INFO[0000] Running CRD generator.                       
/home/yolanda/go/src/github.com/metal3-io/baremetal-operator/pkg/apis/metal3/v1alpha1/baremetalhost_types.go:284:2: must apply minimum to an integer
/home/yolanda/go/src/github.com/metal3-io/baremetal-operator/pkg/apis/metal3/v1alpha1/baremetalhost_types.go:284:2: must apply maximum to an integer
/home/yolanda/go/src/github.com/metal3-io/baremetal-operator/pkg/apis/metal3/v1alpha1/baremetalhost_types.go:313:2: must apply minimum to an integer
/home/yolanda/go/src/github.com/metal3-io/baremetal-operator/pkg/apis/metal3/v1alpha1/baremetalhost_types.go:313:2: must apply maximum to an integer
/home/yolanda/go/src/github.com/metal3-io/baremetal-operator/pkg/apis/metal3/v1alpha1/baremetalhost_types.go:211:17: unsupported type "float64"
Error: error generating CRDs from APIs in pkg/apis: error generating CRD manifests: error running CRD generator: not all generators ran successfully

Seems that support for float64 is not there, and needs to be updated.

Details

The metal3 CRD for BareMetalHost has a status field containing the CPU speed in MHz stored as float64. Floating point values are not supported in the kubernetes tools for generating API code, including CRD specifications.

It’s not entirely clear how this came to be allowed. It may be related to using an older version of the operator-sdk that did not reject the float field, or perhaps we modified the CRD file by hand. It may be related to incomplete validation in CI. Likely some of each.

This situation poses several issues:

  1. Until the problem is resolved, we cannot introduce better validation to prevent other similar issues because the toolchain errors out on the unsupported type.
  2. Changing the type of the field will not be backwards compatible, so we should provide an upgrade path and conversion support such as web hooks. Managing those using the operator-sdk will be hindered by the underlying issue, that the tools break with the current code base.
  3. Several of us have customers with BareMetalHost resources in clusters that will require upgrading.

These are somewhat mitigated by the fact that the field is in the status struct, so it is unlikely that any users are relying on the values.

Possible Approaches

Change Type in Place, Retain v1alpha1

The API version for BareMetalHost is v1alpha1. Upstream, we could reasonably say that the fix is to change the type of the field and that no backwards compatibility or upgrade support is provided. Fixing the type in place allows the operator-sdk tools to work properly again.

Any existing users would be broken by the type change, because existing data could no longer be deserialized from storage properly.

Downstream, because several of us have shipped the API to customers, we cannot allow the break. We might as well do the conversion work upstream to make maintenance simpler for everyone.

Remove Field, Retain v1alpha1

We could remove the problematic field entirely.

Removing the old field with the bad type would allow the operator-sdk tools to work properly again.

The content of the field in existing CRs would be lost. It’s a status field, so the impact of the loss would be minimal. It would however break downstream tools like GUIs, which is not ideal.

Rename and Change Type, Retain v1alpha1

We could add a new field to replace the old field, using a different name and correct type and continue to use the v1alpha1 version.

The API won’t call conversion web hooks for an object accessed with the same version as its storage version.

As in the case of simply removing the field, the content of the field in existing CRs would be lost. It’s a status field, so the impact of the loss would be minimal but would break downstream GUIs.
Any nodes deployed with the new CRD definition would report the data correctly.

Change Type, Switch to New API Version

Under most circumstances, the most natural thing to do when changing the API would be to increment the version number and provide converter web hooks. Updating to v1alpha2 or v1beta1 would require maintaining the current v1alpha1 types to support the conversion tools. In this case, the API needs to be changed because the tools for managing the code break with the current code base. That makes retaining the old API version with the current type harder.

We could hack around the problem in operator-sdk by changing the type in our code temporarily, so we could generate the new CRD definition, then changing the type back and editing the generated files to make them conform with reality. That would not solve the problem long-term, and would not allow us to enable automated tests for the CRD generation to prevent this issue from recurring.

Dropping the v1alpha1 version from the code base entirely would finally allow us to resume using the CRD validation tools.

Change the “Rules”

Clearly at some point the tools we were using allowed float fields. We could, temporarily, fork operator-sdk’s version of kubebuilder to bypass the error we’re seeing now. This would allow us to update to a v1alpha2 API while maintaining v1alpha1 so we could have a version of the baremetal-operator that can do the conversion. Then, in short order, we could drop v1alpha1 and go back to using the upstream tools.

We need to fork both operator-sdk and controller-tools because the error is actually generated in sigs.k8s.io/controller-tools/pkg/crd/schema.go and we would need a version of operator-sdk that calls our modified code. We can use vendored modules to avoid having forks of 2 git repositories. The changes in https://github.com/dhellmann/operator-sdk/tree/allow-floats eliminate the error and convert float64 types to float in the generated CRD.

To Do

The Change the "Rules" approach seems like the most viable way forward.

@honza
Copy link
Member

honza commented Feb 19, 2020

cc #367

@stbenjam stbenjam added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Feb 26, 2020
@stbenjam
Copy link
Member

Let's try this again after #367 is merged

@stbenjam
Copy link
Member

#444 is merged, could you try this again?

@kirankt
Copy link
Contributor

kirankt commented Mar 19, 2020

AFAIK, this problem still exists.

@honza
Copy link
Member

honza commented Mar 19, 2020

I'll look into it.

@kirankt
Copy link
Contributor

kirankt commented Mar 19, 2020

So, I did a little bit of digging around and testing on my own and found out that even with operator-sdk version 0.16.0, it still is broken. BUT, if we use kubebuilder's controller-gen [1] command, it will fix all the issues we're seeing and it'll render the CRDs correctly. Here is an example that MAO uses [2] to generate CRDs.

[1] https://github.com/kubernetes-sigs/controller-tools/tree/master/cmd/controller-gen
[2] https://github.com/openshift/machine-api-operator/blob/master/hack/gen-crd.sh

@stbenjam stbenjam added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Apr 15, 2020
@dhellmann
Copy link
Member

#496 fixes part of the problem.

The other problem I see is that kubebuilder does not like floating point types, and we're using a float64 for the clock speed. Changing that type is going to be an API break. @kirankt are you saying that controller-gen allows floats?

@kirankt
Copy link
Contributor

kirankt commented Apr 29, 2020

#496 fixes part of the problem.

The other problem I see is that kubebuilder does not like floating point types, and we're using a float64 for the clock speed. Changing that type is going to be an API break. @kirankt are you saying that controller-gen allows floats?

Yes, @dhellmann . That's correct. controller-gen works with float64, I have tested it.

@dhellmann
Copy link
Member

#496 fixes part of the problem.
The other problem I see is that kubebuilder does not like floating point types, and we're using a float64 for the clock speed. Changing that type is going to be an API break. @kirankt are you saying that controller-gen allows floats?

Yes, @dhellmann . That's correct. controller-gen works with float64, I have tested it.

Is the copy of controller-gen vendor into the openshift/machine-api-operator repository older than whatever controller tools are in kubebuilder?

@dhellmann
Copy link
Member

When I install controller-gen from source I get the same error that operator-sdk produces.

$ go get sigs.k8s.io/controller-tools/cmd/controller-gen
go: finding sigs.k8s.io/controller-tools v0.3.0
go: downloading sigs.k8s.io/controller-tools v0.3.0
go: extracting sigs.k8s.io/controller-tools v0.3.0
go: downloading sigs.k8s.io/controller-tools v0.2.8
go: extracting sigs.k8s.io/controller-tools v0.2.8
go: downloading github.com/fatih/color v1.7.0
go: downloading github.com/gobuffalo/flect v0.2.0
go: downloading gopkg.in/yaml.v3 v3.0.0-20190905181640-827449938966
go: extracting github.com/fatih/color v1.7.0
go: downloading github.com/mattn/go-colorable v0.1.2
go: downloading github.com/mattn/go-isatty v0.0.12
go: extracting github.com/gobuffalo/flect v0.2.0
go: extracting gopkg.in/yaml.v3 v3.0.0-20190905181640-827449938966
go: extracting github.com/mattn/go-colorable v0.1.2
go: extracting github.com/mattn/go-isatty v0.0.12
go: downloading k8s.io/api v0.18.2
go: downloading k8s.io/apimachinery v0.18.2
go: downloading k8s.io/apiextensions-apiserver v0.18.2
go: extracting k8s.io/apiextensions-apiserver v0.18.2
go: downloading k8s.io/utils v0.0.0-20200324210504-a9aa75ae1b89
go: extracting k8s.io/apimachinery v0.18.2
go: extracting k8s.io/utils v0.0.0-20200324210504-a9aa75ae1b89
go: downloading sigs.k8s.io/structured-merge-diff v1.0.2
go: extracting k8s.io/api v0.18.2
go: extracting sigs.k8s.io/structured-merge-diff v1.0.2
go: downloading sigs.k8s.io/structured-merge-diff/v3 v3.0.0
go: extracting sigs.k8s.io/structured-merge-diff/v3 v3.0.0
go: finding github.com/spf13/cobra v0.0.6
go: finding github.com/fatih/color v1.7.0
go: finding k8s.io/apimachinery v0.18.2
go: finding k8s.io/api v0.18.2
go: finding k8s.io/apiextensions-apiserver v0.18.2
go: finding gopkg.in/yaml.v3 v3.0.0-20190905181640-827449938966
go: finding github.com/gobuffalo/flect v0.2.0
go: finding github.com/mattn/go-isatty v0.0.12
go: finding github.com/mattn/go-colorable v0.1.2
go: finding sigs.k8s.io/structured-merge-diff/v3 v3.0.0
go: finding k8s.io/utils v0.0.0-20200324210504-a9aa75ae1b89

then

$ controller-gen crd crd:crdVersions=v1 paths=$(pwd)/pkg/apis/... output:crd:dir=$(pwd)/deploy/crds/
/home/dhellmann/go/src/github.com/metal3-io/baremetal-operator/pkg/apis/metal3/v1alpha1/baremetalhost_types.go:291:2: must apply minimum to an integer
/home/dhellmann/go/src/github.com/metal3-io/baremetal-operator/pkg/apis/metal3/v1alpha1/baremetalhost_types.go:291:2: must apply maximum to an integer
/home/dhellmann/go/src/github.com/metal3-io/baremetal-operator/pkg/apis/metal3/v1alpha1/baremetalhost_types.go:320:2: must apply minimum to an integer
/home/dhellmann/go/src/github.com/metal3-io/baremetal-operator/pkg/apis/metal3/v1alpha1/baremetalhost_types.go:320:2: must apply maximum to an integer
/home/dhellmann/go/src/github.com/metal3-io/baremetal-operator/pkg/apis/metal3/v1alpha1/baremetalhost_types.go:218:17: unsupported type "float64"
/home/dhellmann/go/src/github.com/metal3-io/baremetal-operator/pkg/apis/metal3/v1alpha1/baremetalhost_types.go:291:2: must apply minimum to an integer
/home/dhellmann/go/src/github.com/metal3-io/baremetal-operator/pkg/apis/metal3/v1alpha1/baremetalhost_types.go:291:2: must apply maximum to an integer
/home/dhellmann/go/src/github.com/metal3-io/baremetal-operator/pkg/apis/metal3/v1alpha1/baremetalhost_types.go:320:2: must apply maximum to an integer
/home/dhellmann/go/src/github.com/metal3-io/baremetal-operator/pkg/apis/metal3/v1alpha1/baremetalhost_types.go:320:2: must apply minimum to an integer
/home/dhellmann/go/src/github.com/metal3-io/baremetal-operator/pkg/apis/metal3/v1alpha1/baremetalhost_types.go:218:17: unsupported type "float64"
Error: not all generators ran successfully

@kirankt
Copy link
Contributor

kirankt commented Apr 30, 2020

@dhellmann Sorry to send you on a wild goose chase. I was wrong. I had made local changes to use json.Number (a string type) and it made it work. From what I gather, floats are not supported in the spec, only string, bool and ints.

https://github.com/kubernetes-sigs/controller-tools/blob/master/pkg/crd/schema.go#L409

@kirankt
Copy link
Contributor

kirankt commented Apr 30, 2020

Found this recommendation in lieu of float64: kubernetes-sigs/controller-tools#245 (comment)

@dhellmann
Copy link
Member

The use of a float is going to be complicated to fix. I wonder how we made that work at all in the first place.

kubebuilder doesn't support floats for some technical reasons having to do with IEEE representations not being very round-tripable. We potentially have a bunch of data in the wild already using floats. Changing the field type to a string (or json.Number) means we can't interpret the old existing data can't be unmarshalled into a string field.

dhellmann added a commit to dhellmann/baremetal-operator that referenced this issue May 1, 2020
Create a symlink from deploy/role.yaml to deploy/rbac/role.yaml so
that the operator-sdk will find the role file it needs for the `add
api` command.

We cannot make the link in the opposite direction because kustomize
refuses to follow symlinks outside of the directory where its
configuration is.

Fixes metal3-io#500
Related to metal3-io#434

Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
@dhellmann dhellmann self-assigned this May 1, 2020
zouy414 added a commit to zouy414/baremetal-operator that referenced this issue May 20, 2020
Signed-off-by: ZouYu <zouy.fnst@cn.fujitsu.com>
@zhouhao3
Copy link
Member

I noticed that this problem has been solved temporarily, and a new version of API will be launched next, and there is also a discussion about API design in the Google group[1]. So will these two situations be combined to launch a new version of the API?
[1] https://groups.google.com/forum/#!topic/metal3-dev/QfMJpyG0Zss

@zaneb
Copy link
Member

zaneb commented May 21, 2020

I noticed that this problem has been solved temporarily, and a new version of API will be launched next, and there is also a discussion about API design in the Google group[1]. So will these two situations be combined to launch a new version of the API?

No, it's likely that there'll be a minor revision to correct this issue and maybe do some other small cleanups in the short term. Any major changes like those discussed in that thread will come later.

@zhouhao3
Copy link
Member

Any major changes like those discussed in that thread will come later.

I see that the rest of the work in this issue is to update the API. Does the community plan to wait until Future API evolution discusses the results before releasing the new API version? Then complete the remaining tasks in this issue.

@metal3-io-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@metal3-io-bot metal3-io-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 20, 2020
@metal3-io-bot
Copy link
Contributor

Stale issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle stale.

/close

@metal3-io-bot
Copy link
Contributor

@metal3-io-bot: Closing this issue.

In response to this:

Stale issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle stale.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

8 participants