KEP-1027: Union types

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

(R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
(R) KEP approvers have approved the KEP status as implementable
(R) Design details are appropriately documented
(R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests for meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
(R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
(R) Production readiness review completed
(R) Production readiness review approved
"Implementation History" section is up-to-date for milestone
User-facing documentation has been created in kubernetes/website, for publication to kubernetes.io
Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

Modern data model definitions like OpenAPI v3 and protobuf (versions 2 and 3) have a keyword to implement “oneof” or "union". They allow APIs to have a better semantics, typically as a way to say “only one of the given fields can be set”. We currently have multiple occurrences of this semantics in kubernetes core types, at least:

VolumeSource is a structure that holds the definition of all the possible volume types, only one of them must be set, it doesn't have a discriminator.
DeploymentStrategy is a structure that has a discrminator "DeploymentStrategyType" which decides if "RollingUpate" should be set

The problem with the lack of solution is that:

The API is implicit, and people don't know how to use it
Clients can't know how to deal with that, especially if they can't parse the OpenAPI
Server can't understand the user intent and normalize the object properly

Motivation

Currently, changing a value in an oneof type is difficult because the semantics is implicit, which means that nothing can be built to automatically fix unions, leading to many bugs and issues:

kubernetes/kubernetes#35345
kubernetes/kubernetes#24238
kubernetes/kubernetes#34292
kubernetes/kubernetes#6979
kubernetes/kubernetes#33766
kubernetes/kubernetes#24198
kubernetes/kubernetes#60340

And then, for other people:

rancher/rancher#13584
helm/charts#12319
EnMasseProject/enmasse#1974
helm/charts#11546
kubernetes/kubernetes#35343

This is replacing a lot of previous work and long-standing effort:

Initially: kubernetes/community#229, then
kubernetes/community#278
kubernetes/community#620
kubernetes/kubernetes#44597
kubernetes/kubernetes#50296
kubernetes/kubernetes#70436

Server-side apply is what enables this proposal to become possible.

Goals

Provide a simple mechanism for API authors to label fields of a resource as members of a oneOf, in order to receive standardized validation and normalization, rather than having to author it themeselves per resource as currently done as a workaround in various validation functions (e.g. pkg/apis/<group>/validation/validation.go).

Validation - ensuring only one member field is set (or at most one if desired).
Normalization - ensuring the API server can understand the intent of clients that are unable to update/modify fields the clients are unaware of due to version skew.

Non-Goals

Migrating all existing unions away from their bespoke validation logic (e.g validation functions), is an explicit non-goal and will be pursued in a separate KEP or later release.

Proposal

User Stories (Optional)

Story 1

As a CRD owner, I can use simple semantics (such as openapi tags/go markers), to express the desired validation of a oneOf (at most one or exactly one field may be set), and the API server will perform this validation automatically.

Story 2

As a client, I can read, modify, and update the union fields of an object, even if I am not aware of all of the possible fields, and the server will properly interpret my intent.

Notes/Constraints/Caveats (Optional)

Risks and Mitigations

We need to ensure we do not break existing union types. This can be done by not forcing existing unions to conform to the newly proposed union semantics. Integration testing with older types should give us the confidence to be sure we have done so.
There is a lot of risk for errors when there exists skew between clients and server. In the section on normalization, we discuss mitigating these risks.

Design Details

Discriminator Field

We propose that all new unions maintain a "discriminator". This is a field that points to which of the other "union member" fields is to be respected as the truly desired union field in the case that there are any conflicts.

In order to demonstrate the need for the discriminator, we developed an extensive test matrix that looks at various configurations of the performing REST operations on a union where the client or server is unaware of a newly added field to the union (due to version skew).

We present a guide doc on how to interpret the test matrix, but the major conclusions are as follows (along with the test case number from the test matrix):

(Case #22 and #27) If an unstructured client (i.e. a client that represents data as raw json maps with no knowledge of the schema) is unaware of field on the union, but wants to clear the union entirely (assuming the union is optional), it will have no way of doing so without a discriminator. With a discriminator, the client can express its intention by setting the discriminator to the empty value and the server can respect its intentions and clear any fields the client is unaware of.
(Case #12 and #16) If a structured client is unaware of a field in the union that is set and it just wants to echo back the union it received in a get request (such as when updating other parts of the object), a client without a discriminator will silently drop the currently set field, while a client with the discriminator will not change the discriminator value, indicating to the server that no changes are desired in the union.
(Case #34 and #39) If a client sets a union field that the server is not aware of, the server will silently drop it and attempt to clear the object of the union field. With a discriminator, the server will see the unrecognized discriminator value and can fail loudly.
(Case #23 and #28) When a client goes to set a field it knows of, but a separate field it doesn't know about is currently set, the server can simply know to always respect the discriminator. Without a discriminator, the server will have to do convoluted logic to detect that the previously set field has not been modified and that only one of the other union fields has been.

Go Markers

We're proposing a new type of tags for go types (in-tree types, and also kubebuilder types):

// +unionDiscriminator before a field means that this field is the discriminator for the union. This field MUST be an enum defined as a string (see section on discriminator values). This field MUST be required if there is no default option, omitempty if the default option is the empty string, or optional and omitempty if a default value is specified with the // +default marker.
// +unionMember[=<memberName>][,optional] before a field means that this field is a member of a union. The is the name of the field that will be set as the discriminator value. It MUST correspond to one of the valid enum values of the discriminator's enum type. It defaults to the go (i.eCamelCase) representation of the field name if not specified. should only be set if authors want to customize how the fields are represented in the discriminator field.should match the serialized JSON name of the field case-insensitively. The comma separated optional value determines whether or not the member field must be set when the discriminator selects it. Meaning, whenoptionalis present on a field, the discriminator can select the field even if the field is not set. If optional is not present in theunionMembertag, then the object will fail validation if the discriminator selects the field but it is nil. A field can be marked as optional without specifying memberName via// +unionMember,optional`.

Discriminator Values

Here we present a description of how discriminators and their valid values should be defined.

As described above, the discriminator field must be a string and required. Because, there are only a few specific values that the discriminator can be, we propose that all discriminators should be defined as an enum, and should be tagged so via the enum go marker // +enum.

If no option is a valid option for a union, such an option must be defined as a member of the discriminator values enum. By convention, this "no member" discriminator should be the empty string, but there is nothing stopping API authors from defining their own "no option" discriminator value.

Empty Union Members

In some cases there are more discriminator values than there are member fields defined in the struct when that specific member requires no configuration. An example is the DeploymentStrategy where it has one member field rollingUpdate, but two valid discriminator values RollingUpdate and Recreate. By using an enum as the discriminator value we are able to define values beyond the member fields in order to accommodate this pattern.

Examples

Below is an example of how to define a union based on the above design

// +enum
type UnionType string

const (
  FieldA UnionType = "FieldA"
  FieldB UnionType = "FieldB"
  FieldC UnionType = "FieldC"
  FieldD UnionType = "FieldD"
  FieldNone UnionType = ""
)

type Union struct {
  // +unionDiscriminator
  // +required
  UnionType UnionType

  // +unionMember
  // +optional
  FieldA int
  // +unionMember
  // +optional
  FieldB int
}

Note unions can't span across multiple go structures (all the fields that are part of a union has to be together in the same structure), examples of what is allowed:

// This will have one embedded union.
type TopLevelUnion struct {
  Name string `json:"name"`

  Union `json:",inline"`
}

// +enum
type UnionType string

const (
  FieldA UnionType = "FieldA"
  FieldB UnionType = "FieldB"
  FieldC UnionType = "FieldC"
  FieldD UnionType = "FieldD"
  FieldNone UnionType = ""
)

// This will generate one union, with two fields and a discriminator.
type Union struct {
  // +unionDiscriminator
  // +required
  UnionType UnionType `json:"unionType"`

  // +unionMember
  // +optional
  FieldA int `json:"fieldA"`
  // +unionMember,optional
  // +optional
  FieldB int `json:"fieldB"`
}

// +enum
type Union2Type string

const (
  Alpha Union2Type = "ALPHA"
  Beta = "BETA"
)

// This generates a union where the unionMember markers demonstrate how to
customize the names used for each field in the discriminator.
type Union2 struct {
  // +unionDiscriminator
  // +required
  Type2 Union2Type `json:"type"`
  // +unionMember=ALPHA
  // +optional
  Alpha int `json:"alpha"`
  // +unionMember=BETA,optional
  // +optional
  Beta int `json:"beta"`
}

OpenAPI

OpenAPI v3 already allows a "oneOf" form, which is accepted by CRD validation (and will continue to be accepted in the future). That oneOf form will be used for validation, but is "on-top" of this proposal.

A new extension is created in the openapi to describe the behavior: x-kubernetes-unions.

This is a list of unions that are part of this structure/object. Each item in the list represents a discriminator for the union, and for each member field, the discriminator value of that field and whether or not that field is optional.

Conversion between OpenAPI v2 and OpenAPI v3 will preserve these fields.

The following is an example of what the generated OpenAPI definition will look like for a given go type.

const (
  FieldA Union1Type = "FieldA"
  FieldB Union1Type = "FieldB"
  FieldC Union1Type = "FieldC"
  FieldD Union1Type = "FieldD"
  FieldNone Union1Type = ""
)

// This will generate one union, with two fields and a discriminator.
type Union struct {
  // +unionDiscriminator
  // +required
  Union1 Union1Type `json:"union1"`

  // +unionMember
  // +unionDiscriminatedBy=Union1
  // +optional
  FieldA int `json:"fieldA"`
  // +unionMember,optional
  // +unionDiscriminatedBy=Union1
  // +optional
  FieldB int `json:"fieldB"`

}

The OpenAPI x-kubernetes-unions extension will then be attached to the discriminator's property and deserialized into go structs as follows:

// XKubernetesUnions is the top level extension
type XKubernetesUnions struct {
  // FieldMembers are the mapping of all valid discriminator values in a
  // union to the corresponding member field.
  // Discriminator value is the value to which the discriminator is set to
  // in order to indicate that a given member field is the currently set member
  // of the union.
  // MemberField may be nil in the case of empty union members where a valid
  // discriminator value has no corresponding member field.
  FieldMembers map[DiscriminatorValue]*MemberField `json:"fieldMembers"`
}

// DiscriminatorValue is the value that the discriminator is set to 
// in order to indicate the selection of a union member.
type DiscriminatorValue string

// MemberField
type MemberField struct {
  // Name is the name of the field corresponding to the member.
  // It will be the json representation of the field marked with the
  // `// +unionMember` marker in the go type.
  Name string `json:"name"`
  // Optional determines whether the discriminator _may_ select this member
  // even when the member field is empty. Optional defaults to false.
  Optional bool `json:"optional"`
}

Normalization and Validation

Normalization

Normalization refers to the process by which the API server attempts to understand and correct clients which may provide the server with conflicting or incomplete information about a union in update or patch requests.

Issues primarily arise here because of version skew between a client and a server, such as when a client is unaware of new fields added to a union and thus doesn't know how to clear these new fields when trying to set a different field.

For unions that follow this design, normalization is simple: the server should always respect the discriminator.

This means that when the server receives an update request with a discriminator set to a given field, and multiple member fields are set it should clear all fields except the one pointed to by the discriminator if and only if the discriminator has been modified. Having multiple fields set, and a discriminator not modified is invalid and caught later by the validation step (see below).

For both custom resources and built-in types, we expect union normalization to be called by the request handlers shortly after mutating admission occurs.

Validation

Objects must be validated AFTER the normalization process.

Some validation situations specific to unions are:

When multiple union fields are set and the discriminator has not been modified we should error loudly that the client must change the discriminator if it changes any union member fields.
When the server receives a request with a discriminator set to a given field, but that given field is empty and not marked as optional, the server should fail with a clear error message. Note this does not apply to discriminator values that do not correspond to any field (as in the "empty union members case").

For both custom resources and built-in types, validation will occur as part of the request validation, before validating admission occurs.

For custom resources, union validation will be done at the same point as the existing structural schema validation that occurs in the custom resource handler. This ensures that any generic validation changes made to all custom resources (such as the ratcheting validation discussed below), behaves appropriately with union validation.

Ratcheting Validation

When updating CRDs to support union validation, it is possible that existing CRs become invalid.

The naive solution is to require existing CRs to be updated to a valid state before they can be updated again.

This creates many potential landmines, and so ratcheting validation is proposed as an alternative. Ratcheting validation means that objects will ignore stricter validation rules if and only if the existing object also fails the stricter validation for the same reason.

Ratcheting validation for custom resources is a separate effort proposed outside of this unions effort. For the initial alpha graduation of unions, we do not propose supporting ratcheting validation. We will require all invalid CRs to be made valid before they can be updated (the naive solution).

In order to potentially support ratcheting validation in the future, we will ensure that all callers of union validation retain access to both old and new objects, so that future ratcheting validation can be implemented within the union validation library.

Migrating existing unions

As mentioned, one of the goals is to migrate at least one existing union to using the new marker based union validation and normalization. While open questions remain around the priority and urgency of migrating existing unions, nonetheless we should be able to come to a consensus on which types to migrate first.

For discriminated unions, a couple relatively straightforward discriminated types are MetricSpec and MetricStatus. These have clearly defined discriminator values that map one-to-one to a member field, which make them good candidates for initial migration.

For non-discriminated unions, there are a few relatively straightforward types that make good candidates for initial migration, such as ContainerState

Until migrated, union types without a discriminator (i.e. only existing unions that have not been migrated to the current desgin), cannot be tagged with the go markers described above and thus will not be treated as "unions" in the sense of this currently proposed normalization and validation logic.

These legacy unions must continue to perform normalization and validation manually, per resource in the validation functions.

Test Plan

[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.

Prerequisite testing updates

Unit tests

Core functionality will be extensively unit tested in the SMD typed package (union_test.go).

Parts of the kubernetes endpoints handlers package that are modified to call into the SMD code will also be unit tested as appropriate.

Integration tests

We will have extensive integration testing of the union code in the test/integration/apiserver package.

We will be testing along the dimensions of:

Which fields of the union get modified (none, existing fields, newly updated fields)
Type of union (discriminated vs non-discriminated)
Whether the client is aware of all the fields
Whether the server is aware of all fields
Whether the union is optional or required

A fully documented test matrix exists in a google spreadsheet along with a guide doc on how to read and understand the test matrix.

As part of implementing the test matrix we will be able to prove the viability of upgrading existing unions by writing tests to mimic using the standardized union semantics on existing unions (even if actually upgrading these unions is outside the scope of alpha graduation)

e2e tests

We are considering adding kubectl e2e tests to mimic kubectl users performing various operations on objects with union fields.

Graduation Criteria

Alpha Graduation

CRDs can be created with union fields and are properly validated when created/updated.
Prove the viability of upgrading existing unions to the new semantics by mimicking existing unions in e2e tests.
Existing unions that don't have discriminators do not break when upgraded.

Upgrade / Downgrade Strategy

Turning the flag on for alpha just enables different runtime codepaths (i.e. performing the unified union validation and normalization)

Any schema markers (added by CRD authors or propagated from tags on built-in types) will appear in the schema, but not do anything if the flag is off.

Version Skew Strategy

See test matrix and commentary about discriminators. It clearly documents how the server will use the discriminator to understand the client's intention even if the client is not aware of all union fields because of version skew.

Skew with alpha flag on/off shouldn't make much of a difference.

Objects created with the union semantics, but applied to a cluster with the alpha flag off will simply not perform union validation and normalization.
Objects created without union semantics will simply not trigger union validation and normalization (regardless of whether the server has the alpha enabled or disabled).

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?

Feature gate (also fill in values in kep.yaml)
- Feature gate name: APIUnions
- Components depending on the feature gate: kube-apiserver

Request handlers in the api server will call into union validation and normalization function from the structured-merge-diff repo when feature is enabled.

Does enabling the feature change any default behavior?

Enabling the feature could cause existing CRs to fail validation if the correspond CRD has union fields and the existing CRs have invalid unions that were unvalidated when initially created in a cluster that had the unions feature disabled.

These CRs will need to be corrected in order to pass validation (or the feature disabled).

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Yes, requests will simply skip union validation and normalization.

What happens if we reenable the feature if it was previously rolled back?

Custom resources that were skipping union validation when when the feature was rolled back may have allowed invalid data to persist.

For alpha, we require that all modifying requests (update/patch) fail unless the data passes union validation. Retrieving newly invalid CRs should still always succeed.

In the future, we may require looser "ratcheting validation" which would allow modifications to ignore union validation if the existing object fails the union validation for the same reason as the new object (see section on "Ratcheting Validation" above). This is not a priority for alpha.

Are there any tests for feature enablement/disablement?

We will have integration tests demonstrating how CRs with persisted invalid data will need to be corrected when the feature is re-enabled (and requires more strict union validation).

Rollout, Upgrade and Rollback Planning

How can a rollout or rollback fail? Can it impact already running workloads?

N/A

What specific metrics should inform a rollback?

apiserver_request_total could be watched to see if the number of create and update requests that are failing increase substantially.

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

N/A

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

N/A

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?

For builtins in alpha, it won't be possible to break clients since turning on vs off will validate the same thing via different code paths.

For CRDs, you can see if they have the new union markers. If the CRD has no other validation mechanism, turning off the flag may result in CRs accepting invalid input.

How can someone using this feature know that it is working for their instance?

Create a new CRD with a union field (and no other validation mechanism)
Apply the CRD
Create a CR with an invalid union (multiple fields set, no discriminator set), see if the CR is rejected via union validation

When we write the e2e test, a standard union CRD and test CR will be obtainable for users to test on their instance.

What are the reasonable SLOs (Service Level Objectives) for the enhancement?

N/A

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

N/A

Are there any missing metrics that would be useful to have to improve observability of this feature?

N/A

Dependencies

Does this feature depend on any specific services running in the cluster?

N/A

Scalability

N/A

Will enabling / using this feature result in any new API calls?

No

Will enabling / using this feature result in introducing new API types?

No

Will enabling / using this feature result in any new calls to the cloud provider?

No

Will enabling / using this feature result in increasing size or count of the existing API objects?

No

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

In GA (maybe beta), we might expect resource reduction/reliability improvement, since this removes a need for webhooks.

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?

New validation and normalization logic should be negligible given that the functions will be in the same SMD path currently used by SSA code.

We will have benchmarking to validate this assumption.

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?

N/A objects are not reachable

What are other known failure modes?

What steps should be taken if SLOs are not being met to determine the problem?

Implementation History

Unions implemented, but disabled in SMD.

Drawbacks

An issue that one might have with requiring a discriminator is that it might seem redundant to have to set a field and set another field indicating to the server to use the set field. The reasons for doing so are discussed above in the normalization section.

One other drawback is that our approach does not standardize all existing unions into a single format. We don't see a way to do so without drastically changing existing APIs and breaking backwards compatibility

Alternatives

Non-Discrminated

The primary alternative discussed is to not have a discriminator for new union types. As discussed in the normalization section, requiring a discriminator allows the server to better understand the intentions of clients that do not have knowledge of all the fields in a union if newer versions of the server add new fields to the union.

"None" Discriminator Values

A number of strategies were discussed around how to represent the "none" value of the discriminator (see "Discriminator Values" section above).

One alternative was to mandate the "none" value always be the empty string. The advantage to this is its simplicity and not creating a situation where different API authors define there "none" value differently, so that anyone could immediately know that a discriminator set to "" (empty string), is not selecting any of the member fields. Also, it would allow us to not have to define the set of enum values for each discriminator (as we could just use the name of the member field). The disadvantage is that by not defining the set of enum values, we make it impossible to support the "empty union members" case.
Another alternative was to make the discriminator a pointer to a string and its value nil. The disadvantage here is that this requires more complicated union validation logic (first do a nil check, then check the value) and makes it harder to determine client intent on patches where the discriminator is not set.
A third alternative is to require all unions be defined in their own separate struct. This was rejected because there are many existing unions that define random fields that are not members in the union within the same struct as fields that do make up the union and we hope to be able to migrate at least some of the existing unions to the new semantics.

Files

README.md

Latest commit

History