Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: component-base config to beta #105448

Closed
wants to merge 2 commits into from

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Oct 4, 2021

What type of PR is this?

/kind cleanup
/kind api-change

What this PR does / why we need it:

In practice, these structs already became beta when they were embedded in other
structs that themselves are presented as beta to users (kube-scheduler v1beta2,
kubelet v1beta1) because to them it is not visible that some parts of the
overall config is still alpha.

Special notes for your reviewer:

This could have been addressed by calling out entire subsets (like kubelet "logging") as experimental, but wasn't done:

While at it, docs get improved bit.

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/cloudprovider area/kubeadm sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 4, 2021
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@pohly
Copy link
Contributor Author

pohly commented Oct 4, 2021

I don't know how much https://github.com/kubernetes/community/blob/master/sig-architecture/api-review-process.md#mechanics applies here.

Here are some potentially relevant KEPs:

The second one is relevant because the goal is to add some more fields to LoggingConfiguration. It would be good to clarify the version of that struct first.

The new fields will be alpha. There's precedence for having alpha fields in beta APIs, so this should be okay as long as the new fields are explicitly called out as "alpha" or "experimental" (similar to the alpha log sanitization field).

@liggitt liggitt self-assigned this Oct 4, 2021
@liggitt liggitt added this to Assigned in API Reviews Oct 4, 2021
@pohly
Copy link
Contributor Author

pohly commented Oct 4, 2021

/retest

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 4, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 4, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pohly
To complete the pull request process, please ask for approval from liggitt after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pohly
Copy link
Contributor Author

pohly commented Oct 5, 2021

/retest

@fedebongio
Copy link
Contributor

/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Oct 5, 2021
@pohly
Copy link
Contributor Author

pohly commented Oct 5, 2021

@fedebongio I saw you removing api-machinery earlier. Sorry for the noise.

I brought up these useless labels in ecb3813#r57456677 which then moved to #105414 (comment).

@@ -14,7 +14,7 @@ See the License for the specific language governing permissions and
limitations under the License.
*/

package v1alpha1
package v1beta1

import (
"time"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the TODO(#80289) below seems relevant to this promotion... we don't actually want to propagate a recommended default of using endpoints objects for leader election to new consumers

I would recommend dropping the default for the ResourceLock field to force callers to explicitly select the resource type to use (the same way they have to set ResourceNamespace/ResourceName today)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is code that I am not familiar with. Same with ResourceLock below. I can try to address these comments, but I am really starting to wonder whether combining all of these unrelated structs in a single config package is the right approach. I find it cleaner to split the package into individual pieces that are aligned with ownership.

Shall I try that as part of this PR or first split it while keeping everything as alpha?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend dropping the default for the ResourceLock field to force callers to explicitly select the resource type to use (the same way they have to set ResourceNamespace/ResourceName today)

@liggitt Why?
I think we should default to Lease - I don't see any advantage of choosing anything else. [And if someone really wants to change, they can always do that.]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consequences of selecting the wrong one between two versions are really bad (totally breaks leader election). We should make the consumer select the resource type to use and document the recommended one if they're starting fresh.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - if you're worrying about people upgrading the current usage to the new API without looking into it - that makes sense.
In that case leaving it empty and documenting that Lease is the desired lock type and mentioning the migration risk and migration path to it sounds like desired path.

@@ -14,7 +14,7 @@ See the License for the specific language governing permissions and
limitations under the License.
*/

package v1alpha1
package v1beta1

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ResourceLock field needs to document

  • the allowed values (leases, endpointsleases, endpoints, configmapsleases, and configmaps)
  • the recommended value if backwards compatibility with an older config is not required (leases)
  • the recommended value if backwards compatibility with an older config using endpoints or configmaps is required (endpointsleases or configmapsleases, respectively)

@@ -81,8 +81,7 @@ type ClientConnectionConfiguration struct {
Burst int32 `json:"burst"`
}

// LoggingConfiguration contains logging options
// Refer [Logs Options](https://github.com/kubernetes/component-base/blob/master/logs/options.go) for more information.
// LoggingConfiguration contains logging options.
type LoggingConfiguration struct {
// Format Flag specifies the structure of log messages.
// default value of format is `text`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what to do with the Sanitization field in the config while the feature is still experimental... it doesn't look like the feature is gated, and the config field doesn't indicate it is experimental like the command-line flag does, so it would be possible for someone to have this enabled in a config and not realize they are using an unstable feature (this is true today, I'm just wanting to use this promotion as a check of existing fields)

possibilities:

  1. drop the field from the beta type
  2. feature gate the functionality (failing if you try to set to set this to true without the feature enabled)
  3. rename this to an explicitly experimental field

what do you think?

Copy link
Contributor Author

@pohly pohly Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't look like the feature is gated

Based on the comment in https://groups.google.com/g/kubernetes-sig-architecture/c/Nxsc7pfe5rw/m/Sw6FAj57BAAJ a specific feature gate for log sanitization wouldn't have been needed, but hiding it behind a common "logging feature gate" gate would have had value. Neither was done, but it's not too late to add one.

the config field doesn't indicate it is experimental like the command-line flag does

There is an [Experimental] tag in

// [Experimental] When enabled prevents logging of fields tagged as sensitive (passwords, keys, tokens).
?

The field name itself could have included "experimental" to make it even clearer.

possibilities:

  • drop the field from the beta type

This is the same as removing the feature because the KubeletConfiguration and thus command line only supports one version of the struct. If that version then doesn't have the field, it cannot be enabled.

We would have to keep v1alpha1 with the field and add v1beta1 without it, then in kubelet stick with v1alpha1. In practice, we want the alpha features in all commands, so nothing would use v1beta1 and we can delay the promotion until all fields are beta quality and v1alpha1 can be removed.

  • feature gate the functionality (failing if you try to set to set this to true without the feature enabled)

I prefer that. For CPU manager, CPUManagerPolicyOptions was added, but gates for alpha and beta were also discussed. In this case, we could add AlphaLoggingConfiguration as feature gate with alpha as permanent state and put several things behind it (for now):

  • sanitization
  • JSON output
  • the upcoming JSON options for buffering

Then when graduating some of them to beta, we add BetaLoggingConfiguration and check for that for those features.

When GA, they are always enabled.

However, when we do this, why bother with versioning the LoggingConfiguration API at all? What does the version indicate, and to whom? End-users are not seeing it and cannot use it to choose a specific API version. Marking it as alpha could have warned our fellow developers that using it in a beta struct might not be a good idea, but that hasn't really worked either...

So if we add such a feature gate, we should keep v1alpha1 and do the graduation only when all fields are beta. Hmm, but what if we then add a new alpha field? We would have to re-introduce v1alpha1. That's a lot of back-and-forth for little gain.

I'm leaning towards removing the entire versioning unless someone can explain to me what it is good for and why that justifies the extra work.

Perhaps if the LoggingConfiguration was used with its own TypeMeta then it would make sense, but that's not what these structs are meant for, right?

  • rename this to an explicitly experimental field

I prefer the feature gate approach because it causes less churn for users (compared to having to rename fields in a config file: AlphaSanitization -> BetaSanitization -> Sanitization).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to mention that i started advocating against encoding the maturity of a feature in the names of annotations, flags, fields, etc. It is due to the breaking changes on graduations.

Users who use an experimental feature should know about the maturity from documentation - flag descriptions, go docs. Just using a feature without reading its documentation is not advisable, if copied from examples online.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess v1alpha1.LoggingConfiguration and a separate v1beta1.LoggingConfiguration with less fields (just those which are beta) would have made some sense if the former had been used in v1alpha1.KubeletConfiguration and the latter in v1beta1.KubeletConfiguration. But there is no v1alpha1.KubeletConfiguration...

Copy link
Contributor Author

@pohly pohly Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem with feature gates is when the configuration enables a feature, either by default or because of user selection, and the feature gate disables it. Should that trigger a warning?

As an example, suppose that log sanitization was beta and enabled by default. Now kubelet gets run with
---logging-sanitization=true --feature-gates BetaLoggingConfiguration=false

The default value is set before parsing feature gates, so LoggingConfiguration.Sanitization will be true. The code which validates it will run after parsing feature gates, but it cannot complain about its own default value, right? So validation cannot emit an error. Later the code which uses the field can ignore it, but should that then trigger a warning?

That the user explicitly enabled the feature is not tracked, so it has to be treated like the default case.

Copy link
Contributor Author

@pohly pohly Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards removing the entire versioning unless someone can explain to me what it is good for and why that justifies the extra work.

I prototyped that here to get a better feeling for what that would look like:

https://github.com/kubernetes/kubernetes/compare/master...pohly:log-configuration-feature-gates?expand=1

Introducing feature gates is not part of that yet, but that would be the goal before merging.

I find that new code structure a lot cleaner, but of course I am biased 😁

  • everything related to logging is under component-base/logs;
    previously this was scattered across different packages and
    different files under "logs" (why some code was in logs/config.go
    vs. logs/options.go vs. logs/logs.go never became clear to me):

    • long-term config and command line API are clearly separated
      into the "api" package underneath that

    • logs/logs.go itself only deals with legacy global flags and
      logging configuration

  • removal of separate Go APIs like logs.BindLoggingFlags and
    logs.Options

@liggitt liggitt moved this from Assigned to Changes requested in API Reviews Oct 6, 2021
@pohly pohly mentioned this pull request Oct 7, 2021
@logicalhan
Copy link
Member

/triage accepted
/assign

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 7, 2021
@k8s-ci-robot
Copy link
Contributor

@pohly: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 8, 2021
@liggitt liggitt moved this from Changes requested to In progress in API Reviews Oct 11, 2021
@pohly pohly changed the title component-base config to beta RFC: component-base config to beta Oct 15, 2021
@liggitt liggitt added this to the v1.23 milestone Nov 11, 2021
@ehashman
Copy link
Member

/priority backlog

@k8s-ci-robot k8s-ci-robot added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 15, 2021
@liggitt liggitt modified the milestones: v1.23, v1.24 Nov 17, 2021
@dims
Copy link
Member

dims commented Jan 10, 2022

Is this PR still needed, please rebase if so (or we can close it?)

@pohly
Copy link
Contributor Author

pohly commented Jan 10, 2022

It's on hold, waiting for a decision about #105797.

Some parts of it are still relevant even when that PR is merged.

@liggitt
Copy link
Member

liggitt commented Jan 20, 2022

closing in favor of #105797

/close

@k8s-ci-robot
Copy link
Contributor

@liggitt: Closed this PR.

In response to this:

closing in favor of #105797

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloudprovider area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note-none Denotes a PR that doesn't merit a release note. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

10 participants