Skip to content
This repository has been archived by the owner on Jun 26, 2023. It is now read-only.

HNC: Review API for sub-project graduation #748

Closed
rjbez17 opened this issue May 20, 2020 · 18 comments
Closed

HNC: Review API for sub-project graduation #748

rjbez17 opened this issue May 20, 2020 · 18 comments
Assignees
Labels
api-review Categorizes an issue or PR as actively needing an API review. sig/auth Categorizes an issue or PR as relevant to SIG Auth.
Milestone

Comments

@rjbez17
Copy link

rjbez17 commented May 20, 2020

Complete an API review with approvals from at least one sig-auth technical lead as part of the graduation criteria to become a sub-project.

https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/1687-hierarchical-namespaces-subproject/README.md#graduation-criteria

@rjbez17 rjbez17 added sig/auth Categorizes an issue or PR as relevant to SIG Auth. api-review Categorizes an issue or PR as actively needing an API review. labels May 20, 2020
@mikedanese
Copy link
Contributor

Really quick pass to get started. @liggitt should do a review as well.

@adrianludwin
Copy link
Contributor

Thanks Mike! cc @yiqigao217 who will probably be making these changes.

Some questions/comments inline.

  • What are CritAncestor, NumPropagatedObjects, etc..?

    Do not use abbreviations in the API, except where they are extremely commonly used, such as "id", "args", or "stdin".

Didn't know about the abbreviation rule - will fix.

We wanted to distinguish between two broad classes of conditions - critical ones (which halt HNC's operation in some way) and noncritical ones. Embedding this into the name seemed to make sense - we could also consider a separate field, but that's not feasible if we use the standard condition scheme. What would you recommend? Simply documenting which codes are critical and which are not?

  • Use a standard conditions schema

Thanks, I didn't know about this. When a problem is resolved, is the practice to keep the condition around for a while in the "false" status and then delete it? Or should all possible conditions be displayed at all times?

  • AffectedObject looks almost exactly like v1.ObjectReference. Why is "Affected" part of the type name and not part of the field name?

We'll switch to ObjectReference, thanks.

  • Fields like "Types []TypeSynchronizationSpec" are not strategically mergable so they won't work with apply. Consider adding a key field.

The combination of APIVersion and Kind are the key - is there any way to mark them as such? Or should we combine them into a single field (eg "v1/Secret" or "foo.example.com/v1alpha1/BarType")?

  • Propagate SynchronizationMode = "propagate": String enums in APIs should be capitalized. Also, bare Propoagtee is going to clutter the go package namespace. Consider using SynchronizationMode SyncModePropagate = "Propoage". e.g. checkout:

Wilco

  • Only runtime objects should generally have APIVersion and Kind. Why do you need them on specs and statuses

Whoops, those are unused and will be fixed. Thanks!

@adrianludwin adrianludwin added this to the hnc-v0.5 milestone Jun 22, 2020
@liggitt
Copy link

liggitt commented Jun 22, 2020

We wanted to distinguish between two broad classes of conditions - critical ones (which halt HNC's operation in some way) and noncritical ones. Embedding this into the name seemed to make sense - we could also consider a separate field, but that's not feasible if we use the standard condition scheme. What would you recommend? Simply documenting which codes are critical and which are not?

Documenting known condition types is what we've done elsewhere, rather than introduce a subschema using stringly-typed meanings ("generically handle unknown conditions based on a type string prefix")

When a problem is resolved, is the practice to keep the condition around for a while in the "false" status and then delete it? Or should all possible conditions be displayed at all times?

It sort of depends on whether clients are expected to wait for definitive status of these conditions before taking action. For example, node conditions add outofdisk/outofmemory conditions, and the scheduler waits for those to be True or False (rather than missing/unknown) before making scheduling decisions.

We'll switch to ObjectReference, thanks.

Be aware that ObjectReference has lots of fields, most of which you probably don't want set and don't want to handle. Don't allow persisting values in fields which are unvalidated and ignored... at best, that is confusing, and at worst that is "time bomb" data that springs to life in the future if you start paying attention to some of those fields.

Fields like "Types []TypeSynchronizationSpec" are not strategically mergable so they won't work with apply. Consider adding a key field.

The combination of APIVersion and Kind are the key - is there any way to mark them as such? Or should we combine them into a single field (eg "v1/Secret" or "foo.example.com/v1alpha1/BarType")?

I don't think strategic merge patch can handle multi-field keys (cc @apelisse to verify), but strategic-merge-patch is also not supported on custom resources, so that doesn't really matter. Server-side apply (which is supported on custom resources) can handle multi-field keys by specifying multiple listMapKey directives. Example from service spec:

// +listType=map
// +listMapKey=containerPort
// +listMapKey=protocol

@adrianludwin
Copy link
Contributor

/cc @yiqigao217

@adrianludwin
Copy link
Contributor

Documenting known condition types is what we've done elsewhere, rather than introduce a subschema using stringly-typed meanings ("generically handle unknown conditions based on a type string prefix")

Thanks. The goal was less to have an automated client handle unknown conditions, and more so that humans would easily be able to understand them. But then again, the existing conditions are not very self-explanatory so simply documenting them (or putting a warning into the human-readable fields) seems sufficient.

When a problem is resolved, is the practice to keep the condition around for a while in the "false" status and then delete it? Or should all possible conditions be displayed at all times?

It sort of depends on whether clients are expected to wait for definitive status of these conditions before taking action. For example, node conditions add outofdisk/outofmemory conditions, and the scheduler waits for those to be True or False (rather than missing/unknown) before making scheduling decisions.

Ah, I see. In our case, this isn't relevant - if things are going well, no HNC object should have any conditions at all. So we can probably just specify that any condition that's present will have a status of true.

We'll switch to ObjectReference, thanks.

Be aware that ObjectReference has lots of fields, most of which you probably don't want set and don't want to handle...

I had a look at the API conventions doc (should have read that earlier) and it sounds like using a subset of ObjectReference is also acceptable? If so, we'll do that.

I don't think strategic merge patch can handle multi-field keys (cc @apelisse to verify), but strategic-merge-patch is also not supported on custom resources, so that doesn't really matter. Server-side apply (which is supported on custom resources) can handle multi-field keys by specifying multiple listMapKey directives...

Thanks, we'll look into that.

@apelisse
Copy link

Fields like "Types []TypeSynchronizationSpec" are not strategically mergable so they won't work with apply. Consider adding a key field.

The combination of APIVersion and Kind are the key - is there any way to mark them as such? Or should we combine them into a single field (eg "v1/Secret" or "foo.example.com/v1alpha1/BarType")?

I don't think strategic merge patch can handle multi-field keys (cc @apelisse to verify), but strategic-merge-patch is also not supported on custom resources, so that doesn't really matter. Server-side apply (which is supported on custom resources) can handle multi-field keys by specifying multiple listMapKey directives. Example from service spec:

// +listType=map
// +listMapKey=containerPort
// +listMapKey=protocol

Correct, strategic-merge-patch doesn't handle with multi-key fields, but server-side apply does.

Some other aspects are important:

  • How big are items of that list, do they have a lot of nested fields?
  • Are we sure that the GroupVersionKind are uniques?
  • Are they key fields required (and non-defaulted)?
  • Do you expect that list to be edited by different people/controller or is it generally fine to just replace the entire list?

@adrianludwin
Copy link
Contributor

Update: we had an API review last Friday with myself, @yiqigao217 , @rjbez17 , @mikedanese and @liggitt . Yiqi is currently writing up a design doc summarizing the decisions and changes we're going to make for v1alpha2.

@adrianludwin adrianludwin modified the milestones: hnc-v0.5, hnc-v0.6 Jul 1, 2020
@adrianludwin
Copy link
Contributor

Thanks @apelisse .

Correct, strategic-merge-patch doesn't handle with multi-key fields, but server-side apply does.

Some other aspects are important:

  • How big are items of that list, do they have a lot of nested fields?
  • Are we sure that the GroupVersionKind are uniques?
  • Are they key fields required (and non-defaulted)?
  • Do you expect that list to be edited by different people/controller or is it generally fine to just replace the entire list?

We want to support 1.15 at least for some time, which implies we can't rely on server-side apply, correct?

To answer your questions:

  • How big: very small, just a GVK (or something similar like group-resource) and a single enum-style string
  • Unique? Yes, each GVK can only have a single setting. In fact, this is all we're really trying to enforce.
  • Required/non-defaulted: Yes, they're all user-specified (in v1alpha2).
  • Different people/controllers: Never written by controllers (in v1alpha2), and generally only be an extremely small number of people. So I think replacing the entire list would generally be fine.

Really, the only thing we're trying to do here is enforce that we don't have duplication. Today, we do this in a webhook but since webhooks can be bypassed, we need a way to detect and report if there's a duplication. If we could express this uniqueness in a way that can't be bypassed (e.g. in a schema), that would be ideal.

@yiqigao217
Copy link
Contributor

Ah right, just confirming that "x-kubernetes-list-type" and "x-kubernetes-list-map-keys" are unknown fields in 1.15.

@liggitt
Copy link

liggitt commented Jul 6, 2020

Ah right, just confirming that "x-kubernetes-list-type" and "x-kubernetes-list-map-keys" are unknown fields in 1.15.

Those were added in 1.16 (kubernetes/kubernetes#77354) and made to enforce item uniqueness in 1.18 (kubernetes/kubernetes#84920)

@liggitt
Copy link

liggitt commented Jul 6, 2020

Note that using CRD defaulting requires use of apiextensions.k8s.io/v1 which is only available in 1.16

@yiqigao217
Copy link
Contributor

yiqigao217 commented Jul 8, 2020

Thanks for the suggestions and the api review on 6/26! This is the proposal for the api changes in v1alpha2 including the transition plan for you to review based on previous discussions. PTAL @liggitt @mikedanese @rjbez17

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 6, 2020
@yiqigao217
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 7, 2020
@adrianludwin adrianludwin modified the milestones: hnc-v0.6, HNC v0.7 Oct 16, 2020
@adrianludwin
Copy link
Contributor

@liggitt, @mikedanese - the v1alpha2 API is now implemented as agreed. Can we close this issue as being completed?

@liggitt
Copy link

liggitt commented Nov 4, 2020

sgtm

@adrianludwin
Copy link
Contributor

adrianludwin commented Nov 4, 2020 via email

@k8s-ci-robot
Copy link
Contributor

@adrianludwin: Closing this issue.

In response to this:

Thanks!
/close

On Wed, Nov 4, 2020 at 11:35 AM Jordan Liggitt notifications@github.com
wrote:

sgtm


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#748 (comment),
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AE43PZAEQF5VOQBTDFJOHULSOF7FJANCNFSM4NGJDEIQ
.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-review Categorizes an issue or PR as actively needing an API review. sig/auth Categorizes an issue or PR as relevant to SIG Auth.
Projects
None yet
Development

No branches or pull requests

8 participants