-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-2732 NetworkPolicy Versioning #2806
Conversation
Signed-off-by: Thomas F Herbert <therbert@redhat.com>
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Hi @tfherbert. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: tfherbert The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Welcome @tfherbert! |
The LF Contributor agreement has been signed as required. |
/ok-to-test |
Signed-off-by: Thomas F Herbert <therbert@redhat.com>
/retest |
@tfherbert try ./hack/update-toc.sh (If you didn't :) ) |
/retest |
Just how special/unusual is NetworkPolicy in this regard? Have any | ||
other SIGs already dealt with similar problems? Or if not are there | ||
any other APIs that are similarly failing to deal with the same | ||
problems? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other broadly similar challenges (in my opinion):
Ingress
, and what features the controller supports. Partly solved byIngressClass
.PodSecurityPolicy
(control mechanisms depend both on container runtime and on node operating system features; Windows and Linux have obviously different security mechanisms)
PodSecurityPolicy
is already deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sftim The language was cribbed from the earlier KEP proposal, KEP-2136. Will clean. up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this problem exists for any aspect of the system where the API and the implementation are not (roughly) lock-step. Ingress yes, but also Gateway. I am sure there are and will be more. We should be looking for a general pattern.
## Proposal | ||
|
||
### User Stories | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a first “story 0”:
as a developer, I want to find out if NetworkPolicy applies at all in my cluster. It's plausible that the API is present but the cluster network pays it no heed at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sftim That is a good question. The objective of this KEP should unify the community around network policy in a future version since there are differences in the features and scope of network policy across various CNIs. The implied issue is whether there would be an opt so "not asserted" would be an option in the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a first “story 0”:
as a developer, I want to find out if NetworkPolicy applies at all in my cluster.
That's sort of a special case of Story 3; for when the plugin doesn't implement any NetworkPolicy features
If a user is waiting to see the status of a newly-created NetworkPolicy, | ||
there is no entirely-reliable way to distinguish "the plugin has not yet | ||
set `status` but will soon" from "the plugin doesn't know about `status` | ||
and is never going to set it". | ||
|
||
It's not clear how big a problem this is, especially if we suggest that | ||
implementations should create an "empty" `status` right away if it's | ||
going to take them a while to determine the final `status`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubernetes has a deprecated ComponentStatus API.
If we brought back something similar but as an official CRD, it could allow CNI plugins to report their status including what policy is supported.
fully in effect. | ||
|
||
### Risks and Mitigations | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imagine that I perform a rolling replacement of all nodes, intending to apply what I think is a routine OS upgrade. After that upgrade, NetworkPolicy is no longer enforced. What happens to the .status
of existing NetworkPolicies?
My thoughts on addressing this: maybe a heartbeat mechanism using Lease, that activates when the first Lease appears in a relevant namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sftim That is a good question. We intentionally left out enforcing, so status without enforcing does seem strange. Maybe status should be in the future KEP that will deal with enforcing. See here for discussion: network-policy-api-thread
- Define rules for dealing with feature gates and alpha APIs in | ||
NetworkPolicy that work well with the 3-way versioning split. | ||
|
||
- Allow network plugins to indicate when a NetworkPolicy has been fully |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not put those 2 items related to status into different KEP? It looks like not related to versioning.
Just how special/unusual is NetworkPolicy in this regard? Have any | ||
other SIGs already dealt with similar problems? Or if not are there | ||
any other APIs that are similarly failing to deal with the same | ||
problems? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this problem exists for any aspect of the system where the API and the implementation are not (roughly) lock-step. Ingress yes, but also Gateway. I am sure there are and will be more. We should be looking for a general pattern.
<<[/UNRESOLVED]>> | ||
``` | ||
|
||
Somewhat related to this, there is currently no way for a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overlap with #2947 ?
Should these be split or merged?
NetworkPolicy that work well with the 3-way versioning split. | ||
|
||
- Allow network plugins to indicate when a NetworkPolicy has been fully | ||
"programmed" into the cluster network, to allow clients (including |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As before - I am not keen on this idea. Any sort of "all nodes agree" logic is super brittle. All it takes is one node going out to lunch, or worse one node to join the cluster and the consensus is broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, and this was part of the "Status" part of the original KEP, not the "Versioning" part anyway, so it doesn't really belong here
// conforms to the specified version. If it is not specified, the apiserver | ||
// will fill in the correct minVersion based on the features used by the policy. | ||
// +optional | ||
MinVersion NetworkPolicyVersion `json:"minVersion,omitempty" protobuf:"bytes,5,name=minVersion"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies that version is a linear feature-accumulator. I'm not convinced that is 100% accurate.
Suppose we start with version X. Then version Y adds "foo" support. Then version Z adds "bar" support.
If an implementation supports "bar' and not "foo" - is that allowed? At leas you can assert that if it knows about "bar" it should be aware of "foo". But are we intending to mandate 100% implementation? How can we enforce that?
An alternative might be a list of feature flags. E.g. "this policy uses: [ FeatureFoo, FeatureBar ]". Then controllers can warn "I support Bar, but not Foo". Of course, we have to get all policy imps to support the features-required field...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also say this is NOT something a user may set themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose we start with version X. Then version Y adds "foo" support. Then version Z adds "bar" support.
If an implementation supports "bar' and not "foo" - is that allowed? At least you can assert that if it knows about "bar" it should be aware of "foo". But are we intending to mandate 100% implementation? How can we enforce that?
The intention in my original KEP is that a network policy implementation has to recognize every feature up to the latest feature that it implements. So an implementation can support "bar" and not "foo", but it would be required to recognize when a policy was using "foo", and fail in an appropriate way. This is discussed more later, in the NetworkPolicyVersion
descriptions and the section "The Supported Condition"
When a user creates a NetworkPolicy, and is using a network plugin that | ||
implements this specification: | ||
|
||
- If the NetworkPolicy uses API fields which are not known to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this isn't true. kubectl does some validation but that can be bypassed and the apiserver drops unknown fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this was the big problem that led to the original KEP being abandoned
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
- Defining "metric-like" NetworkPolicy status information (eg, how long | ||
a particular rule took to implement). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, this is Status, not Versioning
of which plugins (and which versions of which plugins) are expected to | ||
correctly implement which NetworkPolicy features. | ||
|
||
#### Story 2 - More Reliable NetworkPolicy Test Cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is status, not versioning
#### Story 4 - Reporting NetworkPolicy Version | ||
|
||
As a network plugin developer, I want to verify that a CNI implements a specific | ||
NetworkPolicy version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this means. Related: "A CNI" is not a thing. There is only one CNI, and it is this. (However, assuming you meant "a network plugin", the story still does not make sense.)
implementations should create an "empty" `status` right away if it's | ||
going to take them a while to determine the final `status`. | ||
|
||
#### Distributed and Delegating NetworkPolicy Implementations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this is about status, not versioning. There's still the issue of "who actually updates the status to indicate what was supported", but the answer for the versioning-only case is pretty clearly "some centralized part of the network plugin, even if that component isn't actually involved in enforcing the network policy".
// conforms to the specified version. If it is not specified, the apiserver | ||
// will fill in the correct minVersion based on the features used by the policy. | ||
// +optional | ||
MinVersion NetworkPolicyVersion `json:"minVersion,omitempty" protobuf:"bytes,5,name=minVersion"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose we start with version X. Then version Y adds "foo" support. Then version Z adds "bar" support.
If an implementation supports "bar' and not "foo" - is that allowed? At least you can assert that if it knows about "bar" it should be aware of "foo". But are we intending to mandate 100% implementation? How can we enforce that?
The intention in my original KEP is that a network policy implementation has to recognize every feature up to the latest feature that it implements. So an implementation can support "bar" and not "foo", but it would be required to recognize when a policy was using "foo", and fail in an appropriate way. This is discussed more later, in the NetworkPolicyVersion
descriptions and the section "The Supported Condition"
// +optional | ||
// +patchMergeKey=type | ||
// +patchStrategy=merge | ||
Conditions []NetworkPolicyStatusCondition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type" protobuf:"bytes,3,rep,name=conditions"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR I used Conditions in the combined Status+Versioning KEP, but it's not necessarily clear that Conditions are right for just Versioning.
When a user creates a NetworkPolicy, and is using a network plugin that | ||
implements this specification: | ||
|
||
- If the NetworkPolicy uses API fields which are not known to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this was the big problem that led to the original KEP being abandoned
#### Other Conditions | ||
|
||
``` | ||
<<[UNRESOLVED matching-conditions ]>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignoring the question of whether this is a good idea, it feels more like Status than Versioning
Not sure why this is claiming to be "KEP-2732", as #2732 is something totally unrelated. |
#2136 was the original "NetworkPolicy versioning and status" enhancement so maybe this could take over that number, since the original PR for that was abandoned. But then, it's not clear if this is still in progress either... |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Adds a new KEP for Network Policy Versioning
-- The old PR submitted in Nov 2020 was closed: KEP-2136
-- This KEP template isn't fully filled out yet. Additional feedback would be helpful.
/cc @jayunit100 @aojea @rikatz @thockin @astoycos @rpkatz @danwinship @abhiraut