-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proposal: communication ingress flows matrix #1588
Add proposal: communication ingress flows matrix #1588
Conversation
643355b
to
231662e
Compare
/cc @cybertron @trozet @wking @danwinship @msherif1234 |
|
||
The communication matrix can be generated on single-node deployments. | ||
|
||
MicroShift is out of scope for this proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MicroShift has the EndpointSlices API. Is there any reason the tool wouldn't work with MicroShift?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuvalk what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like customers will eventually definitely want this functionality with MicroShift
N/A | ||
|
||
## Version Skew Strategy | ||
N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the new command work with older clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should
@yuvalk what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'll work on any version with endpointslice (which are stable since 1.21)
problem is with all the exceptions (ie listening ports that are not covered by an endpointslices)
listening ports in a running cluster, `oc adm communication-matrix generate`. | ||
|
||
- A new option will be added to OpenShift web console to generate an up-to-date | ||
communication matrix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will that be different from what the network observability features provide?
cc @stleerh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with this... this seems to be the right approach to use. (Unless it is using that to gather the data and then this is just to make it easier to use in the UI?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are targeting a declarative approach whereas observability matrix is looking into what is actually running.
IMHO - what is not declared is "wrong" and should be blocked.
observability feature might be used to detect those and correct them but still need also a tool to extract the declared matrix
|
||
- The admin uses the OpenShift command-line interface (CLI) to generate an up-to-date | ||
communication matrix using the following command `oc adm communication-matrix generate`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even from CLI pov netobserv adding cli capability to dump flows (ingress and egress)
https://github.com/netobserv/network-observability-cli/blob/main/img/flow-table.png
so its not clear me either what additional info will communication-matrix
option will bring ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know the network observability features require installing an operator while we would like to introduce a lighter tool to focus on generating a communication matrix that can be easily applied in firewall rules, as well as delivering it as part of the product documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the cli mode we don't use operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see that now, thank! checking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sabinaaledort I think we should collaborate and see if this can be used to extend the current netobserv solution or not best to discuss this over slack in #forum-ocp-network-observability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even from CLI pov netobserv adding cli capability to dump flows (ingress and egress)
so its not clear me either what additional info willcommunication-matrix
option will bring ?
AIUI, the netobserv tool shows current traffic, while the communication-matrix tool shows expected traffic.
You can't necessarily reliably generate a firewall based on the netobserv output, since you can't guarantee that all of the required-to-be-allowed flows will actually get used during any particular time period.
Also, the netobserv tool can't tell you what ports will need to be open in the next release. (Well, OK, neither can the communcation-matrix tool, but you can at least look at the "generic"/documented communication-matrix to find that.)
``` | ||
direction Data flow direction (currently ingress only) | ||
protocol IP protocol (TCP/UDP/etc) | ||
port Flow port number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the service port right not the endpoint , also do u plan to suport SCTP` protocol ?
also for ingress firewall setting the rules might also require knowing the srcIP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the endpoint port
@liornoy can/do we support SCTP protocol? also you have a test to apply firewall rules right? does it require the srcIP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so actually 3 questions here:
- service port vs endpoint port - it's the endpoint / the one seen on the host network side
- SCTP - theoretically can be included too, especially in the proposal. impl will focus on TCP/UDP in the first iteration. can add later
- srcIP - there might be cases where you want to limit a certain service/flow to be accessible only from specific ip ranges. but that's mostly up to the customer. so short answer is for now it's binary yes/no
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supporting SCTP would only matter if we think OCP will ever use SCTP for internal communication (as opposed to merely supporting customer workloads that use SCTP). This seems extremely unlikely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the tool is to be used by customers to generate their matrix, they would like it to include their SCTP ports too.
for the one we are generating for OCP, you are right.
231662e
to
c86aa0f
Compare
``` | ||
direction Data flow direction (currently ingress only) | ||
protocol IP protocol (TCP/UDP/SCTP/etc) | ||
port Flow port number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this ingress
only why do you have direction
in the config ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might add egress in the future, for now we think it is still a valuable field for a user analyzing the matrix
"protocol": "TCP", | ||
"port": 51035, | ||
"service": "rpc.statd", | ||
"nodeRole": "master", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what will be the nodeRole for the SNO
case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this has significant overlap with Network Observability (as several people have noted). But also with ACS. I would like to make sure we are not duplicating work here.
We met with the Network Observability team and discussed the possibility to integrate our module into the Network Observability CLI. This can be the next step. Currently our module is stored in openshift-kni https://github.com/openshift-kni/commatrix and we plan to release the first documented communication matrix in 4.16 openshift/openshift-docs#69720 |
correct we have a meeting to demonstrate the netobserv cli project and provided info about the repo and how to use it as possible future intg step |
I dont think there's any overlap at the moment as you can see in the proposal, our plan is to use declarative information from EndpointSlices to create the matrix, which is very different from NetObserve. only for our CI, to make sure we are not missing anything, we are using evidence based, currently with |
- A communication matrix describing the expected flows of incoming traffic will | ||
be included in every OpenShift release documentation. | ||
|
||
- A new `oc` command will be added to generate a current snapshot of known |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this really be part of oc? And don't people want more than just a list of listening ports? Shouldn't we be saying why the port is open, and what it is used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oc so customers can generate a matrix from a running cluster, including their workloads.
it is a great idea, to extend EndpointSlices API so that they include a description too (with guidelines to document why port is open and what it is used for)
but meanwhile, we can cover that in the official doc
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed this earlier...
|
||
## Motivation | ||
|
||
Security-conscious customers need OpenShift flows matrix for regulatory reasons |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're providing this for customers to use "for regulatory reasons", then we need more information about exactly what that entails, so we can be sure this meets the regulations.
|
||
Security-conscious customers need OpenShift flows matrix for regulatory reasons | ||
and/or to implement firewall rules to restrict traffic to the minimum set of | ||
required flows only, on-node firewall or external. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the overall goal is to enable customers to implement restrictive firewalls, then this is probably only one part of the full solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not exactly sure what you call "this" and what, in your mind is the full solution.
my vision is that we have a node level firewall that is enabled by default and adaptively configure itself based on the declerations that are applied. obviously that will require more machinery then what is described in this proposal.
we are incrementally building toward that future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that if we are going to support customers who are doing aggressive firewalling, then we need to put a bunch more work into making sure that we don't break them. eg, this comment
|
||
### Non-Goals | ||
|
||
- Egress traffic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a non-goal? Do customers not care about firewalling egress traffic? Is this a future goal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's non-goal in this proposal, indeed a future goal and we are actively working on a research how egress traffic flows can be added to communication matrix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe better add it to the open questions, topics.
but we are actively working and thinking now about egress too (it's much newer work than this proposal, that been lingering for a long while)
|
||
- The admin uses the OpenShift command-line interface (CLI) to generate an up-to-date | ||
communication matrix using the following command `oc adm communication-matrix generate`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even from CLI pov netobserv adding cli capability to dump flows (ingress and egress)
so its not clear me either what additional info willcommunication-matrix
option will bring ?
AIUI, the netobserv tool shows current traffic, while the communication-matrix tool shows expected traffic.
You can't necessarily reliably generate a firewall based on the netobserv output, since you can't guarantee that all of the required-to-be-allowed flows will actually get used during any particular time period.
Also, the netobserv tool can't tell you what ports will need to be open in the next release. (Well, OK, neither can the communcation-matrix tool, but you can at least look at the "generic"/documented communication-matrix to find that.)
|
||
The communication matrix can be generated on single-node deployments. | ||
|
||
MicroShift is out of scope for this proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like customers will eventually definitely want this functionality with MicroShift
|
||
Another test will be added to validate the ports in a generated communication | ||
matrix match a snapshot of the node's listening ports (created with the Linux | ||
`ss` utility). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need a periodic e2e job that brings up a cluster, firewalls everything that isn't required by the communications matrix, and then does an upgrade; if the upgrade fails, that warns us that the matrix is failing to detect something.
|
||
A user will be able to run `openshift-tests` or a new `oc` command, | ||
`oc adm communication-matrix validate`, to validate the `EndpointSlices` | ||
in the cluster match a current snapshot of the node's listening ports. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also important that a user be able to validate "the cluster won't break if I upgrade". IOW, they need to be able to prove that their current firewall rules are acceptable according to the matrix of the OCP release they want to upgrade to.
`EndpointSlices` stand better for ingress traffic than for egress, in ingress traffic the | ||
services are less dynamic and mostly stay up during the entire cluster lifetime. | ||
Supporting egress traffic might require changes in the API that should also be reviewed | ||
and agreed with the upstream Kubernetes community. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand what you're saying here.
Are you just talking about the fact that, in general, we do not have Services/EndpointSlices pointing to the external IPs that we connect to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will rephrase
|
||
3. The node hosted services such as sshd, rpc, etc. are currently missing | ||
`EndpointSlices`. We believe it should be created by the Machine API as | ||
part of adding/removing nodes. This can be addressed in later enhancement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to try out the "firewall everything that isn't listed in the matrix and see what breaks" test, and that should help to answer these questions...
Note that the installer has a bunch of hardcoded rules for setting up networking on cloud platforms (eg, aws) which ideally would be autogenerated from some official list of requirements...
- Use the communication matrix to create and apply firewall rules, and | ||
run E2E `openshift-tests` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't expect openshift-tests
to work on a firewalled cluster; many of the networking test cases in particular create HostNetwork pods and expect to be able to connect to them.
We could potentially run a subset of tests that are "firewall-safe". But I think doing an OCP version upgrade is probably a better overall test, since an OCP upgrade exercises a substantial amount of core functionality, and because upgrades are something we explicitly want to guarantee will work even with a firewall, so it's a good test case anyway.
c86aa0f
to
1e422d7
Compare
1e422d7
to
befa39f
Compare
- Use the communication matrix to create and apply firewall rules, and | ||
run E2E `openshift-tests` ([implemented](https://github.com/openshift-kni/commatrix/blob/main/test/e2e/commatrix_suite_test.go#L100)) | ||
- Use the communication matrix to create and apply firewall rules, and upgrade | ||
OpenShift version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't things that would be "added to openshift-tests
"; they're both new jobs rather than additions to openshift-tests
. (So they should be a new top-level bullet point, not under the openshift-tests
bullet point.)
### Drawbacks | ||
N/A | ||
|
||
## Open Questions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add: "Right now the installer has its own hard-coded list of ports to open on different clouds. Can we eventually auto-generate this as well?"
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
befa39f
to
2e9ecfa
Compare
Communication ingress flows matrix of OpenShift and Operators: This enhancement allows automatically generate an accurate and up-to-date communication flows matrix that can be delivered to customers as part of product documentation for all ingress flows of OpenShift (multi-node and single node deployments) and Operators.
2e9ecfa
to
2edf532
Compare
@sabinaaledort: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/lgtm |
Communication ingress flows matrix of OpenShift and Operators: This enhancement allows automatically generate an accurate and up-to-date communication flows matrix that can be delivered to customers as part of product documentation for all ingress flows of OpenShift (multi-node and single node deployments) and Operators.