Add proposal: communication ingress flows matrix #1588

sabinaaledort · 2024-03-07T09:06:15Z

Communication ingress flows matrix of OpenShift and Operators: This enhancement allows automatically generate an accurate and up-to-date communication flows matrix that can be delivered to customers as part of product documentation for all ingress flows of OpenShift (multi-node and single node deployments) and Operators.

sabinaaledort · 2024-03-07T11:52:04Z

/cc @cybertron @trozet @wking @danwinship @msherif1234
Can you please review?

dhellmann · 2024-03-07T16:02:22Z

enhancements/network/communication-flows-matrix-ingress.md

+
+The communication matrix can be generated on single-node deployments.
+
+MicroShift is out of scope for this proposal.


MicroShift has the EndpointSlices API. Is there any reason the tool wouldn't work with MicroShift?

@yuvalk what do you think?

It seems like customers will eventually definitely want this functionality with MicroShift

enhancements/network/communication-flows-matrix-ingress.md

dhellmann · 2024-03-07T16:05:47Z

enhancements/network/communication-flows-matrix-ingress.md

+N/A
+
+## Version Skew Strategy
+N/A


Will the new command work with older clusters?

It should

@yuvalk what do you think?

it'll work on any version with endpointslice (which are stable since 1.21)
problem is with all the exceptions (ie listening ports that are not covered by an endpointslices)

dhellmann · 2024-03-07T16:06:45Z

enhancements/network/communication-flows-matrix-ingress.md

+  listening ports in a running cluster, `oc adm communication-matrix generate`.
+
+- A new option will be added to OpenShift web console to generate an up-to-date
+  communication matrix.


How will that be different from what the network observability features provide?

cc @stleerh

Agree with this... this seems to be the right approach to use. (Unless it is using that to gather the data and then this is just to make it easier to use in the UI?)

we are targeting a declarative approach whereas observability matrix is looking into what is actually running.
IMHO - what is not declared is "wrong" and should be blocked.
observability feature might be used to detect those and correct them but still need also a tool to extract the declared matrix

msherif1234 · 2024-03-12T15:57:30Z

enhancements/network/communication-flows-matrix-ingress.md

+
+- The admin uses the OpenShift command-line interface (CLI) to generate an up-to-date 
+  communication matrix using the following command `oc adm communication-matrix generate`.
+


even from CLI pov netobserv adding cli capability to dump flows (ingress and egress)
https://github.com/netobserv/network-observability-cli/blob/main/img/flow-table.png
so its not clear me either what additional info will communication-matrix option will bring ?

As far as I know the network observability features require installing an operator while we would like to introduce a lighter tool to focus on generating a communication matrix that can be easily applied in firewall rules, as well as delivering it as part of the product documentation.

in the cli mode we don't use operator

Ok I see that now, thank! checking

@sabinaaledort I think we should collaborate and see if this can be used to extend the current netobserv solution or not best to discuss this over slack in #forum-ocp-network-observability

even from CLI pov netobserv adding cli capability to dump flows (ingress and egress)
so its not clear me either what additional info will communication-matrix option will bring ?

AIUI, the netobserv tool shows current traffic, while the communication-matrix tool shows expected traffic.

You can't necessarily reliably generate a firewall based on the netobserv output, since you can't guarantee that all of the required-to-be-allowed flows will actually get used during any particular time period.

Also, the netobserv tool can't tell you what ports will need to be open in the next release. (Well, OK, neither can the communcation-matrix tool, but you can at least look at the "generic"/documented communication-matrix to find that.)

msherif1234 · 2024-03-13T11:01:19Z

enhancements/network/communication-flows-matrix-ingress.md

+```
+direction      Data flow direction (currently ingress only)
+protocol       IP protocol (TCP/UDP/etc)
+port           Flow port number


this is the service port right not the endpoint , also do u plan to suport SCTP` protocol ?
also for ingress firewall setting the rules might also require knowing the srcIP

That's the endpoint port
@liornoy can/do we support SCTP protocol? also you have a test to apply firewall rules right? does it require the srcIP?

so actually 3 questions here:

service port vs endpoint port - it's the endpoint / the one seen on the host network side

SCTP - theoretically can be included too, especially in the proposal. impl will focus on TCP/UDP in the first iteration. can add later

srcIP - there might be cases where you want to limit a certain service/flow to be accessible only from specific ip ranges. but that's mostly up to the customer. so short answer is for now it's binary yes/no

Supporting SCTP would only matter if we think OCP will ever use SCTP for internal communication (as opposed to merely supporting customer workloads that use SCTP). This seems extremely unlikely.

If the tool is to be used by customers to generate their matrix, they would like it to include their SCTP ports too.

for the one we are generating for OCP, you are right.

msherif1234 · 2024-04-15T11:12:03Z

enhancements/network/communication-flows-matrix-ingress.md

+```
+direction      Data flow direction (currently ingress only)
+protocol       IP protocol (TCP/UDP/SCTP/etc)
+port           Flow port number


since this ingress only why do you have direction in the config ?

We might add egress in the future, for now we think it is still a valuable field for a user analyzing the matrix

msherif1234 · 2024-04-15T11:14:05Z

enhancements/network/communication-flows-matrix-ingress.md

+    "protocol": "TCP",
+    "port": 51035,
+    "service": "rpc.statd",
+    "nodeRole": "master",


what will be the nodeRole for the SNO case ?

master&worker for SNO, mentioned here: https://github.com/openshift/enhancements/pull/1588/files/c86aa0f638b5ac2d90847f88dc25d53a46c5de4f#diff-625ec186439e36165d929fa423f20534a358828cb6c40d9ae7b2eeac66d9669fR142

knobunc

I think this has significant overlap with Network Observability (as several people have noted). But also with ACS. I would like to make sure we are not duplicating work here.

sabinaaledort · 2024-05-12T14:33:41Z

I think this has significant overlap with Network Observability (as several people have noted). But also with ACS. I would like to make sure we are not duplicating work here.

We met with the Network Observability team and discussed the possibility to integrate our module into the Network Observability CLI. This can be the next step. Currently our module is stored in openshift-kni https://github.com/openshift-kni/commatrix and we plan to release the first documented communication matrix in 4.16 openshift/openshift-docs#69720

msherif1234 · 2024-05-28T11:35:13Z

I think this has significant overlap with Network Observability (as several people have noted). But also with ACS. I would like to make sure we are not duplicating work here.

We met with the Network Observability team and discussed the possibility to integrate our module into the Network Observability CLI. This can be the next step. Currently our module is stored in openshift-kni https://github.com/openshift-kni/commatrix and we plan to release the first documented communication matrix in 4.16 openshift/openshift-docs#69720

correct we have a meeting to demonstrate the netobserv cli project and provided info about the repo and how to use it as possible future intg step

yuvalk · 2024-06-12T15:24:26Z

I think this has significant overlap with Network Observability (as several people have noted). But also with ACS. I would like to make sure we are not duplicating work here.

I dont think there's any overlap at the moment
and if there will be in the future, we can converge.

as you can see in the proposal, our plan is to use declarative information from EndpointSlices to create the matrix, which is very different from NetObserve. only for our CI, to make sure we are not missing anything, we are using evidence based, currently with ss commands, can certainly move to netobserve in the future there. especially if/when we'll start handling egress traffic too.

knobunc · 2024-06-24T15:03:05Z

enhancements/network/communication-flows-matrix-ingress.md

+- A communication matrix describing the expected flows of incoming traffic will 
+  be included in every OpenShift release documentation.
+
+- A new `oc` command will be added to generate a current snapshot of known 


Should this really be part of oc? And don't people want more than just a list of listening ports? Shouldn't we be saying why the port is open, and what it is used for?

oc so customers can generate a matrix from a running cluster, including their workloads.

it is a great idea, to extend EndpointSlices API so that they include a description too (with guidelines to document why port is open and what it is used for)

but meanwhile, we can cover that in the official doc

openshift-bot · 2024-07-23T01:15:23Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

yuvalk · 2024-07-23T01:28:55Z

/remove-lifecycle stale

openshift-bot · 2024-08-20T09:15:16Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

sabinaaledort · 2024-08-20T11:11:11Z

/remove-lifecycle stale

danwinship

Sorry, I missed this earlier...

danwinship · 2024-08-20T13:59:43Z

enhancements/network/communication-flows-matrix-ingress.md

+
+## Motivation
+
+Security-conscious customers need OpenShift flows matrix for regulatory reasons 


If we're providing this for customers to use "for regulatory reasons", then we need more information about exactly what that entails, so we can be sure this meets the regulations.

danwinship · 2024-08-20T14:00:38Z

enhancements/network/communication-flows-matrix-ingress.md

+
+Security-conscious customers need OpenShift flows matrix for regulatory reasons 
+and/or to implement firewall rules to restrict traffic to the minimum set of
+required flows only, on-node firewall or external.


If the overall goal is to enable customers to implement restrictive firewalls, then this is probably only one part of the full solution.

not exactly sure what you call "this" and what, in your mind is the full solution.
my vision is that we have a node level firewall that is enabled by default and adaptively configure itself based on the declerations that are applied. obviously that will require more machinery then what is described in this proposal.
we are incrementally building toward that future

I meant that if we are going to support customers who are doing aggressive firewalling, then we need to put a bunch more work into making sure that we don't break them. eg, this comment

danwinship · 2024-08-20T14:06:07Z

enhancements/network/communication-flows-matrix-ingress.md

+
+### Non-Goals
+
+- Egress traffic.


Why is this a non-goal? Do customers not care about firewalling egress traffic? Is this a future goal?

It's non-goal in this proposal, indeed a future goal and we are actively working on a research how egress traffic flows can be added to communication matrix

maybe better add it to the open questions, topics.
but we are actively working and thinking now about egress too (it's much newer work than this proposal, that been lingering for a long while)

danwinship · 2024-08-20T14:11:17Z

enhancements/network/communication-flows-matrix-ingress.md

+
+- The admin uses the OpenShift command-line interface (CLI) to generate an up-to-date 
+  communication matrix using the following command `oc adm communication-matrix generate`.
+


even from CLI pov netobserv adding cli capability to dump flows (ingress and egress)
so its not clear me either what additional info will communication-matrix option will bring ?

AIUI, the netobserv tool shows current traffic, while the communication-matrix tool shows expected traffic.

You can't necessarily reliably generate a firewall based on the netobserv output, since you can't guarantee that all of the required-to-be-allowed flows will actually get used during any particular time period.

Also, the netobserv tool can't tell you what ports will need to be open in the next release. (Well, OK, neither can the communcation-matrix tool, but you can at least look at the "generic"/documented communication-matrix to find that.)

danwinship · 2024-08-20T14:14:13Z

enhancements/network/communication-flows-matrix-ingress.md

+
+The communication matrix can be generated on single-node deployments.
+
+MicroShift is out of scope for this proposal.


It seems like customers will eventually definitely want this functionality with MicroShift

danwinship · 2024-08-20T14:30:24Z

enhancements/network/communication-flows-matrix-ingress.md

+
+Another test will be added to validate the ports in a generated communication 
+matrix match a snapshot of the node's listening ports (created with the Linux
+`ss` utility).


I think we also need a periodic e2e job that brings up a cluster, firewalls everything that isn't required by the communications matrix, and then does an upgrade; if the upgrade fails, that warns us that the matrix is failing to detect something.

danwinship · 2024-08-20T14:32:54Z

enhancements/network/communication-flows-matrix-ingress.md

+
+A user will be able to run `openshift-tests` or a new `oc` command, 
+`oc adm communication-matrix validate`, to validate the `EndpointSlices` 
+in the cluster match a current snapshot of the node's listening ports.


It's also important that a user be able to validate "the cluster won't break if I upgrade". IOW, they need to be able to prove that their current firewall rules are acceptable according to the matrix of the OCP release they want to upgrade to.

danwinship · 2024-08-20T15:04:45Z

enhancements/network/communication-flows-matrix-ingress.md

+`EndpointSlices` stand better for ingress traffic than for egress, in ingress traffic the 
+services are less dynamic and mostly stay up during the entire cluster lifetime.
+Supporting egress traffic might require changes in the API that should also be reviewed
+and agreed with the upstream Kubernetes community.


I don't really understand what you're saying here.

Are you just talking about the fact that, in general, we do not have Services/EndpointSlices pointing to the external IPs that we connect to?

Yes, I will rephrase

danwinship · 2024-08-20T15:25:56Z

enhancements/network/communication-flows-matrix-ingress.md

+
+3. The node hosted services such as sshd, rpc, etc. are currently missing 
+   `EndpointSlices`. We believe it should be created by the Machine API as 
+    part of adding/removing nodes. This can be addressed in later enhancement.


I think we need to try out the "firewall everything that isn't listed in the matrix and see what breaks" test, and that should help to answer these questions...

Note that the installer has a bunch of hardcoded rules for setting up networking on cloud platforms (eg, aws) which ideally would be autogenerated from some official list of requirements...

danwinship · 2024-08-20T15:30:28Z

enhancements/network/communication-flows-matrix-ingress.md

+  - Use the communication matrix to create and apply firewall rules, and 
+    run E2E `openshift-tests`


You can't expect openshift-tests to work on a firewalled cluster; many of the networking test cases in particular create HostNetwork pods and expect to be able to connect to them.

We could potentially run a subset of tests that are "firewall-safe". But I think doing an OCP version upgrade is probably a better overall test, since an OCP upgrade exercises a substantial amount of core functionality, and because upgrades are something we explicitly want to guarantee will work even with a firewall, so it's a good test case anyway.

danwinship · 2024-10-14T14:49:55Z

enhancements/network/communication-flows-matrix-ingress.md

+  - Use the communication matrix to create and apply firewall rules, and 
+    run E2E `openshift-tests` ([implemented](https://github.com/openshift-kni/commatrix/blob/main/test/e2e/commatrix_suite_test.go#L100))
+  - Use the communication matrix to create and apply firewall rules, and upgrade
+    OpenShift version


These aren't things that would be "added to openshift-tests"; they're both new jobs rather than additions to openshift-tests. (So they should be a new top-level bullet point, not under the openshift-tests bullet point.)

danwinship · 2024-10-14T14:51:37Z

enhancements/network/communication-flows-matrix-ingress.md

+### Drawbacks
+N/A
+
+## Open Questions


Add: "Right now the installer has its own hard-coded list of ports to open on different clouds. Can we eventually auto-generate this as well?"

danwinship · 2024-10-14T14:52:20Z

/approve
just two nitpicks; someone else can lgtm after that

openshift-ci · 2024-10-14T14:52:41Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~enhancements/network/OWNERS~~ [danwinship]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Communication ingress flows matrix of OpenShift and Operators: This enhancement allows automatically generate an accurate and up-to-date communication flows matrix that can be delivered to customers as part of product documentation for all ingress flows of OpenShift (multi-node and single node deployments) and Operators.

openshift-ci · 2024-10-15T09:30:45Z

@sabinaaledort: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

yuvalk · 2024-10-15T13:04:38Z

/lgtm

openshift-ci bot requested review from dougbtv and trozet March 7, 2024 09:06

sabinaaledort force-pushed the commatrix_ingress_proposal branch 3 times, most recently from 643355b to 231662e Compare March 7, 2024 10:49

openshift-ci bot requested review from cybertron, danwinship, msherif1234 and wking March 7, 2024 11:52

dhellmann reviewed Mar 7, 2024

View reviewed changes

msherif1234 reviewed Mar 12, 2024

View reviewed changes

msherif1234 reviewed Mar 13, 2024

View reviewed changes

sabinaaledort force-pushed the commatrix_ingress_proposal branch from 231662e to c86aa0f Compare April 9, 2024 11:46

msherif1234 reviewed Apr 15, 2024

View reviewed changes

knobunc reviewed May 9, 2024

View reviewed changes

knobunc reviewed Jun 24, 2024

View reviewed changes

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 23, 2024

openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 23, 2024

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 20, 2024

openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 20, 2024

danwinship reviewed Aug 20, 2024

View reviewed changes

sabinaaledort force-pushed the commatrix_ingress_proposal branch from c86aa0f to 1e422d7 Compare August 22, 2024 14:38

sabinaaledort force-pushed the commatrix_ingress_proposal branch from 1e422d7 to befa39f Compare September 25, 2024 12:41

danwinship reviewed Oct 14, 2024

View reviewed changes

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 14, 2024

sabinaaledort force-pushed the commatrix_ingress_proposal branch from befa39f to 2e9ecfa Compare October 15, 2024 09:16

sabinaaledort force-pushed the commatrix_ingress_proposal branch from 2e9ecfa to 2edf532 Compare October 15, 2024 09:18

openshift-ci bot assigned yuvalk Oct 15, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 15, 2024

openshift-merge-bot bot merged commit 96d8a57 into openshift:master Oct 15, 2024
2 checks passed


		The communication matrix can be generated on single-node deployments.

		MicroShift is out of scope for this proposal.


		- The admin uses the OpenShift command-line interface (CLI) to generate an up-to-date
		communication matrix using the following command `oc adm communication-matrix generate`.


		## Motivation

		Security-conscious customers need OpenShift flows matrix for regulatory reasons

		- Use the communication matrix to create and apply firewall rules, and
		run E2E `openshift-tests`

Add proposal: communication ingress flows matrix #1588

Add proposal: communication ingress flows matrix #1588

Conversation

sabinaaledort commented Mar 7, 2024

sabinaaledort commented Mar 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sabinaaledort Mar 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knobunc left a comment

Choose a reason for hiding this comment

sabinaaledort commented May 12, 2024

msherif1234 commented May 28, 2024

yuvalk commented Jun 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-bot commented Jul 23, 2024

yuvalk commented Jul 23, 2024

openshift-bot commented Aug 20, 2024

sabinaaledort commented Aug 20, 2024

danwinship left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danwinship commented Oct 14, 2024

openshift-ci bot commented Oct 14, 2024

openshift-ci bot commented Oct 15, 2024

yuvalk commented Oct 15, 2024

sabinaaledort Mar 12, 2024 •

edited

Loading