Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All-Ones netmask is not needed for ACLs or flow logging #4440

Merged
merged 1 commit into from
Nov 22, 2024

Conversation

milan-zededa
Copy link
Contributor

@milan-zededa milan-zededa commented Nov 15, 2024

The DHCP server deployed by EVE for Local Network Instances assigns IP addresses to applications with a /32 netmask (all-ones) and uses the Classless Static Route Option (RFC 3442) to configure static routes for the NI subnet. This setup enforces routing even for east-west (app-to-app) traffic, which would otherwise only be forwarded if the actual NI subnet mask (e.g., /24) was used.

This approach was historically implemented to prevent east-west traffic from bypassing ACLs and to ensure that connection tracking (conntrack) entries were created for flow logging purposes. However, it became unnecessary after we enabled the net.bridge.bridge-nf-call-ip(6)tables option, which ensures that even traffic forwarded by a bridge within EVE passes through iptables filtering and has conntrack entries created.

Using a /32 netmask now offers no added value and has some drawbacks. First, applications may use DHCP servers with the Classless Route option disabled, resulting in obtaining all-ones netmask with no reachable destinations due to missing connected routes. Second, enforcing routing adds extra packet processing steps for traffic that could be directly forwarded between applications, increasing overhead and reducing network performance.

We previously added an option to disable the all-ones netmask (while still keeping it enabled by default), but this has increased code complexity since it requires two distinct routing configurations to manage (and test). I propose removing the all-ones netmask configuration altogether to simplify the code and reduce packet processing overhead. While some may consider this a breaking change, I believe the change in the netmask should not impact applications as long as IP addresses are preserved and the overall routing/bridging functionality remains consistent across EVE upgrades (the set of reachable destinations does not change).

This PR is part of a series of network performance optimizations coming into EVE, see documentation (will be submitted in a separate PR)

@milan-zededa
Copy link
Contributor Author

milan-zededa commented Nov 15, 2024

@eriknordmark Do you remember what was the use-case of the /32 netmask given to applications? It does not seem to be needed for ACLs or flow logging.
Could we remove this or would change in netmask be considered a breaking change for some apps (the application IP and the set of reachable destinations will remain the same)?

@milan-zededa milan-zededa force-pushed the remove-all-ones-netmask branch from ded61d4 to 0cf9169 Compare November 15, 2024 15:23
@eriknordmark
Copy link
Contributor

The description in https://eve-os.readthedocs.io/docs/APP-CONNECTIVITY/ matches my memory and your description above. We did have to add the "disable" knob because some NFV applications didn't work with that setting.

Note that we need to communicate to users that the disable knob is no longer needed (and removed). Alternatively we can leave the knob in place and have it be a no-op. (I don't know if anybody notices the errors when a config is referring to a non-existant knob.)

The DHCP server deployed by EVE for Local Network Instances assigns
IP addresses to applications with a /32 netmask (all-ones) and uses
the Classless Static Route Option (RFC 3442) to configure static routes
for the NI subnet. This setup enforces routing even for east-west
(app-to-app) traffic, which would otherwise only be forwarded if
the actual NI subnet mask (e.g., /24) were used.

This approach was historically implemented to prevent east-west traffic
from bypassing ACLs and to ensure that connection tracking (conntrack)
entries were created for flow logging purposes. However, it became
unnecessary after we enabled the "net.bridge.bridge-nf-call-iptables"
option, which ensures that even traffic forwarded by a bridge within
EVE passes through iptables filtering and has conntrack entries created.

Using a /32 netmask now offers no added value and has some drawbacks.
First, applications may use DHCP servers with the Classless Route option
disabled, resulting in obtaining all-ones netmask with no reachable
destinations due to missing connected routes. Second, enforcing routing
adds extra packet processing steps for traffic that could be directly
forwarded between applications, increasing overhead and reducing network
performance.

We previously added an option to disable the all-ones netmask (while still
keeping it enabled by default), but this has increased code complexity
since it requires two distinct routing configurations to manage (and test).
I propose removing the all-ones netmask configuration altogether to simplify
the code and reduce packet processing overhead. While some may consider this
a breaking change, I believe the change in the netmask should not impact
applications as long as IP addresses are preserved and the overall
routing/bridging functionality remains consistent across EVE upgrades
(the set of reachable destinations does not change).

Signed-off-by: Milan Lenco <milan@zededa.com>
@milan-zededa milan-zededa force-pushed the remove-all-ones-netmask branch from 0cf9169 to a225170 Compare November 20, 2024 16:28
@milan-zededa
Copy link
Contributor Author

The description in https://eve-os.readthedocs.io/docs/APP-CONNECTIVITY/ matches my memory and your description above. We did have to add the "disable" knob because some NFV applications didn't work with that setting.

Note that we need to communicate to users that the disable knob is no longer needed (and removed). Alternatively we can leave the knob in place and have it be a no-op. (I don't know if anybody notices the errors when a config is referring to a non-existant knob.)

Good point, I didn't realize that we will publish error in ZInfoDevice.ConfigItemStatus when undefined config item is being used. I decided to leave the knob in place as suggested and just added comment that it no longer has any effect.

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put some text in CONFIG-PROPERTIES.md that the knob no longer needs to be set since we now work without the all-ones netmask in all cases but the knob remains to avoid errors if users are setting the knob.

Feel free to do that doc change as a separate PR and merge this one ahead if that makes sense.

@eriknordmark
Copy link
Contributor

@milan-zededa I see lots of failures in the Jenkins-run ztests for this, but that seems to be due to an earlier xen image having problems.

@eriknordmark eriknordmark merged commit a73c78d into lf-edge:master Nov 22, 2024
45 checks passed
@milan-zededa
Copy link
Contributor Author

Can you put some text in CONFIG-PROPERTIES.md that the knob no longer needs to be set since we now work without the all-ones netmask in all cases but the knob remains to avoid errors if users are setting the knob.

Feel free to do that doc change as a separate PR and merge this one ahead if that makes sense.

We do not have this knob documented in CONFIG-PROPERTIES.md. Should I still mention it there even though it no longer has any effect?

@eriknordmark
Copy link
Contributor

Can you put some text in CONFIG-PROPERTIES.md that the knob no longer needs to be set since we now work without the all-ones netmask in all cases but the knob remains to avoid errors if users are setting the knob.
Feel free to do that doc change as a separate PR and merge this one ahead if that makes sense.

We do not have this knob documented in CONFIG-PROPERTIES.md. Should I still mention it there even though it no longer has any effect?

Sorry, I forgot that we had intentionally left this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants