Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate NIReconciler with zedrouter #3213

Merged
merged 2 commits into from
Jun 5, 2023

Conversation

milan-zededa
Copy link
Contributor

This commit untangles the close coupling between zedrouter and Linux by replacing all remaining netlink and system calls with the calls to NIReconciler (which has generic interface and replaceable implementations).

Moreover, this commit contains a substantial cleanup of the zedrouter code-base - some of the source files had too many lines, some functions were way overcomplicated, had strange names, obsolete comments, used strings instead of net.IP/net.HardwareAddr, etc.

Lastly, VLANs for switch network instances are now configured by zedrouter and not domainmgr - where it does not belong.

This is the last and arguably the most difficult part of zedrouter refactoring. It was unfortunately not feasible to split this into more smaller PRs. For review, it is probably best to start with the updated documentation.

Copy link
Member

@uncleDecart uncleDecart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, does it make sense to try to test zedrouter?

api/proto/info/info.proto Outdated Show resolved Hide resolved
@milan-zededa milan-zededa force-pushed the nireconciler-partII branch from daacd1c to b186781 Compare May 17, 2023 08:33
@milan-zededa
Copy link
Contributor Author

Rebased to resolve merge conflicts with #3203

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of stuff to review here.
Are there any protobuf changes apart from the deprecation/removal of bridgeIPSets?
I think it makes sense doing putting that removal in a separate PR to get it out of the way (which also reduces the conflicts you're likely to have on the generated files).

For the other files, do you have a way to describe or otherwise facilitate seeing what is functionality which has moved to the reconciler but where the logic is unchanged as opposed to new logic and functionality?

A different way to ask the same question, what are the things you'd want the reviewers to look at more closely?

@milan-zededa
Copy link
Contributor Author

milan-zededa commented May 19, 2023

There is a lot of stuff to review here. Are there any protobuf changes apart from the deprecation/removal of bridgeIPSets? I think it makes sense doing putting that removal in a separate PR to get it out of the way (which also reduces the conflicts you're likely to have on the generated files).

For the other files, do you have a way to describe or otherwise facilitate seeing what is functionality which has moved to the reconciler but where the logic is unchanged as opposed to new logic and functionality?

A different way to ask the same question, what are the things you'd want the reviewers to look at more closely?

I have split the API change into a separate PR: #3225
Once it is merged I will rebase this one and fix merge conflicts (edit: Done).

Since refactoring leads to many changes and the resulting diff can be confusing, it is probably better to look at the refactored zedrouter in my fork.
I would recommend to start with the updated and extended documentation.

Here we initialize all components of zedrouter and pubsub channels.

Here is the main event loop of zedrouter.
In the main event loop we process pubsub messages, run config reconciliation from, handle detected IP assignments (and update published info), publish flow logs, etc.

This new file contains all methods used to validate config for network instances.

This new file contains all methods related to IP allocation (aka IPAM).

This new file is where all zedrouter's pubsub handlers are placed.
Those pubsub handlers call various methods for managing network instances (create, activate, deactivate, etc.) and methods for managing application networks (VIFs).

In the refactored zedrouter you will not find any netlink calls - this is now all in LinuxNIReconciler (as one and currently the only implementation of NIReconciler interface). Package cmd/zedrouter contains only generic handling of network instances, not tight to any particular network stack.

@milan-zededa milan-zededa force-pushed the nireconciler-partII branch 2 times, most recently from d23a3be to d9e8add Compare May 19, 2023 08:35
This commit untangles the close coupling between zedrouter and Linux
by replacing all remaining netlink and system calls with the calls
to NIReconciler (which has generic interface and replaceable
implementations).

Moreover, this commit contains a substantial cleanup of the zedrouter
code-base - some of the source files had too many lines, some functions
were way overcomplicated, had strange names, obsolete comments,
used strings instead of net.IP/net.HardwareAddr, etc.

Lastly, VLANs for switch network instances are now configured by
zedrouter and not domainmgr - where it does not belong.

This is the last and arguably the most difficult part of zedrouter
refactoring.

Signed-off-by: Milan Lenco <milan@zededa.com>
Some dependencies are no longer needed.

Signed-off-by: Milan Lenco <milan@zededa.com>
@milan-zededa milan-zededa force-pushed the nireconciler-partII branch from d9e8add to 6c43eb5 Compare May 22, 2023 07:22
@eriknordmark
Copy link
Contributor

I note that there are two unrelated yetus complains against code not modified by this PR which we can ignore.

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since @uncleDecart has reviewed this I think we can proceed with testing.
Let me know if we should add some more code reviewers.

@milan-zededa
Copy link
Contributor Author

milan-zededa commented Jun 5, 2023

From the testing results I can see that ZFS takes too much time to fit into 6 hours - soon this will be addressed by splitting eden workflow into multiple smaller ones, run in parallel.
Also we hit the infamous kernel crash that we get often and I still cannot figure out its source. Maybe with more powerful runners this will go away.
Other than that, tests haven't detected any issues introduced by this PR.

@eriknordmark eriknordmark merged commit 8795b00 into lf-edge:master Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants