Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WireGuard tunnels for Multi-cluster traffic #4606

Merged
merged 1 commit into from
May 17, 2023

Conversation

hjiajing
Copy link
Contributor

@hjiajing hjiajing commented Feb 7, 2023

Add WireGuard tunnels mode for Multi-cluster traffic. WireGuard tunnels will be established between member clusters and responsible for all cross-cluster traffic in this mode.

Implementation:

  • When a Gateway is enqueued, the controller will initialize the WireGuard interface and route on the host. We use the first IP address of the Service CIDR of the cluster as the WireGuard interface's IP. For example, if the ServiceCIDR is 10.10.1.0/24, then the WireGuard interface's IP address is 10.10.1.0.
  • When a ClusterInfoImport is enqueued, the controller will try to update the WireGuard peer based on the ClusterInfoImport.

@codecov
Copy link

codecov bot commented Feb 7, 2023

Codecov Report

Merging #4606 (725f00e) into main (8eb5c8d) will decrease coverage by 0.20%.
The diff coverage is 9.41%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4606      +/-   ##
==========================================
- Coverage   69.79%   69.60%   -0.20%     
==========================================
  Files         402      403       +1     
  Lines       59570    59981     +411     
==========================================
+ Hits        41579    41747     +168     
- Misses      15189    15424     +235     
- Partials     2802     2810       +8     
Flag Coverage Δ *Carryforward flag
e2e-tests 38.38% <ø> (+<0.01%) ⬆️ Carriedforward from 8eb5c8d
integration-tests 34.36% <ø> (+0.03%) ⬆️ Carriedforward from 8eb5c8d
kind-e2e-tests 47.56% <9.41%> (+0.41%) ⬆️
unit-tests 59.75% <ø> (-0.02%) ⬇️ Carriedforward from 8eb5c8d

*This pull request uses carry forward flags. Click here to find out more.

Impacted Files Coverage Δ
cmd/antrea-agent/agent.go 0.00% <ø> (ø)
...trollers/multicluster/member/gateway_controller.go 80.55% <ø> (-2.53%) ⬇️
...gent/controller/noderoute/node_route_controller.go 67.07% <0.00%> (+0.15%) ⬆️
pkg/agent/multicluster/mc_route_controller.go 34.48% <0.00%> (-21.64%) ⬇️
pkg/agent/openflow/multicluster.go 94.40% <0.00%> (-0.77%) ⬇️
pkg/agent/openflow/pipeline.go 89.12% <ø> (-0.27%) ⬇️
pkg/agent/route/route_linux.go 69.48% <0.00%> (-1.59%) ⬇️
pkg/agent/route/route_windows.go 34.64% <ø> (ø)
pkg/agent/wireguard/client_windows.go 100.00% <ø> (ø)
pkg/agent/wireguard/client_linux.go 76.84% <62.06%> (-4.68%) ⬇️
... and 29 more

@luolanzone luolanzone added the area/multi-cluster Issues or PRs related to multi cluster. label Feb 7, 2023
@luolanzone
Copy link
Contributor

@hjiajing please update your PR summary with well-structure sentences, and also include summary in your top commit to help developer to check what does the commit do without checking the PR in the future.

@hjiajing hjiajing force-pushed the wireguard branch 2 times, most recently from 2d4d744 to 5ca2c28 Compare February 15, 2023 02:36
@hjiajing hjiajing force-pushed the wireguard branch 5 times, most recently from 08769ca to 2154901 Compare February 15, 2023 09:36
@luolanzone luolanzone added this to the Antrea v1.11 release milestone Feb 15, 2023
@hjiajing hjiajing force-pushed the wireguard branch 2 times, most recently from 014d4ae to 4ccdccc Compare February 15, 2023 17:42
cmd/antrea-agent/agent.go Outdated Show resolved Hide resolved
pkg/agent/config/node_config.go Outdated Show resolved Hide resolved
pkg/agent/config/node_config.go Outdated Show resolved Hide resolved
pkg/agent/config/node_config.go Outdated Show resolved Hide resolved
build/charts/antrea/values.yaml Show resolved Hide resolved
@@ -377,7 +377,8 @@ var DispositionToString = map[uint32]string{
var (
// snatPktMarkRange takes an 8-bit range of pkt_mark to store the ID of
// a SNAT IP. The bit range must match SNATIPMarkMask.
snatPktMarkRange = &binding.Range{0, 7}
snatPktMarkRange = &binding.Range{0, 7}
multiclusterPktMarkRange = &binding.Range{0, 31}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to add a comment for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need this range as the mark is constant. LoadPktMarkRange accepts nil as the second parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One bit is enough if the mark is constant, like 0x10/0x10.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

cmd/antrea-agent/agent.go Outdated Show resolved Hide resolved
pkg/agent/route/interfaces.go Outdated Show resolved Hide resolved
pkg/agent/multicluster/mc_route_controller.go Outdated Show resolved Hide resolved
pkg/agent/multicluster/mc_route_controller.go Outdated Show resolved Hide resolved
pkg/agent/openflow/multicluster.go Outdated Show resolved Hide resolved
pkg/agent/wireguard/interface.go Outdated Show resolved Hide resolved
pkg/agent/wireguard/client_linux.go Outdated Show resolved Hide resolved
@hjiajing hjiajing force-pushed the wireguard branch 6 times, most recently from e25e55c to f5fe28f Compare February 16, 2023 09:41
cmd/antrea-agent/agent.go Show resolved Hide resolved
pkg/agent/config/node_config.go Outdated Show resolved Hide resolved
@@ -377,7 +377,8 @@ var DispositionToString = map[uint32]string{
var (
// snatPktMarkRange takes an 8-bit range of pkt_mark to store the ID of
// a SNAT IP. The bit range must match SNATIPMarkMask.
snatPktMarkRange = &binding.Range{0, 7}
snatPktMarkRange = &binding.Range{0, 7}
multiclusterPktMarkRange = &binding.Range{0, 31}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need this range as the mark is constant. LoadPktMarkRange accepts nil as the second parameter.


// MulticlusterPktMark is the packet marker used for mark the cross-cluster
// traffic in Antrea Multi-cluster.
MulticlusterPktMark uint32 = 0x4d2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if 0x4d2 (1234 in decimal) is a meaningful value here. Egress allocates [1,255] for the pkt_mark. So in theory any value above 255 should be OK. But we might want to reservice more bits in case we want to extend the egress feature in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Just use one bit now.

Copy link
Contributor

@hongliangl hongliangl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add some description about how the traffic of a Pod is forwarded to another cluster via WireGuard.
A question about the design: why policy-based routing (ip rule, pkt mark and another route table) is used? Is there any possible to implement the target by adding static route in main route table? If we introduce policy-based routing in this PR, do you think we can use policy-based routing in other components of Antrea @jianjuns @tnqn ?

pkg/agent/openflow/multicluster.go Outdated Show resolved Hide resolved
@@ -377,7 +377,8 @@ var DispositionToString = map[uint32]string{
var (
// snatPktMarkRange takes an 8-bit range of pkt_mark to store the ID of
// a SNAT IP. The bit range must match SNATIPMarkMask.
snatPktMarkRange = &binding.Range{0, 7}
snatPktMarkRange = &binding.Range{0, 7}
multiclusterPktMarkRange = &binding.Range{0, 31}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One bit is enough if the mark is constant, like 0x10/0x10.

pkg/agent/route/interfaces.go Outdated Show resolved Hide resolved
pkg/agent/route/interfaces.go Outdated Show resolved Hide resolved
pkg/agent/route/interfaces.go Outdated Show resolved Hide resolved
pkg/agent/multicluster/mc_route_controller.go Outdated Show resolved Hide resolved
pkg/agent/multicluster/mc_route_controller.go Outdated Show resolved Hide resolved
}

func (c *MCRouteController) initializeWireGuardInterface() error {
c.wireGuardConfig.MTU = c.nodeConfig.NodeMTU - config.WireGuardOverhead - config.GeneveOverhead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why geneve overhead is included?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, @hjiajing you should check current tunnel type in NetworkConfig.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advice, will do it later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use NetworkConfig.MTUDeduction in Lan's PR #4407.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reminder, will rebase and fix it .

pkg/agent/route/route_windows.go Outdated Show resolved Hide resolved
@hjiajing hjiajing force-pushed the wireguard branch 3 times, most recently from 9017a57 to d979b4e Compare February 22, 2023 08:13
luolanzone
luolanzone previously approved these changes Apr 26, 2023
@luolanzone
Copy link
Contributor

/test-all

@luolanzone
Copy link
Contributor

@hjiajing could you check if possible to improve the patch test coverage?

@luolanzone luolanzone added the action/release-note Indicates a PR that should be included in release notes. label May 8, 2023
@luolanzone
Copy link
Contributor

@hjiajing please resolve the file conflict.

@hjiajing
Copy link
Contributor Author

/test-all

@luolanzone
Copy link
Contributor

/test-e2e
/test-multicluster-e2e
/test-networkpolicy

Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, a few nits on the test file. @jianjuns could you take a look again? thanks.

cmd/antrea-agent/agent.go Outdated Show resolved Hide resolved
multicluster/apis/multicluster/v1alpha1/gateway_types.go Outdated Show resolved Hide resolved
pkg/agent/multicluster/mc_route_controller.go Outdated Show resolved Hide resolved
pkg/agent/multicluster/mc_route_controller.go Outdated Show resolved Hide resolved
pkg/agent/multicluster/mc_route_controller.go Outdated Show resolved Hide resolved
pkg/agent/openflow/client.go Show resolved Hide resolved
LinkIndex: linkIndex,
}

return c.netlink.RouteDel(route)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should check the error is not NotFound, otherwise if the route doesn't exist somehow, the caller will keep retrying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A check is added, when got "no such process", which means the route does not exist, the function will return nil

pkg/agent/route/route_windows.go Outdated Show resolved Hide resolved
pkg/agent/route/route_windows.go Outdated Show resolved Hide resolved
build/charts/antrea/README.md Outdated Show resolved Hide resolved
@hjiajing hjiajing force-pushed the wireguard branch 2 times, most recently from e02018e to 0055bef Compare May 12, 2023 06:39
@hjiajing
Copy link
Contributor Author

@tnqns Hi Quan. I have addressed the comments you left, but after I pushed the latest code, some comments disappeared both in the conversation and Files changed, so I didn't reply every comment you left. Could you please take a look? Thanks.

pkg/agent/multicluster/mc_route_controller.go Outdated Show resolved Hide resolved
pkg/agent/multicluster/mc_route_controller.go Outdated Show resolved Hide resolved
pkg/agent/multicluster/mc_route_controller.go Outdated Show resolved Hide resolved
Comment on lines 366 to 368
remoteWireGuardIP := serviceIP.Mask(serviceCIDR.Mask)
remoteWireGuardNet := &net.IPNet{IP: remoteWireGuardIP, Mask: net.CIDRMask(32, 32)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the value is provided be user, not discovered by program?

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, some minor comments

pkg/agent/multicluster/mc_route_controller.go Outdated Show resolved Hide resolved
Comment on lines 656 to 645
_, serviceCIDR, err := net.ParseCIDR(spec.ServiceCIDR)
if err != nil {
klog.Warningf("Failed to get the peer Gateway IP for WireGuard, error: %v", err)
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can now remove the error handling of net.ParseCIDR assuming the CIDR in the spec is either empty or valid. We should validate input before it's accepted and no need to validate it again when consuming it.

func getLocalGatewayIP(gateway *mcv1alpha1.Gateway, enableWireGuard bool) net.IP {
if enableWireGuard {
if gateway.ServiceCIDR == "" {
klog.InfoS("The ServiceCIDR has not been updated, skip.", "Gateway", klog.KObj(gateway))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
klog.InfoS("The ServiceCIDR has not been updated, skip.", "Gateway", klog.KObj(gateway))
klog.InfoS("The ServiceCIDR of the Gateway has not been updated, skip it", "Gateway", klog.KObj(gateway))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 667 to 660
if err != nil {
klog.Warningf("Failed to get the local Gateway IP for WireGuard, error: %v", err)
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 678 to 682
if cache.Spec.ServiceCIDR == cur.Spec.ServiceCIDR && cache.Spec.WireGuard != nil && cur.Spec.WireGuard != nil &&
cache.Spec.WireGuard.PublicKey == cur.Spec.WireGuard.PublicKey {
return false
}
return true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should return false when both old WireGuard and new WireGuard are nil.

Suggested change
if cache.Spec.ServiceCIDR == cur.Spec.ServiceCIDR && cache.Spec.WireGuard != nil && cur.Spec.WireGuard != nil &&
cache.Spec.WireGuard.PublicKey == cur.Spec.WireGuard.PublicKey {
return false
}
return true
if cache.Spec.ServiceCIDR != cur.Spec.ServiceCIDR {
return true
}
if cache.Spec.WireGuard == nil && cur.Spec.WireGuard == nil {
return false
}
if cache.Spec.WireGuard == nil || cur.Spec.WireGuard == nil {
return true
}
return cache.Spec.WireGuard.PublicKey != cur.Spec.WireGuard.PublicKey

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

func (c *Client) DeleteRouteForLink(dstCIDR *net.IPNet, linkIndex int) error {
return errors.New("AddRouteForLink is not implemented on Windows")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return errors.New("AddRouteForLink is not implemented on Windows")
return errors.New("DeleteRouteForLink is not implemented on Windows")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Add a traffic encryption mode for cross-cluster traffic. If WireGuard enabled,
corresponding WireGuard configuration will be created on the Gateway
Node. And all cross-cluster traffic will go through the WireGuard tunnel
to remote Gateway.

Signed-off-by: hjiajing <hjiajing@vmware.com>
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented May 16, 2023

@jianjuns @luolanzone do you have other comments for this one?

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianjuns @luolanzone do you have other comments for this one?

I am good.

@tnqn
Copy link
Member

tnqn commented May 17, 2023

/test-multicluster-e2e
/test-all

@tnqn
Copy link
Member

tnqn commented May 17, 2023

/test-multicluster-e2e

@tnqn tnqn merged commit 45a06d7 into antrea-io:main May 17, 2023
ceclinux pushed a commit to ceclinux/antrea that referenced this pull request Jun 5, 2023
…ntrea-io#4606)

Add a traffic encryption mode for cross-cluster traffic. If WireGuard is
enabled, corresponding WireGuard configuration will be created on the
Gateway Node. And all cross-cluster traffic will go through the
WireGuard tunnel to remote Gateway.

Signed-off-by: hjiajing <hjiajing@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes. area/multi-cluster Issues or PRs related to multi cluster.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Antrea Multi-cluster Service with security tunnel between Clusters
6 participants