Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Antrea secondary network implementation for SRIOV #2651

Merged

Conversation

arunvelayutham
Copy link
Contributor

Implement secondary network for Antrea CNI configured pods using SRIOV interface. (CNF use case)

Signed-off-by: root arunkumar.velayutham@intel.com

@arunvelayutham arunvelayutham marked this pull request as draft August 26, 2021 07:00
@@ -3246,6 +3246,11 @@ metadata:
app: antrea
name: antrea-agent
rules:
- apiGroups: ["k8s.cni.cncf.io"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antoninbas , @jianjuns Yaml file patch using generate-manifest.sh is not working good for RBAC changes. I would like to discuss with both of you to see how we can handle the antea.yml changes required for k8s.cni.cncf.io apiGroups access (required for net-attach-def access from client side)

@@ -0,0 +1,12 @@
apiVersion: rbac.authorization.k8s.io/v1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch needs review and get it addressed as in a way to patch properly at antrea.yml. (As of now, it replaces entire RBAC section for antrea-agent).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this doesn't need to be done conditionally though patch, we can give the required permissions (to access Network Attachment Definitions) to the Agent by modifying: https://github.com/antrea-io/antrea/blob/main/build/yamls/base/agent-rbac.yml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. thanks for the suggestion. taken care.


set -u -e

# Inspired by: https://github.com/intel/multus-cni/blob/83556f49bd6706a885eda847210b542669279cd0/images/entrypoint.sh#L161-L222
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required to install kubeconfig for whereabouts to talk to API server. this script will be called as part of init_container scope to be installed at the defined location (as per whereabouts requirements).
NOTE: Script copied/taken from whereabouts-cni repo.

@@ -309,6 +322,30 @@ func run(o *Options) error {

go networkPolicyController.Run(stopCh)

if features.DefaultFeatureGate.Enabled(features.SecondaryInterfaceSupport) {
//TBD: Initialize Secondary Network Infra here.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secondary network infra code changes will be merged separate (which is responsible to create the OVS bridges and config - not required for SRIOV based secondary network config).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this block to a separate func.

// See the License for the specific language governing permissions and
// limitations under the License.

package cnipodcache
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CNI cacheStore implementation to cache CNIserver information of a pod, which is required to be used by podwatch pkg while configuring secondary network

// See the License for the specific language governing permissions and
// limitations under the License.

package ipamglobal
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global IPAM implementation to interface with whereabouts-cni config and assign Cluster wide unique IP address (maintained by whereabouts at api server or etcd - configurable)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just name the package "ipam", and the file ipam_delegator.go? I feel no need to emphasize on "global".

@@ -105,7 +107,7 @@ type CNIServer struct {
hostProcPathPrefix string
kubeClient clientset.Interface
containerAccess *containerAccessArbitrator
podConfigurator *podConfigurator
PodConfigurator *podConfigurator
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a temporary/interim change until we redo the cniserver package and split them into 3 different pkg, make it reusable for primary and secondary interface needs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid touch too many code, could you add a func like GetPodConfigurator() to CNIServer instead?

@@ -461,6 +463,15 @@ func (s *CNIServer) CmdAdd(ctx context.Context, request *cnipb.CniCmdRequest) (*
klog.Infof("CmdAdd for container %v succeeded", cniConfig.ContainerId)
// mark success as true to avoid rollback
success = true

if features.DefaultFeatureGate.Enabled(features.SecondaryInterfaceSupport) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is where we cache CNI server provided pod information into cnipodcache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add and pass a boolean like secInterfaceEnabled to CNIServer, instead of checking the feature gate inside the CNIServer code? I would avoid featuregate in all code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or based on whether the cniPodInfoStore argument passed to Initialize() is nil, we can save the value to a boolean flag of CNIServer struct, like cacheCNIPodInfo, and check s.cacheCNIPodInfo here.

@arunvelayutham arunvelayutham force-pushed the antrea-secondary-network-sriov-support branch from c977d6f to ec21933 Compare August 26, 2021 07:57
@codecov-commenter
Copy link

codecov-commenter commented Aug 26, 2021

Codecov Report

Merging #2651 (9aa489e) into main (0f5df48) will decrease coverage by 8.85%.
The diff coverage is 5.12%.

❗ Current head 9aa489e differs from pull request most recent head 7e0a14e. Consider uploading reports for the commit 7e0a14e to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2651      +/-   ##
==========================================
- Coverage   59.23%   50.37%   -8.86%     
==========================================
  Files         302      300       -2     
  Lines       25873    35722    +9849     
==========================================
+ Hits        15325    17994    +2669     
- Misses       8892    15993    +7101     
- Partials     1656     1735      +79     
Flag Coverage Δ
e2e-tests 50.37% <5.12%> (?)
kind-e2e-tests ?
unit-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/cniserver/sriov.go 0.00% <0.00%> (ø)
pkg/agent/cniserver/sriov_linux.go 0.00% <0.00%> (ø)
pkg/agent/config/node_config.go 88.88% <ø> (-11.12%) ⬇️
pkg/agent/secondarynetwork/cnipodcache/cache.go 0.00% <0.00%> (ø)
pkg/agent/secondarynetwork/ipam/ipam_delegator.go 0.00% <0.00%> (ø)
pkg/agent/secondarynetwork/podwatch/controller.go 0.00% <0.00%> (ø)
pkg/features/antrea_features.go 22.22% <ø> (+5.55%) ⬆️
pkg/util/k8s/client.go 36.36% <0.00%> (-14.86%) ⬇️
pkg/agent/cniserver/server.go 40.58% <19.23%> (-28.31%) ⬇️
pkg/agent/route/route_linux.go 29.91% <54.83%> (-16.30%) ⬇️
... and 304 more

@arunvelayutham arunvelayutham force-pushed the antrea-secondary-network-sriov-support branch from ec21933 to 42f0692 Compare August 26, 2021 20:48
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial comments

@@ -0,0 +1,12 @@
apiVersion: rbac.authorization.k8s.io/v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this doesn't need to be done conditionally though patch, we can give the required permissions (to access Network Attachment Definitions) to the Agent by modifying: https://github.com/antrea-io/antrea/blob/main/build/yamls/base/agent-rbac.yml

Comment on lines 10 to 12
- '*'
verbs:
- '*'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should specify just the resource type we need (network-attachment-definition) and just the permissions / verbs we need

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, Taken care (I used "*" only for dev needs to ensure that initial agent able to fetch the net-attach-def CR info from APIserver).

WHEREABOUTS_KUBECONFIG_LITERAL=$(echo "$WHEREABOUTS_KUBECONFIG" | sed -e s'|/host||')

# ------------------------------- Generate a "kube-config"
SERVICE_ACCOUNT_PATH=/var/run/secrets/kubernetes.io/serviceaccount
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will use the antrea-agent Pod's ServiceAccount, which has a lot of permissions. We should restrict the whereabouts permissions to what's actually needed (access to resources in the whereabouts.cni.cncf.io API group). Maybe @tnqn will have a suggestion on what's the best way to generate and install that Kubeconfig file (which is consumed by the whereabouts plugin).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet to take care on this concern and I fully agree with the suggestion. found new issue with whereabouts IP config and working on further to debug and address it.

if features.DefaultFeatureGate.Enabled(features.SecondaryInterfaceSupport) {
//TBD: Initialize Secondary Network Infra here.

//Create NetworkAttachmentDefinitions client handle to access secondary network object definition from API Server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we typically leave a space between the // and the actual comment

informerFactory,
nodeConfig.Name,
cniPodInfoStore,
cniServer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should change that, there should be no need for the "pod watcher" to access the instance of the cniServer. I believe you are already working on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes certainly. I'm using it as temporary solution for this PR and I will take care of this as part of CNIserver pkg refactor PR.

return errors.New("Secondary network annotation parsing failed")
} else {
for _, netinfo := range networklist {
klog.Infof("Secondary Network Information: %+v", netinfo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have transitioned to structured logging function (klog.InfoS, klog.ErrorS) for all new logs. Please update the PR accordingly. You can find some examples in the rest of the PR.

@@ -0,0 +1,80 @@
// Copyright 2021 Antrea Authors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to suggest a different package name instead of pkg/agent/cnf. Maybe pkg/agent/secondarynetworks?

cc @jianjuns

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Secondary interfaces might be used for other cases besides CNF.

How about secondaryinterfaces?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also feel we can just name this file "cache.go", as it is already in the "cnipodcache" package.

//If container information is not available yet in the CNICache, return error to requeue work
podCniInfo := podWatcher.podCache.GetCniConfigInfoPerPod(pod.Name, pod.Namespace)
if len(podCniInfo) == 0 {
return errors.New("CNI Cache Information not found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will hit this "normal" case pretty often since there will be some delay between the Pod object being created in the k8s API and the CNI being invoked. We should avoid logging this error as it will be misleading to the user, but we also need to ensure that the Pod is added back to the workqueue so that there can be a retry later.

//TBD after secondary network specific OVS init code merge.
klog.Errorf("VethPair not supported. Failed to configure interfaces for container %s: %v", containerInfo.ContainerId, err)
}
} //containerInfo loop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't typically include such comments in the Antrea codebase. If there is too much indentation, it would probably make sense to move that logic to its own function.

FlowExporter: {Default: false, PreRelease: featuregate.Alpha},
NetworkPolicyStats: {Default: true, PreRelease: featuregate.Beta},
NodePortLocal: {Default: false, PreRelease: featuregate.Alpha},
SecondaryInterfaceSupport: {Default: false, PreRelease: featuregate.Alpha},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest SecondaryNetworks as the Feature Gate name
In any case "Support" is not needed, it's implied for Feature Gates

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about SecondaryInterfaces? I just feel it is more about creating secondary interfaces, and we not necessarily handle networks.

}

//Add cniPodInfo to local cache store.
func (cni *cniPodInfoCache) AddCniConfigInfo(cniConfig *CNIConfigInfo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cni -> CNI. We use all upper case for abbrs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change all occurrences.

pkg/agent/cnf/podwatch/k8s/podwatch_controller.go Outdated Show resolved Hide resolved
@arunvelayutham arunvelayutham marked this pull request as ready for review August 26, 2021 23:07
}

} else {
err = cniServer.Initialize(ovsBridgeClient, ofClient, ifaceStore, entityUpdates)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can reuse the same Initialize() func, but based on whether the cniPodInfoStore argument is nil, the func can know it needs to initialize CNI cache or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

introduce another argument (and passing it nil for non secondary interface flow) impact significant of cniserver specific existing test code (also calls these Initialize function). To avoid additional changes, I spilt it separate function with cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not get why that is a problem. You can just pass false/nil for existing callers inc. test code. Why it is significant impact?

if err != nil {
return fmt.Errorf("error initializing CNI server: %v", err)

cniPodInfoStore := cnipodcache.NewCniPodInfoStore()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably assign cniPodInfoStore to cnipodcache.NewCniPodInfoStore inside the "if features.DefaultFeatureGate.Enabled(features.SecondaryInterfaceSupport)" block.

@@ -309,6 +322,30 @@ func run(o *Options) error {

go networkPolicyController.Run(stopCh)

if features.DefaultFeatureGate.Enabled(features.SecondaryInterfaceSupport) {
//TBD: Initialize Secondary Network Infra here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this block to a separate func.

}

//Add cniPodInfo to local cache store.
func (cni *cniPodInfoCache) AddCniConfigInfo(cniConfig *CNIConfigInfo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change all occurrences.

@@ -0,0 +1,80 @@
// Copyright 2021 Antrea Authors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Secondary interfaces might be used for other cases besides CNF.

How about secondaryinterfaces?

@@ -0,0 +1,59 @@
// Copyright 2021 Antrea Authors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel no need to add a separate file, and the func can be in podwatch_controller.go. But no strong opinion.

@@ -105,7 +107,7 @@ type CNIServer struct {
hostProcPathPrefix string
kubeClient clientset.Interface
containerAccess *containerAccessArbitrator
podConfigurator *podConfigurator
PodConfigurator *podConfigurator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid touch too many code, could you add a func like GetPodConfigurator() to CNIServer instead?

@@ -461,6 +463,15 @@ func (s *CNIServer) CmdAdd(ctx context.Context, request *cnipb.CniCmdRequest) (*
klog.Infof("CmdAdd for container %v succeeded", cniConfig.ContainerId)
// mark success as true to avoid rollback
success = true

if features.DefaultFeatureGate.Enabled(features.SecondaryInterfaceSupport) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add and pass a boolean like secInterfaceEnabled to CNIServer, instead of checking the feature gate inside the CNIServer code? I would avoid featuregate in all code.

cniPodInfoStore cnipodcache.CniPodInfoStore,
) error {
var err error
s.PodConfigurator, err = newPodConfiguratorWithCniCache(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, could we just add cniPodInfoStore argument to newPodConfigurator() instead of adding new func with code duplication?

FlowExporter: {Default: false, PreRelease: featuregate.Alpha},
NetworkPolicyStats: {Default: true, PreRelease: featuregate.Beta},
NodePortLocal: {Default: false, PreRelease: featuregate.Alpha},
SecondaryInterfaceSupport: {Default: false, PreRelease: featuregate.Alpha},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about SecondaryInterfaces? I just feel it is more about creating secondary interfaces, and we not necessarily handle networks.

@arunvelayutham arunvelayutham force-pushed the antrea-secondary-network-sriov-support branch 2 times, most recently from c0b4929 to 658c301 Compare September 2, 2021 04:48
WHEREABOUTS_KUBECONFIG_LITERAL=$(echo "$WHEREABOUTS_KUBECONFIG" | sed -e s'|/host||')

# ------------------------------- Generate a "kube-config"
SERVICE_ACCOUNT_PATH=/var/run/secrets/kubernetes.io/serviceaccount
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet to take care on this concern and I fully agree with the suggestion. found new issue with whereabouts IP config and working on further to debug and address it.

@@ -3393,6 +3393,27 @@ rules:
- get
- watch
- list
- apiGroups:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make manifest (after updating base rbac file) updated all the antrea-*.yml file. But, not sure if we really wanted to enable secondarynetwork config on all the variants?. if not, I can remove these changes.

@@ -0,0 +1,12 @@
apiVersion: rbac.authorization.k8s.io/v1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. thanks for the suggestion. taken care.

Comment on lines 10 to 12
- '*'
verbs:
- '*'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, Taken care (I used "*" only for dev needs to ensure that initial agent able to fetch the net-attach-def CR info from APIserver).

}

} else {
err = cniServer.Initialize(ovsBridgeClient, ofClient, ifaceStore, entityUpdates)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

introduce another argument (and passing it nil for non secondary interface flow) impact significant of cniserver specific existing test code (also calls these Initialize function). To avoid additional changes, I spilt it separate function with cache.

informerFactory,
nodeConfig.Name,
cniPodInfoStore,
cniServer)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes certainly. I'm using it as temporary solution for this PR and I will take care of this as part of CNIserver pkg refactor PR.

@@ -461,6 +468,15 @@ func (s *CNIServer) CmdAdd(ctx context.Context, request *cnipb.CniCmdRequest) (*
klog.Infof("CmdAdd for container %v succeeded", cniConfig.ContainerId)
// mark success as true to avoid rollback
success = true

if s.secNetworkEnabled {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took care as suggested by @jianjuns (do not use feature gate all over the code. used bool variable to decide whether we need to cache the cni info or not)

@@ -0,0 +1,59 @@
// Copyright 2021 Antrea Authors
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianjuns separate file for init was introduced based on secondary network infra (OVS) init needs. I will revisit on this file needs while pulling in the OVS bridge init changes in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it should not be called "podwatch_controller_init.go", but maybe just "init.go" under "/secondarynetwork"? For now, I still suggest to move the code to "controller.go".

FlowExporter: {Default: false, PreRelease: featuregate.Alpha},
NetworkPolicyStats: {Default: true, PreRelease: featuregate.Beta},
NodePortLocal: {Default: false, PreRelease: featuregate.Alpha},
SecondaryNetwork: {Default: true, PreRelease: featuregate.Alpha},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed as SecondaryNetwork (more relevant to state it as network instead of interface, in my opinion - as we do configure secondary network to address CNF use cases with this feature gate)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be SecondaryNetworks...

@arunvelayutham arunvelayutham force-pushed the antrea-secondary-network-sriov-support branch from 658c301 to a9e2b0a Compare September 8, 2021 05:28
@@ -309,6 +322,30 @@ func run(o *Options) error {

go networkPolicyController.Run(stopCh)

if features.DefaultFeatureGate.Enabled(features.SecondaryNetwork) {
// TBD: Initialize Secondary Network Infra here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, suggest to add a func for this block.

return nil
}

//get POD/Container information (container id, namespace, etc.,) from CNI info cache (added by CNI Server CmdAdd upon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a space after "//".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

POD -> Pod

}

//get POD/Container information (container id, namespace, etc.,) from CNI info cache (added by CNI Server CmdAdd upon
//successful pod and primary network creation).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pod -> Pod

var ret bool
pod := obj.(*corev1.Pod)

secondaryNetwork, ret := checkForPodSecondaryNetworkAttachement(pod)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case, you missed the previous comments:

You can check the old version of the Pod obj to know if the CNF annotation got removed from the Pod.
Check: https://github.com/antrea-io/antrea/blob/main/pkg/agent/nodeportlocal/k8s/npl_controller.go#L211

networklist, err := parsePodSecondaryNetworkAnnotation(secondaryNetwork)
if err != nil {
return errors.New("Secondary network annotation parsing failed")
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you can remove the "else {}".

for _, netinfo := range networklist {
klog.InfoS("Secondary Network Information:", netinfo)
if len(netinfo.NetworkName) > 0 {
netDefCRD, err := podWatcher.netAttachDefClient.NetworkAttachmentDefinitions(pod.Namespace).Get(context.TODO(), netinfo.NetworkName, metav1.GetOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not support changes of NetworkAttachmentDefinition, or NetworkAttachmentDefinition created after Pods?

for _, containerInfo := range podCNIInfo {
cmdArgs = &invoke.Args{Command: string("ADD"), ContainerID: containerInfo.ContainerId,
NetNS: containerInfo.ContainerNetNS, IfName: containerInfo.ContainerIfDev,
Path: string("/opt/cni/bin/")}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define a constant for this path.

podSriovVFDeviceID, err := cniserver.GetPodContainerDeviceIDs(pod.Name, pod.Namespace)
if err != nil {
klog.ErrorS(err, "handleAddUpdatePod: GetPodContainerDeviceIDs failed for ", pod.Name, pod.Namespace)
//return nil here. we do not want to retry/requeue work if VF id fetch fails (user to check if VFs are created at node).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a space after //.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we -> We

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node -> Node


ipamResult, err = ipam.GetIPAMGlobalAddress(cniconfig, cmdArgs)
if err != nil {
return errors.New("Secondary network global IPAM failed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question - how Multus handles such errors. Does it report the failure back to users via some ways (e.g. annotation, events, etc.)?

)

const (
ipamGlobal = "whereabouts"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove "Global" from (variable, func) names? It will be confusing, as we will support IPAM in Antrea too (see here: #2740).

If you think that is better, we can call it ipamWhereabouts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RawExec: &invoke.RawExec{Stderr: os.Stderr},
}

func delegateCommon(delegatePlugin string, exec invoke.Exec, cniPath string) (string, invoke.Exec, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is a fair amount of code duplication with /agent/cniserver/ipam/ipam_delegator.go, is there an option to reuse the code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

Copy link
Contributor Author

@arunvelayutham arunvelayutham Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies for the delayed reply on this comment (though I communicate with Jianjun/Antonin over Slack on the inputs for their comments).
The reason I didn't reuse the existing implementation for ipam was due to,

  1. Antonin and Jianjun suggested to keep the secondary interface changes as isolated as feasible.
  2. during initial development, I thought we will have more to be done w.r.t whereabouts CNI parsing the retrieved IPAM context from networkAttachementDefinition. But, that wasn't the case when I finalized the implementation.

Indeed, I see that we would be able to reuse the IPAM package with changes/modifications to the existing implementation. If this is fine, I can get it addressed asap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, from what I saw a few funcs here are exactly the same as those in cniserver/ipam, so probably we should reuse them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you plan to refactor this part and share code with /agent/cniserver/ipam/ipam_delegator.go in a later PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes @jianjuns , @annakhm I will do this as separate PR. Need to ensure that the existing IPAM code refactor will not impact primary interface IPAM use case, while it will be used for both the needs (host-local and whereabouts).

@arunvelayutham arunvelayutham force-pushed the antrea-secondary-network-sriov-support branch 2 times, most recently from 49f43cb to 9b0c0a0 Compare September 22, 2021 02:38
WHEREABOUTS_KUBECONFIG_LITERAL=$(echo "$WHEREABOUTS_KUBECONFIG" | sed -e s'|/host||')

# ------------------------------- Generate a "kube-config"
SERVICE_ACCOUNT_PATH=/var/run/secrets/whereabouts
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whereabouts restricted to use separate serviceaccount/secrets.

@arunvelayutham arunvelayutham force-pushed the antrea-secondary-network-sriov-support branch from 9b0c0a0 to 81f4b7b Compare September 22, 2021 20:45
}

// Add CNIPodInfo to local cache store.
func (pod *CNIPodInfoCache) AddCNIConfigInfo(CNIConfig *CNIConfigInfo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why you use "pod" for short name of CNIPodInfoCache, but not "cache" or just "c"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CNIConfig -> cniConfig (as it is not a global variable).

package cnipodcache

type CNIConfigInfo struct {
CniVersion string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CniVersion -> CNIVersion

We use all upper cases for abbrs.

CniVersion string
PodName string
PodNameSpace string
ContainerId string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ContainerId -> ContainerID.

RawExec: &invoke.RawExec{Stderr: os.Stderr},
}

func delegateCommon(delegatePlugin string, exec invoke.Exec, cniPath string) (string, invoke.Exec, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

netDev = "eth1"
)

type PodWatchController struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember Antonin suggested to rename it to PodController.

And in my mind, we should put this file under /podwatch directly (remove /k8s), and rename it to "controller.go".

@@ -0,0 +1,59 @@
// Copyright 2021 Antrea Authors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it should not be called "podwatch_controller_init.go", but maybe just "init.go" under "/secondarynetwork"? For now, I still suggest to move the code to "controller.go".

@arunvelayutham arunvelayutham force-pushed the antrea-secondary-network-sriov-support branch from 81f4b7b to 54b9c12 Compare September 24, 2021 06:58
pkg/agent/cniserver/pod_configuration.go Outdated Show resolved Hide resolved
pkg/agent/cniserver/server.go Outdated Show resolved Hide resolved
cmd/antrea-agent/agent.go Outdated Show resolved Hide resolved
cmd/antrea-agent/agent.go Outdated Show resolved Hide resolved
pkg/features/antrea_features.go Show resolved Hide resolved
Type string `json:"type,omitempty"`
IPAM IPAMConfig `json:"ipam,omitempty"`
// Options to be passed in by the runtime.
//RuntimeConfig RuntimeConfig `json:"runtimeConfig"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you just remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do this clean-up as part of another PR which does veth pair and OVS init handle

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but why not remove these commented lines in this PR?

pkg/agent/secondarynetwork/podwatch/types.go Outdated Show resolved Hide resolved
pkg/agent/secondarynetwork/podwatch/types.go Outdated Show resolved Hide resolved
pkg/agent/secondarynetwork/podwatch/controller.go Outdated Show resolved Hide resolved
pkg/agent/secondarynetwork/podwatch/controller.go Outdated Show resolved Hide resolved
@@ -0,0 +1,12 @@
apiVersion: rbac.authorization.k8s.io/v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file needs to be removed? Comments have been addressed and it doesn't seem to be used anywhere anymore.

Comment on lines 8 to 10
- whereabouts-rbac.yml
- agent.yml
- whereabouts-secret.yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we discussed that there should be a flag for generate-manifest.sh, that would take care of applying all the yaml patches (except changes to agent-rbac.yml). Wasn't that the case @jianjuns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I had a flag defined for whereabouts (as part of rbac changes). But, since I got inputs to put them directly on the agent-rbac file, I thought we are doing the same for other changes as well.
Is that ok, If I do it in separate PR right after this one?.

@@ -309,6 +321,29 @@ func run(o *Options) error {

go networkPolicyController.Run(stopCh)

if features.DefaultFeatureGate.Enabled(features.SecondaryNetwork) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary empty line

// Create NetworkAttachmentDefinitions client handle to access secondary network object definition from API Server.
netAttachDefClient, err := k8s.CreateNetworkAttachDefClient(o.config.ClientConnection, o.config.KubeAPIServerOverride)
if err != nil {
return fmt.Errorf("networkAttachmentDefinition creation failed. %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("networkAttachmentDefinition creation failed. %v", err)
return fmt.Errorf("networkAttachmentDefinition client creation failed: %v", err)

@@ -309,6 +321,29 @@ func run(o *Options) error {

go networkPolicyController.Run(stopCh)

if features.DefaultFeatureGate.Enabled(features.SecondaryNetwork) {

// Create NetworkAttachmentDefinitions client handle to access secondary network object definition from API Server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Create NetworkAttachmentDefinitions client handle to access secondary network object definition from API Server.
// Create the NetworkAttachmentDefinitions client which handles access to secondary network object definitions from the API Server.

podSriovVFDeviceID, err := cniserver.GetPodContainerDeviceIDs(pod.Name, pod.Namespace)
if err != nil {
klog.ErrorS(err, "handleAddUpdatePod: GetPodContainerDeviceIDs failed for ", pod.Name, pod.Namespace)
// return nil here. we do not want to retry/requeue work if VF id fetch fails (user to check if VFs are created at node).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/node/Node
I'm not sure I see the harm in requeueing here. Yes, we will keep logging the error periodically, but maybe it's not bad. And if VFs are eventually created, then we will be able to recover. Otherwise, the used as to delete the Pod and recreate it, or something similar.


ipamResult, err = ipam.GetIPAMSubnetAddress(cniconfig, cmdArgs)
if err != nil {
return errors.New("Secondary network global IPAM failed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we do not capitalize error messages like this one

for _, ip := range result.IPs {
ip.Interface = current.Int(1)
}
// If secondary interface name not provided in pod annotation, assign default name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/pod/Pod

if len(netinfo.NetworkName) > 0 {
netDefCRD, err := pc.netAttachDefClient.NetworkAttachmentDefinitions(pod.Namespace).Get(context.TODO(), netinfo.NetworkName, metav1.GetOptions{})
if err != nil {
klog.Errorf("NetworkAttachmentDefinitions for %s returns err: %s", netinfo.NetworkName, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use structured logging

}
cniconfig, err := netdefutils.GetCNIConfig(netDefCRD, "")
if err != nil {
klog.Errorf("net-attach-def: CNI config spec read error %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@arunvelayutham arunvelayutham force-pushed the antrea-secondary-network-sriov-support branch from 54b9c12 to ee696e3 Compare October 1, 2021 01:29
cmd/antrea-agent/agent.go Outdated Show resolved Hide resolved
pkg/agent/secondarynetwork/cnipodcache/types.go Outdated Show resolved Hide resolved
pkg/agent/secondarynetwork/podwatch/controller.go Outdated Show resolved Hide resolved
pkg/agent/secondarynetwork/podwatch/controller.go Outdated Show resolved Hide resolved
pkg/agent/secondarynetwork/podwatch/controller.go Outdated Show resolved Hide resolved
pkg/agent/secondarynetwork/podwatch/controller.go Outdated Show resolved Hide resolved
pkg/agent/secondarynetwork/podwatch/controller.go Outdated Show resolved Hide resolved
pkg/agent/secondarynetwork/podwatch/controller.go Outdated Show resolved Hide resolved
pkg/agent/secondarynetwork/podwatch/controller.go Outdated Show resolved Hide resolved
pkg/agent/secondarynetwork/podwatch/controller.go Outdated Show resolved Hide resolved
@arunvelayutham arunvelayutham force-pushed the antrea-secondary-network-sriov-support branch 2 times, most recently from b075a3a to e9853b7 Compare October 1, 2021 16:07
)

const (
controllerName = "AntreaAgentPodController"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more comment: not a big fan of this name, it doesn't imply that this is specific to secondary networks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"SecondaryNetworkController"?

@antoninbas
Copy link
Contributor

@arunvelayutham Our jenkins CI jobs are currently broken (at leasts some of them are). However, I see that you have some Github CI jobs currently failing for this PR.

@arunvelayutham arunvelayutham force-pushed the antrea-secondary-network-sriov-support branch 2 times, most recently from c766b26 to 042049f Compare December 26, 2021 20:43
@arunvelayutham
Copy link
Contributor Author

/test-all

@arunvelayutham arunvelayutham force-pushed the antrea-secondary-network-sriov-support branch from 042049f to e9e90f8 Compare December 26, 2021 20:48
@arunvelayutham
Copy link
Contributor Author

/test-all

@arunvelayutham arunvelayutham force-pushed the antrea-secondary-network-sriov-support branch from e9e90f8 to aae7453 Compare January 5, 2022 23:30
@arunvelayutham
Copy link
Contributor Author

/test-all

@arunvelayutham arunvelayutham force-pushed the antrea-secondary-network-sriov-support branch from aae7453 to f3a8068 Compare January 5, 2022 23:49
@arunvelayutham
Copy link
Contributor Author

/test-all

@arunvelayutham arunvelayutham force-pushed the antrea-secondary-network-sriov-support branch from f3a8068 to 67ef0b8 Compare January 6, 2022 19:29
@arunvelayutham
Copy link
Contributor Author

/test-all

@arunvelayutham
Copy link
Contributor Author

/test-all-features-conformance

@arunvelayutham
Copy link
Contributor Author

/test-conformance

@arunvelayutham
Copy link
Contributor Author

/test-all

1 similar comment
@arunvelayutham
Copy link
Contributor Author

/test-all

Signed-off-by: arunvelayutham <arunkumar.velayutham@intel.com>
@arunvelayutham
Copy link
Contributor Author

/test-all

@arunvelayutham arunvelayutham force-pushed the antrea-secondary-network-sriov-support branch from 67ef0b8 to 7e0a14e Compare January 18, 2022 18:11
@arunvelayutham
Copy link
Contributor Author

/test-all

@arunvelayutham
Copy link
Contributor Author

/test-all-features-conformance

@jianjuns
Copy link
Contributor

/test-integration

@jianjuns
Copy link
Contributor

/test-all

@jianjuns
Copy link
Contributor

/test-integration

@jianjuns
Copy link
Contributor

/test-e2e

1 similar comment
@tnqn
Copy link
Member

tnqn commented Jan 19, 2022

/test-e2e

@jianjuns jianjuns merged commit f9dba58 into antrea-io:main Jan 19, 2022
yanjunz97 pushed a commit to yanjunz97/antrea that referenced this pull request Feb 14, 2022
Implement SR-IOV secondary network for Antrea CNI configured Pods

This commit implements SR-IOV secondary network that creates Pod
secondary network interfaces with SR-IOV VFs. At this moment, the
implementation supports only baremetal Nodes, not SR-IOV interfaces
on VM Nodes.

Signed-off-by: arunvelayutham <arunkumar.velayutham@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants