-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check the existence of CRD AntreaAgentInfo before operating on it #4762
Conversation
pkg/monitor/controller.go
Outdated
@@ -365,3 +379,14 @@ func (monitor *controllerMonitor) deleteAgentCRD(name string) error { | |||
} | |||
return nil | |||
} | |||
|
|||
func (monitor *controllerMonitor) AgentInfoCRDInstalled() bool { | |||
_, err := monitor.apiExtensionClient.ApiextensionsV1().CustomResourceDefinitions().Get(context.TODO(), antreaAgentInfoCRDName, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this check is needed? I suppose CRD should be always there? Will CRD be deleted in any case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When antrea is running with agentless mode, CRD of antreaagentinfo may not create as it is not needed.
pkg/monitor/controller.go
Outdated
@@ -365,3 +379,14 @@ func (monitor *controllerMonitor) deleteAgentCRD(name string) error { | |||
} | |||
return nil | |||
} | |||
|
|||
func (monitor *controllerMonitor) AgentInfoCRDInstalled() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clientset.Interface
has Discovery()
which can be used to check if an API exists. We already use it to check EndpointSlice API. This is an example:
antrea/pkg/agent/proxy/proxier.go
Lines 1158 to 1173 in f1545c1
func endpointSliceAPIAvailable(k8sClient clientset.Interface) (bool, error) { | |
resources, err := k8sClient.Discovery().ServerResourcesForGroupVersion(discovery.SchemeGroupVersion.String()) | |
if err != nil { | |
// The group version doesn't exist. | |
if errors.IsNotFound(err) { | |
return false, nil | |
} | |
return false, fmt.Errorf("error getting server resources for GroupVersion %s: %v", discovery.SchemeGroupVersion.String(), err) | |
} | |
for _, resource := range resources.APIResources { | |
if resource.Kind == "EndpointSlice" { | |
return true, nil | |
} | |
} | |
return false, nil | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be a private method, and something like antreaAgentInfoAPIAvailable
reflects the implementation more accurately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reminder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
pkg/monitor/controller.go
Outdated
if !c.AgentInfoCRDInstalled() { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just not run these goroutines instead of check the API's existence on every Node event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that at time 1 the antreaagentinfo CRD is not deployed, but at time2 the CRD is applied. If we only have a single check at the initial run, the case is not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code doesn't automatically call deleteStaleAgentCRDs
and syncNode
when the CRD is added after starting controller, right? Then it would just lead to an incomplete situation that some Nodes are handled while others are not, so a restart is needed anyway. Besides, there seems no reason to apply CRD after running controller? Could other features consuming CRD work correctly? But anyway I feel not worth to add repeated API calls to partially support a scenario that we don't have yet.
And I think the point of the PR is to avoid adding a configuration that controls whether to run the AntreaAgentInfo controller, by auto-discovering it. If the auto-discovery becomes more complicated and even double APIs call of normal mode, it may be not worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
25470da
to
c9b1e6a
Compare
pkg/monitor/controller.go
Outdated
@@ -365,3 +379,14 @@ func (monitor *controllerMonitor) deleteAgentCRD(name string) error { | |||
} | |||
return nil | |||
} | |||
|
|||
func (monitor *controllerMonitor) AgentInfoCRDInstalled() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be a private method, and something like antreaAgentInfoAPIAvailable
reflects the implementation more accurately.
07922a2
to
73015ee
Compare
pkg/monitor/controller.go
Outdated
klog.ErrorS(err, "error getting server resources for GroupVersion", "groupVersion", v1beta1.SchemeGroupVersion.String()) | ||
return false | ||
} | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should retry on non-NotFound error, which may mean the APIServer is temporarily unavailable or there is a networking issue. Otherwise it may return wrong result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated accordingly.
pkg/monitor/controller.go
Outdated
go wait.Until(monitor.nodeWorker, time.Second, stopCh) | ||
if monitor.externalNodeEnabled { | ||
go wait.Until(monitor.externalNodeWorker, time.Second, stopCh) | ||
if monitor.antreaAgentInfoAPIAvailable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should log the result for visibility and troubleshooting. For example:
if monitor.antreaAgentInfoAPIAvailable() {
klog.InfoS("The AntreaAgentInfo API is available, running node workers")
monitor.deleteStaleAgentCRDs()
...
} else {
klog.InfoS("The AntreaAgentInfo API is unavailable, will not run node workers")
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
650d2a7
to
b739e68
Compare
pkg/monitor/controller_test.go
Outdated
@@ -624,3 +642,38 @@ func TestEnqueueExternalNode(t *testing.T) { | |||
obj, _ := controller.controllerMonitor.externalNodeQueue.Get() | |||
assert.Equal(t, expectedkey, obj.(string)) | |||
} | |||
|
|||
func TestAgentInfoCRDInstalled(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test name inconsistent with func name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
Signed-off-by: wenyingd <wenyingd@vmware.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/skip-all |
…trea-io#4762) Signed-off-by: wenyingd <wenyingd@vmware.com>
…trea-io#4762) Signed-off-by: wenyingd <wenyingd@vmware.com>
Antrea Controller checks the existence of AntreaAgentInfo API before creating or deleting the resources for worker Node or ExternalNode.