Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for ovs-vswitchd PID before calling ovs-appctl #2695

Merged
merged 1 commit into from
Sep 2, 2021

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Sep 1, 2021

Otherwise the call may fail and crash the process.

Besides, it cleans up the OVS run files on exit to avoid stale PID
from being used.

Signed-off-by: Quan Tian qtian@vmware.com

Fixes #2694

@codecov-commenter
Copy link

codecov-commenter commented Sep 1, 2021

Codecov Report

Merging #2695 (f010674) into main (dded211) will increase coverage by 6.17%.
The diff coverage is 43.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2695      +/-   ##
==========================================
+ Coverage   59.50%   65.67%   +6.17%     
==========================================
  Files         285      285              
  Lines       23020    26380    +3360     
==========================================
+ Hits        13697    17326    +3629     
+ Misses       7865     7437     -428     
- Partials     1458     1617     +159     
Flag Coverage Δ
e2e-tests 57.18% <43.75%> (?)
kind-e2e-tests 48.40% <46.66%> (+1.65%) ⬆️
unit-tests 41.00% <0.00%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/ovs/ovsctl/interface.go 0.00% <0.00%> (ø)
pkg/ovs/ovsctl/ovsctl_others.go 57.14% <50.00%> (+7.14%) ⬆️
pkg/controller/egress/ipallocator/allocator.go 65.00% <0.00%> (-15.42%) ⬇️
pkg/controller/networkpolicy/endpoint_querier.go 77.64% <0.00%> (-13.79%) ⬇️
pkg/legacyapis/core/v1alpha2/register.go 69.23% <0.00%> (-10.77%) ⬇️
pkg/apis/stats/register.go 71.42% <0.00%> (-10.39%) ⬇️
pkg/legacyapis/stats/register.go 71.42% <0.00%> (-10.39%) ⬇️
pkg/ovs/openflow/ofctrl_meter.go 33.84% <0.00%> (-10.16%) ⬇️
pkg/legacyapis/security/v1alpha1/register.go 73.33% <0.00%> (-10.00%) ⬇️
.../registry/networkpolicy/clustergroupmember/rest.go 78.26% <0.00%> (-9.98%) ⬇️
... and 271 more

@tnqn tnqn added this to the Antrea v1.3 release milestone Sep 1, 2021
@tnqn
Copy link
Member Author

tnqn commented Sep 1, 2021

/test-all

@jianjuns
Copy link
Contributor

jianjuns commented Sep 1, 2021

What code at agent startup runs appctl? Why it crashes? Should we just retry there instead?

antoninbas
antoninbas previously approved these changes Sep 1, 2021
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pkg/ovs/ovsctl/ovsctl_others.go Show resolved Hide resolved
@tnqn
Copy link
Member Author

tnqn commented Sep 2, 2021

What code at agent startup runs appctl? Why it crashes? Should we just retry there instead?

It's a step that validates whether OVS supports NAT. The linked issue #2694 has the detail of the problem.
We can retry there but it seems not more readable because the higher layer function don't know details of the error and may look like retrying blindly. Then people may start adding retry for other steps too. Since this is kind of expected and recoverable, I feel hiding it from caller and retrying directly should be simpler.

antoninbas
antoninbas previously approved these changes Sep 2, 2021
Otherwise the call may fail and crash the process.

Besides, it cleans up the OVS run files on exit to avoid stale PID
from being used.

Signed-off-by: Quan Tian <qtian@vmware.com>
@tnqn
Copy link
Member Author

tnqn commented Sep 2, 2021

/test-all

@tnqn
Copy link
Member Author

tnqn commented Sep 2, 2021

@antoninbas I found I never got the expected "Waited for" log when verifying the change and it still crashed sometimes, which was because I was checking the condition func's return value err instead of readErr.
I have fixed it and verified it, PTAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Agent crashed because of error when listing DP features occasionally
4 participants