Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[windows] Fix bug in configureContainerLink #3089

Merged
merged 1 commit into from
Dec 17, 2021

Conversation

XinShuYang
Copy link
Contributor

@XinShuYang XinShuYang commented Dec 3, 2021

We should use containerNetNS when deciding if a container is an infra one.

Signed-off-by: Shuyang Xin gavinx@vmware.com

Fixes #3077

@codecov-commenter
Copy link

codecov-commenter commented Dec 3, 2021

Codecov Report

Merging #3089 (ccdf733) into main (c135609) will decrease coverage by 0.70%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3089      +/-   ##
==========================================
- Coverage   58.76%   58.06%   -0.71%     
==========================================
  Files         292      293       +1     
  Lines       24724    24823      +99     
==========================================
- Hits        14530    14414     -116     
- Misses       8622     8828     +206     
- Partials     1572     1581       +9     
Flag Coverage Δ
kind-e2e-tests 44.49% <ø> (-0.90%) ⬇️
unit-tests 40.02% <ø> (-0.13%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ver/registry/controlplane/nodestatssummary/rest.go 50.00% <0.00%> (-50.00%) ⬇️
...egator/apiserver/handlers/recordmetrics/handler.go 0.00% <0.00%> (-44.45%) ⬇️
pkg/apiserver/handlers/loglevel/handler.go 0.00% <0.00%> (-38.47%) ⬇️
pkg/agent/cniserver/pod_configuration_linux.go 66.66% <0.00%> (-33.34%) ⬇️
...gregator/apiserver/handlers/flowrecords/handler.go 0.00% <0.00%> (-23.64%) ⬇️
pkg/agent/nodeportlocal/k8s/annotations.go 83.87% <0.00%> (-16.13%) ⬇️
pkg/agent/controller/networkpolicy/reject.go 75.80% <0.00%> (-12.10%) ⬇️
pkg/log/log_level.go 0.00% <0.00%> (-10.00%) ⬇️
.../listers/security/v1alpha1/clusternetworkpolicy.go 0.00% <0.00%> (-9.10%) ⬇️
pkg/controller/networkpolicy/status_controller.go 71.24% <0.00%> (-5.89%) ⬇️
... and 25 more

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change itself LGTM. However, I didn't figure out how it resolves the issue in #3077, could you explain more details?

BTW, the signed email address seems wrong.

@XinShuYang
Copy link
Contributor Author

XinShuYang commented Dec 5, 2021

I think using the wrong parameter in isInfraContainer() could cause errors during reattach HNS Endpoint to the container. And this test failed due to the timeout waiting for connectivity test after updating pod label.
The email address has been corrected, thanks. @tnqn

@tnqn
Copy link
Member

tnqn commented Dec 7, 2021

I think using the wrong parameter in isInfraContainer() could cause errors during reattach HNS Endpoint to the container. And this test failed due to the timeout waiting for connectivity test after updating pod label.

I thought the changed code only affects the cleanup of HNSEndpoint when there was something wrong. When a Pod was recreated, did it encounter the attach error? Did you see the log "failed to configure container IP" when the test failed?

/test-windows-networkpolicy

The email address has been corrected, thanks. @tnqn

You also need to update the email address of your git setting and update the author and email of the patch to be consistent with the DCO sign. Currently DCO check failed:

Commit sha: 779b736, Author: Shuyang Xin, Committer: Shuyang Xin; Expected "Shuyang Xin gavinx@gavinx-a01.vmware.com", but got "Shuyang Xin gavinx@vmware.com".

We should use containerNetNS when deciding if a container is an infra one.

Signed-off-by: Shuyang Xin <gavinx@vmware.com>
@XinShuYang
Copy link
Contributor Author

I think using the wrong parameter in isInfraContainer() could cause errors during reattach HNS Endpoint to the container. And this test failed due to the timeout waiting for connectivity test after updating pod label.

I thought the changed code only affects the cleanup of HNSEndpoint when there was something wrong. When a Pod was recreated, did it encounter the attach error? Did you see the log "failed to configure container IP" when the test failed?

Yes, I can see "failed to configure container IP" many times when test failed from the agent log. I thought it was because the endpoint was always removed after any non-pause container failed to attach the endpoint. This incorrect behavior happened because the isInfraContainer() function only return true at this time.

After fixing this bug, there was no incorrect endpoint removing info in the log. The test can be passed with good connection status of the client pod.

/test-windows-networkpolicy

The email address has been corrected, thanks. @tnqn

You also need to update the email address of your git setting and update the author and email of the patch to be consistent with the DCO sign. Currently DCO check failed:

Commit sha: 779b736, Author: Shuyang Xin, Committer: Shuyang Xin; Expected "Shuyang Xin gavinx@gavinx-a01.vmware.com", but got "Shuyang Xin gavinx@vmware.com".

The DCO check can pass this time, thanks for your reminder.

@tnqn
Copy link
Member

tnqn commented Dec 9, 2021

@XinShuYang thanks for the explanation. Why wasn't there an issue before this change ff17f09? It seems it also removed the hnsendpoint unconditionally. Was it because of the change at ff17f09#diff-17bccd4a78564ad9445047f0815ca61a4590f9119a333b9f619a14c9c3999b9bL302?

@XinShuYang
Copy link
Contributor Author

XinShuYang commented Dec 10, 2021

@XinShuYang thanks for the explanation. Why wasn't there an issue before this change ff17f09? It seems it also removed the hnsendpoint unconditionally. Was it because of the change at ff17f09#diff-17bccd4a78564ad9445047f0815ca61a4590f9119a333b9f619a14c9c3999b9bL302?

@tnqn Before ff17f09 , attachContainerLink() won't return error if non-pause container failed to attach endpoint. According to this commit, any container which failed to attach endpoint will return error, then isInfraContainer() should figure out whether it is a pause container. If it is not a pause container, we should never remove the endpoint. The new bug fix make the isInfraContainer() work correctly on windows.

@XinShuYang
Copy link
Contributor Author

/test-windows-all

@XinShuYang
Copy link
Contributor Author

@tnqn Hi quan, since this bug is blocking windows networkpolicy CI test, could you merge it first?

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Dec 17, 2021

/skip-e2e
/skip-conformance
/skip-networkpolicy
As it doesn't change linux code.

@tnqn tnqn merged commit ebeb2d4 into antrea-io:main Dec 17, 2021
@tnqn tnqn added action/backport Indicates a PR that requires backports. and removed action/backport Indicates a PR that requires backports. labels Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Windows network policy case fail: should allow ingress access from updated pod
3 participants