Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem encountered while using HostNetworkDNSPolicy #92276

Closed
david-enli opened this issue Jun 18, 2020 · 11 comments
Closed

problem encountered while using HostNetworkDNSPolicy #92276

david-enli opened this issue Jun 18, 2020 · 11 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@david-enli
Copy link

Hi all, first time posting and apologize if doing it wrong. With the new release of k8s 1.16.10 , our pure storage CSI node plugin (pods) for iscsi deployed by DaemonSet start to have issue to talk to K8S service..

We deploy CSI node plugin pod on each master and worker nodes. However, it looks like network routing on worker nodes is problematic.
It looks like master have correct routing to our internal k8s service:

# nslookup pso-db-public.nstk
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name:      pso-db-public.nstk
Address 1: 10.102.98.125 pso-db-public.nstk.svc.cluster.local

but the worker nodes s the problem:

/ # nslookup pso-db-public.nstk
Server:    10.96.0.10

Here is our Demonset pod network and DNS setting:

hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet

Since first discovered this issue, we've been hitting it in 1.17.1, 1.17.5, 1.17.6. Still trying out more versions but it looks like this wasn't fixed since 1.16.10, thank you very much for your help.

@david-enli david-enli added the kind/bug Categorizes issue or PR as related to a bug. label Jun 18, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 18, 2020
@david-enli
Copy link
Author

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 18, 2020
@athenabot
Copy link

/triage unresolved

Comment /remove-triage unresolved when the issue is assessed and confirmed.

🤖 I am a bot run by vllry. 👩‍🔬

@k8s-ci-robot k8s-ci-robot added the triage/unresolved Indicates an issue that can not or will not be resolved. label Jun 18, 2020
@wangyira
Copy link

/assign @rikatz

@rikatz
Copy link
Contributor

rikatz commented Jun 29, 2020

Hey @david-enli let's take a look into that.

I didn't understood the first part of the issue, when you do a nslookup from the nodes they don't resolve correctly the address?

Also, can you please provide some further information:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin (CNI) and version (if this is a network-related bug):
  • Others:

Thank you

@rikatz
Copy link
Contributor

rikatz commented Jun 29, 2020

/triage needs-information

@k8s-ci-robot k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Jun 29, 2020
@rikatz
Copy link
Contributor

rikatz commented Jul 1, 2020

@david-enli friendly ping

@athenabot
Copy link

@rikatz
If this issue has been triaged, please comment /remove-triage unresolved.

If you aren't able to handle this issue, consider unassigning yourself and/or adding the help-wanted label.

🤖 I am a bot run by vllry. 👩‍🔬

@rikatz
Copy link
Contributor

rikatz commented Jul 6, 2020

/remove-triage unresolved

@k8s-ci-robot k8s-ci-robot removed the triage/unresolved Indicates an issue that can not or will not be resolved. label Jul 6, 2020
@david-enli
Copy link
Author

david-enli commented Jul 6, 2020

hi @rikatz I sincerely apologize for the delay, this was logged using a different git account and I missed your previous pings. In the meantime, my team member came accross this open issue flannel-io/flannel#1243 and we have adopted one of the solutions provided in the comments: Logging into the node and turn off flannel device check sum by running: ethtool -K flannel.1 tx-checksum-ip-generic off

This issue has present for us since 1.16.10 and it was on both Ubunto and Centos enviornment. The network plugin is flannel. Again, apologize if this not the right place for the issue.

@rikatz
Copy link
Contributor

rikatz commented Jul 6, 2020

@david-enli no problem :D

I was trying to figure out if this was also related with #88986 and it seems to be. There's a PR already merged for this here #92035 and a good explanation also why this happens.

As this is a dup and already solved, I'll close this issue but please feel free to re-open if you think there's a need to deal with something else

Tks

/close

@k8s-ci-robot
Copy link
Contributor

@rikatz: Closing this issue.

In response to this:

@david-enli no problem :D

I was trying to figure out if this was also related with #88986 and it seems to be. There's a PR already merged for this here #92035 and a good explanation also why this happens.

As this is a dup and already solved, I'll close this issue but please feel free to re-open if you think there's a need to deal with something else

Tks

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

5 participants