Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed errors returned when key-value store is locked #232

Closed
PatrickLang opened this issue Aug 15, 2018 · 4 comments
Closed

Malformed errors returned when key-value store is locked #232

PatrickLang opened this issue Aug 15, 2018 · 4 comments

Comments

@PatrickLang
Copy link
Contributor

Is this a request for help?: No


Is this an ISSUE or FEATURE REQUEST? (choose one): Issue


Which release version?: v1.0.11


Which component (CNI/IPAM/CNM/CNS): CNI


Which Operating System (Linux/Windows): Windows Server version 1803

Which Orchestrator and version (e.g. Kubernetes, Docker): Kubernetes v1.10.6


What happened:

When I try to scale up to several pods on the same node, there will be some store locking issues as overlapped calls are made to the Azure CNI binary.

2018/08/15 21:11:45 Going to send Telemetry report to hostnetagent http://169.254.169.254/machine/plugins?comp=netagent&type=cnireport
2018/08/15 21:11:45 "Start Flag false CniSucceeded false Name CNI Version v1.0.11 ErrorMessage Store is locked vnet [] 
				Context AzureCNI SubContext "
2018/08/15 21:11:45 OrchestratorDetails &{  kubectl command failed due to exit status 1}
2018/08/15 21:11:45 OSDetails &{windows    }
2018/08/15 21:11:45 SystemDetails &{0 0 0 0 0 0 }
2018/08/15 21:11:45 InterfaceDetails &{Primary 10.240.0.0/12 10.240.0.4 00:0d:3a:f9:3a:fe vEthernet (Ethernet 2) 30 0 }
2018/08/15 21:11:45 BridgeDetails <nil>
2018/08/15 21:11:45 Send telemetry success 200
2018/08/15 21:11:45 Going to send Telemetry report to hostnetagent http://169.254.169.254/machine/plugins?comp=netagent&type=cnireport
2018/08/15 21:11:45 "Start Flag false CniSucceeded false Name CNI Version v1.0.11 ErrorMessage Store is locked vnet [] 
				Context AzureCNI SubContext "
2018/08/15 21:11:45 OrchestratorDetails &{  kubectl command failed due to exit status 1}
2018/08/15 21:11:45 OSDetails &{windows    }
2018/08/15 21:11:45 SystemDetails &{0 0 0 0 0 0 }
2018/08/15 21:11:45 InterfaceDetails &{Primary 10.240.0.0/12 10.240.0.4 00:0d:3a:f9:3a:fe vEthernet (Ethernet 2) 30 0 }
2018/08/15 21:11:45 BridgeDetails <nil>
2018/08/15 21:11:45 Send telemetry success 200
2018/08/15 21:11:45 Going to send Telemetry report to hostnetagent http://169.254.169.254/machine/plugins?comp=netagent&type=cnireport
2018/08/15 21:11:45 "Start Flag false CniSucceeded false Name CNI Version v1.0.11 ErrorMessage Store is locked vnet [] 
				Context AzureCNI SubContext "
2018/08/15 21:11:45 OrchestratorDetails &{  kubectl command failed due to exit status 1}
2018/08/15 21:11:45 OSDetails &{windows    }
2018/08/15 21:11:45 SystemDetails &{0 0 0 0 0 0 }
2018/08/15 21:11:45 InterfaceDetails &{Primary 10.240.0.0/12 10.240.0.4 00:0d:3a:f9:3a:fe vEthernet (Ethernet 2) 30 0 }
2018/08/15 21:11:45 BridgeDetails <nil>
2018/08/15 21:11:45 Send telemetry success 200
2018/08/15 21:11:45 Going to send Telemetry report to hostnetagent http://169.254.169.254/machine/plugins?comp=netagent&type=cnireport
2018/08/15 21:11:45 "Start Flag false CniSucceeded true Name CNI Version v1.0.11 ErrorMessage  vnet [] 
				Context AzureCNI SubContext "
2018/08/15 21:11:45 OrchestratorDetails &{  kubectl command failed due to exit status 1}
2018/08/15 21:11:45 OSDetails &{windows    }
2018/08/15 21:11:45 SystemDetails &{0 0 0 0 0 0 }
2018/08/15 21:11:45 InterfaceDetails &{Primary 10.240.0.0/12 10.240.0.4 00:0d:3a:f9:3a:fe vEthernet (Ethernet 2) 30 0 }
2018/08/15 21:11:45 BridgeDetails <nil>
2018/08/15 21:11:45 Send telemetry success 200
2018/08/15 21:11:45 SetReportState succeeded
2018/08/15 21:11:45 SetReportState succeeded
2018/08/15 21:11:45 SetReportState succeeded
2018/08/15 21:11:45 SetReportState succeeded
E0815 21:11:45.950994    4188 cni.go:259] Error adding network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
E0815 21:11:45.950994    4188 cni_windows.go:49] error while adding to cni network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
W0815 21:11:45.950994    4188 docker_sandbox.go:353] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "iis-1803-5ffd8b84d6-46msm_default": netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
I0815 21:11:45.957988    4188 kubelet_node_status.go:491] Using Node Hostname from cloudprovider: "15453k8s9001"
E0815 21:11:45.971011    4188 cni.go:259] Error adding network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
E0815 21:11:45.971011    4188 cni_windows.go:49] error while adding to cni network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
W0815 21:11:45.971011    4188 docker_sandbox.go:353] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "iis-1803-5ffd8b84d6-vdppz_default": netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
E0815 21:11:45.991997    4188 cni.go:259] Error adding network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
E0815 21:11:45.992999    4188 cni_windows.go:49] error while adding to cni network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
W0815 21:11:45.992999    4188 docker_sandbox.go:353] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "iis-1803-5ffd8b84d6-sjlmm_default": netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
E0815 21:11:46.006993    4188 cni.go:259] Error adding network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
E0815 21:11:46.006993    4188 cni_windows.go:49] error while adding to cni network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
W0815 21:11:46.006993    4188 docker_sandbox.go:353] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "iis-1803-5ffd8b84d6-2rz9c_default": netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
2018/08/15 21:11:46 SetReportState succeeded
2018/08/15 21:11:41 [cni] Timed out on locking store, err:Store is locked.
2018/08/15 21:11:46 Failed to initialize key-value store of network plugin, err:Store is locked.
2018/08/15 21:11:46 Report plugin error
2018/08/15 21:11:41 [cni] Timed out on locking store, err:Store is locked.
2018/08/15 21:11:46 Failed to initialize key-value store of network plugin, err:Store is locked.
2018/08/15 21:11:46 Report plugin error
2018/08/15 21:11:41 [cni] Timed out on locking store, err:Store is locked.
2018/08/15 21:11:46 Failed to initialize key-value store of network plugin, err:Store is locked.
2018/08/15 21:11:46 Report plugin error
2018/08/15 21:11:41 [cni] Timed out on locking store, err:Store is locked.
2018/08/15 21:11:46 Failed to initialize key-value store of network plugin, err:Store is locked.
2018/08/15 21:11:46 Report plugin error
2018/08/15 21:11:41 [cni] Timed out on locking store, err:Store is locked.
2018/08/15 21:11:46 Failed to initialize key-value store of network plugin, err:Store is locked.
2018/08/15 21:11:46 Report plugin error
2018/08/15 21:11:46 [cni] Timed out on locking store, err:Store is locked.
2018/08/15 21:11:46 Failed to initialize key-value store of network plugin, err:Store is locked.
2018/08/15 21:11:46 Report plugin error
2018/08/15 21:11:46 [cni] Timed out on locking store, err:Store is locked.
2018/08/15 21:11:46 Failed to initialize key-value store of network plugin, err:Store is locked.
2018/08/15 21:11:46 Report plugin error

What you expected to happen:

Instead of seeing E0815 21:11:45.950994 4188 cni.go:259] Error adding network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input after CNI failed to get the lock, I would expect a valid error.

I looked at the source:

log.Printf("Failed to initialize key-value store of network plugin, err:%v.\n", err)

and it has return code 1 with nothing on stderr

What I would expect is a return code >100 (vendor specific), along with this json schema on stdout:

{
  "cniVersion": "0.3.1",
  "code": <numeric-error-code>,
  "msg": <short-error-message>,
  "details": <long-error-message> (optional)
}

How to reproduce it (as minimally and precisely as possible):

Try to scale anything up on Windows. It will eventually succeed, but there will be errors in the process as multiple pods need addEndpoint called simultaneously.

@daschott
Copy link

Could this be related at all to moby/libnetwork#1950 ?

@lowenna
Copy link
Contributor

lowenna commented Sep 12, 2018

Looking at the code here, I strongly suspect this is actually another instance of etcd-io/bbolt#122, which fixes moby/libnetwork#1950. This repo uses libnetwork, so it would make sense that the two things are related. libnetwork in turn uses libkv.

So far:

Once moby/libnetwork#2268 is merged, someone needs to fix up <insert component, CNI - where is that repo - @PatrickLang any idea?> to move to the updated libnetwork to fix this.

@daschott
Copy link

@dineshgovindasamy @madhanrm Are there any other CNI fixes needed here to move to updated libnetwork as @jhowardmsft indicates?

@PatrickLang
Copy link
Contributor Author

Resolved in #247, merged into acs-engine Azure/acs-engine#3989

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants