Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log to standard output result in garbage character in add function #1248

Closed
ozaditya opened this issue Oct 4, 2020 · 6 comments
Closed

Log to standard output result in garbage character in add function #1248

ozaditya opened this issue Oct 4, 2020 · 6 comments
Assignees
Labels

Comments

@ozaditya
Copy link

ozaditya commented Oct 4, 2020

What happened:

After setting environment variable AWS_VPC_K8S_PLUGIN_LOG_FILE to stdout and deploying CNI version 1.7.3.
I have observed that CFN fails to add ip to pods.

Logs from CNI:

Send AddNetworkReply: IPv4Addr 10.1.161.151, DeviceNumber: 0, err: \u003cnil\u003e"}
Received DelNetwork for Sandbox 4da84e08dd17a4e2cceacbf31d04801575cfab2e19fae462a17c95921b422ee5"}
DelNetworkRequest: ClientVersion:\"v1.7.3\" K8S_POD_NAME:\"datadog-agent-wt4hd\" K8S_POD_NAMESPACE:\"datadog\" K8S_POD_INFRA_CONTAINER_ID:\"4da84e08dd17a4e2cceacbf31d04801575cfab2e19fae462a17c95921b422ee5\" Reason:\"PodDeleted\" ContainerID:\"4da84e08dd17a4e2cceacbf31d04801575cfab2e19fae462a17c95921b422ee5\"

Logs from kubelet:

Error adding datadog_datadog-agent-wt4hd/4da84e08dd17a4e2cceacbf31d04801575cfab2e19fae462a17c95921b422ee5 to network aws-cni/aws-cni: invalid character '{' after top-level value
RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to set up sandbox container "4da84e08dd17a4e2cceacbf31d04801575cfab2e19fae462a17c95921b422ee5" network for pod "datadog-agent-wt4hd": NetworkPlugin cni failed to set up pod "datadog-agent-wt4hd_datadog" network: invalid character '{' after top-level value

What you expected to happen:

Since its documented in Read Me that CNI log can be sent on stdout. I expect CNI to function properly.

snippet:
AWS_VPC_K8S_CNI_LOG_FILE
Type: String
Default: /host/var/log/aws-routed-eni/ipamd.log
Valid Values: stdout or a file path
Specifies where to write the logging output of ipamd. Either to stdout or to override the default file (i.e., /var/log/aws-routed-eni/ipamd.log).

How to reproduce it (as minimally and precisely as possible):
Set above environment variable to stdout or simply change cni config file present at path “/etc/cni/net.d” with name “10-aws.conflist” to below.

{
"cniVersion": "0.3.1",
"name": "aws-cni",
"plugins": [
{
"name": "aws-cni",
"type": "aws-cni",
"vethPrefix": "eni",
"mtu": "9001",
"pluginLogFile": "stdout",
"pluginLogLevel": "Debug"
},
{
"type": "portmap",
"capabilities": {"portMappings": true},
"snat": true
}
]
}

Anything else we need to know?:

I believe error was generated by add function.

resp := rpc.AddNetworkReply{

I believe it trace back to project https://github.com/containernetworking/cni/

Environment:

  • Kubernetes version (use kubectl version): Currrently tested on version 1.15
  • CNI Version: 1.7.x same is true for 1.6.x
  • OS (e.g: cat /etc/os-release): Amazon linux 2
  • Kernel (e.g. uname -a): Linux ip-172-31-45-225.ec2.internal 4.14.171-136.231.amzn2.x86_64 Initial commit of amazon-vpc-cni-k8s #1 SMP Thu Feb 27 20:22:48 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
@ozaditya ozaditya added the bug label Oct 4, 2020
@jayanthvn
Copy link
Contributor

jayanthvn commented Oct 5, 2020

Thanks for opening the issue @ozaditya . Looking into it.

@jayanthvn
Copy link
Contributor

jayanthvn commented Oct 6, 2020

We did a initial code walk through, pending root cause. Currently we are seeing the below log in kubelet -

E1005 18:22:11.154374   11718 cni.go:364] Error adding default_my-nginx-86b7cfc89-
kb8d5/cc40109840c171aced733ee1a15b20a403106c301ceca55d4a28792eebd3e7b6 to network aws-cni/aws-cni: netplugin
 failed but error parsing its diagnostic message "{\"level\":\"info\",\"ts\":\"2020-10-05T18:22:11.145Z\",\"caller\":\"routed-eni-cni-plugin/cni.go:111\",\"msg\":\"Received CNI add request: 
ContainerID(cc40109840c171aced733ee1a15b20a403106c301ceca55d4a28792eebd3e7b6) Netns(/proc/12295/ns/net) 
IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=my-nginx-86b7cfc89-
kb8d5;K8S_POD_INFRA_CONTAINER_ID=cc40109840c171aced733ee1a15b20a403106c301ceca55d4a28792eebd3e7b6) 
Path(/opt/cni/bin) argsStdinData({\\\"cniVersion\\\":\\\"0.3.1\\\",\\\"mtu\\\":\\\"9001\\\",\\\"name\\\":\\\"aws-
cni\\\",\\\"pluginLogFile\\\":\\\"stdout\\\",\\\"pluginLogLevel\\\":\\\"trace\\\",\\\"type\\\":\\\"aws-
cni\\\",\\\"vethPrefix\\\":\\\"eni\\\"})\"}\n{\"level\":\"debug\",\"ts\":\"2020-10-05T18:22:11.145Z\",\"caller\":\"routed-eni-cni-
plugin/cni.go:111\",\"msg\":\"MTU value set is 9001:\"}\n{\"level\":\"info\",\"ts\":\"2020-10-
05T18:22:11.148Z\",\"caller\":\"routed-eni-cni-plugin/cni.go:111\",\"msg\":\"Received add network response for container 
cc40109840c171aced733ee1a15b20a403106c301ceca55d4a28792eebd3e7b6 interface eth0: Success:true 
DeviceNumber:-1 VPCcidrs:\\\"192.168.0.0/16\\\" \"}\n{\"level\":\"debug\",\"ts\":\"2020-10-
05T18:22:11.148Z\",\"caller\":\"routed-eni-cni-plugin/cni.go:188\",\"msg\":\"SetupNS: hostVethName=enid9fae0540d6, 
contVethName=eth0, netnsPath=/proc/12295/ns/net, deviceNumber=-1, mtu=9001\"}\n{\"level\":\"error\",\"ts\":\"2020-10-
05T18:22:11.149Z\",\"caller\":\"driver/driver.go:185\",\"msg\":\"Failed to setup veth network setup NS network: failed to add IP
 addr to \\\"eth0\\\": numerical result out of range\"}\n{\"level\":\"error\",\"ts\":\"2020-10-
05T18:22:11.149Z\",\"caller\":\"routed-eni-cni-plugin/cni.go:111\",\"msg\":\"Failed SetupPodNetwork for container 
cc40109840c171aced733ee1a15b20a403106c301ceca55d4a28792eebd3e7b6: setupNS network: failed to setup veth pair.:
 setupVeth network: failed to setup veth network: setup NS network: failed to add IP addr to \\\"eth0\\\": numerical result out of range\"}\n{\"level\":\"error\",\"ts\":\"2020-10-05T18:22:11.149Z\",\"caller\":\"routed-eni-cni-
plugin/cni.go:111\",\"msg\":\"Error received from DelNetwork grpc call for container 
cc40109840c171aced733ee1a15b20a403106c301ceca55d4a28792eebd3e7b6: rpc error: code = Unknown desc = 
datastore: unknown pod\"}\n": invalid character '{' after top-level value

Error is at addToNetwork [https://github.com/kubernetes/kubernetes/blob/v1.15.11/pkg/kubelet/dockershim/network/cni/cni.go#L331]

Call trace-

addToNetwork -> AddNetworkList -> addNetwork [1] -> ExecPluginWithResult [2]

[1] https://github.com/containernetworking/cni/blob/2d4bc3d57aae747b6d68df7c3631bc9c120dfcde/libcni/api.go#L429
[2] https://github.com/containernetworking/cni/blob/2d4bc3d57aae747b6d68df7c3631bc9c120dfcde/libcni/api.go#L421

Need to add more tracing and check around this version.NewResult(confVersion, stdoutBytes) -
https://github.com/containernetworking/cni/blob/f145359defec1db69929c67dead380fdf8dfce1b/pkg/invoke/exec.go#L93

and exec.ExecPlugin - https://github.com/containernetworking/cni/blob/f145359defec1db69929c67dead380fdf8dfce1b/pkg/invoke/exec.go#L81

ExecPlugin on c.Run() success -[https://github.com/containernetworking/cni/blob/77436456f2ab8443ba0450d61b1499bd50e8a704/pkg/invoke/raw_exec.go#L34] -> return stdout.Bytes()

Also based on the spec - https://github.com/containernetworking/cni/blob/master/SPEC.md#result recommendation is to use "stderr can be used for unstructured output such as logs". Verified "stderr" instead of "stdout" and issue is not seen.

For completeness, Call trace from CNI -

add [1]-> PrintResult [2] -> prettyPrint[3] -> _, err = os.Stdout.Write(data) [o/p is written to STDOUT]
[1] https://github.com/containernetworking/cni/blob/f145359defec1db69929c67dead380fdf8dfce1b/pkg/invoke/exec.go#L81
[2] https://github.com/containernetworking/cni/blob/master/pkg/types/types.go#L105
[3] https://github.com/containernetworking/cni/blob/master/pkg/types/types.go#L200

@nithu0115
Copy link
Contributor

nithu0115 commented Oct 6, 2020

I did some further testing last night to see if it is possible set stdout value for AWS_VPC_K8S_PLUGIN_LOG_FILE env var and found that it is not possible to do it today.

As @jayanthvn mentioned above, we use skel.PluginMainWithError method which will redirect the result to stdout in json format. Apart from result which is being returned back by cmdAdd func, when PLUGIN log is set to stdout, we write some addition log information about CNI request and routes information which uses zapcore.Lock(os.Stdout) to write to os.Stdout which is messing up the json structure. This is causing kubelet to consider this as failure of pod creation and retry even though this is not an actual error.

That said, I am trying to understand the use-case here. CNI plugin is a binary on the host which is invoked by kubelet whenever we want to add/del a pod. If we are writing PLUGIN logs to stdout, we will not be able to retrieve this logs if there are any failure as the process will write these logs to /proc/self/fd/1 which cannot be read.

IMO, we should make a change to documentation stating that we do not support stdout logging for AWS_VPC_K8S_PLUGIN_LOG_FILE env var.

WDYT, @jayanthvn @SaranBalaji90

@ghost
Copy link

ghost commented Oct 9, 2020

This bug hit me and has been confusing me all week. I didn't believe in the Quantum Observer Effect until today!

I do think that having the debug logs to to stdout would be a massively useful feature though if it could be made to work.

@jayanthvn
Copy link
Contributor

@wtistio Thanks, will explore the options to see if it is doable for CNI plugin.

@jayanthvn
Copy link
Contributor

#1251 is merged and planning to release part of 1.7.6 release. Closing this issue for now.

@jayanthvn jayanthvn changed the title Log to standard output result in garbage character in add funcation Log to standard output result in garbage character in add function Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants