-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cAdvisor e2e failing 100% on core OS #1344
Comments
@pwittrock do you know when you restarted the jenkins CI VMs? At first glance this doesn't look like the PR it started failing at would have caused it. |
/cc @dchen1107 |
Failing since kubekins build #4096 (sorry, Google internal only) Actually, ignore my comment about #1333 - many builds succeeded on that revision prior to the failures starting. I'm guessing the core-os VM auto-updated and is no longer setting the CPU mask. I'll ping the pr-builder jenkins to see if it's having the same issue. |
@timstclair Were able to root cause this issue? |
No, I haven't tracked it down yet. It looks like this has started affecting the build jobs as well (it didn't yesterday). I think this lends credence to it being caused by a core OS update. |
Let me see if I can reproduce on a new GCE coreos instance. |
We can disable CoreOS node from the e2e suite until this issue is resolved. On Wed, Jun 22, 2016 at 10:42 AM, Tim St. Clair notifications@github.com
|
I'll disable it for the builder. |
Do you have more information on how these images are setup and so on (sorry, not familiar with the cadvisor test setup)? For reference, node_e2e disable updates now with this. If the cadvisor tests fail on newer CoreOSs post-update, that sounds like something to triage and fix.. |
Ok, I was able to reproduce this on a fresh coreOS beta image (
@euank can you take it from here? |
I'll take a guess that this is related to systemd >= 226's change in cgroup hierarchy, but not sure yet. I was able to reproduce on a machine with systemd 226 and docker 1.11 launched with The machine in question is gentoo, but I expect it'll reproduce broadly in that configuration. I'll dig further... |
It finds a cpuset root at both |
Upstream bug to point to as well: opencontainers/runc#931 Our options are, I think:
My preference is 3 to put off having to get a better solution, and hope that 1 happens in the meanwhile. Sound reasonable? |
(3) sounds reasonable to me, and I think we should do it anyway (filed #1361). Once the jenkins jobs are updated to use the cAdvisor jenkins script (kubernetes/test-infra#248) I'll add a coreos-stable VM. |
We should be able to update the CoreOS node if no one has already now that we've switched coreos to use cgroupfs by default. I don't have access to the images referenced by |
I'm not sure what (if anything) needs to be changed from the unmodified coreos image, so it might just work. If you're up for trying it and figuring out what (if anything) needs to be added, I'd certainly welcome the help :) You can see the command used to run the tests here. |
@euank I am assigning this one to you for delegating. We need better support for coreos as one of basic images for us. Re-assign it back to us or ask for help if you need. Thanks! |
I don't expect we'll need to do more than is done for the node_e2e stuff (user-data of https://github.com/euank/kubernetes/blob/5a5ba51b24c9e62aa775de1f568d365c2761aeb5/test/e2e_node/jenkins/coreos-init.json basically). I'm on vacation for the next couple weeks, so I won't be able to verify that's true, and regardless someone with access to the jenkins account where these test instances run will need to start one up, unless we switch to a node_e2e type model where instances are launched and specified as part of code in this repository, not totally out of band. (#1361 could fix that perhaps). cc @yifan-gu @crawford to help with or delegate further on this one, thanks! |
Minor nit, the correct extension for that file should be |
subscribe |
Error:
failing line
This has been failing 100% on e2e-cadvisor-coreos-beta since #1333 was merged.
@Random-Liu @pwittrock @euank
NOTE: cadvisor-pull-build-test-e2e coreos-beta VM disabled -- re enable once this issue is resolved.
The text was updated successfully, but these errors were encountered: