Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kola tests fail in GCP environment when running in pipeline #487

Closed
dustymabe opened this issue May 19, 2020 · 5 comments · Fixed by coreos/fedora-coreos-pipeline#240
Closed

Comments

@dustymabe
Copy link
Member

I recently added a kola GCP pipeline so we can CI our GCP images we're now uploading to GCP. For some reason the images are failing in CI but passing when I run them from a local box. The only real difference I know of is that one is running in the pipeline and one is running from my desktop.

Below is the output when I run it locally (all PASS):

cosa kola run -b fcos -j 5 --no-test-exit-error --platform gce --gce-project=fedora-coreos-testing --gce-image=projects/fedora-coreos-cloud/global/images/fedora-coreos-31-20200517-2-0-gcp-x86-64 --gce-json-key=/srv/fedora-coreos-testing-gcp-kola-tests.json
kola -p qemu-unpriv --output-dir tmp/kola run -b fcos -j 5 --no-test-exit-error --platform gce --gce-project=fedora-coreos-testing --gce-image=projects/fedora-coreos-cloud/global/images/fedora-coreos-31-20200517-2-0-gcp-x86-64 --gce-json-key=/srv/fedora-coreos-testing-gcp-kola-tests.json
=== RUN   rpmostree.upgrade-rollback
=== RUN   podman.base
=== RUN   coreos.ignition.v2.users
=== RUN   fcos.filesystem
=== RUN   coreos.selinux.enforce
=== RUN   systemd.sysusers.gshadow
=== RUN   fcos.network.listeners
=== RUN   rpmostree.status
=== RUN   coreos.ignition.resource.local
=== RUN   podman.workflow
=== RUN   coreos.ignition.once
=== RUN   coreos.ignition.groups
=== RUN   fcos.ignition.v3.noop
=== RUN   ext.config
=== RUN   fcos.internet
=== RUN   coreos.ignition.resource.remote
=== RUN   fcos.ignition.misc.empty
=== RUN   coreos.ignition.instantiated.enable-service
=== RUN   coreos.auth.verify
=== RUN   coreos.ignition.ssh.key
=== RUN   ostree.hotfix
=== RUN   coreos.ignition.security.tls
=== RUN   coreos.ignition.journald-log
=== RUN   coreos.ignition.sethostname
=== RUN   coreos.selinux.boolean
=== RUN   rootfs.uuid
=== RUN   ostree.unlock
=== RUN   basic
=== RUN   coreos.tls.fetch-urls
=== RUN   podman.network-single
=== RUN   rpmostree.install-uninstall
=== RUN   ostree.remote
=== RUN   rhcos.selinux.boolean.persist
=== RUN   ostree.unlock/unlock
=== RUN   rpmostree.upgrade-rollback/upgrade
=== RUN   rpmostree.install-uninstall/install
--- PASS: coreos.ignition.journald-log (68.13s)
=== RUN   ostree.unlock/install
--- PASS: coreos.ignition.instantiated.enable-service (70.53s)
=== RUN   ostree.unlock/uninstall
=== RUN   ostree.unlock/discard
=== RUN   rpmostree.upgrade-rollback/rollback
--- PASS: ostree.unlock (108.93s)
    --- PASS: ostree.unlock/unlock (7.01s)
    --- PASS: ostree.unlock/install (1.98s)
    --- PASS: ostree.unlock/uninstall (1.70s)
    --- PASS: ostree.unlock/discard (32.88s)
=== RUN   podman.workflow/run
=== RUN   podman.workflow/exec
=== RUN   podman.workflow/stop
=== RUN   podman.workflow/remove
=== RUN   podman.workflow/delete
--- PASS: podman.workflow (79.40s)
    --- PASS: podman.workflow/run (10.73s)
            cluster.go:141: Trying to pull docker.io/library/nginx...
            cluster.go:141: Getting image source signatures
            cluster.go:141: Copying blob sha256:b90c53a0b69244e37b3f8672579fc3dec13293eeb574fa0fdddf02da1e192fd6
            cluster.go:141: Copying blob sha256:11fa52a0fdc084d7fc3bbcb774389fd37b148ee98e7829cea4af189735acf848
            cluster.go:141: Copying blob sha256:afb6ec6fdc1c3ba04f7a56db32c5ff5ff38962dc4cd0ffdef5beaa0ce2eb77e2
            cluster.go:141: Copying config sha256:9beeba249f3ee158d3e495a6ac25c5667ae2de8a43ac2a8bfd2bf687a58c06c9
            cluster.go:141: Writing manifest to image destination
            cluster.go:141: Storing signatures
    --- PASS: podman.workflow/exec (0.72s)
    --- PASS: podman.workflow/stop (1.73s)
    --- PASS: podman.workflow/remove (1.50s)
    --- PASS: podman.workflow/delete (1.47s)
            cluster.go:141: grep: docker.io/library/nginx:latest: No such file or directory
            cluster.go:141: bash: line 1: Deleted:: command not found
--- PASS: rpmostree.upgrade-rollback (151.77s)
    --- PASS: rpmostree.upgrade-rollback/upgrade (43.88s)
    --- PASS: rpmostree.upgrade-rollback/rollback (41.26s)
=== RUN   rpmostree.install-uninstall/uninstall
--- PASS: rhcos.selinux.boolean.persist (97.18s)
--- PASS: fcos.ignition.misc.empty (61.99s)
--- PASS: rpmostree.install-uninstall (213.41s)
    --- PASS: rpmostree.install-uninstall/install (91.28s)
    --- PASS: rpmostree.install-uninstall/uninstall (53.36s)
--- PASS: fcos.ignition.v3.noop (65.18s)
=== RUN   fcos.internet/PodmanEcho
--- PASS: ext.config (85.53s)
=== RUN   fcos.internet/PodmanWgetHead
--- PASS: coreos.ignition.resource.remote (140.18s)
--- PASS: fcos.internet (139.37s)
    --- PASS: fcos.internet/PodmanEcho (36.68s)
    --- PASS: fcos.internet/PodmanWgetHead (1.44s)
=== RUN   ostree.hotfix/unlock
--- PASS: coreos.ignition.groups (99.21s)
=== RUN   ostree.hotfix/install
=== RUN   ostree.hotfix/uninstall
=== RUN   ostree.hotfix/persist
--- PASS: coreos.ignition.once (100.96s)
=== RUN   ostree.remote/add
=== RUN   ostree.hotfix/rollback
=== RUN   ostree.remote/list
=== RUN   ostree.remote/show-url
=== RUN   ostree.remote/refs
=== RUN   ostree.remote/summary
=== RUN   fcos.filesystem/suid
=== RUN   fcos.filesystem/sgid
=== RUN   fcos.filesystem/writablefiles
=== RUN   ostree.remote/delete
=== RUN   fcos.filesystem/writabledirs
=== RUN   fcos.filesystem/stickydirs
=== RUN   fcos.filesystem/blacklist
--- PASS: ostree.remote (98.99s)
    --- PASS: ostree.remote/add (0.90s)
    --- PASS: ostree.remote/list (0.57s)
    --- PASS: ostree.remote/show-url (0.57s)
    --- PASS: ostree.remote/refs (17.04s)
    --- PASS: ostree.remote/summary (10.45s)
    --- PASS: ostree.remote/delete (1.89s)
--- PASS: fcos.filesystem (79.18s)
    --- PASS: fcos.filesystem/suid (1.52s)
    --- PASS: fcos.filesystem/sgid (0.68s)
    --- PASS: fcos.filesystem/writablefiles (0.66s)
    --- PASS: fcos.filesystem/writabledirs (0.64s)
    --- PASS: fcos.filesystem/stickydirs (1.70s)
    --- PASS: fcos.filesystem/blacklist (0.69s)
--- PASS: ostree.hotfix (146.15s)
    --- PASS: ostree.hotfix/unlock (11.17s)
    --- PASS: ostree.hotfix/install (1.93s)
    --- PASS: ostree.hotfix/uninstall (1.65s)
    --- PASS: ostree.hotfix/persist (31.74s)
    --- PASS: ostree.hotfix/rollback (37.40s)
--- PASS: coreos.selinux.enforce (116.69s)
--- PASS: coreos.ignition.security.tls (165.57s)
--- PASS: rpmostree.status (66.09s)
--- PASS: coreos.selinux.boolean (93.36s)
--- PASS: coreos.tls.fetch-urls (65.20s)
--- PASS: coreos.ignition.resource.local (129.75s)
=== RUN   rootfs.uuid/RandomUUID
--- PASS: fcos.network.listeners (67.74s)
--- PASS: rootfs.uuid (71.35s)
    --- PASS: rootfs.uuid/RandomUUID (0.59s)
=== RUN   podman.base/info
=== RUN   podman.base/resources
--- PASS: coreos.ignition.ssh.key (61.91s)
=== RUN   basic/NetworkScripts
=== RUN   basic/ServicesActive
=== RUN   basic/ReadOnly
=== RUN   basic/Useradd
=== RUN   basic/MachineID
--- PASS: coreos.ignition.v2.users (69.16s)
=== RUN   basic/PortSSH
=== RUN   basic/DbusPerms
--- PASS: basic (73.61s)
    --- PASS: basic/NetworkScripts (0.63s)
    --- PASS: basic/ServicesActive (0.63s)
    --- PASS: basic/ReadOnly (0.56s)
    --- PASS: basic/Useradd (0.93s)
    --- PASS: basic/MachineID (0.56s)
    --- PASS: basic/PortSSH (0.55s)
    --- PASS: basic/DbusPerms (0.68s)
--- PASS: podman.base (82.23s)
    --- PASS: podman.base/info (1.25s)
    --- PASS: podman.base/resources (15.99s)
            cluster.go:141: Getting image source signatures
            cluster.go:141: Copying blob sha256:fcd519b8abb692b7712345d2a42bf9aed6b5779fddac1700e548bf7f746ee5c5
            cluster.go:141: Copying config sha256:76f82e67b27926196d15b440e15620fcef2d09e99b687fb9deae590679698025
            cluster.go:141: Writing manifest to image destination
            cluster.go:141: Storing signatures
            cluster.go:141: Your kernel does not support Block I/O weight or the cgroup is not mounted. Weight discarded.
--- PASS: coreos.ignition.sethostname (70.80s)
--- PASS: coreos.auth.verify (67.70s)
--- PASS: systemd.sysusers.gshadow (64.37s)
--- PASS: podman.network-single (251.19s)
        cluster.go:141: Getting image source signatures
        cluster.go:141: Copying blob sha256:b7e611706509c3c698adf1d675ef828250d8271e993f69a4ac65affa34cb264b
        cluster.go:141: Copying config sha256:ea5f7657f6d9379ac06a6aaab087854cc1b85c2a111849ab60353e14516e7770
        cluster.go:141: Writing manifest to image destination
        cluster.go:141: Storing signatures
PASS, output in tmp/kola

I wonder if the Error saving console messages are what is marking it as failed?

2020-05-19T18:27:34Z platform/machine/gcloud: Error saving console for instance kola-beada810ca850184817b: open tmp/kola/systemd.sysusers.gshadow/kola-beada810ca850184817b/console.txt: no such file or directory

2020-05-19T18:27:34Z platform/machine/gcloud: Error saving console for instance kola-820720f73e597422baf9: open tmp/kola/rpmostree.install-uninstall/kola-820720f73e597422baf9/console.txt: no such file or directory

2020-05-19T18:27:34Z platform/machine/gcloud: Error saving console for instance kola-13537cbed711d31e2b74: open tmp/kola/fcos.internet/kola-13537cbed711d31e2b74/console.txt: no such file or directory

2020-05-19T18:27:34Z platform/machine/gcloud: Error saving console for instance kola-f5c6f11e0a7d72fb95fa: open tmp/kola/coreos.ignition.sethostname/kola-f5c6f11e0a7d72fb95fa/console.txt: no such file or directory
@dustymabe
Copy link
Member Author

OK after a few changes I made (coreos/fedora-coreos-pipeline#238, coreos/fedora-coreos-pipeline#239) I've had a few successful runs of the non-upgrade tests. Now the upgrade tests are failing:

now that I got upgrade tests working I'm seeing this failure in the upgrade test:

--- FAIL: fcos.upgrade.basic (229.84s)
    --- PASS: fcos.upgrade.basic/setup (23.26s)
    --- FAIL: fcos.upgrade.basic/upgrade-from-previous (125.72s)
            basic.go:294: failed waiting for machine reboot: timed out after 2m0s waiting for machine to reboot
    --- SKIP: fcos.upgrade.basic/upgrade-from-current (0.00s)
            cluster.go:51: A previous test has already failed

Do we maybe need to bump the timeout?

@dustymabe
Copy link
Member Author

OK after a few changes I made (coreos/fedora-coreos-pipeline#238, coreos/fedora-coreos-pipeline#239) I've had a few successful runs of the non-upgrade tests.

I think this was as simple as changing it so that the --platform option had an = sign. It was some combination of adding the = in coreos/fedora-coreos-pipeline#238 and the work I did in coreos/coreos-assembler#1461. Either way it's working now.

Only remaining issue is to debug the upgrade test failure because of the timeout.

@lucab
Copy link
Contributor

lucab commented May 21, 2020

I don't think it's a timeout issue, but something else wrong with the test itself (possibly a version/checksum mismatch in the synthetic update graph).

Zincati is scraping the update server on localhost, it picks up a graph with two nodes and one edge, but it looks like the graph does not encode a valid update from the point of view of this machine:

May 21 03:09:12.753693 zincati[1488]: [TRACE] request to list local deployments
[...]
May 21 03:09:12.782074 zincati[1488]: [TRACE] found 1 local deployments
May 21 03:09:12.782240 zincati[1488]: [TRACE] checking upstream Cincinnati server for updates
May 21 03:09:12.804020 zincati[1488]: [TRACE] got an update graph with 2 nodes and 1 edges
May 21 03:09:12.804200 zincati[1488]: [TRACE] update agent tick, current state: NoNewUpdate

@jlebon
Copy link
Member

jlebon commented May 21, 2020

Sorry, I missed this ticket previously in my notifications.

kola run-upgrade -p gce --find-parent-image won't work until we add GCP support for --find-parent-image.

(There should've been an obvious error message to help with this, but the error-handling was buggy.)

Working on that now.

jlebon added a commit to jlebon/fedora-coreos-pipeline that referenced this issue May 21, 2020
Upgrade tests use the image of the previous release, not the candidate
release we're testing itself. The `--find-parent-image` swithc will
automatically find the starting image to use.

Closes: coreos/fedora-coreos-tracker#487
jlebon added a commit to jlebon/fedora-coreos-pipeline that referenced this issue May 21, 2020
Upgrade tests use the image of the previous release, not the candidate
release we're testing itself. The `--find-parent-image` switch will
automatically find the starting image to use.

Closes: coreos/fedora-coreos-tracker#487
@jlebon
Copy link
Member

jlebon commented May 21, 2020

dustymabe pushed a commit to coreos/fedora-coreos-pipeline that referenced this issue May 22, 2020
Upgrade tests use the image of the previous release, not the candidate
release we're testing itself. The `--find-parent-image` switch will
automatically find the starting image to use.

Closes: coreos/fedora-coreos-tracker#487
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants