Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vfs: pod start: Error: crun: make /path private: OCI permission denied #20332

Open
edsantiago opened this issue Oct 11, 2023 · 14 comments
Open
Labels
flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. remote Problem is in podman-remote

Comments

@edsantiago
Copy link
Member

Another VFS flake. Only one so far, in remote root:

[+0840s] not ok 310 rootful pod with custom ID mapping
...
<+038ms> # # podman-remote pod create --uidmap 0:200000:5000 --name=foo
<+524ms> # SHA1
         #
<+011ms> # # podman-remote pod start foo
<+192ms> # Error: starting container SHA2: crun: make `/var/lib/containers/storage/vfs/dir/SHA3` private: Permission denied: OCI permission denied
@edsantiago edsantiago added flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. labels Oct 11, 2023
@edsantiago
Copy link
Member Author

Grumble. Exactly the same context (remote root, vfs, same subtest, same step in the subtest)... but a different message:

<+010ms> # # podman-remote  pod start foo
<+172ms> # Error: starting container SHA2: crun: open `/var/lib/containers/storage/vfs/dir/SHA3`: Permission denied: OCI permission denied

(That is: original report was "make vfs-dir private: EACCES", this one is "open vfs-dir: EACCES"). Even though the error is different, I'm betting it's the same root cause, so I'm assigning to the same issue.

@rhatdan
Copy link
Member

rhatdan commented Oct 12, 2023

SELinux?

@edsantiago edsantiago added the remote Problem is in podman-remote label Oct 23, 2023
@edsantiago
Copy link
Member Author

It does not look like SELinux. This is today's failure:

[+0780s] not ok 312 rootful pod with custom ID mapping
...
<+010ms> # # podman-remote --url unix:///var/tmp/podman_tmp_qzin pod start nUItqpMZ7eZvH5YJQdt0lUjOW4Utue
<+152ms> # Error: starting container c1c96f0fdc40f6e8159dd0488e6ef57a10815dbe43faf228807159d21f143dc1: crun: make `/var/lib/containers/storage/vfs/dir/fc7be3ff398fccca8c96dd672e98050926e2410dc4db23ff4fe317744bbed182` private: Permission denied: OCI permission denied

...and here is its corresponding journal:

Oct 23 07:40:02 cirrus-task-4763541958295552 podman[120556]: 2023-10-23 07:40:02.170392299 -0500 CDT m=+0.219908570 container remove 698d712a752db46efb7f1a005477c9a36c414e6a6f2c853f29b6b521ed4777e3 (image=quay.io/libpod/testimage:20221018, name=optimistic_saha, created_at=2022-10-18T16:28:08Z, created_by=test/system/build-testimage, io.buildah.version=1.28.0)
Oct 23 07:40:02 cirrus-task-4763541958295552 systemd[1]: libpod-conmon-698d712a752db46efb7f1a005477c9a36c414e6a6f2c853f29b6b521ed4777e3.scope: Deactivated successfully.
Oct 23 07:40:02 cirrus-task-4763541958295552 podman[3875]: 2023-10-23 07:40:02.338266058 -0500 CDT m=+776.337317026 volume remove JR9FywnqMl
Oct 23 07:40:02 cirrus-task-4763541958295552 systemd[1]: run-netns-netns\x2d4430bbc1\x2d265b\x2d6a9c\x2d278b\x2dbb660517fc90.mount: Deactivated successfully.
Oct 23 07:40:02 cirrus-task-4763541958295552 systemd[1]: var-lib-containers-storage-vfs\x2dcontainers-698d712a752db46efb7f1a005477c9a36c414e6a6f2c853f29b6b521ed4777e3-userdata-shm.mount: Deactivated successfully.
Oct 23 07:40:02 cirrus-task-4763541958295552 systemd[1]: libpod-conmon-830e9d8518e7cfe10b5cc9e2055d1ad161b44788f583f87fe15c83ca8a253c8a.scope: Deactivated successfully.
Oct 23 07:40:03 cirrus-task-4763541958295552 systemd[1]: libpod-conmon-6e44a9e3f36c7289e635d5ce82b9fb76295a62cb307d82a823dc792a717c71df.scope: Deactivated successfully.
Oct 23 07:40:04 cirrus-task-4763541958295552 podman[3875]: 2023-10-23 07:40:04.576254342 -0500 CDT m=+778.575305308 image build  ff9295d5cd263d2747e0b3c65a0e160eef97b660f4b2048dd7a77dc0590327ca
Oct 23 07:40:04 cirrus-task-4763541958295552 systemd[1]: Created slice machine-libpod_pod_365f555b3e967bd1d8f76065a6ae5eb8906e30cbe1052f312af7672b47c72bc9.slice - cgroup machine-libpod_pod_365f555b3e967bd1d8f76065a6ae5eb8906e30cbe1052f312af7672b47c72bc9.slice.
Oct 23 07:40:04 cirrus-task-4763541958295552 podman[3875]: 2023-10-23 07:40:04.661255357 -0500 CDT m=+778.660306308 container create c1c96f0fdc40f6e8159dd0488e6ef57a10815dbe43faf228807159d21f143dc1 (image=localhost/podman-pause:4.8.0-dev-1698062895, name=365f555b3e96-infra, pod_id=365f555b3e967bd1d8f76065a6ae5eb8906e30cbe1052f312af7672b47c72bc9, io.buildah.version=1.33.0-dev)
Oct 23 07:40:04 cirrus-task-4763541958295552 podman[3875]: 2023-10-23 07:40:04.669619768 -0500 CDT m=+778.668670727 pod create 365f555b3e967bd1d8f76065a6ae5eb8906e30cbe1052f312af7672b47c72bc9 (image=, name=nUItqpMZ7eZvH5YJQdt0lUjOW4Utue)
Oct 23 07:40:04 cirrus-task-4763541958295552 systemd[1]: Started libpod-conmon-c1c96f0fdc40f6e8159dd0488e6ef57a10815dbe43faf228807159d21f143dc1.scope.
Oct 23 07:40:04 cirrus-task-4763541958295552 systemd[1]: tmp-crun.akFohE.mount: Deactivated successfully.
Oct 23 07:40:04 cirrus-task-4763541958295552 systemd[1]: Started libpod-c1c96f0fdc40f6e8159dd0488e6ef57a10815dbe43faf228807159d21f143dc1.scope - libcrun container.
Oct 23 07:40:04 cirrus-task-4763541958295552 systemd[1]: libpod-conmon-42cd395bdc1fbc72e2a49f10626f03f93494988d25f9e94fe290802c70c363df.scope: Deactivated successfully.
Oct 23 07:40:04 cirrus-task-4763541958295552 systemd[1]: libpod-c1c96f0fdc40f6e8159dd0488e6ef57a10815dbe43faf228807159d21f143dc1.scope: Deactivated successfully.
Oct 23 07:40:04 cirrus-task-4763541958295552 systemd[1]: Stopped libpod-c1c96f0fdc40f6e8159dd0488e6ef57a10815dbe43faf228807159d21f143dc1.scope - libcrun container.
Oct 23 07:40:04 cirrus-task-4763541958295552 conmon[121136]: conmon c1c96f0fdc40f6e8159d <nwarn>: runtime stderr: make `/var/lib/containers/storage/vfs/dir/fc7be3ff398fccca8c96dd672e98050926e2410dc4db23ff4fe317744bbed182` private: Permission denied
Oct 23 07:40:04 cirrus-task-4763541958295552 conmon[121136]: conmon c1c96f0fdc40f6e8159d <error>: Failed to create container: exit status 1
Oct 23 07:40:04 cirrus-task-4763541958295552 systemd[1]: tmp-crun.QmSnzU.mount: Deactivated successfully.
Oct 23 07:40:04 cirrus-task-4763541958295552 systemd[1]: libpod-conmon-c1c96f0fdc40f6e8159dd0488e6ef57a10815dbe43faf228807159d21f143dc1.scope: Deactivated successfully.
Oct 23 07:40:04 cirrus-task-4763541958295552 podman[3875]: 2023-10-23 07:40:04.942095917 -0500 CDT m=+778.941146869 container remove c1c96f0fdc40f6e8159dd0488e6ef57a10815dbe43faf228807159d21f143dc1 (image=localhost/podman-pause:4.8.0-dev-1698062895, name=365f555b3e96-infra, pod_id=365f555b3e967bd1d8f76065a6ae5eb8906e30cbe1052f312af7672b47c72bc9, io.buildah.version=1.33.0-dev)

That's a broad selection, from before and after fc7be3, but the string "AVC" does not appear there.

The story so far:

  • fedora-38 : sys remote fedora-38 root host boltdb [remote]
    • 10-11 18:21 in [sys] rootful pod with custom ID mapping
  • fedora-38 : sys remote fedora-38 root host sqlite [remote]
    • 10-23 08:48 in [sys] rootful pod with custom ID mapping
    • 10-10 13:04 in [sys] rootful pod with custom ID mapping

Seen in: sys remote fedora-38 root host boltdb+sqlite

I've added the remote tag for now, since all three are remote.

@rhatdan
Copy link
Member

rhatdan commented Oct 23, 2023

@giuseppe Any thoughts on why this could happen?

@giuseppe
Copy link
Member

one reason could be that 8ac2aa7 is not affecting vfs since we do not create a new mount point on the host, differently as we do with other drivers.

@rhatdan
Copy link
Member

rhatdan commented Oct 26, 2023

Should we just make this a no-op if driver=vfs? Or does VFS always return not mounted?

@giuseppe
Copy link
Member

the patch above doesn't cause this issue just that it doesn't solve the problem with vfs since there is never a mount. Since the flake is happening always with podman remote and pods, it seems to be like the same scenario that 8ac2aa7 fixed for drivers that create a mount. I need to look more into it, I am not sure at the moment how we could do the same thing with vfs, we kill the podman cleanup process randomly, and the mount counter gets out of sync.

Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Nov 27, 2023

@giuseppe any update?

@edsantiago
Copy link
Member Author

Another one. Here's the list so far. It consistently seems to be root, vfs, remote.

x x x x x x
sys(5) remote(5) fedora-38(5) root(5) host(5) boltdb(3)
sqlite(2)

@edsantiago
Copy link
Member Author

New variation:

<+011ms> # # podman-remote --url unix:///var/tmp/podman_tmp_PjnZ pod start xnC2MABdwwVeBQjb0LhcBmVRWKq7Rb
<+191ms> # Error: starting container xxx: crun: open `/var/lib/containers/storage/vfs/dir/yyy`: Permission denied: OCI permission denied

(that is: the string "make /path private" does not appear in the error message).

x x x x x x
sys(7) remote(7) fedora-38(7) root(7) host(7) boltdb(5)
sqlite(2)

@edsantiago
Copy link
Member Author

Pretty please?

x x x x x x
sys(10) remote(10) fedora-38(10) root(10) host(10) boltdb(8)
sqlite(2)

@edsantiago
Copy link
Member Author

Still happening with new VMs: see the f39 failure today. This suggests that the failure is related to the prior-fedora CI job, which points toward vfs or boltdb. Make of that what you will; it could also be pure coincidence that we haven't seen this elsewhere.

x x x x x x
sys(9) remote(9) fedora-38(8) root(9) host(9) boltdb(9)
fedora-39(1)

@giuseppe
Copy link
Member

the root cause is the fact we have no mount for vfs so 8ac2aa7 is not working for vfs. We could use a bind mount but that kind of defeat the point of using vfs, which doesn't require one (it is just a requirement for the weird interaction of the podman cleanup process and pod cgroups).

Maybe a topic for our next cabal, but how much do we care to test VFS in the CI? The VFS driver is a last resort; it shouldn't really be used for anything serious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. remote Problem is in podman-remote
Projects
None yet
Development

No branches or pull requests

3 participants