Fix pull fails on unexpected EOF #31

anmaxvl · 2021-12-27T19:21:12Z

Currently, containerd doesn't restart pull when it encounters unexpected EOF of
blob strem withtout error codes.
There are cases where this lead to pull failure.
This commit tries to fix this issue.

Signed-off-by: Kohei Tokunaga ktokunaga.mail@gmail.com
(cherry picked from commit 7bc5aa7)

The rollback mechanism is implemented by calling deleteDevice() and RemoveDevice(). But RemoveDevice() is internally calling deleteDevice() as well. Since a device will be deleted by first deleteDevice(), RemoveDevice() always will see ENODATA. The specific error must be ignored to remove the device's metadata correctly. Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>

[release/1.4 backport] snapshots/devmapper: fix rollback

Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit 73b1449) Signed-off-by: Wei Fu <fuweid89@gmail.com>

[release/1.4] runtime: ignore ErrNotExist when remove rootfs

…SYSLOG This call is what is used to implement `dmesg` to get kernel messages about the host. This can leak substantial information about the host. It is normally available to unprivileged users on the host, unless the sysctl `kernel.dmesg_restrict = 1` is set, but this is not set by standard on the majority of distributions. Blocking this to restrict leaks about the configuration seems correct. Relates to moby/moby#37897 "docker exposes dmesg to containers by default" See also https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 267a0cf) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 7e7545e) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

Enabled adjtimex in the default profile without requiring CAP_SYS_TIME privilege. The kernel will check CAP_SYS_TIME and won't allow setting the time. Fixes: Getting the system time with ntptime returns an error in an unprivileged container To verify, inside a CentOS 7 container: yum install -y ntp ntptime # ntp_gettime() returns code 0 (OK) ntpdate -v time.nist.gov # ntpdate[84]: Can't adjust the time of day: Operation not permitted Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 1746a19) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

Add the membarrier syscall to the default seccomp profile. It is for example used in the implementation of dlopen() in the musl libc of Alpine images. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit fc9e5d1) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

From personality(2): Have uname(2) report a 2.6.40+ version number rather than a 3.x version number. Added as a stopgap measure to support broken applications that could not handle the kernel version-numbering switch from 2.6.x to 3.x. This allows both "UNAME26|PER_LINUX" and "UNAME26|PER_LINUX32". Fixes: "setarch broken in docker packages from Debian stretch" Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 117d678) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

On a ppc64le host, running postgres (tried with 9.4 to 9.6) gives the following warning when trying to flush data to disks (which happens very frequently): WARNING: could not flush dirty data: Operation not permitted. A quick dig in postgres source code indicate it uses sync_file_range(2) to flush data; which on ppe64le and arm64 is translated to sync_file_range2(2) for alignements reasons. The profile did not allow sync_file_range2(2), making postgres sad because it can not flush its buffers. arm_sync_file_range(2) is an ancient alias to sync_file_range2(2), the syscall was renamed in Linux 2.6.22 when the same syscall was added for PowerPC. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 5862285) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

This allows the quotactl syscall in the default seccomp profile, gated by CAP_SYS_ADMIN. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 5cdb6e8) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 0a5ee7e) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

Adds the io-uring related system call introduced in kernel 5.1 to the seccomp whitelist. With older kernels or older versions of libseccomp, this configure will be omitted. Note that io_uring will grow support for more syscalls in the future so we should keep an eye on this. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 325bac7) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

…p_updates [release/1.4] backport seccomp profile updates

When containerd is restarted, only v1 tasks are monitored again. This leads to the lack of existing v2 task metrics. Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com> (cherry picked from commit 4422ae3) Signed-off-by: Wei Fu <fuweid89@gmail.com>

[release/1.4] tasks: Monitor v2 tasks in initFunc as well

Previously the signal loop can end up racing with the process exiting. Intead of logging and continuing the loop, exit early. Signed-off-by: Brian Goff <cpuguy83@gmail.com> (cherry picked from commit 6650510) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

Starting with go1.14, the go runtime hijacks SIGURG but with no way to not send to other signal handlers. In practice, we get this signal frequently. I found this while testing out go1.15 with ctr and multiple execs with only `echo hello`. When the process exits quickly, if the previous commit is not applied, you end up with an error message that it couldn't forward SIGURG to the container (due to the process being gone). Signed-off-by: Brian Goff <cpuguy83@gmail.com> (cherry picked from commit 899b4e3) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

…d_signal_not_found [release/1.4 backport] Fix some signal forwarder issues

This `Info` log shows up for all exec processes that use the v1 shim with Docker because Docker deletes the process once it receives the exit event from containerd. Signed-off-by: Brian Goff <cpuguy83@gmail.com> (cherry picked from commit 5f9d15e) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

related to https://patchwork.kernel.org/patch/11167585/ Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> (cherry picked from commit e28e55f) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

related to https://patchwork.kernel.org/patch/11545287/ Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> (cherry picked from commit 6a915a1) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

bump cni dependencies so we can benefits from its bugfixes and improvements Signed-off-by: Antonio Ojea <antonio.ojea.garcia@gmail.com> (cherry picked from commit e3d27f9) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

* `-linkmode external` is required since Go 1.15 for static builds: golang/go#40711 * Clarify the meaning of "loading plugins" Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp> (cherry picked from commit 43cbdf8) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

This prevents packages with no Go included files due to build constraints being included in the package list. These packages cause the test command to fail with "can't load package build constraints exclude all Go files". Signed-off-by: Derek McGowan <derek@mcg.dev> (cherry picked from commit 3275a21) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

Windows container stats were reporting incorrect metrics for cpu kernel runtime. Signed-off-by: Prashant Bhutani <prbhutan@microsoft.com> (cherry picked from commit 35b63c0) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

…xec_p_debug [release/1.4 backport] shimv1: downgrade poroccess missing log to debug

…enat2_syscall [release/1.4] seccomp: add `openat2` and `faccessat2` syscall

…atic_plugin [release/1.4 backport] BUILDING.md: fix description about static builds

…le_test_tags [release/1.4 backport] Update go list to respect build tags

Update fork/release/1.4 to upstream v1.4.3

Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com> (cherry picked from commit 7d926d28178fc8b7158dd4363080aff9975d495e) Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com>

Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com> (cherry picked from commit 4c4f3d1811657e41c39fe27396503ceea45a73b6) Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com>

[fork/release/1.4] Backport signals parsing for LCOW

This changes the stack dumping code so that we split the giant string containing every goroutine's stack and log each stack as a separate log message. This makes the output easier to parse, and especially helps in cases where the log output is sent through a system with a log message size limit, such as Windows's ETW. Signed-off-by: Kevin Parsons <kevpar@microsoft.com> (cherry picked from commit dbb117b) Signed-off-by: Kevin Parsons <kevpar@microsoft.com>

[fork/release/1.4] Dump each goroutine stack as a separate log message

…n the final snapshot For LCOW currently we copy (or create) the scratch.vhdx for every single snapshot so there ends up being a sandbox.vhdx in every directory seemingly unnecessarily. With the default scratch size of 20GB the size on disk is about 17MB so there's a 17MB overhead per layer plus the time to copy the file with every snapshot. Only the final sandbox.vhdx is actually used so this would be a nice little optimization. For WCOW we essentially do the exact same except copy the blank vhdx from the base layer. Signed-off-by: Daniel Canter <dcanter@microsoft.com> (cherry picked from commit a91c298) Signed-off-by: Daniel Canter <dcanter@microsoft.com>

Currently we would create a new disk and mount this into the LCOW UVM for every container but there are certain scenarios where we'd rather just mount a single disk and then have every container share this one storage space instead of every container having it's own xGB of space to play around with. This is accomplished by just making a symlink to the disk that we'd like to share and then using ref counting later on down the stack in hcsshim if we see that we've already mounted this disk. Signed-off-by: Daniel Canter <dcanter@microsoft.com> (cherry picked from commit 3e5acb9) Signed-off-by: Daniel Canter <dcanter@microsoft.com>

Cherry pick LCOW snapshot work

…inID Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com> (cherry picked from commit 14df541)

[fork/release/1.4] Backport ctr image pull support to print image chain ID

We already have code in hcsshim that logs the contents of the panic.log file during shim delete command. However, that doesn't handle all of the cases. This change should fix that. Signed-off-by: Amit Barve <ambarve@microsoft.com>

Log shim panic logs before shim cleanup

During snapshotter cleanup lcow/wcow snapshotters try to rename the snapshot directory before cleaning it up. This rename operation sometimes fails if the sandbox.vhdx file inside that directory is still open in some process (ideally it should not be open but can stay open in case of a crash or exception) or if the directory is busy for some other reason. However, if this rename operation fails we return that error and then the entire snapshot garbage collection operation fails. Due to this we end up not cleaning other snapshots which otherwise could have been cleaned up without any errors. If this goes on for a long time we will end up filling the entire disk with stale snapshots. This change fixes that issue by continuing the snapshot cleanup if the rename operation fails because of such open handles. Signed-off-by: Amit Barve <ambarve@microsoft.com>

Snapshotter cleanup fix

Currently on windows we preserve only 2 containerd panic.log files at a time, this means that in case of multiple containerd restarts we may loses old panic.log information. This PR attempts to preserve that info, by logging it to debug via logrus. Logging individual goroutines can get too noisy, instead log the file content in 16KB chunks. There seems to be a limit on the message size and the maximum power of 2 that worked was 16KB. This also enables for quicker containerd restart RCAs, since the time of restart and the dumps become available to the event/log collectors. Signed-off-by: Maksim An <maksiman@microsoft.com>

anmaxvl · 2021-12-27T19:49:16Z

what's our approach in backporting fixes from upstream? do we backport from upstream into our fork/master and then cherry-pick from fork/master into fork/release-* or cherry-pick from upstream directly into fork/release-*?

helsaawy · 2021-12-27T21:06:47Z

I am unsure the backport approach, but lgtm regardless

dcantah · 2021-12-28T18:03:30Z

I was also a little curious if we'd ever backported a change from main into fork/release/1.4 but seems like we did here #14. But Maksim I think the approach you laid out rings a bell

dcantah · 2021-12-28T18:08:17Z

This is a shoddy example as the change made isn't actually in upstream but we first got this into master and then brought it back to 1.4

master: #10
1.4: #11

Truth be told the happy path is if this change got into the upstream 1.4 branch and then we can just merge in all of the changes to 1.4.x at once instead of cherry-picking, but doesn't seem like that change will be getting backported. There is an example of us pulling in a change that was only added to main and not backported to 1.4 here though #5

anmaxvl · 2021-12-28T19:29:38Z

Thanks, @dcantah . I agree that ideally the fix would make it into upstream 1.4 branch and we just merge it into our fork, but oh well. We probably should merge upstream 1.4 into our fork at some point. I'll draft this for now, until we get people back from vacation and can discuss this.

Currently, containerd doesn't restart pull when it encounters unexpected EOF of blob strem withtout error codes. There are cases where this lead to pull failure. This commit tries to fix this issue. Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com> (cherry picked from commit 7bc5aa7)

kzys and others added 30 commits August 24, 2020 13:14

Merge pull request containerd#4496 from kzys/backport-1.4-4437

94c8bd9

[release/1.4 backport] snapshots/devmapper: fix rollback

runtime: ignore ErrNotExist when remove rootfs

feff914

Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit 73b1449) Signed-off-by: Wei Fu <fuweid89@gmail.com>

Merge pull request containerd#4499 from fuweid/cherry-pick-4472

881db9b

[release/1.4] runtime: ignore ErrNotExist when remove rootfs

seccomp: allow add preadv2 and pwritev2 syscalls

2b80b7d

Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 7e7545e) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

seccomp: allow quotactl with CAP_SYS_ADMIN

e5cc7d5

This allows the quotactl syscall in the default seccomp profile, gated by CAP_SYS_ADMIN. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 5cdb6e8) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

seccomp: allow clock_settime when CAP_SYS_TIME is added

3d28944

Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 0a5ee7e) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

Merge pull request containerd#4503 from thaJeztah/1.4_backport_seccom…

12f20c9

…p_updates [release/1.4] backport seccomp profile updates

Merge pull request containerd#4512 from fuweid/14-cherry-pick-4486

1a36776

[release/1.4] tasks: Monitor v2 tasks in initFunc as well

Merge pull request containerd#4542 from thaJeztah/1.4_backport_forwar…

6b5fc7f

…d_signal_not_found [release/1.4 backport] Fix some signal forwarder issues

seccomp: add openat2 syscall.

8bd2bec

related to https://patchwork.kernel.org/patch/11167585/ Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> (cherry picked from commit e28e55f) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

seccomp: add faccessat2 syscall.

1575c88

related to https://patchwork.kernel.org/patch/11545287/ Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> (cherry picked from commit 6a915a1) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

bump cni version to v0.8.0

16712ae

bump cni dependencies so we can benefits from its bugfixes and improvements Signed-off-by: Antonio Ojea <antonio.ojea.garcia@gmail.com> (cherry picked from commit e3d27f9) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

Report correct stats for windows containers

f247618

Windows container stats were reporting incorrect metrics for cpu kernel runtime. Signed-off-by: Prashant Bhutani <prbhutan@microsoft.com> (cherry picked from commit 35b63c0) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

Merge pull request containerd#4552 from thaJeztah/1.4_backport_shim_e…

98a733e

…xec_p_debug [release/1.4 backport] shimv1: downgrade poroccess missing log to debug

Merge pull request containerd#4554 from thaJeztah/1.4_backport_add_op…

3bcce81

…enat2_syscall [release/1.4] seccomp: add `openat2` and `faccessat2` syscall

Merge pull request containerd#4556 from thaJeztah/1.4_backport_fix_st…

9b5066a

…atic_plugin [release/1.4 backport] BUILDING.md: fix description about static builds

Merge pull request containerd#4557 from thaJeztah/1.4_backport_makefi…

db93194

…le_test_tags [release/1.4 backport] Update go list to respect build tags

dcantah and others added 16 commits December 1, 2020 13:45

Merge pull request #6 from dcantah/update-fork

12e49bb

Update fork/release/1.4 to upstream v1.4.3

Add the ability to parse signals based on a platform string

83e318c

Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com> (cherry picked from commit 7d926d28178fc8b7158dd4363080aff9975d495e) Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com>

Fix bug in signals tests for linux

fa5dca9

Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com> (cherry picked from commit 4c4f3d1811657e41c39fe27396503ceea45a73b6) Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com>

Merge pull request #9 from katiewasnothere/backport_signals_parsing

4fd5652

[fork/release/1.4] Backport signals parsing for LCOW

Merge pull request #11 from kevpar/backport-dump-stacks

17959a9

[fork/release/1.4] Dump each goroutine stack as a separate log message

Merge pull request #13 from dcantah/cherry-pick-snapshot

e8cf51e

Cherry pick LCOW snapshot work

Add a new flag to images pull for printing out the pulled image's cha…

33be33d

…inID Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com> (cherry picked from commit 14df541)

Merge pull request #14 from katiewasnothere/backport_ctr_image_pull

2f482c9

[fork/release/1.4] Backport ctr image pull support to print image chain ID

Log contents of panic.log during bundle cleanup

39ad7f3

We already have code in hcsshim that logs the contents of the panic.log file during shim delete command. However, that doesn't handle all of the cases. This change should fix that. Signed-off-by: Amit Barve <ambarve@microsoft.com>

Merge pull request #16 from ambarve/shim_panic_log

1f460c3

Log shim panic logs before shim cleanup

Merge pull request #18 from ambarve/sn_cleanup_fix

a828a5e

Snapshotter cleanup fix

anmaxvl requested review from ambarve, dcantah, helsaawy, katiewasnothere and kevpar December 27, 2021 19:21

anmaxvl marked this pull request as draft December 28, 2021 19:29

anmaxvl closed this Jan 3, 2022

anmaxvl deleted the cherry-pick-fix-fail-precondition branch January 3, 2022 23:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix pull fails on unexpected EOF #31

Fix pull fails on unexpected EOF #31

anmaxvl commented Dec 27, 2021

anmaxvl commented Dec 27, 2021

helsaawy commented Dec 27, 2021

dcantah commented Dec 28, 2021 •

edited

Loading

dcantah commented Dec 28, 2021 •

edited

Loading

anmaxvl commented Dec 28, 2021

Fix pull fails on unexpected EOF #31

Fix pull fails on unexpected EOF #31

Conversation

anmaxvl commented Dec 27, 2021

anmaxvl commented Dec 27, 2021

helsaawy commented Dec 27, 2021

dcantah commented Dec 28, 2021 • edited Loading

dcantah commented Dec 28, 2021 • edited Loading

anmaxvl commented Dec 28, 2021

dcantah commented Dec 28, 2021 •

edited

Loading

dcantah commented Dec 28, 2021 •

edited

Loading