-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix pull fails on unexpected EOF #31
Fix pull fails on unexpected EOF #31
Conversation
The rollback mechanism is implemented by calling deleteDevice() and RemoveDevice(). But RemoveDevice() is internally calling deleteDevice() as well. Since a device will be deleted by first deleteDevice(), RemoveDevice() always will see ENODATA. The specific error must be ignored to remove the device's metadata correctly. Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
[release/1.4 backport] snapshots/devmapper: fix rollback
Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit 73b1449) Signed-off-by: Wei Fu <fuweid89@gmail.com>
[release/1.4] runtime: ignore ErrNotExist when remove rootfs
…SYSLOG This call is what is used to implement `dmesg` to get kernel messages about the host. This can leak substantial information about the host. It is normally available to unprivileged users on the host, unless the sysctl `kernel.dmesg_restrict = 1` is set, but this is not set by standard on the majority of distributions. Blocking this to restrict leaks about the configuration seems correct. Relates to moby/moby#37897 "docker exposes dmesg to containers by default" See also https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 267a0cf) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 7e7545e) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Enabled adjtimex in the default profile without requiring CAP_SYS_TIME privilege. The kernel will check CAP_SYS_TIME and won't allow setting the time. Fixes: Getting the system time with ntptime returns an error in an unprivileged container To verify, inside a CentOS 7 container: yum install -y ntp ntptime # ntp_gettime() returns code 0 (OK) ntpdate -v time.nist.gov # ntpdate[84]: Can't adjust the time of day: Operation not permitted Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 1746a19) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add the membarrier syscall to the default seccomp profile. It is for example used in the implementation of dlopen() in the musl libc of Alpine images. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit fc9e5d1) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
From personality(2): Have uname(2) report a 2.6.40+ version number rather than a 3.x version number. Added as a stopgap measure to support broken applications that could not handle the kernel version-numbering switch from 2.6.x to 3.x. This allows both "UNAME26|PER_LINUX" and "UNAME26|PER_LINUX32". Fixes: "setarch broken in docker packages from Debian stretch" Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 117d678) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
On a ppc64le host, running postgres (tried with 9.4 to 9.6) gives the following warning when trying to flush data to disks (which happens very frequently): WARNING: could not flush dirty data: Operation not permitted. A quick dig in postgres source code indicate it uses sync_file_range(2) to flush data; which on ppe64le and arm64 is translated to sync_file_range2(2) for alignements reasons. The profile did not allow sync_file_range2(2), making postgres sad because it can not flush its buffers. arm_sync_file_range(2) is an ancient alias to sync_file_range2(2), the syscall was renamed in Linux 2.6.22 when the same syscall was added for PowerPC. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 5862285) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This allows the quotactl syscall in the default seccomp profile, gated by CAP_SYS_ADMIN. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 5cdb6e8) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 0a5ee7e) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Adds the io-uring related system call introduced in kernel 5.1 to the seccomp whitelist. With older kernels or older versions of libseccomp, this configure will be omitted. Note that io_uring will grow support for more syscalls in the future so we should keep an eye on this. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 325bac7) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
…p_updates [release/1.4] backport seccomp profile updates
When containerd is restarted, only v1 tasks are monitored again. This leads to the lack of existing v2 task metrics. Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com> (cherry picked from commit 4422ae3) Signed-off-by: Wei Fu <fuweid89@gmail.com>
[release/1.4] tasks: Monitor v2 tasks in initFunc as well
Previously the signal loop can end up racing with the process exiting. Intead of logging and continuing the loop, exit early. Signed-off-by: Brian Goff <cpuguy83@gmail.com> (cherry picked from commit 6650510) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Starting with go1.14, the go runtime hijacks SIGURG but with no way to not send to other signal handlers. In practice, we get this signal frequently. I found this while testing out go1.15 with ctr and multiple execs with only `echo hello`. When the process exits quickly, if the previous commit is not applied, you end up with an error message that it couldn't forward SIGURG to the container (due to the process being gone). Signed-off-by: Brian Goff <cpuguy83@gmail.com> (cherry picked from commit 899b4e3) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
…d_signal_not_found [release/1.4 backport] Fix some signal forwarder issues
This `Info` log shows up for all exec processes that use the v1 shim with Docker because Docker deletes the process once it receives the exit event from containerd. Signed-off-by: Brian Goff <cpuguy83@gmail.com> (cherry picked from commit 5f9d15e) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
related to https://patchwork.kernel.org/patch/11167585/ Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> (cherry picked from commit e28e55f) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
related to https://patchwork.kernel.org/patch/11545287/ Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> (cherry picked from commit 6a915a1) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
bump cni dependencies so we can benefits from its bugfixes and improvements Signed-off-by: Antonio Ojea <antonio.ojea.garcia@gmail.com> (cherry picked from commit e3d27f9) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
* `-linkmode external` is required since Go 1.15 for static builds: golang/go#40711 * Clarify the meaning of "loading plugins" Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp> (cherry picked from commit 43cbdf8) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This prevents packages with no Go included files due to build constraints being included in the package list. These packages cause the test command to fail with "can't load package build constraints exclude all Go files". Signed-off-by: Derek McGowan <derek@mcg.dev> (cherry picked from commit 3275a21) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Windows container stats were reporting incorrect metrics for cpu kernel runtime. Signed-off-by: Prashant Bhutani <prbhutan@microsoft.com> (cherry picked from commit 35b63c0) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
…xec_p_debug [release/1.4 backport] shimv1: downgrade poroccess missing log to debug
…enat2_syscall [release/1.4] seccomp: add `openat2` and `faccessat2` syscall
…atic_plugin [release/1.4 backport] BUILDING.md: fix description about static builds
…le_test_tags [release/1.4 backport] Update go list to respect build tags
Update fork/release/1.4 to upstream v1.4.3
Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com> (cherry picked from commit 7d926d28178fc8b7158dd4363080aff9975d495e) Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com>
Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com> (cherry picked from commit 4c4f3d1811657e41c39fe27396503ceea45a73b6) Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com>
[fork/release/1.4] Backport signals parsing for LCOW
This changes the stack dumping code so that we split the giant string containing every goroutine's stack and log each stack as a separate log message. This makes the output easier to parse, and especially helps in cases where the log output is sent through a system with a log message size limit, such as Windows's ETW. Signed-off-by: Kevin Parsons <kevpar@microsoft.com> (cherry picked from commit dbb117b) Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
[fork/release/1.4] Dump each goroutine stack as a separate log message
…n the final snapshot For LCOW currently we copy (or create) the scratch.vhdx for every single snapshot so there ends up being a sandbox.vhdx in every directory seemingly unnecessarily. With the default scratch size of 20GB the size on disk is about 17MB so there's a 17MB overhead per layer plus the time to copy the file with every snapshot. Only the final sandbox.vhdx is actually used so this would be a nice little optimization. For WCOW we essentially do the exact same except copy the blank vhdx from the base layer. Signed-off-by: Daniel Canter <dcanter@microsoft.com> (cherry picked from commit a91c298) Signed-off-by: Daniel Canter <dcanter@microsoft.com>
Currently we would create a new disk and mount this into the LCOW UVM for every container but there are certain scenarios where we'd rather just mount a single disk and then have every container share this one storage space instead of every container having it's own xGB of space to play around with. This is accomplished by just making a symlink to the disk that we'd like to share and then using ref counting later on down the stack in hcsshim if we see that we've already mounted this disk. Signed-off-by: Daniel Canter <dcanter@microsoft.com> (cherry picked from commit 3e5acb9) Signed-off-by: Daniel Canter <dcanter@microsoft.com>
Cherry pick LCOW snapshot work
…inID Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com> (cherry picked from commit 14df541)
[fork/release/1.4] Backport ctr image pull support to print image chain ID
We already have code in hcsshim that logs the contents of the panic.log file during shim delete command. However, that doesn't handle all of the cases. This change should fix that. Signed-off-by: Amit Barve <ambarve@microsoft.com>
Log shim panic logs before shim cleanup
During snapshotter cleanup lcow/wcow snapshotters try to rename the snapshot directory before cleaning it up. This rename operation sometimes fails if the sandbox.vhdx file inside that directory is still open in some process (ideally it should not be open but can stay open in case of a crash or exception) or if the directory is busy for some other reason. However, if this rename operation fails we return that error and then the entire snapshot garbage collection operation fails. Due to this we end up not cleaning other snapshots which otherwise could have been cleaned up without any errors. If this goes on for a long time we will end up filling the entire disk with stale snapshots. This change fixes that issue by continuing the snapshot cleanup if the rename operation fails because of such open handles. Signed-off-by: Amit Barve <ambarve@microsoft.com>
Snapshotter cleanup fix
Currently on windows we preserve only 2 containerd panic.log files at a time, this means that in case of multiple containerd restarts we may loses old panic.log information. This PR attempts to preserve that info, by logging it to debug via logrus. Logging individual goroutines can get too noisy, instead log the file content in 16KB chunks. There seems to be a limit on the message size and the maximum power of 2 that worked was 16KB. This also enables for quicker containerd restart RCAs, since the time of restart and the dumps become available to the event/log collectors. Signed-off-by: Maksim An <maksiman@microsoft.com>
what's our approach in backporting fixes from upstream? do we backport from upstream into our |
I am unsure the backport approach, but lgtm regardless |
I was also a little curious if we'd ever backported a change from main into fork/release/1.4 but seems like we did here #14. But Maksim I think the approach you laid out rings a bell |
This is a shoddy example as the change made isn't actually in upstream but we first got this into master and then brought it back to 1.4 Truth be told the happy path is if this change got into the upstream 1.4 branch and then we can just merge in all of the changes to 1.4.x at once instead of cherry-picking, but doesn't seem like that change will be getting backported. There is an example of us pulling in a change that was only added to main and not backported to 1.4 here though #5 |
Thanks, @dcantah . I agree that ideally the fix would make it into upstream 1.4 branch and we just merge it into our fork, but oh well. We probably should merge upstream 1.4 into our fork at some point. I'll draft this for now, until we get people back from vacation and can discuss this. |
Currently, containerd doesn't restart pull when it encounters unexpected EOF of blob strem withtout error codes. There are cases where this lead to pull failure. This commit tries to fix this issue. Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com> (cherry picked from commit 7bc5aa7)
Currently, containerd doesn't restart pull when it encounters unexpected EOF of
blob strem withtout error codes.
There are cases where this lead to pull failure.
This commit tries to fix this issue.
Signed-off-by: Kohei Tokunaga ktokunaga.mail@gmail.com
(cherry picked from commit 7bc5aa7)