Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toolbox instructions do not work as expected #1072

Closed
tormath1 opened this issue Jun 12, 2023 · 6 comments
Closed

toolbox instructions do not work as expected #1072

tormath1 opened this issue Jun 12, 2023 · 6 comments
Labels
channel/stable Issue concerns the Stable channel. kind/bug Something isn't working

Comments

@tormath1
Copy link
Contributor

tormath1 commented Jun 12, 2023

Description

I noticed this on latest Flatcar Stable (3510.2.2) on QEMU setup:

Impact

Can't follow the documentation to run the toolbox: https://www.flatcar.org/docs/latest/setup/debug/install-debugging-tools/#spawn-a-toolbox-with-tmux-in-the-background or https://www.flatcar.org/docs/latest/setup/debug/install-debugging-tools/#quick-debugging

Environment and steps to reproduce

systemd-run transient unit is not working as expected

  1. Boot Flatcar
$ systemd-run --user toolbox sh -c 'dnf install -y tmux strace procps-ng; tmux new-session -d -s sharedsession; strace -p "$(pidof tmux)"
Running as unit: run-rd9efc558fff944d68670fe1c2f0d641e.service
$ systemctl --no-pager --user status run-rd9efc558fff944d68670fe1c2f0d641e.service
× run-rd9efc558fff944d68670fe1c2f0d641e.service - /usr/bin/toolbox sh -c dnf install -y tmux strace procps-ng; tmux new-session -d -s sharedsession; strace -p "$(pidof tmux)"
     Loaded: loaded (/run/user/500/systemd/transient/run-rd9efc558fff944d68670fe1c2f0d641e.service; transient)
  Transient: yes
     Active: failed (Result: exit-code) since Mon 2023-06-12 08:31:16 UTC; 2min 23s ago
   Duration: 12ms
    Process: 1553 ExecStart=/usr/bin/toolbox sh -c dnf install -y tmux strace procps-ng; tmux new-session -d -s sharedsession; strace -p "$(pidof tmux)" (code=exited, status=127)
   Main PID: 1553 (code=exited, status=127)
        CPU: 13ms

Jun 12 08:31:16 localhost systemd[1214]: Started run-rd9efc558fff944d68670fe1c2f0d641e.service.
Jun 12 08:31:16 localhost systemd[1214]: run-rd9efc558fff944d68670fe1c2f0d641e.service: Main process exited, code=exited, status=127/n/a
Jun 12 08:31:16 localhost systemd[1214]: run-rd9efc558fff944d68670fe1c2f0d641e.service: Failed with result 'exit-code'.

The command is seen as dnf only because of the systemd ExecStart formatting:

ExecStart="/usr/bin/toolbox" "sh" "-c" "dnf install -y tmux strace procps-ng\; tmux new-session -d -s sharedsession\; strace -p \"\$(pidof tmux)\""

EDIT: It works on Beta and Alpha (maybe the bash upgrade?)

dnf is being OOM killed

Let's continue without using the systemd-run command:

$ toolbox sh -c 'dnf install -y tmux strace procps-ng; tmux new-session -d -s sharedsession; strace -p "$(pidof tmux)"'
Spawning container core-docker.iolibraryfedora-latest on /var/lib/toolbox/core-docker.io_library_fedora-latest.
Press ^] three times within 1s to kill container
sh: line 1: tmux: command not found
sh: line 1: pidof: command not found
sh: line 1: strace: command not found
Container core-docker.iolibraryfedora-latest failed with error code 127.
$ dmesg
...
[ 1133.983661] dnf invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[ 1133.985659] CPU: 3 PID: 1617 Comm: dnf Not tainted 5.15.111-flatcar #1
[ 1133.986930] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[ 1133.989057] Call Trace:
[ 1133.989665]  <TASK>
[ 1133.990214]  dump_stack_lvl+0x46/0x5e
[ 1133.991025]  dump_header+0x4a/0x1f7
[ 1133.991837]  oom_kill_process.cold+0xb/0x10
[ 1133.992719]  out_of_memory+0x1b9/0x4c0
[ 1133.993574]  __alloc_pages_slowpath.constprop.0+0xbcf/0xca0
[ 1133.994728]  __alloc_pages+0x30d/0x320
[ 1133.995544]  pagecache_get_page+0x150/0x430
[ 1133.996431]  filemap_fault+0x5bf/0x900
[ 1133.997246]  ? filemap_map_pages+0x12a/0x5d0
[ 1133.998173]  __do_fault+0x36/0x120
[ 1133.998932]  __handle_mm_fault+0xed1/0x1450
[ 1133.999812]  handle_mm_fault+0xcf/0x2b0
[ 1134.000639]  do_user_addr_fault+0x1c5/0x680
[ 1134.001556]  exc_page_fault+0x68/0x140
[ 1134.002366]  asm_exc_page_fault+0x22/0x30
[ 1134.003219] RIP: 0033:0x7f83f00963b0
[ 1134.004012] Code: Unable to access opcode bytes at RIP 0x7f83f0096386.
[ 1134.005295] RSP: 002b:00007ffc0ded3938 EFLAGS: 00010202
[ 1134.006346] RAX: 000000001b000000 RBX: 000000001b000000 RCX: 000000000000003f
[ 1134.007702] RDX: 0000000000000000 RSI: 000000001b000000 RDI: 00007f83c407c010
[ 1134.009087] RBP: 00007ffc0ded3960 R08: 0000000000000004 R09: 0000000006899c73
[ 1134.010479] R10: 0000000000001237 R11: 0000000000000000 R12: 0000563ed082a400
[ 1134.011864] R13: 0000000000001235 R14: 0000000000000002 R15: 0000000000014e49
[ 1134.013224]  </TASK>
[ 1134.013810] Mem-Info:
[ 1134.014781] active_anon:12086 inactive_anon:203866 isolated_anon:0
                active_file:30 inactive_file:11 isolated_file:0
                unevictable:0 dirty:0 writeback:0
                slab_reclaimable:5424 slab_unreclaimable:8149
                mapped:7961 shmem:62421 pagetables:794 bounce:0
                kernel_misc_reclaimable:0
                free:12115 free_pcp:0 free_cma:0
[ 1134.025436] Node 0 active_anon:48344kB inactive_anon:815464kB active_file:120kB inactive_file:44kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:31844kB dirty:0kB writeback:0kB shmem:249684kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 49152kB writeback_tmp:0kB kernel_stack:2368kB pagetables:3176kB all_unreclaimable? yes
[ 1134.034357] Node 0 DMA free:4336kB min:732kB low:912kB high:1092kB reserved_highatomic:0KB active_anon:0kB inactive_anon:10848kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1134.039686] lowmem_reserve[]: 0 907 907 907
[ 1134.040725] Node 0 DMA32 free:44124kB min:44320kB low:55400kB high:66480kB reserved_highatomic:0KB active_anon:48344kB inactive_anon:804416kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1032052kB managed:977200kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1134.046151] lowmem_reserve[]: 0 0 0 0
[ 1134.047072] Node 0 DMA: 12*4kB (UME) 14*8kB (UME) 7*16kB (ME) 8*32kB (UME) 4*64kB (ME) 4*128kB (UE) 4*256kB (UME) 2*512kB (UE) 1*1024kB (M) 0*2048kB 0*4096kB = 4368kB
[ 1134.059047] Node 0 DMA32: 1462*4kB (UME) 747*8kB (UME) 287*16kB (UME) 180*32kB (UME) 89*64kB (UME) 46*128kB (UE) 20*256kB (UME) 10*512kB (UM) 1*1024kB (M) 0*2048kB 0*4096kB = 45024kB
[ 1134.062435] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 1134.064321] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1134.066181] 62501 total pagecache pages
[ 1134.067090] 0 pages in swap cache
[ 1134.067921] Swap cache stats: add 0, delete 0, find 0/0
[ 1134.069082] Free swap  = 0kB
[ 1134.069808] Total swap = 0kB
[ 1134.070543] 262011 pages RAM
[ 1134.071285] 0 pages HighMem/MovableOnly
[ 1134.072200] 13871 pages reserved
[ 1134.072974] 0 pages hwpoisoned
[ 1134.073688] Tasks state (memory values in pages):
[ 1134.074770] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[ 1134.076662] [    964]     0   964     8532      328    81920        0          -250 systemd-journal
[ 1134.078648] [    991]     0   991     7908      587    81920        0         -1000 systemd-udevd
[ 1134.080560] [   1001]   244  1001     4743      330    77824        0             0 systemd-network
[ 1134.082531] [   1002]     0  1002     4317      248    69632        0             0 systemd-userdbd
[ 1134.084527] [   1043]   245  1043     5299      521    86016        0             0 systemd-resolve
[ 1134.086532] [   1044]   195  1044    22865      793    86016        0             0 systemd-timesyn
[ 1134.088491] [   1066]   201  1066     2358      237    53248        0          -900 dbus-daemon
[ 1134.090407] [   1072]     0  1072     4719      425    73728        0             0 systemd-logind
[ 1134.092357] [   1076]     0  1076     7886      803    98304        0             0 update_engine
[ 1134.094299] [   1089]     0  1089   374066    13974   331776        0          -999 containerd
[ 1134.096187] [   1190]     0  1190   253744     2137   122880        0             0 locksmithd
[ 1134.098078] [   1210]     0  1210     1546      128    45056        0             0 login
[ 1134.099881] [   1211]     0  1211     1664      170    49152        0             0 login
[ 1134.101716] [   1214]   500  1214     5213      613    81920        0           100 systemd
[ 1134.103537] [   1215]   500  1215     6197     1264    86016        0           100 (sd-pam)
[ 1134.105399] [   1220]   500  1220     1261      146    49152        0             0 bash
[ 1134.107221] [   1221]   500  1221     1114      134    45056        0             0 bash
[ 1134.109026] [   1235]     0  1235     3369      364    69632        0             0 sshd
[ 1134.110745] [   1237]   500  1237     3434      433    69632        0             0 sshd
[ 1134.112555] [   1238]   500  1238     1139      164    45056        0             0 bash
[ 1134.114329] [   1326]     0  1326     4389      263    69632        0             0 systemd-machine
[ 1134.116287] [   1565]     0  1565     4457      270    77824        0             0 systemd-userwor
[ 1134.118245] [   1566]     0  1566     4413      267    77824        0             0 systemd-userwor
[ 1134.120208] [   1567]     0  1567     4413      267    73728        0             0 systemd-userwor
[ 1134.122175] [   1607]   500  1607     1081      110    49152        0             0 toolbox
[ 1134.123951] [   1613]   500  1613     3117      236    61440        0             0 sudo
[ 1134.125738] [   1614]     0  1614     4398      248    69632        0             0 systemd-nspawn
[ 1134.127682] [   1616]     0  1616     1118       58    45056        0             0 sh
[ 1134.129447] [   1617]     0  1617   150859   135176  1220608        0             0 dnf
[ 1134.131246] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=payload,mems_allowed=0,global_oom,task_memcg=/machine.slice/core-docker.iolibraryfedora-latest.scope/payload,task=dnf,pid=1617,uid=0
[ 1134.134969] Out of memory: Killed process 1617 (dnf) total-vm:603436kB, anon-rss:540704kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:1192kB oom_score_adj:0

Adjusting the oom_score_adj makes it "work" but you are ending with other processes being killed (for example systemd-networkd) - as a workaround I used an alpine image:

$ cat .toolboxrc
TOOLBOX_DOCKER_IMAGE=docker.io/library/alpine
$ toolbox sh -c 'apk add --update tmux strace procps-ng; tmux new-session -d -s sharedsession; strace -p "$(pidof tmux)"'
Spawning container core-docker.iolibraryalpine-latest on /var/lib/toolbox/core-docker.io_library_alpine-latest.
Press ^] three times within 1s to kill container.
fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/community/x86_64/APKINDEX.tar.gz
OK: 13 MiB in 28 packages
strace: Process 1781 attached

Additional information

Regarding the dnf OOM: https://bugzilla.redhat.com/show_bug.cgi?id=1907030

@tormath1 tormath1 added the kind/bug Something isn't working label Jun 12, 2023
@t-lo
Copy link
Member

t-lo commented Jun 12, 2023

Currently our QEmu script provides 1GB RAM to the VM - maybe we should increase the default in the script?
I think it's reasonable for Fedora to require more than 1GB.
However, I'm also happy to investigate Alpine as a potential alternative for the Fedora-based toolbox (because I think Alpine is pretty awesome for basing container images on).

@tormath1
Copy link
Contributor Author

Tested in a VM with 2048MB and no more OOM. I think we can keep the helper as it and update the documentation to mention the 2GB for Fedora based toolbox.

@till
Copy link

till commented Jun 13, 2023

That issue is like an evergreen. 😅 Did you try breaking it up with makecache? That worked on a small vm, haven't tried it in the toolbox.

@pothos
Copy link
Member

pothos commented Jun 13, 2023

Had a similar issue without Flatcar, just plain Fedora on a 1 GB RAM arm64 SoC. A workaround is to use zram, and we also have a small docs section to enable it with Flatcar (but it's not using the systemd generator): https://www.flatcar.org/docs/latest/setup/storage/adding-swap/#using-zram

@tormath1
Copy link
Contributor Author

I investigated further: it works with Beta and Alpha (it just takes a simple fix, I'll send as a documentation PR). I suspect the bash upgrade to fix this behavior as the systemd unit is formatted in the same way. I'll keep this open for tracking purposes while Beta is not yet promoted to Stable.

@tormath1
Copy link
Contributor Author

It works fine on Stable too - closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
channel/stable Issue concerns the Stable channel. kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants