Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

Stress-ng test suite causing memory fault and heap corruption failure in Graphene #2419

Closed
jinengandhi-intel opened this issue Jun 3, 2021 · 22 comments
Assignees

Comments

@jinengandhi-intel
Copy link
Contributor

Description of the problem

While trying to enable the different stressors with Graphene I am seeing some failures Memory fault, Heap corruption and other failures from Graphene.

Failure 1: Testing procfs.

Run 1:
$ graphene-direct stress-ng --procfs 8 --timeout 40s
error: Using insecure argv source. Graphene will continue application execution, but this configuration must not be used in production!
error: Forwarding host environment variables to the app is enabled. Graphene will continue application execution, but this configuration must not be used in production!
stress-ng: info: [1] dispatching hogs: 8 procfs
stress-ng: error: [1] glob on regex "/sys/devices/system/cpu/cpu0/cache/index[0-9]*" failed: 1
stress-ng: info: [1] cache allocate: using built-in defaults as unable to determine cache details
[P15695:T2:stress-ng] error: Internal memory fault at 0x100000013 (IP = +0x2d234, VMID = 15695, TID = 2)
[P15697:i1:stress-ng] error: IPC worker: unexpected event (4) on exit handle
[P15711:i1:stress-ng] error: Internal memory fault at 0x100000014 (IP = +0x39e67, VMID = 15711, TID = 0)
[P15718:i1:stress-ng] error: Internal memory fault at 0x100000014 (IP = +0x39e67, VMID = 15718, TID = 0)
[P15725:i1:stress-ng] error: Internal memory fault at 0x100000014 (IP = +0x39e67, VMID = 15725, TID = 0)
[P15732:i1:stress-ng] error: Internal memory fault at 0x100000014 (IP = +0x39e67, VMID = 15732, TID = 0)
[P15739:i1:stress-ng] error: Internal memory fault at 0x100000014 (IP = +0x39e67, VMID = 15739, TID = 0)
stress-ng: info: [1] unsuccessful run completed in 40.06s

Run 2:
$ graphene-direct stress-ng --procfs 8 --timeout 40s
error: Using insecure argv source. Graphene will continue application execution, but this configuration must not be used in production!
error: Forwarding host environment variables to the app is enabled. Graphene will continue application execution, but this configuration must not be used in production!
stress-ng: info: [1] dispatching hogs: 8 procfs
stress-ng: error: [1] glob on regex "/sys/devices/system/cpu/cpu0/cache/index[0-9]*" failed: 1
stress-ng: info: [1] cache allocate: using built-in defaults as unable to determine cache details
[P16061:i1:stress-ng] error: Internal memory fault at 0x100000012 (IP = +0x39e67, VMID = 16061, TID = 0)
[P16059:T2:stress-ng] error: Internal memory fault at 0x100000014 (IP = +0x2d234, VMID = 16059, TID = 2)
[P16075:i1:stress-ng] error: Internal memory fault at 0x100000014 (IP = +0x39e67, VMID = 16075, TID = 0)
[P16089:i1:stress-ng] error: Internal memory fault at 0x100000014 (IP = +0x39e67, VMID = 16089, TID = 0)
error: *** Unexpected memory fault occurred inside PAL (PID = 16093, TID = 16102, RIP = +0x00004c30)! ***
error: *** Unexpected memory fault occurred inside PAL (PID = 16103, TID = 16109, RIP = +0x00004c30)! ***
[P16068:i1:stress-ng] error: Internal memory fault at 0x100000016 (IP = +0x39e67, VMID = 16068, TID = 0)
[P16082:T6:stress-ng] error: Failed to send IPC msg to 16075: -32
stress-ng: info: [1] unsuccessful run completed in 40.08s

Failure 2: Testing getdents

Run 1:
$ graphene-direct stress-ng --getdent 8 --timeout 60s
error: Using insecure argv source. Graphene will continue application execution, but this configuration must not be used in production!
error: Forwarding host environment variables to the app is enabled. Graphene will continue application execution, but this configuration must not be used in production!
stress-ng: info: [1] dispatching hogs: 8 getdent
stress-ng: error: [1] glob on regex "/sys/devices/system/cpu/cpu0/cache/index[0-9]*" failed: 1
stress-ng: info: [1] cache allocate: using built-in defaults as unable to determine cache details
[P14981:T6:stress-ng] assert failed ../LibOS/shim/include/../../../common/include/slabmgr.h:400 *m == SLAB_CANARY_STRING
[P14990:i1:stress-ng] error: Internal memory fault at 0x100000017 (IP = +0x39e67, VMID = 14990, TID = 0)
[P14970:T2:stress-ng] assert failed ../LibOS/shim/include/../../../common/include/slabmgr.h:400 *m == SLAB_CANARY_STRING
[P14972:T3:stress-ng] Heap corruption detected: invalid heap level 8
stress-ng: info: [1] unsuccessful run completed in 60.12s (1 min, 0.12 secs)

Run 2:
$ graphene-direct stress-ng --getdent 8 --timeout 60s
error: Using insecure argv source. Graphene will continue application execution, but this configuration must not be used in production!
error: Forwarding host environment variables to the app is enabled. Graphene will continue application execution, but this configuration must not be used in production!
stress-ng: info: [1] dispatching hogs: 8 getdent
stress-ng: error: [1] glob on regex "/sys/devices/system/cpu/cpu0/cache/index[0-9]*" failed: 1
stress-ng: info: [1] cache allocate: using built-in defaults as unable to determine cache details
[P15426:T6:stress-ng] assert failed ../LibOS/shim/include/../../../common/include/slabmgr.h:400 *m == SLAB_CANARY_STRING
[P15429:T7:stress-ng] assert failed ../LibOS/shim/include/../../../common/include/slabmgr.h:400 *m == SLAB_CANARY_STRING
[P15432:T8:stress-ng] assert failed ../LibOS/shim/include/../../../common/include/slabmgr.h:400 *m == SLAB_CANARY_STRING
[P15435:T9:stress-ng] assert failed ../LibOS/shim/include/../../../common/include/slabmgr.h:400 *m == SLAB_CANARY_STRING
stress-ng: info: [1] unsuccessful run completed in 60.09s (1 min, 0.09 secs)

Failure 3: Testing mknod

$ graphene-direct stress-ng --mknod 8 --timeout 40s
error: Using insecure argv source. Graphene will continue application execution, but this configuration must not be used in production!
error: Forwarding host environment variables to the app is enabled. Graphene will continue application execution, but this configuration must not be used in production!
stress-ng: info: [1] dispatching hogs: 8 mknod
stress-ng: error: [1] glob on regex "/sys/devices/system/cpu/cpu0/cache/index[0-9]*" failed: 1
stress-ng: info: [1] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: fail: [2] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [2] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [2] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [2] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [2] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
info: 5 failures reached, aborting stress process
stress-ng: fail: [3] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [3] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [3] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [3] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
[P26573:T3:stress-ng] error: Internal memory fault at 0xc0000005b (IP = +0x1bdd9, VMID = 26573, TID = 3)
stress-ng: fail: [4] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [4] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [4] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [4] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
[P26577:T4:stress-ng] error: Internal memory fault at 0x400000053 (IP = +0x1bdd9, VMID = 26577, TID = 4)
stress-ng: fail: [5] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [5] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [5] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [5] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [5] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
info: 5 failures reached, aborting stress process
stress-ng: fail: [6] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [6] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [6] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [6] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [6] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
info: 5 failures reached, aborting stress process
stress-ng: fail: [7] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [7] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [7] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [7] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [7] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
info: 5 failures reached, aborting stress process
stress-ng: fail: [8] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [8] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
[P26591:T8:stress-ng] error: Internal memory fault at 0x400000053 (IP = +0x1bdd9, VMID = 26591, TID = 8)
stress-ng: fail: [9] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [9] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [9] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail: [9] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
[P26594:T9:stress-ng] error: Internal memory fault at 0xc0000005b (IP = +0x1bdd9, VMID = 26594, TID = 9)

Failure 4: Testing rename.

graphene-direct stress-ng --rename 8 --timeout 40s
error: Using insecure argv source. Graphene will continue application execution, but this configuration must not be used in production!
error: Forwarding host environment variables to the app is enabled. Graphene will continue application execution, but this configuration must not be used in production!
stress-ng: info: [1] dispatching hogs: 8 rename
stress-ng: error: [1] glob on regex "/sys/devices/system/cpu/cpu0/cache/index[0-9]*" failed: 1
stress-ng: info: [1] cache allocate: using built-in defaults as unable to determine cache details
error: *** Unexpected memory fault occurred inside PAL (PID = 16073, TID = 16073, RIP = +0x00005a4d)! ***
error: *** Unexpected memory fault occurred inside PAL (PID = 16053, TID = 16053, RIP = +0x00005a4d)! ***
error: *** Unexpected memory fault occurred inside PAL (PID = 16061, TID = 16061, RIP = +0x00005a4d)! ***
stress-ng: info: [1] unsuccessful run completed in 28.54s

Steps to reproduce

Installation of the tool is simple, just run the following command: apt install stress-ng

Manifest files are attached to the issue.
stress-ng.manifest.template.txt
stress-ng.manifest.txt

Expected results

Actual results

@dimakuv
Copy link

dimakuv commented Jun 3, 2021

Thanks @jinengandhi-intel for the detailed bug report!

This is easy to reproduce. I spent a couple hours debugging this, and the culprit is always the same: our IPC communication code at the LibOS layer. In particular, all that stuff in ipc-thread.c and shim_ipc_pid.c / shim_ipc_ranges.c / shim_ipc_sysv.c.

The immediate bug is something like this: we have lists of objects and operate on them using LISTP macros. Any object that can be included in a list has the LIST_TYPE() list field. In particular, I observed this bug with shim_ipc_worker.c: g_ipc_connections. This list got corrupted in such a way that the second list item had a corrupted pointer value in its list.next (which resulted in Internal memory fault at 0x100000012).

The most interesting part here is that g_ipc_connections is trivial and seems to be correctly operated on (with proper locks and stuff). So the only other explanation is that some unrelated part of Graphene code corrupts this memory. I wasn't able to debug with watchpoints (well, was lazy), but disabling ipc-thread.c: proc_list_ipc_thread() solved the issue.

So yes, there is some memory corruption happening somewhere inside get_all_pid_status() and all those IPC functions.

At this point I decided to stop, because this is the code that @boryspoplawski is currently fixing. Borys, if you want a test for your IPC rewrite, this stress-ng is a good candidate.

@dimakuv
Copy link

dimakuv commented Jun 3, 2021

Some quick comments on this stress-ng program, maybe helpful for future debugs.

The program spawns several processes. Each process has a bunch of threads (around 6, why this number I don't understand). Each thread does the same -- e.g., repeatedly opens the file /proc/self/maps if you run with stress-ng --procfs.

Specifying stress-ng --procfs 1 means: there is one main process waiting for one child process. The main process doesn't do anything, the child process spawns ~6 threads, each thread opens /proc/self/maps repeatedly (and I think not only /proc/self/maps but other files under /proc as well, but I didn't look deeply into this).

Specifying stress-ng --procfs 2 means: there is one main process waiting for two child processes. Each child process does the same hogging job as described above.

Note that stress-ng --procfs 1 works fine. Whereas stress-ng --procfs 2 fails non-deterministically. So it is clearly some data races/memory corruptions in the IPC code.

@dimakuv
Copy link

dimakuv commented Jun 3, 2021

Btw, putting watchpoints on g_ipc_connections object wouldn't be too hard, but the problem is that our misbehaving application has at least three processes, and GDB constantly failed on me when I tried to stop exactly in Inferior 2 (the "middle" process). I tried different combinations of set detach-on-fork and set follow-fork-mode, but GDB always ended up in an Internal GDB error.

@pwmarcz Do you know about this behavior of GDB? Do you have an idea why it behaves this way? Do you know of any workarounds?

@pwmarcz
Copy link
Contributor

pwmarcz commented Jun 4, 2021

@pwmarcz Do you know about this behavior of GDB? Do you have an idea why it behaves this way? Do you know of any workarounds?

Not sure if this exactly, but sometimes GDB outright freezes for me with some applications that use fork (even simple regression tests). I haven't been able to debug it yet.

@anjalirai-intel
Copy link

@dimakuv @jinengandhi-intel
Heap corruption is not seen with IPC PR.

However mknod issue still exists and procfs throws different error

Procfs:

rasp@rasp-WHITLEY:~/Anjali/stress-ng/example-jobs$ graphene-direct stress-ng --procfs 8 --timeout 40s
error: Using insecure argv source. Graphene will continue application execution, but this configuration must not be used in production!
error: Forwarding host environment variables to the app is enabled. Graphene will continue application execution, but this configuration must not be used in production!
stress-ng: info:  [1] dispatching hogs: 8 procfs
stress-ng: info:  [1] cache allocate: using built-in defaults as unable to determine cache details
[P161036:T35:stress-ng] assert failed ../LibOS/shim/src/fs/shim_fs_pseudo.c:74 parent_node->type == PSEUDO_DIR
[P161034:shim] error: Child process (vmid: 0x2750c) got disconnected
[P161080:T257:stress-ng] assert failed ../LibOS/shim/src/fs/shim_fs_pseudo.c:74 parent_node->type == PSEUDO_DIR
[P161034:shim] error: Child process (vmid: 0x27538) got disconnected
stress-ng: info:  [1] unsuccessful run completed in 77.04s (1 min, 17.04 secs)

mknod:

rasp@rasp-WHITLEY:~/Anjali/stress-ng/example-jobs$ graphene-direct stress-ng --mknod 8 --timeout 40s
error: Using insecure argv source. Graphene will continue application execution, but this configuration must not be used in production!
error: Forwarding host environment variables to the app is enabled. Graphene will continue application execution, but this configuration must not be used in production!
stress-ng: info:  [1] dispatching hogs: 8 mknod
stress-ng: info:  [1] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: fail:  [2] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [2] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [2] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [2] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [2] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
info: 5 failures reached, aborting stress process
stress-ng: fail:  [3] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [3] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [3] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [3] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [3] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
info: 5 failures reached, aborting stress process
stress-ng: fail:  [4] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [4] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [4] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [4] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [4] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
info: 5 failures reached, aborting stress process
stress-ng: fail:  [5] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [5] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [5] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [5] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [5] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
info: 5 failures reached, aborting stress process
stress-ng: fail:  [6] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [6] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
[P161352:T6:stress-ng] error: Internal memory fault at 0x400000053 (IP = +0x1aae8, VMID = 161352, TID = 6)
[P161339:shim] error: Child process (vmid: 0x27648) got disconnected
stress-ng: fail:  [7] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [7] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [7] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [7] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
[P161355:T7:stress-ng] error: Internal memory fault at 0x400000053 (IP = +0x1aae8, VMID = 161355, TID = 7)
[P161339:shim] error: Child process (vmid: 0x2764b) got disconnected
stress-ng: fail:  [8] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [8] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
[P161358:T8:stress-ng] error: Internal memory fault at 0x400000053 (IP = +0x1aae8, VMID = 161358, TID = 8)
[P161339:shim] error: Child process (vmid: 0x2764e) got disconnected
stress-ng: fail:  [9] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
stress-ng: fail:  [9] stress-ng-mknod: mknod failed, errno=22 (Invalid argument)
[P161361:T9:stress-ng] error: Internal memory fault at 0x400000053 (IP = +0x1aae8, VMID = 161361, TID = 9)
[P161339:shim] error: Child process (vmid: 0x27651) got disconnected
stress-ng: info:  [1] unsuccessful run completed in 0.66s

@dimakuv
Copy link

dimakuv commented Jun 28, 2021

Interesting, thanks for testing! @pwmarcz You'll want to take a look at this.

@pwmarcz
Copy link
Contributor

pwmarcz commented Jul 2, 2021

I wasn't able to reproduce the procfs issue, but I fixed crash in shim_do_poll. @anjalirx-intel Does PR #2498 fix the procfs crash for you?

I was able to reproduce the mknod one and I'm working on a fix.

@jinengandhi-intel
Copy link
Contributor Author

@pwmarcz @mkow I don't see the Internal memory fault with the mknod test but I see it 3/5 times with procfs test.

Logs below:

intel@intel-Ice-Lake-Client-Platform:~/graphene/Examples/stress-ng$ graphene-direct stress-ng --procfs 8 --timeout 40s
error: Using insecure argv source. Graphene will continue application execution, but this configuration must not be used in production!
error: Forwarding host environment variables to the app is enabled. Graphene will continue application execution, but this configuration must not be used in production!
stress-ng: info:  [1] dispatching hogs: 8 procfs
stress-ng: info:  [1] cache allocate: using built-in defaults as unable to determine cache details
[P8417:T35:stress-ng] error: Internal memory fault at 0x00000000 (IP = +0x4bca3, VMID = 8417, TID = 35)
[P8415:shim] error: Child process (vmid: 0x20e1) got disconnected
[P8426:T99:stress-ng] error: Internal memory fault at 0x00000000 (IP = +0x4bca3, VMID = 8426, TID = 99)
[P8415:shim] error: Child process (vmid: 0x20ea) got disconnected
[P8419:T65:stress-ng] error: Internal memory fault at 0x00000000 (IP = +0x4bca3, VMID = 8419, TID = 65)
[P8415:shim] error: Child process (vmid: 0x20e3) got disconnected
[P8433:T132:stress-ng] error: Internal memory fault at 0x00000000 (IP = +0x4bca3, VMID = 8433, TID = 132)
[P8415:shim] error: Child process (vmid: 0x20f1) got disconnected
[P8454:T228:stress-ng] error: Internal memory fault at 0x00000000 (IP = +0x4bca3, VMID = 8454, TID = 228)
[P8415:shim] error: Child process (vmid: 0x2106) got disconnected
[P8440:T164:stress-ng] error: Internal memory fault at 0x00000000 (IP = +0x4bca3, VMID = 8440, TID = 164)
[P8415:shim] error: Child process (vmid: 0x20f8) got disconnected
[P8461:T257:stress-ng] error: Internal memory fault at 0x00000000 (IP = +0x4bca3, VMID = 8461, TID = 257)
[P8415:shim] error: Child process (vmid: 0x210d) got disconnected
stress-ng: info:  [1] unsuccessful run completed in 40.09s

@pwmarcz
Copy link
Contributor

pwmarcz commented Jul 6, 2021

Initially I couldn't reproduce the procfs crash, but I found out that it appears only in Ubuntu 18.04 (stress-ng 0.09.25), not Ubuntu 20.04 (stress-ng 0.11.07).

The exact line in stress-ng causing the crash is here: stress-procfs.c:129

It looks like we're calling open() on a path that changes in the meantime. I think that actually stress-ng is doing something bad here. The path is read from a global pointer (static char *proc_path), which is protected by a lock, but the content of the string (which is in a local array) is modified without a lock.

Some later commits (ColinIanKing/stress-ng@b6c62a3, ColinIanKing/stress-ng@5a598e5) replace this global pointer with a global array, and it looks like stress-ng 0.11.07 uses locks correctly.

In conclusion, it looks like a bug in stress-ng, as the path changes while Graphene is processing it. It's not good that Graphene crashes on it, though: I guess we could mitigate the impact of such bugs by making sure that we read user data only once. Or at least copy the filename at the beginning of path lookup function?

CC @dimakuv, I remember you investigating exactly this issue before. What do you think?

@dimakuv
Copy link

dimakuv commented Jul 6, 2021

@pwmarcz I did only superficial analysis of this, but here's my dump:


The assert fires here:

~/graphene/Examples/stress-ng$ graphene-direct stress-ng --procfs 2 --timeout 5s
[P21242:T34:stress-ng] ----------------- path = /proc/self/maps
[P21242:T34:stress-ng] ----------------- name = /maps
[P21242:T34:stress-ng] assert failed ../LibOS/shim/src/fs/shim_namei.c:211 *name != '/'

I am looking at the code of shim_namei.c: do_path_lookupat() and can't understand how the name can end up starting with /.
There seems to be a data race. I have a feeling that somewhere in pseudo/proc FS code, we have temporary substitute like this *name = '\0'. It's some kind of a memory corruption in an unrelated piece of code.


Looking at stress-ng changes, it indeed looks like there was a bug in the locking scheme (path could be changed by another thread while performing open(path) in this thread). So yes, looks like a bug in stress-ng.

I guess we could mitigate the impact of such bugs by making sure that we read user data only once. Or at least copy the filename at the beginning of path lookup function?

I think we should copy the filename at the beginning of each syscall-entry function that uses open_namei()/path_lookup() family of functions. This is also what Linux does -- it copies into the kernel space the filename, and then operates on this copy.
I'm talking about adding strdup() to shim_do_open(), shim_do_mkdir(), shim_do_chmod() etc. -- but if there is a common function where we can add this copying, then yes, we can add only once, in this common function.

Though I would highly prefer the explicit copying in each syscall-entry func.

@boryspoplawski
Copy link
Contributor

boryspoplawski commented Jul 6, 2021

Though I would highly prefer the explicit copying in each syscall-entry func.

I guess we do not do that currently for performance reasons as we do not really have to do that (if the app is not bugged). Linux has to do that for security reasons (but we do not consider user app vs Graphene to be a security boundary).

@dimakuv
Copy link

dimakuv commented Jul 6, 2021

@boryspoplawski True, but sometimes we have this kind of buggy apps. Plus, performance drop should be negligible. So sounds like a reasonable thing to do.

@mkow
Copy link
Member

mkow commented Jul 7, 2021

I agree with Borys, Linux has to do this, but that's because its security model is opposite to ours.

@boryspoplawski True, but sometimes we have this kind of buggy apps. Plus, performance drop should be negligible. So sounds like a reasonable thing to do.

We shouldn't try to patch around such bugs in apps. It's just another complexity for which I don't see a good enough justification.

@dimakuv
Copy link

dimakuv commented Jul 7, 2021

Ok, I agree that it doesn't make sense to "hide" bugs in the user application. On the other hand, I don't think it has any detrimental effect on performance or complexity.

But I agree, if there is no good justification to add something to Graphene, then we shouldn't add it. And here we don't have a good justification.

So I guess we should just recommend to use stress-ng with a minimal version of 0.11.07.

@pwmarcz
Copy link
Contributor

pwmarcz commented Jul 7, 2021

I think there is a justification: the bug causes Graphene to behave in internally inconsistent ways (break assertions, etc.), this is confusing to us as Graphene developers, and we waste time trying to find a bug in Graphene[1]. That could be avoided if we copied the path, and could depend on it not changing afterwards.

We already usually check if user memory is readable, and return -EFAULT otherwise. We could have a function that does that for paths (or null-terminated strings), and creates Graphene's copy of the string (strdup-like) at the same time.

[1] @dimakuv suspected that "somewhere in pseudo/proc FS code, we have temporary substitute like this *name = '\0'".

@dimakuv
Copy link

dimakuv commented Jul 7, 2021

This also leads to an interesting question: if an application (perhaps inadvertently) relies on a side-effect of the Linux implementation, do we emulate this side-effect in Graphene? Or do we declare this application buggy/not conformant?

@boryspoplawski
Copy link
Contributor

This also leads to an interesting question: if an application (perhaps inadvertently) relies on a side-effect of the Linux implementation, do we emulate this side-effect in Graphene? Or do we declare this application buggy/not conformant?

I guess that depends on the complexity and impact of such quirks. Currently we do support some and I would say we should as we emulate Linux (Graphene is not just some UNIX system), but if the implementation would be hard and the particular quirk weird/hard I would say we could ignore it.

As for adding a path copying to open-like syscalls: I do not directly oppose it (because at some level I agree with @pwmarcz that this could save us - Graphene devs - some debugging), but I would definitely not make this a rule (trying to circumvent app bugs inside Graphene).

@mkow
Copy link
Member

mkow commented Jul 7, 2021

Actually, shouldn't this particular bug also fail sporadically on Linux? And in general, this whole class of bugs. If the path changes in the meantime, then in can also start changing during kernel copy and the kernel would get a partially updated version?

@pwmarcz
Copy link
Contributor

pwmarcz commented Jul 7, 2021

One of the commit messages mentions unexplained errors on some systems, so maybe it did fail on Linux. But kernel copy should take much less time, so it would rarely fail anyway. Graphene lookup takes locks and performs I/O while traversing the path, so there is more time for the string to change.

If we add the proposed fix to Graphene and run the old version of stress-ng, I would expect Graphene open() to sometimes return ENOENT on invalid/garbage paths, and maybe EFAULT if there is no null terminator and we attempt to copy too much (provided we keep checking if the memory is readable).

@mkow
Copy link
Member

mkow commented Jul 11, 2021

So, at least to me, it seems that:

  • Such kind of races should also fail on Linux, just more rarely.
  • Implementing the copying is syscall-specific and will rather take non-negligible amount of work.
  • But sometimes may save us some debugging of crashes.

So, overall I'm rather opposed to implementing this, mostly because of the first point.

@dimakuv
Copy link

dimakuv commented Jul 12, 2021

Ok, looks like the consensus is that bugs in the app are allowed to crash/trigger asserts in Graphene. Then the solution for this issue is to ask to use newer stress-ng version. And there is nothing to do in Graphene. Should we close this issue then, @jinengandhi-intel ?

@jinengandhi-intel
Copy link
Contributor Author

Sure, we will upgrade the systems and try the newer version in the coming sprint and raise a separate issue if we still see similar errors. For now, closing the issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants