Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

IPC helper cleanup and ensuing killXX bug fixes #811

Merged
merged 15 commits into from
Jul 19, 2019

Conversation

dimakuv
Copy link

@dimakuv dimakuv commented Jul 8, 2019

Affected components

  • README and global configuration
  • Linux PAL
  • SGX PAL
  • FreeBSD PAL
  • Common PAL code
  • Library OS (i.e., SHIM), including GLIBC

Description of the changes

This PR contains commits which serve to cleanup the IPC (Inter-Process Communication) subsystem of Graphene which contained several intermittent bugs (data races between Graphene processes). Please refer to commit messages.

This PR also updates the https://github.com/oscarlab/graphene-tests submodule, see the corresponding PR oscarlab/graphene-tests#3.

Also, for some preliminary info on the bugs fixed by this PR, see #803. This issue also contains the LTP tests fixed by this PR.

Fixes #803.

How to test this PR?

All LTP tests must pass, including newly enabled killXX and waitpid05.


This change is Reviewable

@dimakuv dimakuv force-pushed the dimakuv/ipc-helper-cleanup branch from d635c08 to 8133db4 Compare July 9, 2019 03:43
Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice clean up!

Reviewed 16 of 18 files at r1, 2 of 2 files at r2.
Reviewable status: all files reviewed, 10 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)


LibOS/shim/src/shim_async.c, line 278 at r2 (raw file):

        return 0;

    enable_locking();

I guess this is for code resembrance to ipc helper.
If so, enable_locking() should be called somewhere. probaby in init_async() or intall_async_event()


LibOS/shim/src/bookkeep/shim_signal.c, line 757 at r2 (raw file):

}

#define __WCOREDUMP_BIT 0x80

this definition should go to header file. shim_types.h


LibOS/shim/src/ipc/shim_ipc.c, line 451 at r2 (raw file):

    if (!off) {
        off = ADD_CP_OFFSET(sizeof(struct shim_ipc_info));
        ADD_TO_CP_MAP(obj, off);

Is this bug fix?


LibOS/shim/src/ipc/shim_ipc_child.c, line 161 at r2 (raw file):

mesage

typo: message.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 330 at r2 (raw file):

    unlock(&ipc_helper_lock);

    for (int i = 0; i < MAX_IPC_PORT_FINI_CB; i++)

Here is it guaranteed refcount == 1 or can del_ipc_port_fini() be called multiple times?
Original code indicates the latter. But the new code indicates the former.
At least the contract is that fini function is called only once.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 381 at r2 (raw file):

32

Please introduce symbol instead of raw immediate.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 398 at r2 (raw file):

        int cnt = 0;
        struct shim_ipc_port** target_ports_heap =
            malloc(sizeof(struct shim_ipc_port *) * target_ports_cnt);

do we want to introduce realloc() which is currently commented out in sim_malloc.c?


LibOS/shim/src/ipc/shim_ipc_helper.c, line 410 at r2 (raw file):

Quoted 4 lines of code…
    unlock(&ipc_helper_lock);

    for (int i = 0; i < target_ports_cnt; i++)
        get_ipc_port(target_ports[i]);

unlock() should be after incrementing refcount to avoid dereferening by other thread.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 495 at r2 (raw file):

                memcpy(tmp_buf, msg, bytes);
                free(msg);
                msg = tmp_buf;

realloc would help.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 746 at r2 (raw file):

    if (notme || !stack) {
        put_thread(self);

free stack if (stack).


LibOS/shim/src/ipc/shim_ipc_helper.c, line 777 at r2 (raw file):

        ipc_helper_state = HELPER_NOTALIVE;
        put_thread(new);
        return -PAL_ERRNO;

This is (potentially) dangerous because put_thread may overwrite errno. (maybe it can be debug message).

@dimakuv
Copy link
Author

dimakuv commented Jul 12, 2019

Note to myself: I forgot to change shim_do_vfork() implementation. In particular, gcc regression fails.

@dimakuv
Copy link
Author

dimakuv commented Jul 13, 2019

Retest Jenkins-SGX please (Large File Support LibOS test timed out, that's weird...)

Update: Apparently, we have so many LibOS tests now that 5 minutes is not enough to run them all under Linux-SGX. On my local machine, LibOS tests take almost 5 minutes to complete. I added a new commit increasing the timeouts, let's see now.

@dimakuv
Copy link
Author

dimakuv commented Jul 13, 2019

Retest this please (run 2 out of 10, hoping to see no failures except for clone03)

@dimakuv
Copy link
Author

dimakuv commented Jul 13, 2019

Retest this please (run 3 out of 10, hoping to see no failures except for clone03)

1 similar comment
@dimakuv
Copy link
Author

dimakuv commented Jul 13, 2019

Retest this please (run 3 out of 10, hoping to see no failures except for clone03)

@dimakuv
Copy link
Author

dimakuv commented Jul 13, 2019

Retest this please (run 4 out of 10, hoping to see no failures except for clone03) (clone03 and clone06 failed but not others)

@dimakuv
Copy link
Author

dimakuv commented Jul 13, 2019

Retest this please (run 5 out of 10, hoping to see no failures except for clone03)

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive clean-up, thanks!

Reviewed 16 of 18 files at r1, 2 of 2 files at r2, 22 of 22 files at r3, 7 of 7 files at r4, 2 of 2 files at r5.
Reviewable status: all files reviewed, 96 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv and @yamahata)

a discussion (no related file):
Please fix poortly typo in the commit messages when doing the final rebase.


a discussion (no related file):
From one of the commit messages:

We take a shortcut for now and emulate vfork() via fork(): this is allowed by POSIX and almost all Linux apps correctly use vfork().

I'm not sure how to understand the end of this sentence. Maybe you meant currently, not correctly?



LibOS/shim/include/shim_ipc.h, line 457 at r2 (raw file):

enum {
    LISTEN=0,   /* listening */

This is not needed, 0 is the default for the first element of an enum.


LibOS/shim/include/shim_ipc.h, line 501 at r2 (raw file):

static_always_inline size_t get_ipc_msg_duplex_size(size_t payload) {
    return get_ipc_msg_size(payload) +
        (sizeof(struct shim_ipc_msg_duplex) - sizeof(struct shim_ipc_msg));

I'd prefer to have it implemented here the same way as get_ipc_msg_size. Now, if sizeof(struct shim_ipc_msg_duplex) would be smaller than sizeof(struct shim_ipc_msg) we could return something smaller than IPC_MSG_MINIMAL_SIZE.


LibOS/shim/src/shim_init.c, line 844 at r3 (raw file):

        return -convert_pal_errno(-ret);
    debug("creating pipe: pipe.srv:%u\n", pipeid);
    if ((len = snprintf(uri, size, "pipe.srv:%u", pipeid)) == size)

should be >= size, not == size


LibOS/shim/src/shim_init.c, line 846 at r3 (raw file):

    if ((len = snprintf(uri, size, "pipe.srv:%u", pipeid)) == size)
        return -ERANGE;
    *((IDTYPE *) id) = pipeid;

no space after cast


LibOS/shim/src/bookkeep/shim_signal.c, line 757 at r2 (raw file):

Previously, yamahata wrote…

this definition should go to header file. shim_types.h

Do we want to export it if it's only used here, internally?


LibOS/shim/src/bookkeep/shim_signal.c, line 803 at r2 (raw file):

        /* SIGALRM */   &sighandler_kill,
        /* SIGTERM */   &sighandler_kill,
        /* SIGSTKFLT */ NULL,

Shouldn't this be &sighandler_kill?


LibOS/shim/src/bookkeep/shim_signal.c, line 819 at r2 (raw file):

        /* SIGPWR  */   &sighandler_kill,
        /* SIGSYS  */   &sighandler_core,
        /* SIGRTMIN  */ NULL,

Why do we include SIGRTMIN in the list?


LibOS/shim/src/ipc/shim_ipc.c, line 20 at r2 (raw file):

 * shim_ipc.c
 *
 * This file contains codes to maintain generic bookkeeping of IPC: operations

codes -> code


LibOS/shim/src/ipc/shim_ipc.c, line 48 at r2 (raw file):

struct shim_process cur_process;

#define CLIENT_HASH_LEN     6

I think "BITLEN" would be better here, we usually use "LEN" for byte-size.


LibOS/shim/src/ipc/shim_ipc.c, line 152 at r2 (raw file):

    struct shim_ipc_info* info;
    size_t len = strlen(uri);

Why does this function calculate length from strlen, but the previous one requires passing it in an argument?


LibOS/shim/src/ipc/shim_ipc.c, line 156 at r2 (raw file):

    lock(&ipc_info_lock);

    /* check if info with this vmid and uri already exists and return it */

exists -> exist


LibOS/shim/src/ipc/shim_ipc.c, line 305 at r2 (raw file):

    assert(thread);

    /* prepare thread which sends the message for waiting for response

either sends messages or will send the message


LibOS/shim/src/ipc/shim_ipc.c, line 326 at r2 (raw file):

        *seq = msg->msg.seq;

    debug("Start waiting for response (seq = %lu)\n", msg->msg.seq);

--> Waiting for [...]


LibOS/shim/src/ipc/shim_ipc.c, line 328 at r2 (raw file):

    debug("Start waiting for response (seq = %lu)\n", msg->msg.seq);

    /* force thread which sends the message to wait for response;

ditto (is sending)


LibOS/shim/src/ipc/shim_ipc.c, line 406 at r2 (raw file):

    init_ipc_msg(msg, IPC_CHECKPOINT, total_msg_size, 0);

    struct shim_ipc_checkpoint* msgin = (struct shim_ipc_checkpoint *) &msg->msg;

no space after cast


LibOS/shim/src/ipc/shim_ipc.c, line 413 at r2 (raw file):

    /* broadcast to all including myself (so I can also checkpoint) */
    ret = broadcast_ipc(msg, IPC_PORT_DIRCLD|IPC_PORT_DIRPRT, /*exclude_port*/ NULL);

/*exclude_port=*/NULL
This way we'll be able to verify correctness of this name when we introduce clang-tidy and use bugprone-argument-comment check.


LibOS/shim/src/ipc/shim_ipc_child.c, line 46 at r2 (raw file):

    struct thread_info* info = (struct thread_info *) arg;
    int found_exiting_thread = 0;

Please use bool.


LibOS/shim/src/ipc/shim_ipc_child.c, line 70 at r2 (raw file):

    struct thread_info* info = (struct thread_info *) arg;
    int found_exiting_thread = 0;

ditto (bool)


LibOS/shim/src/ipc/shim_ipc_child.c, line 125 at r2 (raw file):

    debug("Child process %u got disconnected: assume that child exited and "
          "force %d of its threads to exit\n", vmid & 0xFFFF, exited_threads_cnt);

assume -> assuming
force -> forcing


LibOS/shim/src/ipc/shim_ipc_child.c, line 156 at r2 (raw file):

          ppid, tid, exitcode, term_signal);

    int ret = broadcast_ipc(msg, IPC_PORT_DIRPRT|IPC_PORT_DIRCLD, /*exclude_port*/ NULL);

ditto


LibOS/shim/src/ipc/shim_ipc_child.c, line 192 at r2 (raw file):

    assert(msg->src != cur_process.vmid);

    /* First try to find remote thread who sent this message among normal

who -> which


LibOS/shim/src/ipc/shim_ipc_child.c, line 209 at r2 (raw file):

        /* Remote thread is "virtually" exited: SIGCHLD is generated for the
         * parent thread and exit events are arranged for subsequent wait4(). */
        ret = thread_exit(thread, /*send_ipc*/ false);

ditto


LibOS/shim/src/ipc/shim_ipc_child.c, line 247 at r2 (raw file):

int ipc_cld_profile_send(void) {
    struct shim_ipc_port* port = NULL;
    IDTYPE dest = (IDTYPE) -1;

no space after cast


LibOS/shim/src/ipc/shim_ipc_child.c, line 257 at r2 (raw file):

    unlock(&cur_process.lock);

    if (!port || (dest == (IDTYPE) -1))

ditto


LibOS/shim/src/ipc/shim_ipc_child.c, line 326 at r2 (raw file):

    debug("IPC callback from %u: IPC_CLD_PROFILE\n", msg->src & 0xFFFF);

    struct shim_ipc_cld_profile* msgin = (struct shim_ipc_cld_profile *) &msg->msg;

no space after cast


LibOS/shim/src/ipc/shim_ipc_helper.c, line 47 at r2 (raw file):

static LISTP_TYPE(shim_ipc_port) port_list;

/* can be read without ipc_helper_lock but always written with lock held */

That sounds like a race condition? Why would you lock only writes?


LibOS/shim/src/ipc/shim_ipc_helper.c, line 59 at r2 (raw file):

static ipc_callback ipc_callbacks[IPC_CODE_NUM] = {
    /* RESP             */  &ipc_resp_callback,

You could use [index] = val, syntax here.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 222 at r2 (raw file):

                port->fini[i] = fini;
                break;
            }

Shouldn't we fail if we didn't find any free entry? If this shouldn't happen, then at least an assert would be a nice-to-have.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 252 at r2 (raw file):

        if (msg->thread) {
            debug("Deleted pending message on port %p, wake up blocking thread %d\n",
                    port, msg->thread->tid);

Please fix wrapping here (there are 2 redundant spaces here)


LibOS/shim/src/ipc/shim_ipc_helper.c, line 373 at r2 (raw file):

    struct shim_ipc_port* port;
    struct shim_ipc_port** target_ports;
    int target_ports_cnt = 0;

size_t


LibOS/shim/src/ipc/shim_ipc_helper.c, line 396 at r2 (raw file):

        /* Rare case when there are more than 32 ports. Allocate big-enough
         * array on the heap and collect all ports again. */
        int cnt = 0;

size_t


LibOS/shim/src/ipc/shim_ipc_helper.c, line 398 at r2 (raw file):

Previously, yamahata wrote…

do we want to introduce realloc() which is currently commented out in sim_malloc.c?

It's commented out because it didn't work with migrated memory. I don't know how much work it's required to fix this.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 412 at r2 (raw file):

    unlock(&ipc_helper_lock);

    for (int i = 0; i < target_ports_cnt; i++)

size_t


LibOS/shim/src/ipc/shim_ipc_helper.c, line 416 at r2 (raw file):

    /* send msg to each collected port (note that ports cannot be freed in meantime) */
    for (int i = 0; i < target_ports_cnt; i++) {

size_t


LibOS/shim/src/ipc/shim_ipc_helper.c, line 426 at r2 (raw file):

        if (ret < 0) {
            debug("Broadcast to port %p (handle %p) for process %u failed (errno = %d)!\n",
                    port, port->pal_handle, port->vmid & 0xFFFF, ret);

2 spaces less


LibOS/shim/src/ipc/shim_ipc_helper.c, line 433 at r2 (raw file):

    ret = 0;
out:
    for (int i = 0; i < target_ports_cnt; i++)

size_t


LibOS/shim/src/ipc/shim_ipc_helper.c, line 450 at r2 (raw file):

    struct shim_ipc_msg_duplex* req_msg = pop_ipc_msg_duplex(port, msg->seq);

    /* if some thread waits for response, wake it up with response retval */

waits -> is waiting


LibOS/shim/src/ipc/shim_ipc_helper.c, line 461 at r2 (raw file):

}

int send_response_ipc_message(struct shim_ipc_port* port, IDTYPE dest, int ret, uint64_t seq) {

Wasn't seq an unsigned long in other places?


LibOS/shim/src/ipc/shim_ipc_helper.c, line 470 at r2 (raw file):

    resp_msg->seq = seq;

    struct shim_ipc_resp* resp = (struct shim_ipc_resp *) resp_msg->msg;

no space after cast


LibOS/shim/src/ipc/shim_ipc_helper.c, line 484 at r2 (raw file):

    struct shim_ipc_msg* msg = malloc(bufsize);
    int expected_size = IPC_MSG_MINIMAL_SIZE;
    int bytes = 0;

Could you make all memory size variables here size_t?


LibOS/shim/src/ipc/shim_ipc_helper.c, line 498 at r2 (raw file):

            }

            int read = DkStreamRead(port->pal_handle, /*offset*/ 0, expected_size - bytes + readahead,

size_t


LibOS/shim/src/ipc/shim_ipc_helper.c, line 498 at r2 (raw file):

            }

            int read = DkStreamRead(port->pal_handle, /*offset*/ 0, expected_size - bytes + readahead,

ditto (named arg formatting)


LibOS/shim/src/ipc/shim_ipc_helper.c, line 525 at r2 (raw file):

            if (msg->code < IPC_CODE_NUM && ipc_callbacks[msg->code]) {
                /* invoke callback to this msg */
                ret = (*ipc_callbacks[msg->code]) (msg, port);

no space between function and parens


LibOS/shim/src/ipc/shim_ipc_helper.c, line 531 at r2 (raw file):

                    if (ret < 0) {
                        debug("Sending IPC_RESP msg on port %p (handle %p) to %u failed\n",
                                port, port->pal_handle, msg->src & 0xFFFF);

ditto (wrapping)


LibOS/shim/src/ipc/shim_ipc_helper.c, line 544 at r2 (raw file):

            /* we may have started reading the next message, move this message
             * to beginning of msg buffer and reset expected size */
            memmove(msg, (void *) msg + expected_size, bytes);

no space after cast


LibOS/shim/src/ipc/shim_ipc_helper.c, line 589 at r2 (raw file):

     * - object_list collects IPC port objects and is the main handled list
     * - palhandle_list collects corresponding PAL handles of IPC port objects
     *   and is needed for DkObjectsWaitAny(.., <arrya-of-PAL-handles>, ..)

arrya -> array


LibOS/shim/src/ipc/shim_ipc_helper.c, line 649 at r2 (raw file):

                    if (attr.disconnected) {
                        debug("Port %p (handle %p) disconnected\n",
                                polled_port, polled_port->pal_handle);

ditto (wrapping)


LibOS/shim/src/ipc/shim_ipc_helper.c, line 654 at r2 (raw file):

                } else {
                    debug("Port %p (handle %p) was removed during attr querying\n",
                            polled_port, polled_port->pal_handle);

ditto


LibOS/shim/src/ipc/shim_ipc_helper.c, line 662 at r2 (raw file):

        /* done handling ports; put their references so they can be freed */
        for (size_t i = 0; i < object_list_size; i++) {
            struct shim_ipc_port* port = object_list[i];

This variable seems to be unnecessary.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 676 at r2 (raw file):

            __get_ipc_port(port);

            /* grow object_list and palhandle_list to accomodate more objects */

This comment should be inside the if.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 683 at r2 (raw file):

                        sizeof(PAL_HANDLE) * (1 + object_list_maxsize * 2));
                memcpy(tmp_array, object_list,
                        sizeof(struct shim_ipc_port *) * (object_list_maxsize));

object_list_maxsize -> object_list_size (for consistency with the next memcpy)


LibOS/shim/src/ipc/shim_ipc_helper.c, line 712 at r2 (raw file):

    /* IPC thread exits; put acquired port references so they can be freed */
    for (size_t i = 0; i < object_list_size; i++) {
        struct shim_ipc_port* port = object_list[i];

ditto (unnecessary variable)


LibOS/shim/src/ipc/shim_ipc_helper.c, line 98 at r3 (raw file):

    if (!cur_process.self) {
        /* very first process or clone/fork case: create IPC port from scratch */
        cur_process.self = create_ipc_info_cur_process(/*is_self_ipc_info*/ true);

ditto


LibOS/shim/src/ipc/shim_ipc_helper.c, line 110 at r3 (raw file):

                           cur_process.self->pal_handle,
                           IPC_PORT_SERVER,
                           /* fini callback */ NULL,

ditto


LibOS/shim/src/ipc/shim_ipc_helper.c, line 138 at r3 (raw file):

                       cur_process.parent->pal_handle,
                       IPC_PORT_DIRPRT | IPC_PORT_LISTEN,
                       /* fini callback */ NULL,

ditto


LibOS/shim/src/ipc/shim_ipc_helper.c, line 173 at r3 (raw file):

                       cur_process.ns[ns_idx]->pal_handle,
                       type | IPC_PORT_LISTEN,
                       /* fini callback */ NULL,

ditto


LibOS/shim/src/ipc/shim_ipc_pid.c, line 97 at r2 (raw file):

    init_ipc_msg(msg, IPC_PID_KILL, total_msg_size, dest);

    struct shim_ipc_pid_kill* msgin = (struct shim_ipc_pid_kill *) &msg->msg;

no space after cast


LibOS/shim/src/ipc/shim_ipc_pid.c, line 106 at r2 (raw file):

        debug("IPC broadcast: IPC_PID_KILL(%u, %d, %u, %d)\n",
              sender, type, target, signum);
        ret = broadcast_ipc(msg, IPC_PORT_DIRCLD|IPC_PORT_DIRPRT, /*exclude_port*/ NULL);

ditto (named arg formatting)


LibOS/shim/src/ipc/shim_ipc_pid.c, line 108 at r2 (raw file):

        ret = broadcast_ipc(msg, IPC_PORT_DIRCLD|IPC_PORT_DIRPRT, /*exclude_port*/ NULL);
    }
    else {

This should be on the previous line (} else {)


LibOS/shim/src/sys/shim_pipe.c, line 44 at r3 (raw file):

    if ((ret = create_pipe(pipeid, uri, PIPE_URI_SIZE, &hdl0,
                           qstr, /*use_vmid_for_name*/ false)) < 0) {

ditto


LibOS/shim/src/sys/shim_semget.c, line 577 at r2 (raw file):

            resp_msg->seq = sops->client.seq;

            struct shim_ipc_resp* resp = (struct shim_ipc_resp *) resp_msg->msg;

no space after cast


LibOS/shim/src/sys/shim_semget.c, line 800 at r2 (raw file):

            resp_msg->seq = client->seq;

            struct shim_ipc_resp* resp = (struct shim_ipc_resp *) resp_msg->msg;

ditto


LibOS/shim/src/sys/shim_sigaction.c, line 525 at r2 (raw file):

       calling process has permission to send */
    else if (pid == -1) {
        ipc_pid_kill_send(cur->tid, /*target*/ 0, KILL_ALL, sig);

ditto


LibOS/shim/src/sys/shim_sigaction.c, line 211 at r4 (raw file):

            atomic_read(&cur->signal_logs[sig - 1].tail)) {
            /* at least one signal of type sig... */
            if (!__sigismember(mask, sig)) {

Isn't this whole check racy?


LibOS/shim/src/sys/shim_sleep.c, line 44 at r5 (raw file):

    for (int sig = 1; sig <= NUM_SIGS; sig++) {
        if (atomic_read(&cur->signal_logs[sig - 1].head) !=
            atomic_read(&cur->signal_logs[sig - 1].tail)) {

This is racy, isn't this a problem? If not, why?


LibOS/shim/test/apps, line 1 at r2 (raw file):

Subproject commit 064b86df3e18f34f6916377b45c954b1a00c7f13

Please remember to update this after merging oscarlab/graphene-tests#3


LibOS/shim/test/regression/00_bootstrap.py, line 53 at r4 (raw file):

if rv: sys.exit(rv)

# Running Vfork and Exec

vfork and exec


LibOS/shim/test/regression/00_bootstrap.py, line 56 at r4 (raw file):

regression = Regression(loader, "vfork_and_exec")

regression.add_check(name="Vfork and exec 2 page child binary",

vfork


LibOS/shim/test/regression/fork_and_exec.c, line 6 at r3 (raw file):

#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>

Please sort includes


LibOS/shim/test/regression/fork_and_exec.c, line 13 at r3 (raw file):

    /* duplicate STDOUT into newfd and pass it as exec_victim argument
     * (it will be inherited by exec_victim) */
    int newfd = dup(1);

return value should be checked


LibOS/shim/test/regression/fork_and_exec.c, line 14 at r3 (raw file):

     * (it will be inherited by exec_victim) */
    int newfd = dup(1);
    char fd_argv[4];

4 -> 12


LibOS/shim/test/regression/fork_and_exec.c, line 19 at r3 (raw file):

    /* set environment variable to test that it is inherited by exec_victim */
    setenv("IN_EXECVE", "1", 1);

ditto


LibOS/shim/test/regression/vfork_and_exec.c, line 6 at r5 (raw file):

#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>

Please sort includes


LibOS/shim/test/regression/vfork_and_exec.c, line 13 at r5 (raw file):

    /* duplicate STDOUT into newfd and pass it as exec_victim argument
     * (it will be inherited by exec_victim) */
    int newfd = dup(1);

please check return value


LibOS/shim/test/regression/vfork_and_exec.c, line 14 at r5 (raw file):

     * (it will be inherited by exec_victim) */
    int newfd = dup(1);
    char fd_argv[4];

4 -> 12 (just to be on the safe side, as the number of open descriptors at the time of dup() is externally controlled)


LibOS/shim/test/regression/vfork_and_exec.c, line 19 at r5 (raw file):

    /* set environment variable to test that it is inherited by exec_victim */
    setenv("IN_EXECVE", "1", 1);

ditto (retval)


Pal/src/host/Linux-SGX/db_main.c, line 305 at r3 (raw file):

    if (rv) {
        SGX_DBG(DBG_E, "Failed to initialize enclave properties: %d\n", rv);
        ocall_exit(rv, /*is_exitgroup*/ true);

ditto


Pal/src/host/Linux-SGX/db_main.c, line 326 at r3 (raw file):

        if ((rv = init_child_process(&parent)) < 0) {
            SGX_DBG(DBG_E, "Failed to initialize child process: %d\n", rv);
            ocall_exit(rv, /*is_exitgroup*/ true);

ditto


Pal/src/host/Linux-SGX/db_main.c, line 369 at r3 (raw file):

    if ((rv = read_config(root_config, loader_filter, &errstring)) < 0) {
        SGX_DBG(DBG_E, "Can't read manifest: %s, error code %d\n", errstring, rv);
        ocall_exit(rv, /*is_exitgroup*/ true);

ditto


Pal/src/host/Linux-SGX/db_process.c, line 330 at r3 (raw file):

    if (exitcode)
        SGX_DBG(DBG_I, "DkProcessExit: Returning exit code %d\n", exitcode);
    ocall_exit(exitcode, /*is_exitgroup*/ true);

ditto


Pal/src/host/Linux-SGX/db_threading.c, line 143 at r3 (raw file):

noreturn void _DkThreadExit (void)
{
    ocall_exit(0, /*is_exitgroup*/ false);

ditto


Pal/src/host/Linux-SGX/enclave_framework.c, line 1240 at r3 (raw file):

    if (((uint64_t) ctx) != ctx->rsp - (sizeof(sgx_context_t) + RED_ZONE_SIZE)) {
        SGX_DBG(DBG_E, "Invalid sgx_context_t pointer passed to restore_sgx_context!\n");
        ocall_exit(1, /*is_exitgroup*/ false);

ditto


Pal/src/host/Linux-SGX/enclave_ocalls.h, line 11 at r3 (raw file):

#include <linux/poll.h>

noreturn void ocall_exit (int exitcode, int is_exitgroup);

Why not bool?


Pal/src/host/Linux-SGX/enclave_pages.c, line 63 at r3 (raw file):

                __asm__ volatile ("int $3" ::: "memory");
#endif
            ocall_exit(1, /*is_exitgroup*/ true);

ditto


Pal/src/host/Linux-SGX/ocall_types.h, line 63 at r3 (raw file):

typedef struct {
    int ms_exitcode;
    int ms_is_exitgroup;

ditto (bool)

@dimakuv
Copy link
Author

dimakuv commented Jul 13, 2019

Note to myself: Jenkins-SGX failed on Vfork and exec 2 page child binary. Stress-test and fix.

Update: these failures are not related to my changes. They are described in issues #826 and #828 and should be fixed by other PRs. Continuing my tests of killXX.

@dimakuv
Copy link
Author

dimakuv commented Jul 16, 2019

Retest this please (run 6 out of 10, hoping to see no failures except for clone03 and vfork_and_exec)

@dimakuv
Copy link
Author

dimakuv commented Jul 16, 2019

Retest this please (run 7 out of 10, hoping to see no failures except for clone03 and vfork_and_exec)

@dimakuv
Copy link
Author

dimakuv commented Jul 16, 2019

Retest this please (run 8 out of 10, hoping to see no failures except for clone03 and vfork_and_exec) (clone03 failed on Jenkins-Debug)

@dimakuv
Copy link
Author

dimakuv commented Jul 16, 2019

Retest this please (run 9 out of 10, hoping to see no failures except for clone03 and vfork_and_exec)

@dimakuv
Copy link
Author

dimakuv commented Jul 16, 2019

Retest this please (run 10 out of 10, hoping to see no failures except for clone03 and vfork_and_exec)

@dimakuv
Copy link
Author

dimakuv commented Jul 16, 2019

Retest Jenkins-SGX please (vfork_and_exec failed)

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 94 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv, @mkow, and @yamahata)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

From one of the commit messages:

We take a shortcut for now and emulate vfork() via fork(): this is allowed by POSIX and almost all Linux apps correctly use vfork().

I'm not sure how to understand the end of this sentence. Maybe you meant currently, not correctly?

I meant "correctly" in the sense that Linux apps I saw (except one test program in LTP) do not abuse vfork(). I will simply remove the last part in the new commit message, smth like: We take a shortcut for now and emulate vfork() via fork(): this is allowed by POSIX.



LibOS/shim/include/shim_ipc.h, line 457 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This is not needed, 0 is the default for the first element of an enum.

Done.


LibOS/shim/include/shim_ipc.h, line 501 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I'd prefer to have it implemented here the same way as get_ipc_msg_size. Now, if sizeof(struct shim_ipc_msg_duplex) would be smaller than sizeof(struct shim_ipc_msg) we could return something smaller than IPC_MSG_MINIMAL_SIZE.

Done. But notice that this function's logic is slightly different than get_ipc_msg_size().


LibOS/shim/src/shim_async.c, line 278 at r2 (raw file):

Previously, yamahata wrote…

I guess this is for code resembrance to ipc helper.
If so, enable_locking() should be called somewhere. probaby in init_async() or intall_async_event()

Done.


LibOS/shim/src/shim_init.c, line 844 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

should be >= size, not == size

Done.


LibOS/shim/src/shim_init.c, line 846 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

no space after cast

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 757 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Do we want to export it if it's only used here, internally?

Agreed with @mkow, I would leave it here. There is no reason to have it in the header.


LibOS/shim/src/bookkeep/shim_signal.c, line 803 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Shouldn't this be &sighandler_kill?

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 819 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Why do we include SIGRTMIN in the list?

Done.


LibOS/shim/src/ipc/shim_ipc.c, line 20 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

codes -> code

Done.


LibOS/shim/src/ipc/shim_ipc.c, line 48 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I think "BITLEN" would be better here, we usually use "LEN" for byte-size.

Done.


LibOS/shim/src/ipc/shim_ipc.c, line 152 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Why does this function calculate length from strlen, but the previous one requires passing it in an argument?

Done.


LibOS/shim/src/ipc/shim_ipc.c, line 156 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

exists -> exist

No, there is just one info object. But I changed to check if info with this vmid & uri already exists and return it to make it more explicit that there is one info which has two fields: vmid & uri.


LibOS/shim/src/ipc/shim_ipc.c, line 305 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

either sends messages or will send the message

Done.


LibOS/shim/src/ipc/shim_ipc.c, line 326 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

--> Waiting for [...]

Done.


LibOS/shim/src/ipc/shim_ipc.c, line 328 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto (is sending)

Done.


LibOS/shim/src/ipc/shim_ipc.c, line 406 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

no space after cast

Done.


LibOS/shim/src/ipc/shim_ipc.c, line 413 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

/*exclude_port=*/NULL
This way we'll be able to verify correctness of this name when we introduce clang-tidy and use bugprone-argument-comment check.

Done.


LibOS/shim/src/ipc/shim_ipc.c, line 451 at r2 (raw file):

Previously, yamahata wrote…

Is this bug fix?

Yes.


LibOS/shim/src/ipc/shim_ipc_child.c, line 46 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Please use bool.

This doesn't make sense without changing the callback-function signature of: int walk_thread_list (int (*callback) (struct shim_thread *, void *, bool *). In this particular case, child_stread_exit() is a callback passed to walk_thread_list() and must have return value of type int. Our local found_exiting_thread is the return value therefore must be int.

We may have a separate PR to fix the signature of the walk_thread_list() callback but not here.


LibOS/shim/src/ipc/shim_ipc_child.c, line 70 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto (bool)

ditto


LibOS/shim/src/ipc/shim_ipc_child.c, line 125 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

assume -> assuming
force -> forcing

Done.


LibOS/shim/src/ipc/shim_ipc_child.c, line 156 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


LibOS/shim/src/ipc/shim_ipc_child.c, line 161 at r2 (raw file):

Previously, yamahata wrote…
mesage

typo: message.

Done.


LibOS/shim/src/ipc/shim_ipc_child.c, line 192 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

who -> which

Done.


LibOS/shim/src/ipc/shim_ipc_child.c, line 209 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


LibOS/shim/src/ipc/shim_ipc_child.c, line 247 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

no space after cast

Done.


LibOS/shim/src/ipc/shim_ipc_child.c, line 257 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


LibOS/shim/src/ipc/shim_ipc_child.c, line 326 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

no space after cast

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 47 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

That sounds like a race condition? Why would you lock only writes?

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 59 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

You could use [index] = val, syntax here.

Not really, because there is this annoying IPC_NS_CALLBACKS(pid) macro in the middle of the array initialization. We will hopefully change that macro ugliness one day, but not in this PR.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 222 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Shouldn't we fail if we didn't find any free entry? If this shouldn't happen, then at least an assert would be a nice-to-have.

Done. Added assert; this shouldn't happen in our code.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 252 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Please fix wrapping here (there are 2 redundant spaces here)

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 330 at r2 (raw file):

Previously, yamahata wrote…

Here is it guaranteed refcount == 1 or can del_ipc_port_fini() be called multiple times?
Original code indicates the latter. But the new code indicates the former.
At least the contract is that fini function is called only once.

Done. Indeed I forgot to add port->fini[i] = NULL to make sure the fini function is called only once.

As for the other part of the question, del_ipc_port_fini() can be called multiple times. If this happens, then the check if (LIST_EMPTY(port, list)) { ... return...} prevents from decrementing refcount multiple times. And because we're doing __get_ipc_port(port), we guarantee that refcount >= 1.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 373 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

size_t

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 381 at r2 (raw file):

Previously, yamahata wrote…
32

Please introduce symbol instead of raw immediate.

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 396 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

size_t

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 398 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

It's commented out because it didn't work with migrated memory. I don't know how much work it's required to fix this.

Yes, let's not introduce realloc (and fix all related bugs with checkpointing) in this PR.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 410 at r2 (raw file):

Previously, yamahata wrote…
    unlock(&ipc_helper_lock);

    for (int i = 0; i < target_ports_cnt; i++)
        get_ipc_port(target_ports[i]);

unlock() should be after incrementing refcount to avoid dereferening by other thread.

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 412 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

size_t

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 416 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

size_t

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 426 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

2 spaces less

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 433 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

size_t

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 450 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

waits -> is waiting

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 461 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Wasn't seq an unsigned long in other places?

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 470 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

no space after cast

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 484 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Could you make all memory size variables here size_t?

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 495 at r2 (raw file):

Previously, yamahata wrote…

realloc would help.

As mentioned above, cannot use it.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 498 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

size_t

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 498 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto (named arg formatting)

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 525 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

no space between function and parens

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 531 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto (wrapping)

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 544 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

no space after cast

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 589 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

arrya -> array

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 649 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto (wrapping)

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 654 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 662 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This variable seems to be unnecessary.

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 676 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This comment should be inside the if.

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 683 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

object_list_maxsize -> object_list_size (for consistency with the next memcpy)

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 712 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto (unnecessary variable)

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 746 at r2 (raw file):

Previously, yamahata wrote…

free stack if (stack).

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 777 at r2 (raw file):

Previously, yamahata wrote…

This is (potentially) dangerous because put_thread may overwrite errno. (maybe it can be debug message).

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 98 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 110 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 138 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 173 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


LibOS/shim/src/ipc/shim_ipc_pid.c, line 97 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

no space after cast

Done.


LibOS/shim/src/ipc/shim_ipc_pid.c, line 106 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto (named arg formatting)

Done.


LibOS/shim/src/ipc/shim_ipc_pid.c, line 108 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This should be on the previous line (} else {)

Done.


LibOS/shim/src/sys/shim_pipe.c, line 44 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


LibOS/shim/src/sys/shim_semget.c, line 577 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

no space after cast

Done.


LibOS/shim/src/sys/shim_semget.c, line 800 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


LibOS/shim/src/sys/shim_sigaction.c, line 525 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


LibOS/shim/src/sys/shim_sigaction.c, line 211 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Isn't this whole check racy?

Done.


LibOS/shim/src/sys/shim_sleep.c, line 44 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This is racy, isn't this a problem? If not, why?

Done.


LibOS/shim/test/apps, line 1 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Please remember to update this after merging oscarlab/graphene-tests#3

Done.


LibOS/shim/test/regression/00_bootstrap.py, line 53 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

vfork and exec

Done.


LibOS/shim/test/regression/00_bootstrap.py, line 56 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

vfork

Done.


LibOS/shim/test/regression/fork_and_exec.c, line 6 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Please sort includes

Done.


LibOS/shim/test/regression/fork_and_exec.c, line 13 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

return value should be checked

Done.


LibOS/shim/test/regression/fork_and_exec.c, line 14 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

4 -> 12

Done.


LibOS/shim/test/regression/fork_and_exec.c, line 19 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


LibOS/shim/test/regression/vfork_and_exec.c, line 6 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Please sort includes

Done.


LibOS/shim/test/regression/vfork_and_exec.c, line 13 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

please check return value

Done.


LibOS/shim/test/regression/vfork_and_exec.c, line 14 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

4 -> 12 (just to be on the safe side, as the number of open descriptors at the time of dup() is externally controlled)

Done.


LibOS/shim/test/regression/vfork_and_exec.c, line 19 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto (retval)

Done.


Pal/src/host/Linux-SGX/db_main.c, line 305 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


Pal/src/host/Linux-SGX/db_main.c, line 326 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


Pal/src/host/Linux-SGX/db_main.c, line 369 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


Pal/src/host/Linux-SGX/db_process.c, line 330 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


Pal/src/host/Linux-SGX/db_threading.c, line 143 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


Pal/src/host/Linux-SGX/enclave_framework.c, line 1240 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


Pal/src/host/Linux-SGX/enclave_ocalls.h, line 11 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Why not bool?

There is no hard reason, but since there is the ms_is_exitgroup which serves as an interface between enclave and untrusted part, and all such interfaces have only primitive types, I opted for int.


Pal/src/host/Linux-SGX/enclave_pages.c, line 63 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


Pal/src/host/Linux-SGX/ocall_types.h, line 63 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto (bool)

It serves as an interface between enclave and untrusted part, and all such interfaces have only primitive types, so I opted for int.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 22 of 22 files at r6, 1 of 1 files at r7.
Reviewable status: all files reviewed, 11 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv, @mkow, and @yamahata)


LibOS/shim/include/shim_ipc.h, line 501 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Done. But notice that this function's logic is slightly different than get_ipc_msg_size().

With this assert you can leave the old (simpler) code. It should be either:

  • the old version + assert, or
  • the current version without the assert, but with that min size checked on base_size + add_size, not base_size

LibOS/shim/src/ipc/shim_ipc.c, line 156 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

No, there is just one info object. But I changed to check if info with this vmid & uri already exists and return it to make it more explicit that there is one info which has two fields: vmid & uri.

Ah, now I see. New version looks good.


LibOS/shim/src/ipc/shim_ipc_child.c, line 46 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This doesn't make sense without changing the callback-function signature of: int walk_thread_list (int (*callback) (struct shim_thread *, void *, bool *). In this particular case, child_stread_exit() is a callback passed to walk_thread_list() and must have return value of type int. Our local found_exiting_thread is the return value therefore must be int.

We may have a separate PR to fix the signature of the walk_thread_list() callback but not here.

Ah, I didn't notice that this is a callback. Let's leave it for now as is.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 59 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Not really, because there is this annoying IPC_NS_CALLBACKS(pid) macro in the middle of the array initialization. We will hopefully change that macro ugliness one day, but not in this PR.

Argh :/

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 11 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv, @mkow, and @yamahata)


LibOS/shim/include/shim_ipc.h, line 501 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

With this assert you can leave the old (simpler) code. It should be either:

  • the old version + assert, or
  • the current version without the assert, but with that min size checked on base_size + add_size, not base_size

Done. Old version + assert.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r8.
Reviewable status: all files reviewed, 10 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv, @mkow, and @yamahata)

mkow
mkow previously approved these changes Jul 18, 2019
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @dimakuv, @mkow, and @yamahata)

Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @dimakuv and @mkow)


LibOS/shim/src/shim_checkpoint.c, line 1134 at r8 (raw file):

256

nitpick: sizeof(new_process_self_uri)


LibOS/shim/src/bookkeep/shim_signal.c, line 803 at r8 (raw file):

        /* SIGALRM */   &sighandler_kill,
        /* SIGTERM */   &sighandler_kill,
        /* SIGSTKFLT */ &sighandler_kill,

Linux/x86-64 doesn't generate it. it doesn't harm, though.


LibOS/shim/src/bookkeep/shim_thread.c, line 659 at r8 (raw file):

    __UNUSED(offset);

    thread->vmid = cur_process.vmid;

This seems accidental addition.
thread here can be one in other process.


LibOS/shim/src/sys/shim_vfork.c, line 67 at r8 (raw file):

int shim_do_vfork (void)
{
#ifdef ALIAS_VFORK_AS_FORK

Maybe we'd like to #ifdef shim_thread::dummy and related code.

Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 22 files at r3, 2 of 7 files at r4, 2 of 2 files at r5, 20 of 22 files at r6, 1 of 1 files at r7, 1 of 1 files at r8.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @dimakuv and @mkow)

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @yamahata from a discussion.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @dimakuv, @mkow, and @yamahata)


LibOS/shim/src/bookkeep/shim_signal.c, line 803 at r8 (raw file):

Previously, yamahata wrote…

Linux/x86-64 doesn't generate it. it doesn't harm, though.

Yes, we decided to add it (see resolved discussion with @mkow)


LibOS/shim/src/bookkeep/shim_thread.c, line 659 at r8 (raw file):

Previously, yamahata wrote…

This seems accidental addition.
thread here can be one in other process.

Done. Good catch.


LibOS/shim/src/sys/shim_vfork.c, line 67 at r8 (raw file):

Previously, yamahata wrote…

Maybe we'd like to #ifdef shim_thread::dummy and related code.

Done. Good idea. I still left thread->dummy field because I personally dislike ifdefs inside structs (but added a comment). But we can also change that if you want.

Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 6 files at r9.
Reviewable status: 37 of 41 files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv, @mkow, and @yamahata)

yamahata
yamahata previously approved these changes Jul 19, 2019
Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 6 files at r9.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv and @mkow)

mkow
mkow previously approved these changes Jul 19, 2019
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r9.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @dimakuv and @mkow)

@dimakuv dimakuv dismissed stale reviews from mkow and yamahata via 89a25d1 July 19, 2019 06:57
@dimakuv dimakuv force-pushed the dimakuv/ipc-helper-cleanup branch from 8df30ca to 89a25d1 Compare July 19, 2019 06:57
Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 34 of 41 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @mkow and @yamahata)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

Please fix poortly typo in the commit messages when doing the final rebase.

Done.


a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I meant "correctly" in the sense that Linux apps I saw (except one test program in LTP) do not abuse vfork(). I will simply remove the last part in the new commit message, smth like: We take a shortcut for now and emulate vfork() via fork(): this is allowed by POSIX.

Done.


Dmitrii Kuvaiskii added 15 commits July 19, 2019 01:03
Also increases timeout on LibOS's "udp" test to 50 seconds. It seems to
timeout frequently otherwise.
- Removed unnecessary macros (IPC_PORT_IFPOLL, DEBUG_REF, etc).
- Simplified function signatures and changed to better names.
- Removed unused functions (del_ipc_port, del_ipc_port_by_id).
- Malloc instead of huge stack allocations.
- Removed complex logic of exit_with_ipc_helper(), now IPC thread exits
  similarly to Async helper thread.
- Removed ipc_port_pool hash list (used as perf optimization but
  providing no tangible benefit at the cost of high complexity).
- Simplified IPC helper thread states to only ALIVE & NOTALIVE.
- Removed unused broadcast_port.
- Reworked IPC helper thread's while-loop similarly to Async helper
  thread; removed perf optimization of keeping the same list of ports to
  listen on DkObjectsWaitAny(), instead simply repopulate this list
  every time (may become too slow if lots of IPC on hundreds of ports).
- Changed type of shim_ipc_msg.size from int to size_t.
- Renamed shim_ipc_msg_obj to shim_ipc_msg_duplex for readability.
- Simplified function signatures and changed to better names.
- Removed unused IPC_FINDURI & IPC_TELLURI and corresponding functions.
- Replaced macros IPC_MSG_SIZE & IPC_MSGOBJ_SIZE with inline functions.
- Removed dangerous create_xxx_on_stack() functions and changed all
  invocations to have explicit __alloca's. Those functions relied on
  being inlined in the callers otherwise their created objects would
  become corrupted. Explicit __alloca's avoid this brittle
  implementation and make object ownership clear.
- Removed unnecessary wrapper function do_ipc_duplex(), replacing it
  with send_ipc_message_duplex().
- Removed unused IPC_CLD_JOIN and corresponding functions
  ipc_cld_join_send/ipc_cld_join_callback.
- Renamed poorly named ipc_child_exit to ipc_port_with_child_fini.
- Added thread locking during child_sthread_exit/child_thread_exit.
- Added extensive comments.
Previously, there was an incorrect corner case during discovering of the
current namespace leader. If the parent process would exit before the
child, the child process would fail on sending FINDNS message to the
parent (because of closed parent socket). The previous code logic would
assume that since NS_LEADER is set to some value, the leader process
exists. In reality, the child loses the only source of information about
the leader process (note that FINDNS is sent only to the parent), so the
only meaningful action is to set myself as the new leader.
Previously, the IPC subsystem incorrectly sent IPC_PID_KILL message
(generated as part of kill() syscall) as a duplex message, i.e., the
sender thread (the one issuing kill()) was paused until the receiving
child process handled IPC_PID_KILL callback and sent the acknowledgement
reply message back to sender.

This incorrect logic created a data race between the IPC_PID_KILL ack
message and the exiting child process. In particular, the child could
exit and all its resources (including IPC port to communicate with
parent) could be reclaimed by host OS. This could lead to IPC_PID_KILL
ack message being lost (because of the closed IPC port), and the paused
parent thread would wake up with -ECONNRESET instead of the ack message.
This would lead the kill() implementation to believe that child process
never existed in the first place and to return -ESRCH.

The fix to this data race is to send IPC_PID_KILL without waiting for
acknowledgement. Specification of kill() syscall does not require it to
be synchronous (indeed it is not on Linux), so this fix is correct. This
fix also enabled to merge broadcast_signal() into ipc_pid_kill_send().
Previously, when a thread exited via thread_exit(), it could send two
identical IPC_CLD_EXIT messages under certain conditions. This commit
fixes this bug and forces thread_exit() to send IPC_CLD_EXIT at most once.
This commit adds additional logic around DkStreamWrite() and
DkStreamRead() in send_ipc_message() and receive_ipc_message()
respectively: interrupts and partial reads/writes are handled correctly.
This commit adds default signal dispositions (as per Linux) to all 32
standard signals. It also adds WCOREDUMP bit to correctly inform wait4()
status word.

This commit updates graphene-tests submodules to enable new killXX LTP
regression tests, as well as fix the waitpid05 LTP test. It is important
to update this submodule reference in this commit because otherwise
waitpid05 tests will fail Graphene's CI.
Previously, both DkProcessExit() and DkThreadExit() used SGX OCALL
ocall_exit(exitcode), which finally issued exit() syscall. This is
incorrect because DkProcessExit() must exit the whole process and not
just a single thread. This led to abandoned IPC/Async helper threads
in some Graphene-SGX corner cases. This commit forces DkProcessExit()
to result in exit_group() syscall, achieved by adding a new argument
to ocall_exit(exitcode, is_exitgroup).
Previously, Graphene incorrectly treated execve() under SGX PAL:
Graphene would emulate execve() as fork + execve, and the new forked
process didn't try to "assume" the identity of its parent (which
violated execve specification "all process attributes are preserved").

This commit reworks the implementations of clone/fork and execve. In
particular, the IPC subsystem clearly distinguishes between the two
cases: clone/fork works as before whereas execve forks new "real"
process (which starts executing the requested program) and silently
exits the now-useless "temporary" process. New "real" process assumes
the identity of "temporary" process by inheriting its VMID (ID of
process for IPC purposes) and IPC-info objects with their PAL handles.

This commit also cleans up initialization of four IPC-info objects: self
(creates process-unique server pipe for IPC), parent (holds pipe for IPC
with parent process), and two namespace leaders (hold pipes for IPC with
leader processes). To correctly identify new-process server pipe, the
implementation of create_pipe() now allows to create VMID-based pipe URI.
Sometimes apps are built with the ELF header containing OS ABI ==
ELFOSABI_LINUX (GCC is one example). Previously, LibOS dynamic loader
only allowed OS ABI == ELFOSABI_SYSV (most apps are built with it) and
failed on ELFOSABI_LINUX. This commit teaches LibOS loader to accept
ELFOSABI_LINUX binaries and removes some redundant checks on ELF header.
New implementation is now in line with dynamic loader of PAL.
Previous implementation of vfork() was both buggy (worked only in simple
cases) and not compatible with the new IPC implementation. We take a
shortcut for now and emulate vfork() via fork(); this is allowed by
POSIX. This commit also adds LibOS regression test for vfork().
Previously, sigsuspend(), pause(), and nanosleep() syscalls did not
check for pending signals before waiting on signals. This commit adds
this logic, similar to how this is implemented in Linux. This fixes data
races in regression tests relying on these syscalls (killXX in ltp).
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r10.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@mkow mkow merged commit 89a25d1 into master Jul 19, 2019
@mkow mkow deleted the dimakuv/ipc-helper-cleanup branch July 19, 2019 14:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[LTP tests] killXX tests do not work properly
3 participants