Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: C popen hangs when linked with Go shared library #12873

Closed
pavel-odintsov opened this issue Oct 7, 2015 · 15 comments
Closed

runtime: C popen hangs when linked with Go shared library #12873

pavel-odintsov opened this issue Oct 7, 2015 · 15 comments
Milestone

Comments

@pavel-odintsov
Copy link

Hello, Folks!

I have used dynamic library builded with Go:

go build -buildmode=c-shared -o libgobgp.so *.go

And I link this dynamic library with my C++ code which uses daemonization and popen system call:

// Call fork function
int do_fork() {
    int status = 0;

    switch (fork()) {
    case 0:
        // It's child
        break;
    case -1:
        /* fork failed */
        status = -1;
        break;
    default:
        // We should close master process with _exit(0)
        // We should not call exit() because it will destroy all global variables for programm
        _exit(0);
    }

    return status;
}

void redirect_fds() {
    // Close stdin, stdout and stderr
    close(0);
    close(1);
    close(2);

    if (open("/dev/null", O_RDWR) != 0) { 
        // We can't notify anybody now
        exit(1);
    }    

    // Create copy of zero decriptor for 1 and 2 fd's
    // We do not need return codes here but we need do it for suppressing complaints from compiler
    int first_dup_result  = dup(0);
    int second_dup_result = dup(0);
}

std::vector<std::string> exec(std::string cmd) {
    std::vector<std::string> output_list;

    FILE* pipe = popen(cmd.c_str(), "r");
    if (!pipe) return output_list;

    char buffer[256];
    while (!feof(pipe)) {
        if (fgets(buffer, 256, pipe) != NULL) {
            size_t newbuflen = strlen(buffer);

            // remove newline at the end
            if (buffer[newbuflen - 1] == '\n') {
                buffer[newbuflen - 1] = '\0';
            }    

            output_list.push_back(buffer);
        }    
    }    

    pclose(pipe);
    return output_list;
}

int main () {
    if (daemonize) {
        int status = 0;

        printf("We will run in daemonized mode\n");

        if ((status = do_fork()) < 0) {
            // fork failed
            status = -1;
        } else if (setsid() < 0) {
            // Create new session
            status = -1;
        } else if ((status = do_fork()) < 0) {
            status = -1;
        } else {
            // Clear inherited umask
            umask(0);

            // Chdir to root
            int chdir_result = chdir("/");

            // close all descriptors because we are daemon!
            redirect_fds();
        }
    }

    // this code become zombie
    exec("ip link show");

    // this code will be never called
    while (true) {

   }
}

When I run my tool in normal non daemon code everything work perfectly.

But when I use --daemonize with my toolkit everything become weird and "/sbin/ip" become zombie:

15747 root       20   0  297M 18332 12408 S  0.0  0.1  0:00.00 ├─ ./fastnetmon --daemonize
15748 root       20   0     0     0     0 Z  0.0  0.0  0:00.00 │  └─ sh

cat /proc/15748/status
Name:   sh
State:  Z (zombie)

In strace I saw:

[pid 15475] write(1, "11: eth5: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000\\    link/ether a0:36:9f:0c:8d:d6 brd ff:ff:ff:ff:ff:ff\n", 171 <unfinished ...>
[pid 15473] <... read resumed> "10: eth6: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000\\    link/ether a0:36:9f:0c:8d:d4 brd ff:ff:ff:ff:ff:ff\n", 4096) = 171
[pid 15475] <... write resumed> )       = 171
[pid 15473] read(4,  <unfinished ...>
[pid 15475] exit_group(0)               = ?
[pid 15473] <... read resumed> "11: eth5: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000\\    link/ether a0:36:9f:0c:8d:d6 brd ff:ff:ff:ff:ff:ff\n", 4096) = 171
[pid 15473] read(4,  <unfinished ...>
[pid 15475] +++ exited with 0 +++
[pid 15474] <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 15475
[pid 15474] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=15475, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 15474] rt_sigreturn()              = 15475
[pid 15474] exit_group(0)               = ?
[pid 15474] +++ exited with 0 +++
<... read resumed> "", 4096)            = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=15474, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigprocmask(SIG_SETMASK, NULL, ~[KILL STOP], 8) = 0
sigaltstack({ss_sp=0xc820032000, ss_flags=0, ss_size=32672}, NULL) = 0
gettid()                                = 15473
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STKFLT CHLD STOP PROF], NULL, 8) = 0
futex(0xc820030110, FUTEX_WAIT, 0, NULL

I have found so much signal reconfigurations from Go's runtime side and assume this problems related with Go runtime itself. And we have got conflict between Golang's signal handlers and popen's SIGCHILD signal call.

Do you have any workaround for this case?

@ianlancetaylor ianlancetaylor changed the title Golang's dynamic library broke C++ code when linked together runtime: Golang's dynamic library broke C++ code when linked together Oct 7, 2015
@ianlancetaylor
Copy link
Member

Can you pin down whether the problem is daemonize or whether problem is popen?

@pavel-odintsov
Copy link
Author

Problem related with popen. Daemonization works fine but following popen call block main() execution.

@ianlancetaylor ianlancetaylor changed the title runtime: Golang's dynamic library broke C++ code when linked together runtime: C popen hangs when linked with Go shared library Oct 7, 2015
@ianlancetaylor
Copy link
Member

I don't know whether this is your problem, but it's not clear that you can call fork in a C program linked against a Go shared library. In general fork does not work for threaded programs, and Go code is always threaded. You may have to dlopen the shared library after calling fork.

@ianlancetaylor ianlancetaylor added this to the Go1.6 milestone Oct 7, 2015
@brknstrngz
Copy link

fork() only works in threaded programs if the first call in the child process is async-signal-safe. IOW, if you ONLY call exec*() in the child, all is fine. Otherwise the behaviour is undefined. Another approach is to make sure you call popen() before executing any code from the Go shared library.

@ianlancetaylor
Copy link
Member

You basically can't call popen (or fork) before executing any code from the Go shared library, because the Go shared library has an initializer that runs when the library is loaded. If you link directly against the shared library rather than using dlopen, that initializer will run before the main function.

@rakyll
Copy link
Contributor

rakyll commented Oct 8, 2015

You need to dynamically load the Go shared library and export a symbol you can dlsym and invoke from your non-Go main. Go mobile's app package provides a good reference if you can avoid the JNI calls. https://github.com/golang/mobile/blob/master/app/android.c#L64

Android apps dlopen Go mobile shared libraries and use ANativeActivity_onCreate as an entry point.

@pavel-odintsov
Copy link
Author

Thanks for detailed answers!

Will be fine to get some external function which could call initialization code explicitly. What about this approach?

pavel-odintsov added a commit to pavel-odintsov/fastnetmon that referenced this issue Oct 8, 2015
… compilation time to runtime linking with dynamic library
@pavel-odintsov
Copy link
Author

Hello!

Thanks again! I have moved to dynamic linking in runtime and run Golang's runtime after daemonization code. You could find examples here: https://github.com/pavel-odintsov/fastnetmon/blob/master/src/actions/gobgp_action.cpp#L290

@pavel-odintsov
Copy link
Author

But I hit another issue with dlclose.

So when I use code linke this: https://github.com/pavel-odintsov/gobgp_api_cpp_client/blob/master/gobgp_api_client.cc#L265 and call dlclose before toolkit exit I got segmentation fault.

LD_LIBRARY_PATH="/opt/grpc_0_11_1_7a94236d698477636dd06282f12f706cad527029/lib;/opt/protobuf_3.0.0_alpha4/lib;/opt//libgobgp_1_0_0/lib/" gdb ./gobgp_api_client 
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./gobgp_api_client...done.
(gdb) run
Starting program: /usr/src/gobgp_api_cpp_client/gobgp_api_client 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
warning: File "/usr/local/go/src/runtime/runtime-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
    add-auto-load-safe-path /usr/local/go/src/runtime/runtime-gdb.py
line to your configuration file "/root/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/root/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
[New Thread 0x7ffff4c71700 (LWP 22000)]
[New Thread 0x7fffeffff700 (LWP 22001)]
[New Thread 0x7fffef7fe700 (LWP 22002)]
[New Thread 0x7fffeeffd700 (LWP 22003)]
[New Thread 0x7fffee7fc700 (LWP 22004)]
[New Thread 0x7fffedffb700 (LWP 22005)]
[Thread 0x7fffedffb700 (LWP 22005) exited]
List of announced prefixes for route family: 65537

Prefix: 10.10.20.33/32
NLRI: {"nlri":{"prefix":"10.10.20.33/32"},"attrs":[{"type":1,"value":2},{"type":3,"nexthop":"10.10.1.99"}]}


List of announced prefixes for route family: 65669

Prefix: [destination:10.0.0.0/24][protocol: tcp][source:20.0.0.0/24]
NLRI: {"nlri":{"value":[{"type":1,"value":{"prefix":"10.0.0.0/24"}},{"type":3,"value":[{"op":129,"value":6}]},{"type":2,"value":{"prefix":"20.0.0.0/24"}}]},"attrs":[{"type":1,"value":2},{"type":14,"nexthop":"0.0.0.0","afi":1,"safi":133,"value":[{"value":[{"type":1,"value":{"prefix":"10.0.0.0/24"}},{"type":3,"value":[{"op":129,"value":6}]},{"type":2,"value":{"prefix":"20.0.0.0/24"}}]}]},{"type":16,"value":[{"type":128,"subtype":8,"value":"10:10"}]}]}

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffee7fc700 (LWP 22004)]
0x00007ffff50c91a1 in ?? ()
(gdb) bt
#0  0x00007ffff50c91a1 in ?? ()
#1  0x00007ffff50932b3 in ?? ()
#2  0x000000c820066110 in ?? ()
#3  0x0000000000000000 in ?? ()
(gdb) quit
A debugging session is active.

    Inferior 1 [process 21996] will be killed.

Quit anyway? (y or n) y

Do you have any options to shutdown Golang's runtime correctly?

@ianlancetaylor
Copy link
Member

You can not dlclose a Go shared library. It simply doesn't work at all.

@pavel-odintsov
Copy link
Author

:(

ianlancetaylor added a commit that referenced this issue Oct 8, 2015
Go shared libraries do not support dlclose, and there is no likelihood
that they will suppose dlclose in the future.  Set the DF_1_NODELETE
flag to tell the dynamic linker to not attempt to remove them from
memory.  This makes the shared library act as though every call to
dlopen passed the RTLD_NODELETE flag.

Fixes #12582.
Update #11100.
Update #12873.

Change-Id: Id4b6e90a1b54e2e6fc8355b5fb22c5978fc762b4
Reviewed-on: https://go-review.googlesource.com/15605
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
@rsc
Copy link
Contributor

rsc commented Oct 14, 2015

Is there anything to be done here? It seems like the problem has been identified (can't fork with threads) and is not Go's fault.

Ian, in #12873 (comment) I think s/before/after/ or something in the first sentence.

@pavel-odintsov
Copy link
Author

We have only single issue with dlclose here.

So we need some docs about Go's runtime behavior when we link with C. I do not know about multiple threads on main app startup before this issue.

@ianlancetaylor
Copy link
Member

@rsc What I meant by that sentence is that even if you would like to call popen before executing Go code, you can't, because Go code will get in before you anyhow.

@ianlancetaylor
Copy link
Member

I've modified Go so that you can no longer dlclose a Go shared library (https://golang.org/cl/15605). If that is the only issue here, perhaps we should close this bug.

Yes, more docs are always good.

@rsc rsc closed this as completed Nov 5, 2015
pavel-odintsov added a commit to pavel-odintsov/fastnetmon that referenced this issue Sep 17, 2016
… compilation time to runtime linking with dynamic library
@golang golang locked and limited conversation to collaborators Nov 4, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants