Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Fibers causes epic crash #46

Open
withinboredom opened this issue Oct 20, 2022 · 13 comments · May be fixed by #387 or #171
Open

Using Fibers causes epic crash #46

withinboredom opened this issue Oct 20, 2022 · 13 comments · May be fixed by #387 or #171
Labels
bug Something isn't working

Comments

@withinboredom
Copy link
Collaborator

withinboredom commented Oct 20, 2022

Minimal code to reproduce:

<?php

do {
    $running = false;
    //$running = frankenphp_handle_request(function (): void {
        $fiber = new Fiber(function() {
            echo "Starting Fiber\n";
        });
        $fiber->start();
    //});
} while ($running);

With some slight modifications, it can also be reproduced in worker mode.

@dunglas dunglas added the bug Something isn't working label Oct 21, 2022
krakjoe added a commit to krakjoe/frankenphp that referenced this issue Nov 10, 2022
  This is the bare minimum required to make fibers work within the go
  runtime.
krakjoe added a commit to krakjoe/frankenphp that referenced this issue Nov 10, 2022
  This is the bare minimum required to make fibers work within the go
  runtime.
krakjoe added a commit to krakjoe/frankenphp that referenced this issue Nov 10, 2022
  This is the bare minimum required to make fibers work within the go
  runtime.
dunglas pushed a commit that referenced this issue Aug 4, 2023
  This is the bare minimum required to make fibers work within the go
  runtime.
@dunglas dunglas linked a pull request Aug 4, 2023 that will close this issue
@dunglas dunglas pinned this issue Aug 6, 2023
dunglas pushed a commit that referenced this issue Aug 16, 2023
  This is the bare minimum required to make fibers work within the go
  runtime.
dunglas pushed a commit that referenced this issue Sep 8, 2023
  This is the bare minimum required to make fibers work within the go
  runtime.
@withinboredom
Copy link
Collaborator Author

withinboredom commented Dec 13, 2023

@dunglas the following Docker file (props @cdaguerre in #374) appears to "fix" fibers. At least for this reproducer with manual testing. It needs more testing:

FROM dunglas/frankenphp:latest-builder-php8.3-alpine AS builder

COPY --from=caddy:builder-alpine /usr/bin/xcaddy /usr/bin/xcaddy

ENV CGO_ENABLED=1 XCADDY_SETCAP=1 CGO_CXXFLAGS=-fPIE CGO_CFLAGS=-fPIE CGO_LDFLAGS=-pie XCADDY_GO_BUILD_FLAGS='-buildmode=pie -ldflags="-w -s" -trimpath'
RUN xcaddy build \
    --output /usr/local/bin/frankenphp \
    --with github.com/dunglas/frankenphp=./ \
    --with github.com/dunglas/frankenphp/caddy=./caddy/ \
    --with github.com/dunglas/mercure/caddy \
    --with github.com/dunglas/vulcain/caddy

🥳 🤞 🤞 still testing...

@dunglas
Copy link
Owner

dunglas commented Dec 13, 2023

Great news! Don't hesitate to open a PR with this changes, so we can see if this fix the issue for all architectures.

@withinboredom
Copy link
Collaborator Author

I'll do some proper testing by Monday (by updating the fiber branch), but I haven't seen a crash yet via manual testing.

@piotrekkr
Copy link

@withinboredom I had issues with fibers so I could also test this on my Cloud Run service but not really sure where can I get docker image to use with this fix.

@withinboredom
Copy link
Collaborator Author

withinboredom commented Jan 10, 2024

It doesn't fix it, per se, more-or-less just reduces the probability of a crash.

Edit to add: the best way to prevent a crash is to just not output anything at all inside a fiber.

dunglas pushed a commit that referenced this issue Jan 14, 2024
  This is the bare minimum required to make fibers work within the go
  runtime.
@erikfrerejean
Copy link

I've just encountered this issue and using the workaround from @withinboredom did resolve the exception. In this project the culprit seem to be the monolog logger as that is the only place fibers are being used.

@withinboredom
Copy link
Collaborator Author

I started working on a cgo library several weeks ago to allow output from c to go without calling go. It's still a wip: https://github.com/withinboredom/cgoc

There's a segfault once the number of concurrent requests gets high (due to usage of some C synchronization primitives from go), and a memory leak, but the it's pretty fast by itself (~8gbs on my machine).

I hope to have it working sometime in the next few months as a potential solution.

@dunglas
Copy link
Owner

dunglas commented Jul 9, 2024

@withinboredom IMHO the best option would be to fix the issue directly in Go!

dunglas pushed a commit that referenced this issue Jul 9, 2024
  This is the bare minimum required to make fibers work within the go
  runtime.
@withinboredom
Copy link
Collaborator Author

@dunglas I highly doubt it will ever be fixable, for very valid reasons. The reason it is failing boils down to the following:

  1. C creates a new thread
  2. C calls go_handle_request (ncgo = 1)
  3. Go calls frankenphp_execute_script (reenter C)
  4. PHP creates a fiber
  5. C calls Go (go_ub_write for example) (ncgo = 2)
  6. crash as designed

According to the CL (https://go-review.googlesource.com/c/go/+/530480) this means changing the stack for an ncgo > 1 will never be possible -- for very valid safety reasons. This was a huge part of my approach in taking over Go threads (ncgo <= 1 always).

If we can fix the ncgo issue, then we are free to muck around with the stack as much as we want.

@withinboredom
Copy link
Collaborator Author

withinboredom commented Jul 9, 2024

One way to fix it might be to have go_handle_request return a pointer that we can continue with (making ncgo = 0), then continuing in C to frankenphp_execute_script, so if a fiber is created, and we call things like go_ub_write, ncgo == 1 and it will just reset the stack bounds just fine (in theory).

@dunglas
Copy link
Owner

dunglas commented Jul 9, 2024

According to golang/go#62130 (comment), this seems fixable directly in Go for our case.

@withinboredom
Copy link
Collaborator Author

withinboredom commented Jul 10, 2024

This would work: C changes stack back

I've been tearing apart the Fiber/boost context implementation to see if I can pop the stack back to original and jump to go, then on returning, replace the stack. The only problem with this approach (and fwiw, I do have it mostly working) is that it requires assembly and I am only familiar with x86-64 assembly. We would need to write assembly for every architecture (and there are some big perf hits here).

@withinboredom
Copy link
Collaborator Author

It turns out the patch to get it working is pretty darn simple.

diff --git a/src/runtime/cgocall.go b/src/runtime/cgocall.go
index 0d3cc40903..609c5dbc52 100644
--- a/src/runtime/cgocall.go
+++ b/src/runtime/cgocall.go
@@ -215,34 +215,6 @@ func cgocall(fn, arg unsafe.Pointer) int32 {
 func callbackUpdateSystemStack(mp *m, sp uintptr, signal bool) {
        g0 := mp.g0

-       inBound := sp > g0.stack.lo && sp <= g0.stack.hi
-       if mp.ncgo > 0 && !inBound {
-               // ncgo > 0 indicates that this M was in Go further up the stack
-               // (it called C and is now receiving a callback).
-               //
-               // !inBound indicates that we were called with SP outside the
-               // expected system stack bounds (C changed the stack out from
-               // under us between the cgocall and cgocallback?).
-               //
-               // It is not safe for the C call to change the stack out from
-               // under us, so throw.
-
-               // Note that this case isn't possible for signal == true, as
-               // that is always passing a new M from needm.
-
-               // Stack is bogus, but reset the bounds anyway so we can print.
-               hi := g0.stack.hi
-               lo := g0.stack.lo
-               g0.stack.hi = sp + 1024
-               g0.stack.lo = sp - 32*1024
-               g0.stackguard0 = g0.stack.lo + stackGuard
-               g0.stackguard1 = g0.stackguard0
-
-               print("M ", mp.id, " procid ", mp.procid, " runtime: cgocallback with sp=", hex(sp), " out of bounds [", hex(lo), ", ", hex(hi), "]")
-               print("\n")
-               exit(2)
-       }
-
        if !mp.isextra {
                // We allocated the stack for standard Ms. Don't replace the
                // stack bounds with estimated ones when we already initialized

It turns out, because of a few conditions, nothing fancy is required:

  1. pthread is really nice to give us proper stack bounds from the fiber
  2. we are just "popping into go" to send some data in a channel and "pop back out"
  3. we aren't jumping to/from other threads and then calling back into go from a different thread (the stack is coherent)

If we are OK with having a custom version of go for forever ... then this is likely the best solution, but I highly doubt it would be accepted into go. Note that this is probably a very ugly crash if output is sent from a thread from the parallel extension... because (3) will be violated above. This can probably be mitigated by marshaling the output in C, to the "main" thread, if the current thread isn't the "main" thread. This needs some further testing.

Before I go into this further, are we ok with a custom go patch for the foreseeable future @dunglas? I will create a PR to go, arguing for this patch, but I suspect it won't be accepted.

If we are, this is what I propose:

A. testing for (3) above and verify if any further work is required
B. create PR to apply the patch (might be better to just maintain a fork of go?)
C. create a separate PR to apply any fixes/optimizations for (A)

dunglas pushed a commit that referenced this issue Jul 25, 2024
  This is the bare minimum required to make fibers work within the go
  runtime.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants