-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 wrangler dev error: Received signal #11: Segmentation fault: 11 #1422
Comments
Also seeing this (M1), but I don't use alarms at all, so I don't get any of the additional logs you provided. Maybe it's just something with Durable Objects in general? I'm also using Hibernatable Websockets if that helps narrow it down. I thought I was going crazy 😅 |
We are also using the socket hibernation api. |
in case anybody else needs it, here is the script I wrote to wrap wrangler and restart it after a segfault.
import { ChildProcessWithoutNullStreams, spawn } from 'child_process'
import stripAnsi from 'strip-ansi'
class WranglerMonitor {
private process: ChildProcessWithoutNullStreams | null = null
public start(): void {
this.stop() // Ensure any existing process is stopped
console.log(`Starting wrangler...`)
this.process = spawn('wrangler', ['dev', '--env', 'dev'], {
env: {
NODE_ENV: 'development',
...process.env,
},
})
this.process.stdout.on('data', (data: Buffer) => {
this.handleOutput(stripAnsi(data.toString().replace('\r', '').trim()))
})
this.process.stderr.on('data', (data: Buffer) => {
this.handleOutput(stripAnsi(data.toString().replace('\r', '').trim()), true)
})
}
private handleOutput(output: string, err = false): void {
if (!output) return
if (output.includes('Segmentation fault')) {
console.error('Segfault detected. Restarting Wrangler...')
this.restart()
} else if (!err) {
console.log(output.replace('[mf:inf]', ''))
}
}
private restart(): void {
console.log('Restarting wrangler...')
this.stop()
setTimeout(() => this.start(), 100) // Restart after a short delay
}
private stop(): void {
if (this.process) {
this.process.kill()
this.process = null
}
}
}
new WranglerMonitor().start() Run it with There's probably a bash one-liner that can do the same thing but i am not bashfully gifted. |
This looks like the same bug as #1386. |
Unfortunately still seeing this on |
I've spent way too much time trying to get a reproduction that is consistent - there isn't one. Here's a log from one of the crashes, hopefully I left enough unredacted: This appears to only be affecting M1 chips, and one of the contributing factors is However, we have other projects that do not segfault. This leads us to believe it could be because of
Here's the code of the Durable Object I'm primarily testing with:
This has become such a frustrating issue for my team that we've sunk way too many hours into diagnosing and are desperately looking for a solution. We hope this issue gets the attention it deserves. |
Hey @thecatanon! 👋 Thanks for trying to put together a minimal reproduction. Unfortunately, the stacks from the release builds aren't very useful for debugging these kinds of issues. I've compiled an M1 debug build with ASAN enabled from 8c47f13 for you to try: https://drive.google.com/file/d/1OVMhhzU9HltFLyde9T1sg74UNaNn9w1S/view?usp=sharing. To use this build, unzip the file, then run |
Thanks @mrbbot! Unfortunately I have just a few (unhelpful) findings:
Maybe that bus error is another manifestation of the same issue? It actually will load the HTML, but crash immediately after. Could it be a caching thing? EDIT: Just created my own ASAN build and received the same bus error. |
I seem to be able to reliably trigger the error when using the Hibernating Sockets API. When a socket disconnects and then the Durable Object is evicted it triggers the error. Although I'm only assuming it's the Durable Object eviction as it occurs 10 seconds after the last web socket message which I believe is the durable object eviction timer. |
Can you provide a repro? When do you see the segfault? Also are you on M1 mac as well? |
Yeah that might be a bit tricky given I'm using |
@thecatanon I thought it might be which is why I tried closing out that bug, but I suspect it's unrelated. I don't see how that would have caused a segfault. |
@MellowYarker Yeah still seeing it in the latest release Will attempt creating an ASAN build again (got a bus error whenever HTML responses were delivered last time) |
@thecatanon are you referring to the latest release of Workerd or Miniflare (which I think is still pointing to a prev version of Workerd)? I'm not familiar with how Miniflare works but it seems like earlier in this thread you were manually changing the dep to point to a specific version, so I figure you may have already changed it to point to the latest Workerd? |
@MellowYarker Yeah, using the |
Additionally, I might be seeing this in production now (saw someone get a disconnected screen, went to reproduce with Real-Time logs enabled, saw it happen, and found Could be unrelated, but thought it was worth adding. "Cancelled" CF-Ray: (Hope that helps - happy to share any other details privately on Discord, I'm in the CF Developers server as |
For what it's worth, we're seeing this a lot in reflect.net too. I have not been able to get a reproduction but will keep trying. |
I started to see this only after switching to Hibernatable WebSockets. Pretty much no other changes, so this see4ms very much related. FWIW we're using both regular and hibernatable sockets in the same DO:
Every subsequent WebSocket connection is rejected with
and for GET requests I see
I have to restart |
@codefrau do you have a repro we could take a look at? Are you also on mac? |
Just for tracking purposes (in case Cloudflare can't reproduce this on their end), still seeing this on Additionally, when creating ASAN builds (could absolutely be doing this wrong, I'm no C++ dev), when HTML is delivered I receive |
The repo is not public unfortunately. But yes, I'm on a Mac (2021 MacBook Pro with M1 Max, running Sonoma 14.2.1) wrangler 3.28.1, node v20.11.0 |
So this is curious: if it happens, then the segfault always happens at about 5 min after the last regular web socket disconnected and we sent the last message through the single hibernateble socket remaining. I looked through more logs than this but it's always ~5 minutes, never less than 4, never more than 6. There is no 5 min timeout in our code, and there was no other request. We only have an auto-response, the client pings every 30s.
|
Upvoted, +1! This one is particularly annoying, as there doesn't seem to be a workaround, and it's making unit testing waking up from hibernation impossible for us. I've been able to make a minimal reliable replication of this crash: It uses a Durable Object with the WebSocket Hibernation API, and pretty much nothing else. I'm on an M3 Max, on Sonoma 14.4.1.
|
+1 on this issue, extremely irritating and difficult to do development with wrangler on hibernateable sockets |
@nvie thanks for the repro! I see a segfault on linux too (very surprising tbh). It looks like it's segfaulting in the |
Let me know if this is still a problem after the next release goes out. It looks like we aren't putting one out this week, so I suspect it should go out next week. Edit: Looks like a release has gone out now. Just need to wait for Miniflare to catch up. |
@nvie I'm running your repro with |
Yes! This is amazing — thanks so much, @MellowYarker! 🙌 |
Closing for now as this seems to have been resolved. Feel free to reopen if still experiencing a segfault after |
Hello!
Recently we've been seeing this issue with one of our worker scripts (we have three, the other two seem fine) where after some time being used in dev mode it will segfault and become unresponsive. This seems to happen anywhere from 10 seconds to 10 minutes after booting. I have not been able to find a way to reproduce it on demand, but it will happen every time, at some point.
This is on MacOS, using M1 chips.
The full error that is printed when this happens is (
--log-level debug
)There are always other errors logged out above, but they seem to change every time and are maybe unrelated since they seem to happen all the time regardless of how long it takes to hit the segfault. Here's an example (again
--log-level debug
)Let me know if any other info I can provide would be helpful.
For now I'm going to try to wrap the
wrangler dev
process in a retry loop.The text was updated successfully, but these errors were encountered: