Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes in 1.1.1 #158

Closed
nuclearace opened this issue Jan 20, 2016 · 24 comments
Closed

Crashes in 1.1.1 #158

nuclearace opened this issue Jan 20, 2016 · 24 comments

Comments

@nuclearace
Copy link
Contributor

In the latest version I've noticed some crashes after the delegate is released. It doesn't happen all the time, but it happens pretty often. Not sure if I'm doing something crazy.

Test project: https://github.com/nuclearace/WebSocketCrash

@nuclearace
Copy link
Contributor Author

I believe you need to add

deinit {
    outputStream?.close()
    inputStream?.close()
}

@nuclearace
Copy link
Contributor Author

Closing the streams seems to work most of the time, but there's still the occasional crash

@nuclearace
Copy link
Contributor Author

Actually it seems like if there's a lot of activity going on with the socket when it's being deinit it's more likely to crash

@mattlilek
Copy link

I just noticed this when upgrading my project to 1.1.1. If you run with zombies enabled, you get:
*** -[Starscream.WebSocket respondsToSelector:]: message sent to deallocated instance 0x1ff86bdc0

I'm not sure about the close() calls you mention @nuclearace, but I think the delegate for each of the streams need to be cleared. NSStream's delegate is unowned(unsafe) not weak so they're left dangling, just like in the Obj-C days. At least that consistently fixed the crashes for my project.

@nuclearace
Copy link
Contributor Author

@mattlilek even setting the delegates to nil seems to cause a crash occasionally.
of2vdtm

daltoniam added a commit that referenced this issue Jan 21, 2016
@daltoniam
Copy link
Owner

yeah this is a weird one (and I didn't see it when I ran my Autobahn test project, really strange). I ran the profiler and it looks like it is still getting delegate messages from the CFStreams which obviously is a problem if the object has been deallocated. Let me know if the fixes works and I will do a new version.

@nuclearace
Copy link
Contributor Author

Still seeing some crashes

daltoniam added a commit that referenced this issue Jan 21, 2016
@daltoniam
Copy link
Owner

hmmm, ok I checked in another possible fix. I am not able to reproduce the issue with the same node server and example app, let me know if that resolve it.

@nuclearace
Copy link
Contributor Author

That fixes most of the crashes, but I still get that occasional crash that looks like this

@nuclearace
Copy link
Contributor Author

It's much less common than the other ones though, and certainly a race condition

@daltoniam
Copy link
Owner

ok good to know. We will probably have to add some protection before scheduling the to queue if the WebSocket has been deallocated. I will try to do that tomorrow if time permits.

@daltoniam
Copy link
Owner

I am trying to reproduce that last crash with the test app + node server, is there anything I can do to modify the example to trigger the crash more frequently? So far I haven't been able to reproduce it after multiple tries.

@nuclearace
Copy link
Contributor Author

Let me see if there's a way to boost the probability of it happening.

@nuclearace
Copy link
Contributor Author

@daltoniam I updated the test project. It's not the exact same crash that gets triggered, but it's the same type of crash--trying to form a closure while the object is in the process of being released.

@nuclearace
Copy link
Contributor Author

One way to fix that I've come up with is to have some private property that tracks if its being released and set that to true in deint and then check if it's true before places where dispatch_async is called. But I'm still getting a weird crash that happens once in a while.
el3o43q

@nuclearace
Copy link
Contributor Author

Ack, on second thought, I keep finding places where it wants to crash because of trying to retain the websocket while it's being released

@nuclearace
Copy link
Contributor Author

But adding that released property has definitely made it seem less likely to crash.

@daltoniam
Copy link
Owner

@nuclearace ok I think I found a way around the issue, just not sure I love it though. If you add a mutex protected boolean before all of the dispatch_async calls, the issue will be resolved (NSLock + bool property).

I'm not sure I love that solution though and I'm reconsidering if switching over to the dispatch queue based code is really worth it at this point. I might just switch over to a single pthread that does that same thing had has its own run loop since I don't think we had these issues with the older code.

@nuclearace
Copy link
Contributor Author

Yeah, I didn't like my pseudo-solution and it made me wonder if this threading approach was really worth it, since it seems to be fairly prone to bugginess.

@daltoniam
Copy link
Owner

Ok, so I did some more research and testing on this. I implemented the pthread and runloop solution, but it runs into the same problem. This is kinda of encouraging though, as it confirms the race condition will happen regardless of our setup. The big issue is because the WebSocket object is being deallocated from the main thread (which makes sense), but it maintains its own queue of background work inside which can cause the race condition. I don't love the idea of adding a mutex lock but it is the only way to ensure that if the object is randomly deallocated while in the middle of work it won't crash. I don't necessarily want to switch back to the old system as it "abusing" GCD by blocking the global queue it is on, hence the issue when a lot of WebSockets are created it keeps all the background work from being able to execute.

I know this restates a lot of the obvious but I wanted to document all of this incase questions come up later about it. I think the "safest" solution at this point is to add the mutex protection. Ideally in most cases it won't be needed as the WebSockets will be properly disconnected before being deallocated, but I want to protect against so Starscream isn't blamed or causing a crash in the case when a WebSocket is deallocated before disconnect.

Any questions or input, please add 😄

@GuyKahlon
Copy link

+1

@daltoniam
Copy link
Owner

I just checked in the fix discussed. Let me know if that seems to resolve it and I will do another release.

@nuclearace
Copy link
Contributor Author

Seems to have fixed things

@daltoniam
Copy link
Owner

1.1.2 released!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants