-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENG] help! websocket write: broken pipe, and ws.cleanupConnection() Unlock() not work #291
Comments
Maybe I found problem The problem is, using RLock() in a process that may require Lock() :
We know that when several messages enter at the same time, several RLock() will be generated,
There is Lock() in cleanupConnection(), which will cause Lock() to be blocked by other RLock() and generate Deadlock.
So, we need to modify the RUnLock position so that it does not occur at the same time as Lock():
Not sure if it is correct, please help me confirm, thks. |
This would only deadlock if the queue is blocked and/or not buffered. |
Hi @AndrewYEEE I also had the same problem, in my env. it might caused by by the client sent burst message in one time to the server and then the client is closed by network layer (like VPN disconnected or very slowly network/4G). After server get write timeout the call inside I just change the I tried this method and it worked for a temporary solution:
|
@dwibudut Very very thank you!
I will try the solution you gave, thank you! |
HI @dwibudut @lorenzodonini
[Method 2]. In addition, we are working on another solution. So the simplest solution is to give a large enough channel size (poor solution). With a large enough channel, queued messages can be sent to the channel immediately.
If this method is adopted, the first modification ( Write() ) is not needed. Due to time constraints, the second method has not been tested for a long time, but I want to understand everyone's thoughts first to see if it is correct, thanks. |
OCPP version:
[x] 1.6
[ ] 2.0.1
I'm submitting a ...
[x] bug report
[ ] feature request
Current behavior:
ChargingPoint sent a large number of Heartbeats at the same time due to an exception,
and we encountered the following error:
time="2024-08-31T06:45:32+08:00" level=info msg=CCCCCC____Heartbeat
time="2024-08-31T06:45:32+08:00" level=info msg=BBBBBB____Heartbeat
time="2024-08-31T06:45:33+08:00" level=info msg=AAAAAA____Heartbeat
time="2024-08-31T06:45:33+08:00" level=info msg=AAAAAA____Heartbeat
time="2024-08-31T06:45:33+08:00" level=info msg="write failed for AAAAAA: %!w(*net.OpError=&{write tcp 0xc0007a24b0 0xc0007a24e0 0xc000682a60})" logger=websocket
time="2024-08-31T06:45:33+08:00" level=error msg="write failed for AAAAAA: write tcp 172.18.0.xx:9001->172.18.0.xx:57006: write: broken pipe" logger=websocket
time="2024-08-31T06:45:33+08:00" level=info msg=AAAAAA____Heartbeat
time="2024-08-31T06:45:33+08:00" level=info msg=EEEEEE____Heartbeat
But the next step is not printed:
"closed connection to AAAAAA"
According to writePump() of ws/websocket.go,
When "write failed" occurs, the next step is to execute server.cleanupConnection(ws) to close the connection and print "closed connection to AAAAAA", but it does not.
We judge that the program is blocked in server.cleanup Connection(ws) and server.connMutex.Unlock() is not released.
As a result, all subsequent messages are blocked and unable to respond to ChargingPoint because they are waiting for server.connMutex.Unlock().
Can anyone give an answer?
Is it possible that a Race condition occurs in cleanupConnection(), or are close(ws.outQueue) and close(ws.closeC) in server.cleanupConnection(ws) blocked?
Please help me! thank you!
The text was updated successfully, but these errors were encountered: