Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

window post scheduler notifs channel closed #7653

Closed
12 of 18 tasks
Araquiel opened this issue Nov 22, 2021 · 6 comments
Closed
12 of 18 tasks

window post scheduler notifs channel closed #7653

Araquiel opened this issue Nov 22, 2021 · 6 comments

Comments

@Araquiel
Copy link

Checklist

  • This is not a security-related bug/issue. If it is, please follow please follow the security policy.
  • This is not a question or a support request. If you have any lotus related questions, please ask in the lotus forum.
  • This is not a new feature request. If it is, please file a feature request instead.
  • This is not an enhancement request. If it is, please file a improvement suggestion instead.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
  • I am running the Latest release, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to lotus.

Lotus component

  • lotus daemon - chain sync
  • lotus miner - mining and block production
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt)
  • lotus miner/market - storage deal
  • lotus miner/market - retrieval deal
  • lotus miner/market - data transfer
  • lotus client
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

Lotus Version

lotus version 1.13.0+mainnet+git.7a55e8e89.dirty
lotus-miner version 1.13.0+mainnet+git.7a55e8e89.dirty

Describe the Bug

daemon and miner work on one box.
conect by localhost (127.0.0.1)

  1. start lotus-miner
  2. wait few hours
  3. get error message "window post scheduler notifs channel closed"

expected
lotus-miner work without connection problem or resart connection

connection closed with no visible reason. box is not overload. chain store on SSD.

Logging Information

2021-11-21T09:47:23.660+0500    ERROR   rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:667        Connection timeout      {"remote": "127.0.0.1:1234"}
2021-11-21T09:47:23.661+0500    ERROR   rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:648        sendReqest failed (Handle me): write tcp 127.0.0.1:53256->127.0.0.1:1234: use of closed network connection
2021-11-21T09:47:23.661+0500    ERROR   rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:97 setting read deadlineset tcp 127.0.0.1:53256: use of closed network connection
2021-11-21T09:47:23.661+0500    ERROR   rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:648        sendReqest failed (Handle me): write tcp 127.0.0.1:53256->127.0.0.1:1234: use of closed network connection
2021-11-21T09:47:23.661+0500    ERROR   rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:648        sendReqest failed (Handle me): write tcp 127.0.0.1:53256->127.0.0.1:1234: use of closed network connection
2021-11-21T09:47:23.661+0500    WARN    events  events/observer.go:60   listenHeadChanges quit
2021-11-21T09:47:23.661+0500    ERROR   miner   miner/miner.go:233      failed to get best mining candidate: handler: websocket connection closed
2021-11-21T09:47:23.661+0500    WARN    rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:290        failed to send request  {"method": "xrpc.cancel", "id": 20, "error": "write tcp 127.0.0.1:53256->127.0.0.1:1234: use of closed network connection"}
2021-11-21T09:47:23.661+0500    WARN    storageminer    storage/wdpost_sched.go:123     window post scheduler notifs channel closed
2021-11-21T09:47:23.661+0500    WARN    sectors storage-sealing/terminate_batch.go:104  TerminateBatcher processBatch error     {"error": "getting proving deadline info failed: handler: websocket connection closed", "errorVerbose": "getting proving deadline info failed:\n    github.com/filecoin-project/lotus/extern/storage-sealing.(*TerminateBatcher).processBatch\n        /home/user/lotus/extern/storage-sealing/terminate_batch.go:112\n  - handler: websocket connection closed"}
2021-11-21T09:47:23.662+0500    ERROR   rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:648        sendReqest failed (Handle me): write tcp 127.0.0.1:53256->127.0.0.1:1234: use of closed network connection
2021-11-21T09:47:23.682+0500    WARN    nat     go-libp2p-nat@v0.0.6/nat.go:186 failed to establish port mapping: goupnp: error performing SOAP HTTP request: Post "http://192.168.0.1:52869/upnp/control/WANIPConn1": EOF
2021-11-21T09:47:24.026+0500    ERROR   rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/handler.go:326  write tcp4 127.0.0.1:2345->127.0.0.1:52842: write: broken pipe
2021-11-21T09:47:24.026+0500    ERROR   rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:129        handle me:write tcp4 127.0.0.1:2345->127.0.0.1:52842: write: broken pipe
2021-11-21T09:47:24.200+0500    ERROR   rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:122        handle me:write tcp4 127.0.0.1:2345->127.0.0.1:42590: write: broken pipe

Repo Steps

  1. Run lotus daemon, run lotus-miner
  2. wait a few hours/days
  3. See error WARN storageminer storage/wdpost_sched.go:123 window post scheduler notifs channel closed
@Araquiel
Copy link
Author

happened again.
error from miner:

2021-11-22T13:02:09.045+0500	ERROR	rpc	go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/handler.go:326	writev tcp4 127.0.0.1:2345->127.0.0.1:63994: writev: connection reset by peer
2021-11-22T13:02:09.045+0500	ERROR	rpc	go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:129	handle me:writev tcp4 127.0.0.1:2345->127.0.0.1:63994: writev: connection reset by peer
<....>
2021-11-22T13:09:26.008+0500    ERROR   rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:667        Connection timeout      {"remote": "127.0.0.1:1234"}
2021-11-22T13:09:26.009+0500    ERROR   rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:498        sending ping message: write tcp 127.0.0.1:38122->127.0.0.1:1234: use of closed network connection
2021-11-22T13:09:26.009+0500    ERROR   rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:97 setting read deadlineset tcp 127.0.0.1:38122: use of closed network connection
2021-11-22T13:09:26.010+0500    ERROR   rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:648        sendReqest failed (Handle me): write tcp 127.0.0.1:38122->127.0.0.1:1234: use of closed network connection
2021-11-22T13:09:26.010+0500    ERROR   rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:648        sendReqest failed (Handle me): write tcp 127.0.0.1:38122->127.0.0.1:1234: use of closed network connection
2021-11-22T13:09:26.010+0500    ERROR   rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:648        sendReqest failed (Handle me): write tcp 127.0.0.1:38122->127.0.0.1:1234: use of closed network connection
2021-11-22T13:09:26.010+0500    ERROR   rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:97 setting read deadlineset tcp 127.0.0.1:38122: use of closed network connection
2021-11-22T13:09:26.010+0500    ERROR   rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/websocket.go:648        sendReqest failed (Handle me): write tcp 127.0.0.1:38122->127.0.0.1:1234: use of closed network connection
2021-11-22T13:09:26.011+0500    ERROR   storageminer    storage/wdpost_sched.go:180     handling head updates in window post sched: handler: websocket connection closed
2021-11-22T13:09:26.011+0500    WARN    storageminer    storage/wdpost_sched.go:123     window post scheduler notifs channel closed
2021-11-22T13:09:26.011+0500    WARN    events  events/observer.go:60   listenHeadChanges quit

@Shekelme
Copy link

So is your lotus-miner process crashing at this poit? Or what's going on there?

@Araquiel
Copy link
Author

So is your lotus-miner process crashing at this poit? Or what's going on there?

sealing process work, but lotus-miner can't start prooving. I think mining not work too. after restart works fine for the while
for the now just monitor "notif channel closed" on log and restart if need.
advice from the slack - just split market and miner subsystems for the different boxes

@TippyFlitsUK TippyFlitsUK added area/markets/provider kind/enhancement Kind: Enhancement and removed kind/bug Kind: Bug labels Mar 29, 2022
@rjan90
Copy link
Contributor

rjan90 commented Apr 21, 2022

Hey! Just wanted to updated everybody in this thread that this issue will be tracked in #8362 going forward. It´s an issue that is on our radar, and one that we really want to find find a fix for, but it´s also one that we would certainly need additional help from everybody.

  1. Ideally we are looking for a very detailed, and an easy way to get a good repro (reliable/easy/fast) on this issue - so if anybody here has some insights to how we can easily repro it, post the steps in wdPoSt scheduler notifs channel closes when lotus-miner is under heavy load #8362.
  2. Goroutine dumps from both sides (lotus daemon & lotus-miner) could be helpful, but ideally we are looking for 1.

@rjan90 rjan90 closed this as completed Apr 21, 2022
@daveyhook
Copy link

I have the same problem at version 1.23.3, how to solve it?

@Araquiel
Copy link
Author

Araquiel commented Nov 7, 2023

I have the same problem at version 1.23.3, how to solve it?

in my case the cause was connection problems. when connection have problems - we have "notifs channel closed ERROR". and need to restart miner after it.
my solution was to replace the NIC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants
@Araquiel @TippyFlitsUK @daveyhook @rjan90 @Shekelme and others