Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Grandpa notifications dropping to 0 #7983

Closed
tomaka opened this issue Jan 26, 2021 · 1 comment · Fixed by #7985
Closed

Grandpa notifications dropping to 0 #7983

tomaka opened this issue Jan 26, 2021 · 1 comment · Fixed by #7985
Labels
I3-bug The node fails to follow expected behavior.

Comments

@tomaka
Copy link
Contributor

tomaka commented Jan 26, 2021

We have a node on the master branch of Polkadot. It is connected only to nodes on Polkadot 0.8.27.
It seems that, occasionally, and without any apparent cause, the Grandpa notifications seem to stop being received.

Some information I've gathered from debugging:

  • The block announces are still work fine. Most of the code paths of Grandpa messages are shared with block announces.
  • We still occasionally attempt to send out Grandpa messages. It's just the receiving that seems to stop.
  • The authority-discovery still runs and receives Event::Dht events. In other words, it's most likely not a problem with distributing events to the rest of Substrate.
  • The problem has started to happen after Rework priority groups, take 2 #7700
  • The logs don't show anything suspicious happening when the problem happens. The notifications simply stop being received. There's no substream closing nor disconnect.
  • The problem always seems to happen roughly at the same time as a block that takes a long time to import. However, there are a lot of blocks that take a long time to import and that do not necessarily cause the problem to happen.
  • Restarting a node connected to the buggy node doesn't fix the problem.
@tomaka tomaka added the I3-bug The node fails to follow expected behavior. label Jan 26, 2021
@tomaka
Copy link
Contributor Author

tomaka commented Jan 26, 2021

We tried to restart 12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn, which was connected to the faulty node:

Jan 26 10:21:30 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:21:30.891  DEBUG tokio-runtime-worker sub-libp2p: Libp2p => Disconnected(12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn, SetId(0), ConnectionId(TaskId(99916))): Enabled.
Jan 26 10:21:30 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:21:30.891  DEBUG tokio-runtime-worker sub-libp2p: External API <= Closed(12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn, SetId(0))
Jan 26 10:21:30 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:21:30.891  DEBUG tokio-runtime-worker sub-libp2p: PSM <= Dropped(12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn, SetId(0))
Jan 26 10:21:30 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:21:30.897  DEBUG tokio-runtime-worker sub-libp2p: Libp2p => Disconnected(12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn, SetId(1), ConnectionId(TaskId(99916))): Disabled.
Jan 26 10:21:30 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:21:30.897  DEBUG tokio-runtime-worker sub-libp2p: Libp2p => Disconnected(12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn, SetId(2), ConnectionId(TaskId(99916))): Enabled.
Jan 26 10:21:30 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:21:30.898  DEBUG tokio-runtime-worker sub-libp2p: External API <= Closed(12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn, SetId(2))
Jan 26 10:21:30 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:21:30.898  DEBUG tokio-runtime-worker sub-libp2p: PSM <= Dropped(12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn, SetId(2))
Jan 26 10:21:30 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:21:30.898  TRACE tokio-runtime-worker sub-libp2p: Libp2p => Disconnected(PeerId("12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn"), Some(IO(Custom { kind: Other, error: Closed })))
Jan 26 10:21:30 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:21:30.898   WARN tokio-runtime-worker sync: Reserved peer 12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn disconnected
Jan 26 10:21:30 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:21:30.898  DEBUG tokio-runtime-worker sub-libp2p: PSM => Connect(12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn, SetId(0)): Will start to connect at until Instant { tv_sec: 4236514, tv_nsec: 742269773 }

... (lots of attempt at reconnecting)

Jan 26 10:22:31 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:22:31.040  DEBUG tokio-runtime-worker sub-libp2p: Libp2p => Connected(12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn, SetId(0), Dialer { address: "/ip4/35.204.65.51/tcp/30333" }): Connection was requested by PSM.
Jan 26 10:22:31 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:22:31.040  DEBUG tokio-runtime-worker sub-libp2p: Handler(PeerId("12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn"), ConnectionId(TaskId(954900))) <= Open(SetId(0))
Jan 26 10:22:31 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:22:31.040  DEBUG tokio-runtime-worker sub-libp2p: Libp2p => Connected(12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn, SetId(1), Dialer { address: "/ip4/35.204.65.51/tcp/30333" }, ConnectionId(TaskId(954900))): Not requested by PSM, disabling.
Jan 26 10:22:31 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:22:31.040  DEBUG tokio-runtime-worker sub-libp2p: Libp2p => Connected(12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn, SetId(2), Dialer { address: "/ip4/35.204.65.51/tcp/30333" }, ConnectionId(TaskId(954900))): Not requested by PSM, disabling.
Jan 26 10:22:31 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:22:31.040  TRACE tokio-runtime-worker sub-libp2p: Libp2p => Connected(PeerId("12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn"))
Jan 26 10:22:31 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:22:31.040  DEBUG tokio-runtime-worker sub-libp2p: Handler(PeerId("12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn"), ConnectionId(TaskId(954900))]) => OpenDesiredByRemote(SetId(1))
Jan 26 10:22:31 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:22:31.040  DEBUG tokio-runtime-worker sub-libp2p: PSM <= Incoming(12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn, IncomingIndex(6719)).
Jan 26 10:22:31 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:22:31.045  DEBUG tokio-runtime-worker sub-libp2p: PSM => Reject(IncomingIndex(6719), 12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn, SetId(1)): Rejecting connections.
Jan 26 10:22:31 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:22:31.045  DEBUG tokio-runtime-worker sub-libp2p: Handler(PeerId("12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn"), ConnectionId(TaskId(954900))) <= Close(SetId(1))
Jan 26 10:22:31 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:22:31.045  DEBUG tokio-runtime-worker sub-libp2p: Handler(PeerId("12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn"), ConnectionId(TaskId(954900))]) => OpenDesiredByRemote(SetId(0))
Jan 26 10:22:31 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:22:31.045  DEBUG tokio-runtime-worker sub-libp2p: Handler(PeerId("12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn"), ConnectionId(TaskId(954900))]) => OpenDesiredByRemote(SetId(2))
Jan 26 10:22:31 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:22:31.045  DEBUG tokio-runtime-worker sub-libp2p: PSM <= Incoming(12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn, IncomingIndex(6720)).
Jan 26 10:22:31 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:22:31.045  DEBUG tokio-runtime-worker sub-libp2p: PSM => Reject(IncomingIndex(6720), 12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn, SetId(2)): Rejecting connections.
Jan 26 10:22:31 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:22:31.045  DEBUG tokio-runtime-worker sub-libp2p: Handler(PeerId("12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn"), ConnectionId(TaskId(954900))) <= Close(SetId(2))
Jan 26 10:22:31 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:22:31.045  DEBUG tokio-runtime-worker sub-libp2p: Handler(12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn, ConnectionId(TaskId(954900))) => OpenResultOk(SetId(0))
Jan 26 10:22:31 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:22:31.045  DEBUG tokio-runtime-worker sub-libp2p: External API <= Open(PeerId("12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn"))
Jan 26 10:22:31 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:22:31.046  DEBUG tokio-runtime-worker sub-libp2p: Handler(12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn, ConnectionId(TaskId(954900))) => CloseResult(SetId(1))
Jan 26 10:22:31 canary-validator-ew4-0 kusama[5410]: 2021-01-26 10:22:31.046  DEBUG tokio-runtime-worker sub-libp2p: Handler(12D3KooWBBVAW4jbiaNF1B5orYUaeRwZVq2FwfrxCfx7spN6v6hn, ConnectionId(TaskId(954900))) => CloseResult(SetId(2))

...
then we're back to the previous situation of trying to send notifications but not receiving any

This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I3-bug The node fails to follow expected behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant