Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not GRAFT after receiving subscription #554

Closed
D4nte opened this issue Mar 31, 2021 · 15 comments
Closed

Does not GRAFT after receiving subscription #554

D4nte opened this issue Mar 31, 2021 · 15 comments

Comments

@D4nte
Copy link

D4nte commented Mar 31, 2021

Hi, I noticed this behaviour using nim-waku. Can you please clarify the expected behaviour?

Scenario

no_graft_after_subscribe.log

Steps:

  1. js node dials nim node
  2. nim node sends subscription message to js node
  3. js node sends subscription message to nim node

4. Expected

nim node sends graft message to js node to inform that nim node added js node to local mesh for default topic

4. Actual

No further processing is done. From my logs, the subscription has been received at
TRC 2021-03-30 10:47:14.497+11:00 peer subscribed to topic
but nothing happens until nim waku is killed at
nim-waku 4093597 getting SIGINT at 10:47:19 am (5 seconds).

Notes

Heartbeat

What is the heartbeat configuration for nim-waku? Is the issue that the graft message is sent at heartbeat time but I do not leave enough time for it to happen?

Subscribe before identify

In the attached logs, we can see that nim-waku sends the subscription message at
TRC 2021-03-30 10:47:14.443+11:00 sending msg to peer topics="libp2p pubsubpeer" tid=4093597 file=pubsubpeer.nim:237 peer=QmS*x2SoN8 rpcMsg="(subscriptions: @[(subscribe: true, topic: \"/waku/2/default-waku/proto\")], messages: @[], control: (ihave: @[], iwant: @[], graft: @[], prune: @[]))" .

However, the dial did not really finish at this stage as we can see the identify protocol still happening after:

DBG 2021-03-30 10:47:14.444+11:00 decodeMsg: decoded message                 topics="libp2p identify" tid=4093597 file=identify.nim:94 pubkey="RSA ke...10001\n" addresses="@[/ip4/127.0.0.1/tcp/46819, /ip4/192.168.178.251/tcp/46819]" protocols="@[\"/libp2p/circuit/relay/0.1.0\", \"/ipfs/id/1.0.0\", \"/ipfs/id/push/1.0.0\", \"/ipfs/ping/1.0.0\", \"/vac/waku/relay/2.0.0-beta2\"]" observable_address=Some(/ip4/127.0.0.1/tcp/64140/p2p/16Uiu2HAmPRqSvBdczSQYjqA4FnF4DXVhttHkhbQk2dU93xdJwPD9) proto_version="Some(\"ipfs/0.1.0\")" agent_version="Some(\"js-libp2p/0.30.10\")"
TRC 2021-03-30 10:47:14.444+11:00 identified remote peer                     tid=4093597 file=upgrade.nim:80 conn=QmS*x2SoN8:6062670231038b2d390eebe6 peerInfo="(peerId: \"QmSTjj31KM9GrLA4SUho397RsfJ781FhppUPmSuLx2SoN8\", addrs: @[\"/ip4/127.0.0.1/tcp/46819\", \"/ip4/192.168.178.251/tcp/46819\"], protocols: @[\"/libp2p/circuit/relay/0.1.0\", \"/ipfs/id/1.0.0\", \"/ipfs/id/push/1.0.0\", \"/ipfs/ping/1.0.0\", \"/vac/waku/relay/2.0.0-beta2\"], protoVersion: \"ipfs/0.1.0\", agentVersion: \"js-libp2p/0.30.10\")"

As the identify is not yet done when nim waku sends a subscribe to js waku, I believe it is dropped simply because js waku does not know yet that nim waku supports the waku relay protocol.

@sinkingsugar
Copy link
Contributor

sinkingsugar commented Mar 31, 2021

Is your node subscribed to that topic? If not the heardbeat won't rebalance that specific topic as no mesh is needed in such case, (fanout will deliver)

@sinkingsugar
Copy link
Contributor

Just realized there are logs attached, altho they are "colorized" will try to read them :)

@sinkingsugar
Copy link
Contributor

Why is the log silent after peer subscribed to topic ? are we sure async code is properly working?

@sinkingsugar
Copy link
Contributor

To sum up, first of all something is blocking it seems, and that's bad.
Second I think your node is not subscribed to the topic it was to create a mesh with.

@D4nte
Copy link
Author

D4nte commented Mar 31, 2021

Why is the log silent after peer subscribed to topic ? are we sure async code is properly working?

So this Nim-Waku node was started as part of a test suite. Indeed, nothing happens after that and then I kill the node timeout elapse.

Should I be waiting for a heartbeat? How often do they occur?

@D4nte
Copy link
Author

D4nte commented Mar 31, 2021

To sum up, first of all something is blocking it seems, and that's bad.
Second I think your node is not subscribed to the topic it was to create a mesh with.

If you look at the start of the log file you can see that the Nim-Waku node subscribe to a topic. This is the same topic than the peer subscribe to at the end.

@D4nte
Copy link
Author

D4nte commented Mar 31, 2021

Just realized there are logs attached, altho they are "colorized" will try to read them :)

Sorry, I'll be sure to clean them up next time :)

@sinkingsugar
Copy link
Contributor

sinkingsugar commented Mar 31, 2021

Why is the log silent after peer subscribed to topic ? are we sure async code is properly working?

So this Nim-Waku node was started as part of a test suite. Indeed, nothing happens after that and then I kill the node timeout elapse.

Should I be waiting for a heartbeat? How often do they occur?

https://github.com/status-im/nim-libp2p/blob/df497660bcf7aa23005f22aa7daced15b5668e3a/libp2p/protocols/pubsub/gossipsub/types.nim#L23

1 second if everything is default

I see no traces of heartbeat and maintenance in the logs.. something is wrong.

@sinkingsugar
Copy link
Contributor

To sum up, first of all something is blocking it seems, and that's bad.
Second I think your node is not subscribed to the topic it was to create a mesh with.

If you look at the start of the log file you can see that the Nim-Waku node subscribe to a topic. This is the same topic than the peer subscribe to at the end.

Right I missed that

@D4nte
Copy link
Author

D4nte commented Mar 31, 2021

Why is the log silent after peer subscribed to topic ? are we sure async code is properly working?

So this Nim-Waku node was started as part of a test suite. Indeed, nothing happens after that and then I kill the node timeout elapse.
Should I be waiting for a heartbeat? How often do they occur?

https://github.com/status-im/nim-libp2p/blob/df497660bcf7aa23005f22aa7daced15b5668e3a/libp2p/protocols/pubsub/gossipsub/types.nim#L23

1 second if everything is default

I see no traces of heartbeat and maintenance in the logs.. something is wrong.

Thanks! Indeed, looking at the nim-waku node deployed in the fleet, I do not see any heartbeat either.

@jm-clius, what do you think? Are we missing something?

@D4nte
Copy link
Author

D4nte commented Mar 31, 2021

Starting nim-waku aa9a74c60bb809c04936843ad9ea1cda2c292fec locally with trace logs, I cannot see any maintenance or heartbeat trace either.

@jm-clius
Copy link
Contributor

Starting nim-waku aa9a74c60bb809c04936843ad9ea1cda2c292fec locally with trace logs, I cannot see any maintenance or heartbeat trace either.

Mmm, I'll try to trace this and let you know.

@jm-clius
Copy link
Contributor

jm-clius commented Mar 31, 2021

Starting nim-waku aa9a74c60bb809c04936843ad9ea1cda2c292fec locally with trace logs, I cannot see any maintenance or heartbeat trace either.

Seems like this could be a pretty serious bug -> waku-org/nwaku#445. Working on a fix, but just trying to determine to what extent we should change the order in which we mount protocols and start listening.

@sinkingsugar
Copy link
Contributor

@jm-clius @D4nte can we close this? was on your side right?

@jm-clius
Copy link
Contributor

jm-clius commented Apr 6, 2021

Closing, as issue was fixed and will be tested on nim-waku side.

@jm-clius jm-clius closed this as completed Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants