Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gossipsub interop issue with go-ipfs #1671

Closed
rklaehn opened this issue Jul 22, 2020 · 24 comments
Closed

Gossipsub interop issue with go-ipfs #1671

rklaehn opened this issue Jul 22, 2020 · 24 comments

Comments

@rklaehn
Copy link
Contributor

rklaehn commented Jul 22, 2020

I have the following scenario: I create a libp2p based program using gossipsub and connect to a go-ipfs peer.

When I subscribe to a topic immediately at startup, sometimes I don't ever get any messages for that topic despite the remote nodes clearly publishing on that topic.

When adding a small delay (tokio::time::delay_for(Duration::from_secs(2)).await) to allow for some time to pass before subscribing, everything works as expected.

I tried to work around this issue by regularly unsubscribing and subscribing to all topics that somebody is interested in at the moment, but that did not work either.

Just calling subscribe again will not work, since subscribe will abort early with a false return value, indicating that the topic is already subscribed.

Calling unsubscribe immediately followed by subscribe did both return true, but after the first unsubscribe/subscribe I did not get any pubsub msgs anymore.

Am I using the API correctly? It seems that I should be able to call subscribe once for a topic and then rely on the topic being subscribed despite not yet having any peers...

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 22, 2020

@AgeManning would really appreciate your help. I can get detailed info about this scenario.

@tcharding
Copy link
Contributor

tcharding commented Jul 22, 2020

We are doing this too, subscribing on startup. Interested to hear the result of this thread.

tcharding pushed a commit to comit-network/comit-rs that referenced this issue Jul 23, 2020
If this issue on github is correct

libp2p/rust-libp2p#1671 (comment)

then we should have an open connection to another node on the network
before subscribing.

Add a method to the orderbook and all the layes above it to facilitate
subscribing. Call it after a succesfull dial is made.
@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 23, 2020

I have implemented a debug instance for the Gossipsub behaviour ( https://github.com/Actyx/rust-libp2p/tree/rkl/gossipsub-debug-instances ), and discovered a few weird things. Here is the state of the gossipsub in the case where I added a delay fudge to make things work:

Gossipsub {
    config: GossipsubConfig {
        protocol_id: "/meshsub/1.0.0",
        history_length: 5,
        history_gossip: 3,
        mesh_n: 6,
        mesh_n_low: 4,
        mesh_n_high: 12,
        gossip_lazy: 6,
        heartbeat_initial_delay: 5s,
        heartbeat_interval: 1s,
        fanout_ttl: 60s,
        max_transmit_size: 262144,
        hash_topics: false,
        no_source_id: false,
        manual_propagation: false,
    },
    events: [],
    control_pool: {},
    local_peer_id: PeerId(
        "12D3KooWSm42gMt9XAp5pkaaYBDQfWmJvCvY3WfvV1cUBeCYnCeo",
    ),
    topic_peers: {
        TopicHash {
            hash: "/klk/prod/2020-01-09",
        }: [
            PeerId(
                "QmWR3WVgkVUNG3HEdU2xjW4wW5pcKF3AL3Aht3ixYHJVrW",
            ),
        ],
        TopicHash {
            hash: "/klk/prod/2020-01-09/monitoring",
        }: [
            PeerId(
                "QmWR3WVgkVUNG3HEdU2xjW4wW5pcKF3AL3Aht3ixYHJVrW",
            ),
        ],
        TopicHash {
            hash: "owa-dev",
        }: [
            PeerId(
                "QmWR3WVgkVUNG3HEdU2xjW4wW5pcKF3AL3Aht3ixYHJVrW",
            ),
        ],
    },
    peer_topics: {
        PeerId(
            "QmWR3WVgkVUNG3HEdU2xjW4wW5pcKF3AL3Aht3ixYHJVrW",
        ): [
            TopicHash {
                hash: "/klk/prod/2020-01-09",
            },
            TopicHash {
                hash: "/klk/prod/2020-01-09/monitoring",
            },
            TopicHash {
                hash: "owa-dev",
            },
        ],
    },
    mesh: {
        TopicHash {
            hash: "discovery",
        }: [],
        TopicHash {
            hash: "/klk/prod/2020-01-09",
        }: [
            PeerId(
                "QmWR3WVgkVUNG3HEdU2xjW4wW5pcKF3AL3Aht3ixYHJVrW",
            ),
            PeerId(
                "QmWR3WVgkVUNG3HEdU2xjW4wW5pcKF3AL3Aht3ixYHJVrW",
            ),
            PeerId(
                "QmWR3WVgkVUNG3HEdU2xjW4wW5pcKF3AL3Aht3ixYHJVrW",
            ),
        ],
    },
    fanout: {},
    fanout_last_pub: {},
    mcache: MessageCache {
        msgs: {
            MessageId(
                "QmP5fEfa6Cmk22yF9hFRfokU42mN9NHBLAkyuFzGK4J4vT1595325081332460989",
            ): GossipsubMessage {
                data.len: 284,
                source: PeerId(
                    "QmP5fEfa6Cmk22yF9hFRfokU42mN9NHBLAkyuFzGK4J4vT",
                ),
                sequence_number: 1595325081332460989,
                topics: [
                    TopicHash {
                        hash: "/klk/prod/2020-01-09",
                    },
                ],
            },
            MessageId(
                "QmcMA53bUskoAGdWE5XrQwxRXQm9EhjE6RcTpoyfg6Dv6W1595236462749115763",
            ): GossipsubMessage {
                data.len: 284,
                source: PeerId(
                    "QmcMA53bUskoAGdWE5XrQwxRXQm9EhjE6RcTpoyfg6Dv6W",
                ),
                sequence_number: 1595236462749115763,
                topics: [
                    TopicHash {
                        hash: "/klk/prod/2020-01-09",
                    },
                ],
            },
        },
        history: [
            [],
            [
                CacheEntry {
                    mid: MessageId(
                        "QmP5fEfa6Cmk22yF9hFRfokU42mN9NHBLAkyuFzGK4J4vT1595325081332460989",
                    ),
                    topics: [
                        TopicHash {
                            hash: "/klk/prod/2020-01-09",
                        },
                    ],
                },
            ],
            [
                CacheEntry {
                    mid: MessageId(
                        "QmcMA53bUskoAGdWE5XrQwxRXQm9EhjE6RcTpoyfg6Dv6W1595236462749115763",
                    ),
                    topics: [
                        TopicHash {
                            hash: "/klk/prod/2020-01-09",
                        },
                    ],
                },
            ],
            [],
            [],
        ],
        gossip: 3,
    },
    received: LruCache {
        len: 24,
        cap: 256,
    },
    heartbeat: Interval {
        delay: Delay {
            when: Instant {
                tv_sec: 158183,
                tv_nsec: 868651819,
            },
        },
        interval: 1s,
    },
}

Never mind the topic names. What I found strange is that the mesh for the topic /klk/prod/2020-01-09 contains the same peer three times. Is that correct behaviour? I would have expected the peers for a topic to be a set, not a list that can possibly contain duplicates...

 mesh: {
        TopicHash {
            hash: "discovery",
        }: [],
        TopicHash {
            hash: "/klk/prod/2020-01-09",
        }: [
            PeerId(
                "QmWR3WVgkVUNG3HEdU2xjW4wW5pcKF3AL3Aht3ixYHJVrW",
            ),
            PeerId(
                "QmWR3WVgkVUNG3HEdU2xjW4wW5pcKF3AL3Aht3ixYHJVrW",
            ),
            PeerId(
                "QmWR3WVgkVUNG3HEdU2xjW4wW5pcKF3AL3Aht3ixYHJVrW",
            ),
        ],
    },

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 23, 2020

I definitely think the duplicate peer ids in the mesh are wrong and will cause trouble...

See #1674

@AgeManning
Copy link
Contributor

AgeManning commented Jul 23, 2020

Hey @rklaehn - Thanks for raising this.

Sorry I've not been more vocal, there are a few known bugs in gossipsub. I'm trying to resolve them all in #1583.

Let me enumerate known issues as some may apply to you:

  • Subscribing to a topic only sends subscriptions to peers of that topic (when it should send to all peers). So if you connect a peer and then get each peer to subscribe to a topic, the two peers often don't know each other are subscribed to that topic. However, if you subscribe to a topic, then connect a peer, the peers on connection inform each other about their subscriptions (and it should work as expected). I assume this issue, is probably one that is causing you these problems. I've already corrected this in Gossipsub message signing #1583.

  • Message Validation - The current implementation has a callback of sorts to allow the application to optionally forward a message. Its original intention was to not propagate messages that the application deemed invalid. This works as expected, however gossipsub stores a cache of messages and it still gossips about these messages (including ones that have not been strictly validated). In 1.1 with scoring introduced, this could look like a malicious peer advertising invalid messages. (In the process of fixing this).

I've not seen the duplicate peer id's in the mesh before. I'll look into and try add the fix to #1583.

#1583 is almost complete (however not usable in it's current form). For it to be completed we need to add some configurable signing options, such that it can match other implementations. Specifically around whether to sign or not sign and validate or not validate signed messages.

@AgeManning
Copy link
Contributor

To test, you could try adding this and seeing if it helps your subscription issue: 7d678f5

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 23, 2020

I tried the diff in 7d678f5 , and it does not seem to help. The only thing that helps is that cursed fudge delay after connect.

However, I have been banging my head against this all day. I will try again with a clear mind tomorrow.

Regarding the duplicate peer ids: the mesh for a topic is a set of peers, so what is the reason to use Vec? Using a set here would make such errors fundamentally impossible. Vec might be slightly faster, but I doubt that it is measurable.

Here is a draft PR that changes all Vec sets to proper (BTree)Sets:
#1675

@AgeManning
Copy link
Contributor

Ok, I'm curious about your scenario.

I have applications which subscribe on startup and work as expected.

Could you collect some gossipsub logs for the scenario you are describing? Perhaps your peer is not being added to the mesh and the gossip is not working as expected.

On a new peer connection you should share your subscription with newly peer'd nodes. If you can paste some gossipsub logs of the node starting up and having the new node connect, I might be able to see something.

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 24, 2020

Ok, I'm curious about your scenario.

I have applications which subscribe on startup and work as expected.

Yes, me too. This does not happen all the time. In fact I did not see this before. But it is reproducible with this particular node.

Could you collect some gossipsub logs for the scenario you are describing? Perhaps your peer is not being added to the mesh and the gossip is not working as expected.

You mean from my side (just dumping the traffic with some println! statements) or from the go-ipfs side? Not sure how to get the latter. I will try to get you logs asap.

On a new peer connection you should share your subscription with newly peer'd nodes. If you can paste some gossipsub logs of the node starting up and having the new node connect, I might be able to see something.

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 24, 2020

Here is some traffic, just dumping msgs as they go in or out:

No initial delay between connect and subscribe - not working

send initial GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "discovery" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }], control_msgs: [] }
send GossipsubRpc { messages: [], subscriptions: [], control_msgs: [Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
send GossipsubRpc { messages: [], subscriptions: [], control_msgs: [Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
send GossipsubRpc { messages: [], subscriptions: [], control_msgs: [Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, Prune { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }], control_msgs: [] }

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 24, 2020

Here is the exact same code as above, but with a delay between the connection to the peer and the pubsub subscribe. As you can see, now we do get data:

XXXXT send initial GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "discovery" } }], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }], control_msgs: [] }
XXXXT send GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [], subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "/klk/prod/2020-01-09/monitoring" } }, GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "owa-dev" } }], control_msgs: [] }
XXXXT send GossipsubRpc { messages: [], subscriptions: [], control_msgs: [Graft { topic_hash: TopicHash { hash: "/klk/prod/2020-01-09" } }] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a36474797065666576656e747366736f757263.., source: PeerId("QmcMA53bUskoAGdWE5XrQwxRXQm9EhjE6RcTpoyfg6Dv6W"), sequence_number: 1595236462749176036, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a36474797065666576656e747366736f757263.., source: PeerId("QmP5fEfa6Cmk22yF9hFRfokU42mN9NHBLAkyuFzGK4J4vT"), sequence_number: 1595325081332521105, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a5647479706568736e617073686f7466736f75.., source: PeerId("QmYNgJr34Etz7tjZeMN5Pk4vxL6ektqnUuLBJwEtBLGRd4"), sequence_number: 1594900249707574952, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a36474797065666576656e747366736f757263.., source: PeerId("QmcMA53bUskoAGdWE5XrQwxRXQm9EhjE6RcTpoyfg6Dv6W"), sequence_number: 1595236462749176038, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a36474797065666576656e747366736f757263.., source: PeerId("QmP5fEfa6Cmk22yF9hFRfokU42mN9NHBLAkyuFzGK4J4vT"), sequence_number: 1595325081332521109, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a5647479706568736e617073686f7466736f75.., source: PeerId("Qmf9HJdwHccrEJDefDynAjAxbtUCdwEjdFkdGrTdsRXz1j"), sequence_number: 1594903579435181961, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a5647479706568736e617073686f7466736f75.., source: PeerId("QmP5fEfa6Cmk22yF9hFRfokU42mN9NHBLAkyuFzGK4J4vT"), sequence_number: 1595325081332521112, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a36474797065666576656e747366736f757263.., source: PeerId("QmcMA53bUskoAGdWE5XrQwxRXQm9EhjE6RcTpoyfg6Dv6W"), sequence_number: 1595236462749176044, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a36474797065666576656e747366736f757263.., source: PeerId("QmP5fEfa6Cmk22yF9hFRfokU42mN9NHBLAkyuFzGK4J4vT"), sequence_number: 1595325081332521114, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a5647479706568736e617073686f7466736f75.., source: PeerId("QmcMA53bUskoAGdWE5XrQwxRXQm9EhjE6RcTpoyfg6Dv6W"), sequence_number: 1595236462749176046, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a5647479706568736e617073686f7466736f75.., source: PeerId("Qme4zisTBfxhmnAxtEkvDHFq6HsQ7bdbR3ErcUkrKrmh1h"), sequence_number: 1595306484405753623, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a36474797065666576656e747366736f757263.., source: PeerId("QmcMA53bUskoAGdWE5XrQwxRXQm9EhjE6RcTpoyfg6Dv6W"), sequence_number: 1595236462749176049, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a36474797065666576656e747366736f757263.., source: PeerId("QmP5fEfa6Cmk22yF9hFRfokU42mN9NHBLAkyuFzGK4J4vT"), sequence_number: 1595325081332521119, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a36474797065666576656e747366736f757263.., source: PeerId("QmcMA53bUskoAGdWE5XrQwxRXQm9EhjE6RcTpoyfg6Dv6W"), sequence_number: 1595236462749176053, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a36474797065666576656e747366736f757263.., source: PeerId("QmP5fEfa6Cmk22yF9hFRfokU42mN9NHBLAkyuFzGK4J4vT"), sequence_number: 1595325081332521123, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a5647479706568736e617073686f7466736f75.., source: PeerId("QmRZbxgN3fmtkdL4DEbezF2PCC1tjAtgijdWPPewANwx1K"), sequence_number: 1590042983659808291, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a5647479706568736e617073686f7466736f75.., source: PeerId("QmXewoKmfHAsc4DCur7DsuBnvP9MNX97QJf62zoCAsB5AV"), sequence_number: 1595219844902287567, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a5647479706568736e617073686f7466736f75.., source: PeerId("QmaSD6KBmn8oDcVdiJ5t2b4JTMg2hn6H6c4S7M8Qm9H7oC"), sequence_number: 1595552489221850258, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a36474797065666576656e747366736f757263.., source: PeerId("QmcMA53bUskoAGdWE5XrQwxRXQm9EhjE6RcTpoyfg6Dv6W"), sequence_number: 1595236462749176057, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a36474797065666576656e747366736f757263.., source: PeerId("QmP5fEfa6Cmk22yF9hFRfokU42mN9NHBLAkyuFzGK4J4vT"), sequence_number: 1595325081332521128, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a36474797065666576656e747366736f757263.., source: PeerId("QmcMA53bUskoAGdWE5XrQwxRXQm9EhjE6RcTpoyfg6Dv6W"), sequence_number: 1595236462749176061, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a36474797065666576656e747366736f757263.., source: PeerId("QmP5fEfa6Cmk22yF9hFRfokU42mN9NHBLAkyuFzGK4J4vT"), sequence_number: 1595325081332521130, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a5647479706568736e617073686f7466736f75.., source: PeerId("QmYNgJr34Etz7tjZeMN5Pk4vxL6ektqnUuLBJwEtBLGRd4"), sequence_number: 1594900249707574959, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a36474797065666576656e747366736f757263.., source: PeerId("QmcMA53bUskoAGdWE5XrQwxRXQm9EhjE6RcTpoyfg6Dv6W"), sequence_number: 1595236462749176063, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a36474797065666576656e747366736f757263.., source: PeerId("QmP5fEfa6Cmk22yF9hFRfokU42mN9NHBLAkyuFzGK4J4vT"), sequence_number: 1595325081332521134, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a5647479706568736e617073686f7466736f75.., source: PeerId("Qmf9HJdwHccrEJDefDynAjAxbtUCdwEjdFkdGrTdsRXz1j"), sequence_number: 1594903579435181973, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a5647479706568736e617073686f7466736f75.., source: PeerId("QmP5fEfa6Cmk22yF9hFRfokU42mN9NHBLAkyuFzGK4J4vT"), sequence_number: 1595325081332521136, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a36474797065666576656e747366736f757263.., source: PeerId("QmcMA53bUskoAGdWE5XrQwxRXQm9EhjE6RcTpoyfg6Dv6W"), sequence_number: 1595236462749176068, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a5647479706568736e617073686f7466736f75.., source: PeerId("QmcMA53bUskoAGdWE5XrQwxRXQm9EhjE6RcTpoyfg6Dv6W"), sequence_number: 1595236462749176070, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }
XXXXT recv GossipsubRpc { messages: [GossipsubMessage { data: a36474797065666576656e747366736f757263.., source: PeerId("QmP5fEfa6Cmk22yF9hFRfokU42mN9NHBLAkyuFzGK4J4vT"), sequence_number: 1595325081332521139, topics: [TopicHash { hash: "/klk/prod/2020-01-09" }] }], subscriptions: [], control_msgs: [] }

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 24, 2020

This is a manually instrumented debug branch, but it does include the changes you suggested in 7d678f5 .

Here is the branch: https://github.com/Actyx/rust-libp2p/tree/rkl/gossip-debug

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 24, 2020

I have run this with the message signing branch. Same behaviour. Works with the delay, but not without.

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 24, 2020

FYI I have made a PR with a few minor improvements against your message signing branch: sigp#32

@AgeManning
Copy link
Contributor

Hey awesome. Thanks for the help.
It would be nice to get the actual logs to see what's going on.
You could use the environment logger as in the example here: https://github.com/libp2p/rust-libp2p/blob/master/examples/gossipsub-chat.rs#L64
Or more simply in the test here: https://github.com/libp2p/rust-libp2p/blob/master/protocols/gossipsub/tests/smoke.rs#L169

And use the environment variable RUST_LOG=libp2p_gossipsub=debug before running your application.

It should output the gossipsub logs, which might help us diagnose this.

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 24, 2020

Got another PR, which you may or may not want to merge. I hope you don't mind: sigp#33

Will get you the tracing logs. There is something wrong with the tracing setup in my app though, so this might take a bit longer.

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 24, 2020

OK, here are the tracing logs with both delay and no delay.

https://gist.github.com/rklaehn/db7b167987a37a3898c07adedba4a3e4

There is some application data mixed in in the "works" case, but just ignore that.

Let me know if you need more. One thing that is already interesting is that apparently some proto msgs from go-ipfs can not be parsed.

Jul 24 13:32:25.721  WARN libp2p_gossipsub::handler: Invalid message received. Error: failed to decode Protobuf message: ControlIHave.message_ids: ControlMessage.ihave: Rpc.control: invalid string value: data is not UTF-8 encoded 

BTW this is against the gossipsub-signing branch, which seems to be your current dev branch.

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 24, 2020

OK, now this is weird. I looked at the proto msgs in detail. E.g.

$ echo 1ac9010ac6010a142f6b6c6b2f70726f642f323032302d30312d3039122a1220d024d0c98917394729c109ddf8d94c98d36ff3899b7f3f6d148b8e6baf084dc316236b1c66ea8a8e122a12202fe480d71af77f741f0a3abf054e30c5d414a8b61890d612c075f0a82a3e682e1610f7ab52a56bb2122a12200b04920eca11fe080538fd2e165d7e9547b90ee42879e36a267f37dbaa0beee01623bbb58618d1f8122a12208a68f5e8b6a57ef6515d1a9deebb060766edf4dd71f925a37eafa2fadc125ac616235bff420e07ec  | xxd -r -p | protoc --decode_raw
3 {
  1 {
    1: "/klk/prod/2020-01-09"
    2: "\022 \320$\320\311\211\0279G)\301\t\335\370\331L\230\323o\363\211\233\177?m\024\213\216k\257\010M\303\026#k\034f\352\212\216"
    2: "\022 /\344\200\327\032\367\177t\037\n:\277\005N0\305\324\024\250\266\030\220\326\022\300u\360\250*>h.\026\020\367\253R\245k\262"
    2: "\022 \013\004\222\016\312\021\376\010\0058\375.\026]~\225G\271\016\344(y\343j&\1777\333\252\013\356\340\026#\273\265\206\030\321\370"
    2: "\022 \212h\365\350\266\245~\366Q]\032\235\356\273\006\007f\355\364\335q\371%\243~\257\242\372\334\022Z\306\026#[\377B\016\007\354"
  }
}

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 24, 2020

I am getting message ids in the control msgs which are not utf8 strings, causing the parsing of the entire msg to fail...

Jul 24 14:26:05.599  WARN libp2p_gossipsub::protocol: message id not an utf8 string: 12202fe480d71af77f741f0a3abf054e30c5d414a8b61890d612c075f0a82a3e682e1610f7ab52a57069    
Jul 24 14:26:05.599  WARN libp2p_gossipsub::protocol: message id not an utf8 string: 1220d024d0c98917394729c109ddf8d94c98d36ff3899b7f3f6d148b8e6baf084dc316236b1c66ea8e82    
Jul 24 14:26:05.599  WARN libp2p_gossipsub::protocol: message id not an utf8 string: 12208a68f5e8b6a57ef6515d1a9deebb060766edf4dd71f925a37eafa2fadc125ac616235bff420e08f9    
Jul 24 14:26:05.599  WARN libp2p_gossipsub::protocol: message id not an utf8 string: 12200b04920eca11fe080538fd2e165d7e9547b90ee42879e36a267f37dbaa0beee01623bbb58618d5f2

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 24, 2020

I changed MessageId to be an arbitrary Vec<u8> instead of just a string, and suddenly things work. 🤷 . Going to check with protocol labs on how to proceed with this...

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 24, 2020

Issue in go-libp2p-gossipsub: libp2p/go-libp2p-pubsub#361

Workaround: Actyx@536ce88 (requires changing the .proto, which we probably don't want to do)

@rklaehn rklaehn changed the title Gossipsub subscribe race condition? Gossipsub interop issue with go-ipfs Jul 24, 2020
@AgeManning
Copy link
Contributor

Ok, awesome. Thanks for tracking this down. Seems like we relax the string condition in the protobuf and generalise the message-id.

I'll update this and merge your commit.

Thanks again.

If you're happy with this solution, I'll make these changes and finish of #1583 and this should resolve this as well as #1674

@rklaehn
Copy link
Contributor Author

rklaehn commented Jul 25, 2020

Yep, both issues should be fixed. I might have a few other small things, now that I am a bit more familiar with the code base.

The PR to the specs has been merged, so messageid is now officially bytes... libp2p/specs@c5b328a

@AgeManning
Copy link
Contributor

This should be resolved in #1583

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants