Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: strange errors when light pushing messages with payload >= 300 kb #1076

Closed
fbarbu15 opened this issue Apr 3, 2024 · 11 comments · Fixed by #1083
Closed

bug: strange errors when light pushing messages with payload >= 300 kb #1076

fbarbu15 opened this issue Apr 3, 2024 · 11 comments · Fixed by #1083
Assignees
Labels
bug Something isn't working

Comments

@fbarbu15
Copy link

fbarbu15 commented Apr 3, 2024

When the payload is >= 300 kb the light push fails with some strange messages

  1. Payload ~= 300 KB
payload=$(head -c $((1024 * 225)) /dev/urandom | base64 -w 0) && \
echo "{\"pubsubTopic\": \"/waku/2/rs/0/0\", \"message\": {\"payload\": \"${payload}\", \"contentTopic\": \"/myapp/1/latest/proto\", \"timestamp\": 1712135330213797632}}" > /tmp/message_payload.json && \
curl -v -X POST "http://127.0.0.1:59667/lightpush/v1/message" -H "Content-Type: application/json" -d @/tmp/message_payload.json

Error is EOF with 503 Service Unavailable

  1. Payload larger than 1MB
payload=$(head -c $((1024 * 1024)) /dev/urandom | base64 -w 0) && \
echo "{\"pubsubTopic\": \"/waku/2/rs/0/0\", \"message\": {\"payload\": \"${payload}\", \"contentTopic\": \"/myapp/1/latest/proto\", \"timestamp\": 1712135330213797632}}" > /tmp/message_payload.json && \
curl -v -X POST "http://127.0.0.1:59667/lightpush/v1/message" -H "Content-Type: application/json" -d @/tmp/message_payload.json

Error is stream reset with 503 Service Unavailable

relay_node.log
light_push_node_that_sends_the_messages.log

@fbarbu15
Copy link
Author

Issue still reproduces with latest master

@richard-ramos
Copy link
Member

I'm getting this response with latest master: Could not publish message: message size exceeds gossipsub max message size for both 300kb and 1mb message:

* Mark bundle as not supporting multiuse
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
* Mark bundle as not supporting multiuse
< HTTP/1.1 503 Service Unavailable
< Access-Control-Allow-Origin: *
< Cache-Control: no-cache, no-store, no-transform, must-revalidate, private, max-age=0
< Expires: Wed, 31 Dec 1969 19:30:00 -0430
< Pragma: no-cache
< X-Accel-Expires: 0
< Date: Tue, 16 Apr 2024 19:38:12 GMT
< Content-Length: 74
< Content-Type: text/plain; charset=utf-8
< 
* Connection #0 to host 127.0.0.1 left intact
Could not publish message: message size exceeds gossipsub max message size

please do share the script you used to test this scenario. I might be missing some flag

@chaitanyaprem
Copy link
Collaborator

chaitanyaprem commented Apr 17, 2024

This could be due to the flag defaultMaxMsgSize which is set to 150KB by default for TWN.
The lightpush service/relay node should include the flag --max-msg-size. Flag value format is Maximum message size. Supported formats are B, KiB, KB, MiB. If no suffix, default is bytes

@fbarbu15
Copy link
Author

@richard-ramos here's the script

#!/bin/bash
printf "\nAssuming you already have a docker network called waku\n"
# if not something like this should create it: docker network create --driver bridge --subnet 172.18.0.0/16 --gateway 172.18.0.1 waku


cluster_id=2
pubsub_topic="/waku/2/rs/$cluster_id/0"
node_1=harbor.status.im/wakuorg/nwaku:latest
node_2=harbor.status.im/wakuorg/go-waku:latest
ext_ip="172.18.204.9"
tcp_port="37344"
ts=`date +%s%N`

printf "\nStarting containers\n"

container_id1=$(docker run -d -i -t -p 37343:37343 -p $tcp_port:$tcp_port -p 37345:37345 -p 37346:37346 -p 37347:37347 $node_1 --listen-address=0.0.0.0 --rest=true --rest-admin=true --websocket-support=true --log-level=TRACE --rest-relay-cache-capacity=100 --websocket-port=37345 --rest-port=37343 --tcp-port=$tcp_port --discv5-udp-port=37346 --rest-address=0.0.0.0 --nat=extip:$ext_ip --peer-exchange=true --discv5-discovery=true --cluster-id=$cluster_id --metrics-server=true --metrics-server-address=0.0.0.0 --metrics-server-port=37347 --metrics-logging=true --pubsub-topic=/waku/2/rs/2/0 --lightpush=true --relay=true)
docker network connect --ip $ext_ip waku $container_id1

printf "\nSleeping 2 seconds\n"
sleep 2

response=$(curl -X GET "http://127.0.0.1:37343/debug/v1/info" -H "accept: application/json")
enrUri=$(echo $response | jq -r '.enrUri')

# Extract the first non-WebSocket address
ws_address=$(echo $response | jq -r '.listenAddresses[] | select(contains("/ws") | not)')

# Check if we got an address, and construct the new address with it
if [[ $ws_address != "" ]]; then
    identifier=$(echo $ws_address | awk -F'/p2p/' '{print $2}')
    if [[ $identifier != "" ]]; then
        multiaddr_with_id="/ip4/${ext_ip}/tcp/${tcp_port}/p2p/${identifier}"
        echo $multiaddr_with_id
    else
        echo "No identifier found in the address."
        exit 1
    fi
else
    echo "No non-WebSocket address found."
    exit 1
fi

container_id2=$(docker run -d -i -t -p 25908:25908 -p 25909:25909 -p 25910:25910 -p 25911:25911 -p 25912:25912 $node_2 --listen-address=0.0.0.0 --rest=true --rest-admin=true --websocket-support=true --log-level=DEBUG --rest-relay-cache-capacity=100 --websocket-port=25910 --rest-port=25908 --tcp-port=25909 --discv5-udp-port=25911 --rest-address=0.0.0.0 --nat=extip:172.18.141.214 --peer-exchange=true --discv5-discovery=true --cluster-id=$cluster_id --min-relay-peers-to-publish=1 --rest-filter-cache-capacity=50 --pubsub-topic=/waku/2/rs/2/0 --lightpush=true --relay=false --discv5-bootstrap-node=$enrUri --lightpushnode=$multiaddr_with_id)

docker network connect --ip 172.18.141.214 waku $container_id2

printf "\nSleeping 10 seconds\n"
sleep 10

printf "\nSubscribe\n"
curl -v -X POST "http://127.0.0.1:37343/relay/v1/subscriptions" -H "Content-Type: application/json" -d '["/waku/2/rs/2/0"]'


printf "\nSleeping 2 seconds\n"
sleep 2

printf "\nLightpush messagewith 300kb\n"                            
payload=$(head -c $((1024 * 225)) /dev/urandom | base64 -w 0) && \
echo "{\"pubsubTopic\": \"${pubsub_topic}\", \"message\": {\"payload\": \"${payload}\", \"contentTopic\": \"/myapp/1/latest/proto\", \"timestamp\": $ts}}" > /tmp/message_payload.json && \
curl -v -X POST "http://127.0.0.1:25908/lightpush/v1/message" -H "Content-Type: application/json" -d @/tmp/message_payload.json


printf "\nPayload larger than 1MB\n" 
payload=$(head -c $((1024 * 1024)) /dev/urandom | base64 -w 0) && \
echo "{\"pubsubTopic\": \"${pubsub_topic}\", \"message\": {\"payload\": \"${payload}\", \"contentTopic\": \"/myapp/1/latest/proto\", \"timestamp\": $ts}}" > /tmp/message_payload.json && \
curl -v -X POST "http://127.0.0.1:25908/lightpush/v1/message" -H "Content-Type: application/json" -d @/tmp/message_payload.json

@chaitanyaprem
Copy link
Collaborator

@richard-ramos here's the script

#!/bin/bash
printf "\nAssuming you already have a docker network called waku\n"
# if not something like this should create it: docker network create --driver bridge --subnet 172.18.0.0/16 --gateway 172.18.0.1 waku


cluster_id=2
pubsub_topic="/waku/2/rs/$cluster_id/0"
node_1=harbor.status.im/wakuorg/nwaku:latest
node_2=harbor.status.im/wakuorg/go-waku:latest
ext_ip="172.18.204.9"
tcp_port="37344"
ts=`date +%s%N`

printf "\nStarting containers\n"

container_id1=$(docker run -d -i -t -p 37343:37343 -p $tcp_port:$tcp_port -p 37345:37345 -p 37346:37346 -p 37347:37347 $node_1 --listen-address=0.0.0.0 --rest=true --rest-admin=true --websocket-support=true --log-level=TRACE --rest-relay-cache-capacity=100 --websocket-port=37345 --rest-port=37343 --tcp-port=$tcp_port --discv5-udp-port=37346 --rest-address=0.0.0.0 --nat=extip:$ext_ip --peer-exchange=true --discv5-discovery=true --cluster-id=$cluster_id --metrics-server=true --metrics-server-address=0.0.0.0 --metrics-server-port=37347 --metrics-logging=true --pubsub-topic=/waku/2/rs/2/0 --lightpush=true --relay=true)
docker network connect --ip $ext_ip waku $container_id1

printf "\nSleeping 2 seconds\n"
sleep 2

response=$(curl -X GET "http://127.0.0.1:37343/debug/v1/info" -H "accept: application/json")
enrUri=$(echo $response | jq -r '.enrUri')

# Extract the first non-WebSocket address
ws_address=$(echo $response | jq -r '.listenAddresses[] | select(contains("/ws") | not)')

# Check if we got an address, and construct the new address with it
if [[ $ws_address != "" ]]; then
    identifier=$(echo $ws_address | awk -F'/p2p/' '{print $2}')
    if [[ $identifier != "" ]]; then
        multiaddr_with_id="/ip4/${ext_ip}/tcp/${tcp_port}/p2p/${identifier}"
        echo $multiaddr_with_id
    else
        echo "No identifier found in the address."
        exit 1
    fi
else
    echo "No non-WebSocket address found."
    exit 1
fi

container_id2=$(docker run -d -i -t -p 25908:25908 -p 25909:25909 -p 25910:25910 -p 25911:25911 -p 25912:25912 $node_2 --listen-address=0.0.0.0 --rest=true --rest-admin=true --websocket-support=true --log-level=DEBUG --rest-relay-cache-capacity=100 --websocket-port=25910 --rest-port=25908 --tcp-port=25909 --discv5-udp-port=25911 --rest-address=0.0.0.0 --nat=extip:172.18.141.214 --peer-exchange=true --discv5-discovery=true --cluster-id=$cluster_id --min-relay-peers-to-publish=1 --rest-filter-cache-capacity=50 --pubsub-topic=/waku/2/rs/2/0 --lightpush=true --relay=false --discv5-bootstrap-node=$enrUri --lightpushnode=$multiaddr_with_id)

docker network connect --ip 172.18.141.214 waku $container_id2

printf "\nSleeping 10 seconds\n"
sleep 10

printf "\nSubscribe\n"
curl -v -X POST "http://127.0.0.1:37343/relay/v1/subscriptions" -H "Content-Type: application/json" -d '["/waku/2/rs/2/0"]'


printf "\nSleeping 2 seconds\n"
sleep 2

printf "\nLightpush messagewith 300kb\n"                            
payload=$(head -c $((1024 * 225)) /dev/urandom | base64 -w 0) && \
echo "{\"pubsubTopic\": \"${pubsub_topic}\", \"message\": {\"payload\": \"${payload}\", \"contentTopic\": \"/myapp/1/latest/proto\", \"timestamp\": $ts}}" > /tmp/message_payload.json && \
curl -v -X POST "http://127.0.0.1:25908/lightpush/v1/message" -H "Content-Type: application/json" -d @/tmp/message_payload.json


printf "\nPayload larger than 1MB\n" 
payload=$(head -c $((1024 * 1024)) /dev/urandom | base64 -w 0) && \
echo "{\"pubsubTopic\": \"${pubsub_topic}\", \"message\": {\"payload\": \"${payload}\", \"contentTopic\": \"/myapp/1/latest/proto\", \"timestamp\": $ts}}" > /tmp/message_payload.json && \
curl -v -X POST "http://127.0.0.1:25908/lightpush/v1/message" -H "Content-Type: application/json" -d @/tmp/message_payload.json

Looks like the max-msg-size flag is not passed in the relay node. Can you rerun passing it with 1024KiB since it is nwaku node? nwaku only supports KiB, KB and Bytes for this flag.

@fbarbu15
Copy link
Author

@chaitanyaprem I did but there's not change, I get the same 503 errors

@chaitanyaprem
Copy link
Collaborator

When i finally ran the script after making some changes as i am using mac, i get 503 error but due to nwaku not able to publish as there are no peers for relay. I think it could be because the min-peers required for publishing via relay maybe set to 1 .


:6620e42e1b879d00e6d59e64 ms=/vac/waku/lightpush/2.0.0-beta1
2024-04-18 14:43:18 TRC 2024-04-18 09:13:18.372+00:00 found handler                              topics="libp2p multistream" tid=1 file=multistream.nim:181 conn=16U*6U3NiN:6620e42e1b879d00e6d59e64 protocol=/vac/waku/lightpush/2.0.0-beta1
2024-04-18 14:43:18 TRC 2024-04-18 09:13:18.372+00:00 try to send the buffer                     topics="libp2p yamux" tid=1 file=yamux.nim:361 h="{Data, {}, streamId: 5, length: 33}"
2024-04-18 14:43:18 TRC 2024-04-18 09:13:18.372+00:00 found handler                              topics="libp2p multistream" tid=1 file=multistream.nim:207 conn=16U*6U3NiN:6620e42e1b879d00e6d59e64 protocol=/vac/waku/lightpush/2.0.0-beta1
2024-04-18 14:43:18 DBG 2024-04-18 09:13:18.375+00:00 push request                               topics="waku lightpush" tid=1 file=protocol.nim:63 peerId=16U*6U3NiN requestId=5771ff90b387bd2d154ef843fa8e631d962a4016168ae6c5baab7265f2da31bc pubsubTopic=/waku/2/rs/2/0 hash=0xc1a16b9f01f143cfad006e3a94b7146dc8a9d3aafcac4e11ecb972ea39ffb818
2024-04-18 14:43:18 TRC 2024-04-18 09:13:18.378+00:00 waku.relay received                        topics="waku node" tid=1 file=waku_node.nim:223 peerId=16U*SJwZi2 pubsubTopic=/waku/2/rs/2/0 hash=0xc1a16b9f01f143cfad006e3a94b7146dc8a9d3aafcac4e11ecb972ea39ffb818 receivedTime=1713431598378322176 payloadSizeBytes=15
2024-04-18 14:43:18 TRC 2024-04-18 09:13:18.381+00:00 Publishing message on topic                topics="libp2p gossipsub" tid=1 file=gossipsub.nim:617 data=0a0f54657374...aa0100f80100 topic=/waku/2/rs/2/0
2024-04-18 14:43:18 TRC 2024-04-18 09:13:18.382+00:00 about to replenish fanout                  topics="libp2p gossipsub" tid=1 file=behavior.nim:558 topic=/waku/2/rs/2/0
2024-04-18 14:43:18 TRC 2024-04-18 09:13:18.383+00:00 replenishing fanout                        topics="libp2p gossipsub" tid=1 file=behavior.nim:562 peers=0 topic=/waku/2/rs/2/0
2024-04-18 14:43:18 TRC 2024-04-18 09:13:18.383+00:00 fanout replenished with peers              topics="libp2p gossipsub" tid=1 file=behavior.nim:569 peers=0 topic=/waku/2/rs/2/0
2024-04-18 14:43:18 DBG 2024-04-18 09:13:18.383+00:00 No peers for topic, skipping publish       topics="libp2p gossipsub" tid=1 file=gossipsub.nim:671 peersOnTopic=0 connectedPeers=0 topic=/waku/2/rs/2/0
2024-04-18 14:43:18 DBG 2024-04-18 09:13:18.384+00:00 Lightpush request has not been published to any peers topics="waku node" tid=1 file=waku_node.nim:817

Looks like nim-libp2p gossip publish fails if there are no peers to publish.

@chaitanyaprem
Copy link
Collaborator

Further update, as a work-around i have enabled relay on go-waku node as well the 300Kb message goes through but 1MB still fails. Trying to figure out the cause for the same.

@chaitanyaprem
Copy link
Collaborator

chaitanyaprem commented Apr 18, 2024

For the second request i see following error where stream is being reset from nwaku.

2024-04-18 15:25:55 2024-04-18T09:55:55.140Z    INFO    gowaku.lightpush        rest/lightpush_rest.go:81       =================================> lightpush request proto conversion successful   {"node": "16Uiu2HAmJzb3z9mNrwGhUeEakLRJAsnzLgmaPtALeJQ1h19fYxTk"}
2024-04-18 15:25:55 2024-04-18T09:55:55.140Z    DEBUG   gowaku.node2.lightpush  lightpush/waku_lightpush.go:331 publishing message      {"node": "16Uiu2HAmJzb3z9mNrwGhUeEakLRJAsnzLgmaPtALeJQ1h19fYxTk", "hash": "0x13d2d97958ab0f1b03b93231c8a836a2660f20eac28ec8f3ee5ec7f7634157be", "pubsubTopic": "/waku/2/rs/2/0", "contentTopic": "/myapp/1/latest/proto", "timestamp": 1713434138, "peerID": "16Uiu2HAm1CHSSyo4MzJyuAgzP1bewEVLajKAGRLYa5e7pdmhXBge"}
2024-04-18 15:25:55 2024-04-18T09:55:55.141Z    DEBUG   swarm2  swarm/swarm.go:455      [16Uiu2HAmJzb3z9mNrwGhUeEakLRJAsnzLgmaPtALeJQ1h19fYxTk] opening stream to peer [16Uiu2HAm1CHSSyo4MzJyuAgzP1bewEVLajKAGRLYa5e7pdmhXBge]     {"node": "16Uiu2HAmJzb3z9mNrwGhUeEakLRJAsnzLgmaPtALeJQ1h19fYxTk"}
2024-04-18 15:25:55 2024-04-18T09:55:55.146Z    ERROR   gowaku.node2.lightpush  lightpush/waku_lightpush.go:217 writing request {"node": "16Uiu2HAmJzb3z9mNrwGhUeEakLRJAsnzLgmaPtALeJQ1h19fYxTk", "peer": "16Uiu2HAm1CHSSyo4MzJyuAgzP1bewEVLajKAGRLYa5e7pdmhXBge", "error": "stream reset"}
2024-04-18 15:25:55 2024-04-18T09:55:55.146Z    INFO    gowaku.node2.connection-notifier        node/connectedness.go:78        peer disconnected       {"node": "16Uiu2HAmJzb3z9mNrwGhUeEakLRJAsnzLgmaPtALeJQ1h19fYxTk", "peer": "16Uiu2HAm1CHSSyo4MzJyuAgzP1bewEVLajKAGRLYa5e7pdmhXBge"}
2024-04-18 15:25:55 2024-04-18T09:55:55.146Z    ERROR   gowaku.node2.lightpush  lightpush/waku_lightpush.go:335 could not publish message       {"node": "16Uiu2HAmJzb3z9mNrwGhUeEakLRJAsnzLgmaPtALeJQ1h19fYxTk", "hash": "0x13d2d97958ab0f1b03b93231c8a836a2660f20eac28ec8f3ee5ec7f7634157be", "pubsubTopic": "/waku/2/rs/2/0", "contentTopic": "/myapp/1/latest/proto", "timestamp": 1713434138, "peerID": "16Uiu2HAm1CHSSyo4MzJyuAgzP1bewEVLajKAGRLYa5e7pdmhXBge", "error": "stream reset"}
2024-04-18 15:25:55 2024/04/18 09:55:55 "POST http://127.0.0.1:25908/lightpush/v1/message HTTP/1.1" from 192.168.65.1:19925 - 503 12B in 19.26725ms

Looking at nwaku logs, i notice the following error , looks like some limit is being hit.


2024-04-18 15:25:55 TRC 2024-04-18 09:55:55.142+00:00 waiting for header                         topics="libp2p yamux" tid=1 file=yamux.nim:520
2024-04-18 15:25:55 TRC 2024-04-18 09:55:55.142+00:00 readFrame                                  topics="libp2p noise" tid=1 file=noise.nim:312 sconn=16U*9fYxTk:6620ee1c2d9a70069c096332 size=65535
2024-04-18 15:25:55 TRC 2024-04-18 09:55:55.142+00:00 handle: got request                        topics="libp2p multistream" tid=1 file=multistream.nim:160 conn=16U*9fYxTk:6620ee2b2d9a70069c09633a ms=/multistream/1.0.0
2024-04-18 15:25:55 TRC 2024-04-18 09:55:55.142+00:00 try to send the buffer                     topics="libp2p yamux" tid=1 file=yamux.nim:361 h="{Data, {}, streamId: 7, length: 20}"
2024-04-18 15:25:55 TRC 2024-04-18 09:55:55.142+00:00 handle: got request                        topics="libp2p multistream" tid=1 file=multistream.nim:160 conn=16U*9fYxTk:6620ee2b2d9a70069c09633a ms=/vac/waku/lightpush/2.0.0-beta1
2024-04-18 15:25:55 TRC 2024-04-18 09:55:55.142+00:00 found handler                              topics="libp2p multistream" tid=1 file=multistream.nim:181 conn=16U*9fYxTk:6620ee2b2d9a70069c09633a protocol=/vac/waku/lightpush/2.0.0-beta1
2024-04-18 15:25:55 TRC 2024-04-18 09:55:55.142+00:00 try to send the buffer                     topics="libp2p yamux" tid=1 file=yamux.nim:361 h="{Data, {}, streamId: 7, length: 33}"
2024-04-18 15:25:55 TRC 2024-04-18 09:55:55.142+00:00 found handler                              topics="libp2p multistream" tid=1 file=multistream.nim:207 conn=16U*9fYxTk:6620ee2b2d9a70069c09633a protocol=/vac/waku/lightpush/2.0.0-beta1
2024-04-18 15:25:55 TRC 2024-04-18 09:55:55.142+00:00 Exception in multistream                   topics="libp2p multistream" tid=1 file=multistream.nim:228 conn=16U*9fYxTk:6620ee2b2d9a70069c09633a msg="Message exceeds maximum length"

@chaitanyaprem
Copy link
Collaborator

chaitanyaprem commented Apr 18, 2024

Ok, on further debugging i have figured out there is an issue in nwaku wrt how max-msg-size is being used in other protocols like lightpush which seems to be causing this 503 error.

@chaitanyaprem
Copy link
Collaborator

Reran your test script after the changes in waku-org/nwaku#2601 and it passed.

So, closing this issue as this is not go-waku related anymore. Once that PR is merged, please rerun.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants