Distributed Inference tries to connect twice to the same client #3364

ManuXD32 · 2024-08-23T16:24:12Z

LocalAI version:
v2.20.1

Environment, CPU architecture, OS, and Version:
Ubuntu server, raspberry pi OS

Describe the bug
When trying to initiate inference through P2P, the workers get initiated twice and the main node tries to use twice the max ammount of memory available.

To Reproduce
Initiate the main node: ./local-ai-Linux-x86_64 run --p2p
Initiate worker1: ./local-ai-Linux-arm64 worker p2p-llama-cpp-rpc
Initiate worker2: ./local-ai-Linux-x86_64 worker p2p-llama-cpp-rpc
(Token stablished in the three of them)

Expected behavior
Use only the memory available.

Logs
Main node logs:

6:15PM INF Trying to load the model 'Meta-Llama-3.1-70B-Instruct.Q4_K_M.gguf' with the backend '[llama-cpp llama-ggml llama-cpp-fallback piper rwkv whisper huggingface bert-embeddings]'
6:15PM INF [llama-cpp] Attempting to load
6:15PM INF Loading model 'Meta-Llama-3.1-70B-Instruct.Q4_K_M.gguf' with backend llama-cpp
6:15PM INF [llama-cpp-grpc] attempting to load with GRPC variant
6:15PM INF Redirecting 127.0.0.1:40815 to /ip4/192.168.18.44/udp/50955/quic-v1
6:15PM INF Redirecting 127.0.0.1:39147 to /ip4/192.168.18.103/udp/50707/quic-v1
6:15PM INF Redirecting 127.0.0.1:40815 to /ip4/192.168.18.44/udp/50955/quic-v1
6:15PM INF Redirecting 127.0.0.1:39147 to /ip4/192.168.18.103/udp/50707/quic-v1
6:16PM INF Redirecting 127.0.0.1:40815 to /ip4/192.168.18.44/udp/50955/quic-v1
6:16PM INF [llama-cpp] Fails: could not load model: rpc error: code = Canceled desc =
6:16PM INF [llama-cpp] Autodetection failed, trying the fallback
6:16PM INF Loading model 'Meta-Llama-3.1-70B-Instruct.Q4_K_M.gguf' with backend llama-cpp-avx2

Worker1 logs:

6:15PM INF Setting logging to info
{"level":"INFO","time":"2024-08-23T18:15:12.365+0200","caller":"config/config.go:288","message":"connmanager disabled\n"}
6:15PM INF Starting llama-cpp-rpc-server on '127.0.0.1:46589'
{"level":"INFO","time":"2024-08-23T18:15:12.368+0200","caller":"config/config.go:292","message":" go-libp2p resource manager protection disabled"}
{"level":"INFO","time":"2024-08-23T18:15:12.373+0200","caller":"node/node.go:118","message":" Starting EdgeVPN network"}
create_backend: using CPU backend
Starting RPC server on 127.0.0.1:46589, backend memory: 7810 MB
2024/08/23 18:15:12 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 7168 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.
{"level":"INFO","time":"2024-08-23T18:15:12.580+0200","caller":"node/node.go:172","message":" Node ID: 12D3KooWJop5nYfGyoTD1sFXBhk5rDDSevDhXVBJRqXsd2otATkT"}
{"level":"INFO","time":"2024-08-23T18:15:12.581+0200","caller":"node/node.go:173","message":" Node Addresses: [/ip4/127.0.0.1/tcp/42603 /ip4/127.0.0.1/udp/38003/quic-v1 /ip4/127.0.0.1/udp/50707/quic-v1/webtransport/certhash/uEiB-CnmLhKE3dajiHaTCVUnMUNKuJdd_otIHVL0SxCMNuw/certhash/uEiDTFDCkNza6tRwGRAa18mMaeyT-gW5FP57RB7Si5ZGV1Q /ip4/127.0.0.1/udp/52519/webrtc-direct/certhash/uEiAYQZdGzXH8vs2krcAiHVRZvPEjnnKwYtBGUw0rAryi2Q /ip4/192.168.18.103/tcp/42603 /ip4/192.168.18.103/udp/38003/quic-v1 /ip4/192.168.18.103/udp/50707/quic-v1/webtransport/certhash/uEiB-CnmLhKE3dajiHaTCVUnMUNKuJdd_otIHVL0SxCMNuw/certhash/uEiDTFDCkNza6tRwGRAa18mMaeyT-gW5FP57RB7Si5ZGV1Q /ip4/192.168.18.103/udp/52519/webrtc-direct/certhash/uEiAYQZdGzXH8vs2krcAiHVRZvPEjnnKwYtBGUw0rAryi2Q /ip6/::1/tcp/44529 /ip6/::1/udp/41447/quic-v1 /ip6/::1/udp/44894/quic-v1/webtransport/certhash/uEiB-CnmLhKE3dajiHaTCVUnMUNKuJdd_otIHVL0SxCMNuw/certhash/uEiDTFDCkNza6tRwGRAa18mMaeyT-gW5FP57RB7Si5ZGV1Q /ip6/::1/udp/52411/webrtc-direct/certhash/uEiAYQZdGzXH8vs2krcAiHVRZvPEjnnKwYtBGUw0rAryi2Q]"}
{"level":"INFO","time":"2024-08-23T18:15:12.586+0200","caller":"discovery/dht.go:104","message":" Bootstrapping DHT"}
{"level":"WARN","time":"2024-08-23T18:15:12.591+0200","caller":"node/connection.go:226","message":"publish error: no message room available\n"}
{"level":"WARN","time":"2024-08-23T18:15:12.593+0200","caller":"node/connection.go:226","message":"publish error: no message room available\n"}
Accepted client connection, free_mem=8189411328, total_mem=8189411328
Client connection closed
Accepted client connection, free_mem=8189411328, total_mem=8189411328
Client connection closed

Worker2 logs:

4:15PM INF Setting logging to info {"level":"INFO","time":"2024-08-23T16:15:05.787Z","caller":"config/config.go:288","message":"connmanager disabled\n"} 4:15PM INF Starting llama-cpp-rpc-server on '127.0.0.1:39571'{"level":"INFO","time":"2024-08-23T16:15:05.787Z","caller":"config/config.go:292","message":" go-libp2p resource manager protection disabled"} {"level":"INFO","time":"2024-08-23T16:15:05.788Z","caller":"node/node.go:118","message":" Starting EdgeVPN network"} create_backend: using CPU backend Starting RPC server on 127.0.0.1:39571, backend memory: 11854 MB 2024/08/23 16:15:05 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 7168 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details. {"level":"INFO","time":"2024-08-23T16:15:05.805Z","caller":"node/node.go:172","message":" Node ID: 12D3KooWPDeW43Br3JLzghRS6iZ8zt7czRnYmjtystNhggiwLVR1"} {"level":"INFO","time":"2024-08-23T16:15:05.805Z","caller":"node/node.go:173","message":" Node Addresses: [/ip4/127.0.0.1/tcp/35197 /ip4/127.0.0.1/udp/34605/quic-v1/webtransport/certhash/uEiAzVFqmJ6DXPuTPEI27rSYGhhvItfVf61PkHnfrRaIQkw/certhash/uEiBcxxt249su8zlNVJzkJibG80o5MVnRnTFVAcUsl4t3aA /ip4/127.0.0.1/udp/36434/webrtc-direct/certhash/uEiBKiZJ4GK-ATY0rWZvpFoJBoH-wFYhtz3hnFLWsI2Cljg /ip4/127.0.0.1/udp/50955/quic-v1 /ip4/192.168.18.44/tcp/35197 /ip4/192.168.18.44/udp/34605/quic-v1/webtransport/certhash/uEiAzVFqmJ6DXPuTPEI27rSYGhhvItfVf61PkHnfrRaIQkw/certhash/uEiBcxxt249su8zlNVJzkJibG80o5MVnRnTFVAcUsl4t3aA /ip4/192.168.18.44/udp/36434/webrtc-direct/certhash/uEiBKiZJ4GK-ATY0rWZvpFoJBoH-wFYhtz3hnFLWsI2Cljg /ip4/192.168.18.44/udp/50955/quic-v1 /ip6/::1/tcp/40095 /ip6/::1/udp/43292/webrtc-direct/certhash/uEiBKiZJ4GK-ATY0rWZvpFoJBoH-wFYhtz3hnFLWsI2Cljg /ip6/::1/udp/56079/quic-v1 /ip6/::1/udp/58500/quic-v1/webtransport/certhash/uEiAzVFqmJ6DXPuTPEI27rSYGhhvItfVf61PkHnfrRaIQkw/certhash/uEiBcxxt249su8zlNVJzkJibG80o5MVnRnTFVAcUsl4t3aA]"}
{"level":"INFO","time":"2024-08-23T16:15:05.806Z","caller":"discovery/dht.go:104","message":" Bootstrapping DHT"}
Accepted client connection, free_mem=12430540800, total_mem=12430540800
Client connection closed
Accepted client connection, free_mem=12430540800, total_mem=12430540800
Client connection closed
Accepted client connection, free_mem=12430540800, total_mem=12430540800
ggml_backend_cpu_buffer_type_alloc_buffer: failed to allocate buffer of size 24512430112
Client connection closed

Additional context
I tried a couple of times, when the worker1 was the first used, it showed the same error as worker2 but with the double of its corresponding total memory.

It would be nice if we could set a maximum of memory per worker, either from UI or command line

mudler · 2024-08-24T08:04:47Z

I see only a single worker being started:

TOKEN=xxx ./local-ai worker p2p-llama-cpp-rpc
10:01AM INF env file found, loading environment variables from file envFile=.env
10:01AM INF Setting logging to info
{"level":"INFO","time":"2024-08-24T10:01:46.659+0200","caller":"config/config.go:288","message":"connmanager disabled\n"}
{"level":"INFO","time":"2024-08-24T10:01:46.659+0200","caller":"config/config.go:292","message":" go-libp2p resource manager protection disabled"}
10:01AM INF Starting llama-cpp-rpc-server on '127.0.0.1:46717'
{"level":"INFO","time":"2024-08-24T10:01:46.659+0200","caller":"node/node.go:118","message":" Starting EdgeVPN network"}
create_backend: using CPU backend
Starting RPC server on 127.0.0.1:46717, backend memory: 63988 MB
2024/08/24 10:01:46 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 7168 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.
{"level":"INFO","time":"2024-08-24T10:01:46.670+0200","caller":"node/node.go:172","message":" Node ID: 12D3KooWPUzGU7qvFkeJdhzBkBzFRUd4NaV16e6NPgvrE3ALkU9U"}
{"level":"INFO","time":"2024-08-24T10:01:46.670+0200","caller":"node/node.go:173","message":" Node Addresses: [/ip4/127.0.0.1/tcp/44211 /ip4/127.0.0.1/udp/40522/webrtc-direct/certhash/uEiAGrZ_CWqY3_m6ighXQVOaFQ-ZRbI8b_8hpaqOPxiWEIw /ip4/127.0.0.1/udp/43715/quic-v1/webtransport/certhash/uEiCnmkOorxDeME_56tbuMu6n3OgVmKbGadBWqsnqK-mgLQ/certhash/uEiDJyxhZuuqt4cjGWfWJefLfFJq7SdeHJ6cqu06jQ3hG0g /ip4/127.0.0.1/udp/45769/quic-v1 /ip4/192.168.68.123/tcp/44211 /ip4/192.168.68.123/udp/40522/webrtc-direct/certhash/uEiAGrZ_CWqY3_m6ighXQVOaFQ-ZRbI8b_8hpaqOPxiWEIw /ip4/192.168.68.123/udp/43715/quic-v1/webtransport/certhash/uEiCnmkOorxDeME_56tbuMu6n3OgVmKbGadBWqsnqK-mgLQ/certhash/uEiDJyxhZuuqt4cjGWfWJefLfFJq7SdeHJ6cqu06jQ3hG0g /ip4/192.168.68.123/udp/45769/quic-v1 /ip6/::1/tcp/46059 /ip6/::1/udp/36028/quic-v1/webtransport/certhash/uEiCnmkOorxDeME_56tbuMu6n3OgVmKbGadBWqsnqK-mgLQ/certhash/uEiDJyxhZuuqt4cjGWfWJefLfFJq7SdeHJ6cqu06jQ3hG0g /ip6/::1/udp/41877/webrtc-direct/certhash/uEiAGrZ_CWqY3_m6ighXQVOaFQ-ZRbI8b_8hpaqOPxiWEIw /ip6/::1/udp/44299/quic-v1]"}
{"level":"INFO","time":"2024-08-24T10:01:46.670+0200","caller":"discovery/dht.go:104","message":" Bootstrapping DHT"}

Are you sure that isn't crashing because you don't have enough memory to run two workers?

ManuXD32 · 2024-08-27T12:11:49Z

I see only a single worker being started:

TOKEN=xxx ./local-ai worker p2p-llama-cpp-rpc
10:01AM INF env file found, loading environment variables from file envFile=.env
10:01AM INF Setting logging to info
{"level":"INFO","time":"2024-08-24T10:01:46.659+0200","caller":"config/config.go:288","message":"connmanager disabled\n"}
{"level":"INFO","time":"2024-08-24T10:01:46.659+0200","caller":"config/config.go:292","message":" go-libp2p resource manager protection disabled"}
10:01AM INF Starting llama-cpp-rpc-server on '127.0.0.1:46717'
{"level":"INFO","time":"2024-08-24T10:01:46.659+0200","caller":"node/node.go:118","message":" Starting EdgeVPN network"}
create_backend: using CPU backend
Starting RPC server on 127.0.0.1:46717, backend memory: 63988 MB
2024/08/24 10:01:46 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 7168 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.
{"level":"INFO","time":"2024-08-24T10:01:46.670+0200","caller":"node/node.go:172","message":" Node ID: 12D3KooWPUzGU7qvFkeJdhzBkBzFRUd4NaV16e6NPgvrE3ALkU9U"}
{"level":"INFO","time":"2024-08-24T10:01:46.670+0200","caller":"node/node.go:173","message":" Node Addresses: [/ip4/127.0.0.1/tcp/44211 /ip4/127.0.0.1/udp/40522/webrtc-direct/certhash/uEiAGrZ_CWqY3_m6ighXQVOaFQ-ZRbI8b_8hpaqOPxiWEIw /ip4/127.0.0.1/udp/43715/quic-v1/webtransport/certhash/uEiCnmkOorxDeME_56tbuMu6n3OgVmKbGadBWqsnqK-mgLQ/certhash/uEiDJyxhZuuqt4cjGWfWJefLfFJq7SdeHJ6cqu06jQ3hG0g /ip4/127.0.0.1/udp/45769/quic-v1 /ip4/192.168.68.123/tcp/44211 /ip4/192.168.68.123/udp/40522/webrtc-direct/certhash/uEiAGrZ_CWqY3_m6ighXQVOaFQ-ZRbI8b_8hpaqOPxiWEIw /ip4/192.168.68.123/udp/43715/quic-v1/webtransport/certhash/uEiCnmkOorxDeME_56tbuMu6n3OgVmKbGadBWqsnqK-mgLQ/certhash/uEiDJyxhZuuqt4cjGWfWJefLfFJq7SdeHJ6cqu06jQ3hG0g /ip4/192.168.68.123/udp/45769/quic-v1 /ip6/::1/tcp/46059 /ip6/::1/udp/36028/quic-v1/webtransport/certhash/uEiCnmkOorxDeME_56tbuMu6n3OgVmKbGadBWqsnqK-mgLQ/certhash/uEiDJyxhZuuqt4cjGWfWJefLfFJq7SdeHJ6cqu06jQ3hG0g /ip6/::1/udp/41877/webrtc-direct/certhash/uEiAGrZ_CWqY3_m6ighXQVOaFQ-ZRbI8b_8hpaqOPxiWEIw /ip6/::1/udp/44299/quic-v1]"}
{"level":"INFO","time":"2024-08-24T10:01:46.670+0200","caller":"discovery/dht.go:104","message":" Bootstrapping DHT"}

Are you sure that isn't crashing because you don't have enough memory to run two workers?

I tried with some other models and yeah, that was exactly the problem. Sorry and thanks for your time!

ManuXD32 added bug Something isn't working unconfirmed labels Aug 23, 2024

mudler mentioned this issue Aug 24, 2024

fix(p2p): correctly allow to pass extra args to llama.cpp #3368

Merged

1 task

ManuXD32 closed this as completed Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed Inference tries to connect twice to the same client #3364

Distributed Inference tries to connect twice to the same client #3364

ManuXD32 commented Aug 23, 2024

mudler commented Aug 24, 2024

ManuXD32 commented Aug 27, 2024

Distributed Inference tries to connect twice to the same client #3364

Distributed Inference tries to connect twice to the same client #3364

Comments

ManuXD32 commented Aug 23, 2024

mudler commented Aug 24, 2024

ManuXD32 commented Aug 27, 2024