-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed Inference tries to connect twice to the same client #3364
Labels
Comments
I see only a single worker being started:
Are you sure that isn't crashing because you don't have enough memory to run two workers? |
1 task
I tried with some other models and yeah, that was exactly the problem. Sorry and thanks for your time! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
LocalAI version:
v2.20.1
Environment, CPU architecture, OS, and Version:
Ubuntu server, raspberry pi OS
Describe the bug
When trying to initiate inference through P2P, the workers get initiated twice and the main node tries to use twice the max ammount of memory available.
To Reproduce
Initiate the main node: ./local-ai-Linux-x86_64 run --p2p
Initiate worker1: ./local-ai-Linux-arm64 worker p2p-llama-cpp-rpc
Initiate worker2: ./local-ai-Linux-x86_64 worker p2p-llama-cpp-rpc
(Token stablished in the three of them)
Expected behavior
Use only the memory available.
Logs
Main node logs:
6:15PM INF Trying to load the model 'Meta-Llama-3.1-70B-Instruct.Q4_K_M.gguf' with the backend '[llama-cpp llama-ggml llama-cpp-fallback piper rwkv whisper huggingface bert-embeddings]'
6:15PM INF [llama-cpp] Attempting to load
6:15PM INF Loading model 'Meta-Llama-3.1-70B-Instruct.Q4_K_M.gguf' with backend llama-cpp
6:15PM INF [llama-cpp-grpc] attempting to load with GRPC variant
6:15PM INF Redirecting 127.0.0.1:40815 to /ip4/192.168.18.44/udp/50955/quic-v1
6:15PM INF Redirecting 127.0.0.1:39147 to /ip4/192.168.18.103/udp/50707/quic-v1
6:15PM INF Redirecting 127.0.0.1:40815 to /ip4/192.168.18.44/udp/50955/quic-v1
6:15PM INF Redirecting 127.0.0.1:39147 to /ip4/192.168.18.103/udp/50707/quic-v1
6:16PM INF Redirecting 127.0.0.1:40815 to /ip4/192.168.18.44/udp/50955/quic-v1
6:16PM INF [llama-cpp] Fails: could not load model: rpc error: code = Canceled desc =
6:16PM INF [llama-cpp] Autodetection failed, trying the fallback
6:16PM INF Loading model 'Meta-Llama-3.1-70B-Instruct.Q4_K_M.gguf' with backend llama-cpp-avx2
Worker1 logs:
6:15PM INF Setting logging to info
{"level":"INFO","time":"2024-08-23T18:15:12.365+0200","caller":"config/config.go:288","message":"connmanager disabled\n"}
6:15PM INF Starting llama-cpp-rpc-server on '127.0.0.1:46589'
{"level":"INFO","time":"2024-08-23T18:15:12.368+0200","caller":"config/config.go:292","message":" go-libp2p resource manager protection disabled"}
{"level":"INFO","time":"2024-08-23T18:15:12.373+0200","caller":"node/node.go:118","message":" Starting EdgeVPN network"}
create_backend: using CPU backend
Starting RPC server on 127.0.0.1:46589, backend memory: 7810 MB
2024/08/23 18:15:12 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 7168 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.
{"level":"INFO","time":"2024-08-23T18:15:12.580+0200","caller":"node/node.go:172","message":" Node ID: 12D3KooWJop5nYfGyoTD1sFXBhk5rDDSevDhXVBJRqXsd2otATkT"}
{"level":"INFO","time":"2024-08-23T18:15:12.581+0200","caller":"node/node.go:173","message":" Node Addresses: [/ip4/127.0.0.1/tcp/42603 /ip4/127.0.0.1/udp/38003/quic-v1 /ip4/127.0.0.1/udp/50707/quic-v1/webtransport/certhash/uEiB-CnmLhKE3dajiHaTCVUnMUNKuJdd_otIHVL0SxCMNuw/certhash/uEiDTFDCkNza6tRwGRAa18mMaeyT-gW5FP57RB7Si5ZGV1Q /ip4/127.0.0.1/udp/52519/webrtc-direct/certhash/uEiAYQZdGzXH8vs2krcAiHVRZvPEjnnKwYtBGUw0rAryi2Q /ip4/192.168.18.103/tcp/42603 /ip4/192.168.18.103/udp/38003/quic-v1 /ip4/192.168.18.103/udp/50707/quic-v1/webtransport/certhash/uEiB-CnmLhKE3dajiHaTCVUnMUNKuJdd_otIHVL0SxCMNuw/certhash/uEiDTFDCkNza6tRwGRAa18mMaeyT-gW5FP57RB7Si5ZGV1Q /ip4/192.168.18.103/udp/52519/webrtc-direct/certhash/uEiAYQZdGzXH8vs2krcAiHVRZvPEjnnKwYtBGUw0rAryi2Q /ip6/::1/tcp/44529 /ip6/::1/udp/41447/quic-v1 /ip6/::1/udp/44894/quic-v1/webtransport/certhash/uEiB-CnmLhKE3dajiHaTCVUnMUNKuJdd_otIHVL0SxCMNuw/certhash/uEiDTFDCkNza6tRwGRAa18mMaeyT-gW5FP57RB7Si5ZGV1Q /ip6/::1/udp/52411/webrtc-direct/certhash/uEiAYQZdGzXH8vs2krcAiHVRZvPEjnnKwYtBGUw0rAryi2Q]"}
{"level":"INFO","time":"2024-08-23T18:15:12.586+0200","caller":"discovery/dht.go:104","message":" Bootstrapping DHT"}
{"level":"WARN","time":"2024-08-23T18:15:12.591+0200","caller":"node/connection.go:226","message":"publish error: no message room available\n"}
{"level":"WARN","time":"2024-08-23T18:15:12.593+0200","caller":"node/connection.go:226","message":"publish error: no message room available\n"}
Accepted client connection, free_mem=8189411328, total_mem=8189411328
Client connection closed
Accepted client connection, free_mem=8189411328, total_mem=8189411328
Client connection closed
Worker2 logs:
4:15PM INF Setting logging to info {"level":"INFO","time":"2024-08-23T16:15:05.787Z","caller":"config/config.go:288","message":"connmanager disabled\n"} 4:15PM INF Starting llama-cpp-rpc-server on '127.0.0.1:39571'{"level":"INFO","time":"2024-08-23T16:15:05.787Z","caller":"config/config.go:292","message":" go-libp2p resource manager protection disabled"} {"level":"INFO","time":"2024-08-23T16:15:05.788Z","caller":"node/node.go:118","message":" Starting EdgeVPN network"} create_backend: using CPU backend Starting RPC server on 127.0.0.1:39571, backend memory: 11854 MB 2024/08/23 16:15:05 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 7168 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details. {"level":"INFO","time":"2024-08-23T16:15:05.805Z","caller":"node/node.go:172","message":" Node ID: 12D3KooWPDeW43Br3JLzghRS6iZ8zt7czRnYmjtystNhggiwLVR1"} {"level":"INFO","time":"2024-08-23T16:15:05.805Z","caller":"node/node.go:173","message":" Node Addresses: [/ip4/127.0.0.1/tcp/35197 /ip4/127.0.0.1/udp/34605/quic-v1/webtransport/certhash/uEiAzVFqmJ6DXPuTPEI27rSYGhhvItfVf61PkHnfrRaIQkw/certhash/uEiBcxxt249su8zlNVJzkJibG80o5MVnRnTFVAcUsl4t3aA /ip4/127.0.0.1/udp/36434/webrtc-direct/certhash/uEiBKiZJ4GK-ATY0rWZvpFoJBoH-wFYhtz3hnFLWsI2Cljg /ip4/127.0.0.1/udp/50955/quic-v1 /ip4/192.168.18.44/tcp/35197 /ip4/192.168.18.44/udp/34605/quic-v1/webtransport/certhash/uEiAzVFqmJ6DXPuTPEI27rSYGhhvItfVf61PkHnfrRaIQkw/certhash/uEiBcxxt249su8zlNVJzkJibG80o5MVnRnTFVAcUsl4t3aA /ip4/192.168.18.44/udp/36434/webrtc-direct/certhash/uEiBKiZJ4GK-ATY0rWZvpFoJBoH-wFYhtz3hnFLWsI2Cljg /ip4/192.168.18.44/udp/50955/quic-v1 /ip6/::1/tcp/40095 /ip6/::1/udp/43292/webrtc-direct/certhash/uEiBKiZJ4GK-ATY0rWZvpFoJBoH-wFYhtz3hnFLWsI2Cljg /ip6/::1/udp/56079/quic-v1 /ip6/::1/udp/58500/quic-v1/webtransport/certhash/uEiAzVFqmJ6DXPuTPEI27rSYGhhvItfVf61PkHnfrRaIQkw/certhash/uEiBcxxt249su8zlNVJzkJibG80o5MVnRnTFVAcUsl4t3aA]"}
{"level":"INFO","time":"2024-08-23T16:15:05.806Z","caller":"discovery/dht.go:104","message":" Bootstrapping DHT"}
Accepted client connection, free_mem=12430540800, total_mem=12430540800
Client connection closed
Accepted client connection, free_mem=12430540800, total_mem=12430540800
Client connection closed
Accepted client connection, free_mem=12430540800, total_mem=12430540800
ggml_backend_cpu_buffer_type_alloc_buffer: failed to allocate buffer of size 24512430112
Client connection closed
Additional context
I tried a couple of times, when the worker1 was the first used, it showed the same error as worker2 but with the double of its corresponding total memory.
It would be nice if we could set a maximum of memory per worker, either from UI or command line
The text was updated successfully, but these errors were encountered: