Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed Inference tries to connect twice to the same client #3364

Closed
ManuXD32 opened this issue Aug 23, 2024 · 2 comments
Closed

Distributed Inference tries to connect twice to the same client #3364

ManuXD32 opened this issue Aug 23, 2024 · 2 comments
Labels
bug Something isn't working unconfirmed

Comments

@ManuXD32
Copy link

LocalAI version:
v2.20.1

Environment, CPU architecture, OS, and Version:
Ubuntu server, raspberry pi OS

Describe the bug
When trying to initiate inference through P2P, the workers get initiated twice and the main node tries to use twice the max ammount of memory available.

To Reproduce
Initiate the main node: ./local-ai-Linux-x86_64 run --p2p
Initiate worker1: ./local-ai-Linux-arm64 worker p2p-llama-cpp-rpc
Initiate worker2: ./local-ai-Linux-x86_64 worker p2p-llama-cpp-rpc
(Token stablished in the three of them)

Expected behavior
Use only the memory available.

Logs
Main node logs:

6:15PM INF Trying to load the model 'Meta-Llama-3.1-70B-Instruct.Q4_K_M.gguf' with the backend '[llama-cpp llama-ggml llama-cpp-fallback piper rwkv whisper huggingface bert-embeddings]'
6:15PM INF [llama-cpp] Attempting to load
6:15PM INF Loading model 'Meta-Llama-3.1-70B-Instruct.Q4_K_M.gguf' with backend llama-cpp
6:15PM INF [llama-cpp-grpc] attempting to load with GRPC variant
6:15PM INF Redirecting 127.0.0.1:40815 to /ip4/192.168.18.44/udp/50955/quic-v1
6:15PM INF Redirecting 127.0.0.1:39147 to /ip4/192.168.18.103/udp/50707/quic-v1
6:15PM INF Redirecting 127.0.0.1:40815 to /ip4/192.168.18.44/udp/50955/quic-v1
6:15PM INF Redirecting 127.0.0.1:39147 to /ip4/192.168.18.103/udp/50707/quic-v1
6:16PM INF Redirecting 127.0.0.1:40815 to /ip4/192.168.18.44/udp/50955/quic-v1
6:16PM INF [llama-cpp] Fails: could not load model: rpc error: code = Canceled desc =
6:16PM INF [llama-cpp] Autodetection failed, trying the fallback
6:16PM INF Loading model 'Meta-Llama-3.1-70B-Instruct.Q4_K_M.gguf' with backend llama-cpp-avx2

Worker1 logs:

6:15PM INF Setting logging to info
{"level":"INFO","time":"2024-08-23T18:15:12.365+0200","caller":"config/config.go:288","message":"connmanager disabled\n"}
6:15PM INF Starting llama-cpp-rpc-server on '127.0.0.1:46589'
{"level":"INFO","time":"2024-08-23T18:15:12.368+0200","caller":"config/config.go:292","message":" go-libp2p resource manager protection disabled"}
{"level":"INFO","time":"2024-08-23T18:15:12.373+0200","caller":"node/node.go:118","message":" Starting EdgeVPN network"}
create_backend: using CPU backend
Starting RPC server on 127.0.0.1:46589, backend memory: 7810 MB
2024/08/23 18:15:12 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 7168 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.
{"level":"INFO","time":"2024-08-23T18:15:12.580+0200","caller":"node/node.go:172","message":" Node ID: 12D3KooWJop5nYfGyoTD1sFXBhk5rDDSevDhXVBJRqXsd2otATkT"}
{"level":"INFO","time":"2024-08-23T18:15:12.581+0200","caller":"node/node.go:173","message":" Node Addresses: [/ip4/127.0.0.1/tcp/42603 /ip4/127.0.0.1/udp/38003/quic-v1 /ip4/127.0.0.1/udp/50707/quic-v1/webtransport/certhash/uEiB-CnmLhKE3dajiHaTCVUnMUNKuJdd_otIHVL0SxCMNuw/certhash/uEiDTFDCkNza6tRwGRAa18mMaeyT-gW5FP57RB7Si5ZGV1Q /ip4/127.0.0.1/udp/52519/webrtc-direct/certhash/uEiAYQZdGzXH8vs2krcAiHVRZvPEjnnKwYtBGUw0rAryi2Q /ip4/192.168.18.103/tcp/42603 /ip4/192.168.18.103/udp/38003/quic-v1 /ip4/192.168.18.103/udp/50707/quic-v1/webtransport/certhash/uEiB-CnmLhKE3dajiHaTCVUnMUNKuJdd_otIHVL0SxCMNuw/certhash/uEiDTFDCkNza6tRwGRAa18mMaeyT-gW5FP57RB7Si5ZGV1Q /ip4/192.168.18.103/udp/52519/webrtc-direct/certhash/uEiAYQZdGzXH8vs2krcAiHVRZvPEjnnKwYtBGUw0rAryi2Q /ip6/::1/tcp/44529 /ip6/::1/udp/41447/quic-v1 /ip6/::1/udp/44894/quic-v1/webtransport/certhash/uEiB-CnmLhKE3dajiHaTCVUnMUNKuJdd_otIHVL0SxCMNuw/certhash/uEiDTFDCkNza6tRwGRAa18mMaeyT-gW5FP57RB7Si5ZGV1Q /ip6/::1/udp/52411/webrtc-direct/certhash/uEiAYQZdGzXH8vs2krcAiHVRZvPEjnnKwYtBGUw0rAryi2Q]"}
{"level":"INFO","time":"2024-08-23T18:15:12.586+0200","caller":"discovery/dht.go:104","message":" Bootstrapping DHT"}
{"level":"WARN","time":"2024-08-23T18:15:12.591+0200","caller":"node/connection.go:226","message":"publish error: no message room available\n"}
{"level":"WARN","time":"2024-08-23T18:15:12.593+0200","caller":"node/connection.go:226","message":"publish error: no message room available\n"}
Accepted client connection, free_mem=8189411328, total_mem=8189411328
Client connection closed
Accepted client connection, free_mem=8189411328, total_mem=8189411328
Client connection closed

Worker2 logs:

4:15PM INF Setting logging to info {"level":"INFO","time":"2024-08-23T16:15:05.787Z","caller":"config/config.go:288","message":"connmanager disabled\n"} 4:15PM INF Starting llama-cpp-rpc-server on '127.0.0.1:39571'{"level":"INFO","time":"2024-08-23T16:15:05.787Z","caller":"config/config.go:292","message":" go-libp2p resource manager protection disabled"} {"level":"INFO","time":"2024-08-23T16:15:05.788Z","caller":"node/node.go:118","message":" Starting EdgeVPN network"} create_backend: using CPU backend Starting RPC server on 127.0.0.1:39571, backend memory: 11854 MB 2024/08/23 16:15:05 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 7168 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details. {"level":"INFO","time":"2024-08-23T16:15:05.805Z","caller":"node/node.go:172","message":" Node ID: 12D3KooWPDeW43Br3JLzghRS6iZ8zt7czRnYmjtystNhggiwLVR1"} {"level":"INFO","time":"2024-08-23T16:15:05.805Z","caller":"node/node.go:173","message":" Node Addresses: [/ip4/127.0.0.1/tcp/35197 /ip4/127.0.0.1/udp/34605/quic-v1/webtransport/certhash/uEiAzVFqmJ6DXPuTPEI27rSYGhhvItfVf61PkHnfrRaIQkw/certhash/uEiBcxxt249su8zlNVJzkJibG80o5MVnRnTFVAcUsl4t3aA /ip4/127.0.0.1/udp/36434/webrtc-direct/certhash/uEiBKiZJ4GK-ATY0rWZvpFoJBoH-wFYhtz3hnFLWsI2Cljg /ip4/127.0.0.1/udp/50955/quic-v1 /ip4/192.168.18.44/tcp/35197 /ip4/192.168.18.44/udp/34605/quic-v1/webtransport/certhash/uEiAzVFqmJ6DXPuTPEI27rSYGhhvItfVf61PkHnfrRaIQkw/certhash/uEiBcxxt249su8zlNVJzkJibG80o5MVnRnTFVAcUsl4t3aA /ip4/192.168.18.44/udp/36434/webrtc-direct/certhash/uEiBKiZJ4GK-ATY0rWZvpFoJBoH-wFYhtz3hnFLWsI2Cljg /ip4/192.168.18.44/udp/50955/quic-v1 /ip6/::1/tcp/40095 /ip6/::1/udp/43292/webrtc-direct/certhash/uEiBKiZJ4GK-ATY0rWZvpFoJBoH-wFYhtz3hnFLWsI2Cljg /ip6/::1/udp/56079/quic-v1 /ip6/::1/udp/58500/quic-v1/webtransport/certhash/uEiAzVFqmJ6DXPuTPEI27rSYGhhvItfVf61PkHnfrRaIQkw/certhash/uEiBcxxt249su8zlNVJzkJibG80o5MVnRnTFVAcUsl4t3aA]"}
{"level":"INFO","time":"2024-08-23T16:15:05.806Z","caller":"discovery/dht.go:104","message":" Bootstrapping DHT"}
Accepted client connection, free_mem=12430540800, total_mem=12430540800
Client connection closed
Accepted client connection, free_mem=12430540800, total_mem=12430540800
Client connection closed
Accepted client connection, free_mem=12430540800, total_mem=12430540800
ggml_backend_cpu_buffer_type_alloc_buffer: failed to allocate buffer of size 24512430112
Client connection closed

Additional context
I tried a couple of times, when the worker1 was the first used, it showed the same error as worker2 but with the double of its corresponding total memory.

It would be nice if we could set a maximum of memory per worker, either from UI or command line

@ManuXD32 ManuXD32 added bug Something isn't working unconfirmed labels Aug 23, 2024
@mudler
Copy link
Owner

mudler commented Aug 24, 2024

I see only a single worker being started:

TOKEN=xxx ./local-ai worker p2p-llama-cpp-rpc
10:01AM INF env file found, loading environment variables from file envFile=.env
10:01AM INF Setting logging to info
{"level":"INFO","time":"2024-08-24T10:01:46.659+0200","caller":"config/config.go:288","message":"connmanager disabled\n"}
{"level":"INFO","time":"2024-08-24T10:01:46.659+0200","caller":"config/config.go:292","message":" go-libp2p resource manager protection disabled"}
10:01AM INF Starting llama-cpp-rpc-server on '127.0.0.1:46717'
{"level":"INFO","time":"2024-08-24T10:01:46.659+0200","caller":"node/node.go:118","message":" Starting EdgeVPN network"}
create_backend: using CPU backend
Starting RPC server on 127.0.0.1:46717, backend memory: 63988 MB
2024/08/24 10:01:46 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 7168 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.
{"level":"INFO","time":"2024-08-24T10:01:46.670+0200","caller":"node/node.go:172","message":" Node ID: 12D3KooWPUzGU7qvFkeJdhzBkBzFRUd4NaV16e6NPgvrE3ALkU9U"}
{"level":"INFO","time":"2024-08-24T10:01:46.670+0200","caller":"node/node.go:173","message":" Node Addresses: [/ip4/127.0.0.1/tcp/44211 /ip4/127.0.0.1/udp/40522/webrtc-direct/certhash/uEiAGrZ_CWqY3_m6ighXQVOaFQ-ZRbI8b_8hpaqOPxiWEIw /ip4/127.0.0.1/udp/43715/quic-v1/webtransport/certhash/uEiCnmkOorxDeME_56tbuMu6n3OgVmKbGadBWqsnqK-mgLQ/certhash/uEiDJyxhZuuqt4cjGWfWJefLfFJq7SdeHJ6cqu06jQ3hG0g /ip4/127.0.0.1/udp/45769/quic-v1 /ip4/192.168.68.123/tcp/44211 /ip4/192.168.68.123/udp/40522/webrtc-direct/certhash/uEiAGrZ_CWqY3_m6ighXQVOaFQ-ZRbI8b_8hpaqOPxiWEIw /ip4/192.168.68.123/udp/43715/quic-v1/webtransport/certhash/uEiCnmkOorxDeME_56tbuMu6n3OgVmKbGadBWqsnqK-mgLQ/certhash/uEiDJyxhZuuqt4cjGWfWJefLfFJq7SdeHJ6cqu06jQ3hG0g /ip4/192.168.68.123/udp/45769/quic-v1 /ip6/::1/tcp/46059 /ip6/::1/udp/36028/quic-v1/webtransport/certhash/uEiCnmkOorxDeME_56tbuMu6n3OgVmKbGadBWqsnqK-mgLQ/certhash/uEiDJyxhZuuqt4cjGWfWJefLfFJq7SdeHJ6cqu06jQ3hG0g /ip6/::1/udp/41877/webrtc-direct/certhash/uEiAGrZ_CWqY3_m6ighXQVOaFQ-ZRbI8b_8hpaqOPxiWEIw /ip6/::1/udp/44299/quic-v1]"}
{"level":"INFO","time":"2024-08-24T10:01:46.670+0200","caller":"discovery/dht.go:104","message":" Bootstrapping DHT"}

Are you sure that isn't crashing because you don't have enough memory to run two workers?

@ManuXD32
Copy link
Author

I see only a single worker being started:

TOKEN=xxx ./local-ai worker p2p-llama-cpp-rpc
10:01AM INF env file found, loading environment variables from file envFile=.env
10:01AM INF Setting logging to info
{"level":"INFO","time":"2024-08-24T10:01:46.659+0200","caller":"config/config.go:288","message":"connmanager disabled\n"}
{"level":"INFO","time":"2024-08-24T10:01:46.659+0200","caller":"config/config.go:292","message":" go-libp2p resource manager protection disabled"}
10:01AM INF Starting llama-cpp-rpc-server on '127.0.0.1:46717'
{"level":"INFO","time":"2024-08-24T10:01:46.659+0200","caller":"node/node.go:118","message":" Starting EdgeVPN network"}
create_backend: using CPU backend
Starting RPC server on 127.0.0.1:46717, backend memory: 63988 MB
2024/08/24 10:01:46 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 7168 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.
{"level":"INFO","time":"2024-08-24T10:01:46.670+0200","caller":"node/node.go:172","message":" Node ID: 12D3KooWPUzGU7qvFkeJdhzBkBzFRUd4NaV16e6NPgvrE3ALkU9U"}
{"level":"INFO","time":"2024-08-24T10:01:46.670+0200","caller":"node/node.go:173","message":" Node Addresses: [/ip4/127.0.0.1/tcp/44211 /ip4/127.0.0.1/udp/40522/webrtc-direct/certhash/uEiAGrZ_CWqY3_m6ighXQVOaFQ-ZRbI8b_8hpaqOPxiWEIw /ip4/127.0.0.1/udp/43715/quic-v1/webtransport/certhash/uEiCnmkOorxDeME_56tbuMu6n3OgVmKbGadBWqsnqK-mgLQ/certhash/uEiDJyxhZuuqt4cjGWfWJefLfFJq7SdeHJ6cqu06jQ3hG0g /ip4/127.0.0.1/udp/45769/quic-v1 /ip4/192.168.68.123/tcp/44211 /ip4/192.168.68.123/udp/40522/webrtc-direct/certhash/uEiAGrZ_CWqY3_m6ighXQVOaFQ-ZRbI8b_8hpaqOPxiWEIw /ip4/192.168.68.123/udp/43715/quic-v1/webtransport/certhash/uEiCnmkOorxDeME_56tbuMu6n3OgVmKbGadBWqsnqK-mgLQ/certhash/uEiDJyxhZuuqt4cjGWfWJefLfFJq7SdeHJ6cqu06jQ3hG0g /ip4/192.168.68.123/udp/45769/quic-v1 /ip6/::1/tcp/46059 /ip6/::1/udp/36028/quic-v1/webtransport/certhash/uEiCnmkOorxDeME_56tbuMu6n3OgVmKbGadBWqsnqK-mgLQ/certhash/uEiDJyxhZuuqt4cjGWfWJefLfFJq7SdeHJ6cqu06jQ3hG0g /ip6/::1/udp/41877/webrtc-direct/certhash/uEiAGrZ_CWqY3_m6ighXQVOaFQ-ZRbI8b_8hpaqOPxiWEIw /ip6/::1/udp/44299/quic-v1]"}
{"level":"INFO","time":"2024-08-24T10:01:46.670+0200","caller":"discovery/dht.go:104","message":" Bootstrapping DHT"}

Are you sure that isn't crashing because you don't have enough memory to run two workers?

I tried with some other models and yeah, that was exactly the problem. Sorry and thanks for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working unconfirmed
Projects
None yet
Development

No branches or pull requests

2 participants