When using Ollama as the engine for LLM, restart the llama model every time? #1803

17Reset · 2024-03-28T06:28:25Z

If you are using Ollama alone, Ollama will load the model into the GPU, and you don't have to restart loading the model every time you call Ollama's api. But in privategpt, the model has to be reloaded every time a question is asked, which greatly increases the Q&A time.

dbzoo · 2024-04-06T20:20:04Z

I think this solves your problem #1800 the default is 5m. Increase it.

ollama:
  keep_alive: 30m

17Reset closed this as completed Apr 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When using Ollama as the engine for LLM, restart the llama model every time? #1803

When using Ollama as the engine for LLM, restart the llama model every time? #1803

17Reset commented Mar 28, 2024

dbzoo commented Apr 6, 2024

When using Ollama as the engine for LLM, restart the llama model every time? #1803

When using Ollama as the engine for LLM, restart the llama model every time? #1803

Comments

17Reset commented Mar 28, 2024

dbzoo commented Apr 6, 2024