Ollama QOL settings #1800

Robinsane · 2024-03-27T16:19:13Z

ollama settings: ability to keep LLM in memory for a longer time + ability to run ollama embedding on another instance

We've got a butter smooth production setup as of right now by doing the following things:

Run the embedding on a separate Ollama instance (docker container)
By doing this we avoid waiting times needed where Ollama swaps the LLM in (V)RAM to the embedding model in VRAM and back
By explicitely stating with each request we want the used model to stay in (V)RAM for another 6 hours
By default Ollama makes a model leave (V)RAM after 5 minutes of not being used. This caused long wait times to reload the LLM after > 5 minutes (running a 20 GB quant at the moment)

(3. ingest_mode: pipeline)

I hope this PR can make others as happy as I am right now ;)

…ility to run ollama embedding on another instance

Robinsane · 2024-03-27T16:51:21Z

If anyone feels like fixing failing mypy tests for private_gpt/components/llm/custom/ollama.py feel free

I got some errors trying to import stuff only for annotations for mypy...
I feel like it's not really necessary since you can just look at the Ollama superclass to understand it all, right?

dbzoo

An alternative implementation could wrap the methods instead of creating a new subclass. This might not be technically correct but you get the general idea.

from typing import Callable

def add_keep_alive(func: Callable) -> Callable:
    def wrapper(*args, **kwargs):
        # Adding the keep_alive='5m' keyword argument
        kwargs['keep_alive'] = '5m'
        # Calling the original function with the updated kwargs
        return func(*args, **kwargs)
    return wrapper

self.llm.chat = add_keep_alive(self.llm.chat)
self.llm.stream_chat = add_keep_alive(self.llm.stream_chat)
self.llm.complete = add_keep_alive(self.llm.complete)
self.llm.stream_complete = add_keep_alive(llm.stream_complete)

…f keep_alive differs from default

Robinsane · 2024-03-28T07:29:59Z

Thanks for the suggestion @dbzoo

extra:
If the default keep_alive is left unchanged, I don't wrap, leaving the requests just like they used to be :)

dbzoo · 2024-03-28T12:10:25Z

Thanks for the suggestion @dbzoo

extra: If the default keep_alive is left unchanged, I don't wrap, leaving the requests just like they used to be :)

Thanks for taking that suggestion to heart. The code looks better for it, too. Nice job.

imartinez

Really powerful contribution!

zylon-ai#1800)

Robinsane added 5 commits March 27, 2024 10:36

ollama settings: ability to keep LLM in memory for a longer time + ab…

2123b82

…ility to run ollama embedding on another instance

actually do something with keep_alive parameter

e0533a4

black formatting

05ff156

ruff formatting

48c0823

also ruff formatting

a8fd51d

Robinsane mentioned this pull request Mar 27, 2024

Creating embeddings with ollama extremely slow #1787

Open

dbzoo reviewed Mar 27, 2024

View reviewed changes

Robinsane marked this pull request as draft March 28, 2024 06:02

Robinsane force-pushed the ollama_QOL_settings branch from 61970f0 to a8fd51d Compare March 28, 2024 06:51

keep_alive classmethod wrappers instead of custom class + only wrap i…

437f921

…f keep_alive differs from default

Robinsane force-pushed the ollama_QOL_settings branch from 0a28a79 to 437f921 Compare March 28, 2024 07:15

mypy fix attempt

ac144fd

Robinsane marked this pull request as ready for review March 28, 2024 07:28

imartinez approved these changes Apr 2, 2024

View reviewed changes

imartinez merged commit b3b0140 into zylon-ai:main Apr 2, 2024
6 checks passed

github-actions bot mentioned this pull request Apr 2, 2024

chore(main): release 0.5.0 #1708

Merged

dbzoo mentioned this pull request Apr 6, 2024

When using Ollama as the engine for LLM, restart the llama model every time? #1803

Closed

mrepetto-certx pushed a commit to mrepetto-certx/privateGPT that referenced this pull request Apr 18, 2024

feat(llm): Ollama LLM-Embeddings decouple + longer keep_alive settings (

9e4c988

zylon-ai#1800)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ollama QOL settings #1800

Ollama QOL settings #1800

Robinsane commented Mar 27, 2024

Robinsane commented Mar 27, 2024

dbzoo left a comment •

edited

Loading

Robinsane commented Mar 28, 2024

dbzoo commented Mar 28, 2024

imartinez left a comment

Ollama QOL settings #1800

Ollama QOL settings #1800

Conversation

Robinsane commented Mar 27, 2024

Robinsane commented Mar 27, 2024

dbzoo left a comment • edited Loading

Choose a reason for hiding this comment

Robinsane commented Mar 28, 2024

dbzoo commented Mar 28, 2024

imartinez left a comment

Choose a reason for hiding this comment

dbzoo left a comment •

edited

Loading