Update to latest llama.cpp #118

nsarrazin · 2023-03-30T19:45:30Z

The new version has breaking changes that require a conversion script. This PR adds the conversion script and updates the version used in the dockerfile.

… version

gaby

LGTM, just two comments

gaby · 2023-03-31T03:52:20Z

api/src/serge/utils/migrate.py

@@ -0,0 +1,305 @@
+# Migrate ggml file(s) with ggmf magic to ggml file with ggjt magic


Can't we just CP this file in the Dockerfile after the git clone?

The file is different, I modified the original script so it could be called as a function, and the way it shuffles files around also is different than the original so it works better with serge. I hope the content doesn't change too often 😅

gaby · 2023-03-31T03:52:44Z

api/src/serge/routers/chat.py

@@ -124,7 +124,7 @@ async def event_generator():
                prompt=full_prompt,
                params=chat.parameters,
            ):
-                await asyncio.sleep(0.1)
+                await asyncio.sleep(0.01)


What's the the purpose of these sleeps?

So we generate a token once every 100ms at best for good machines, so there's no point in checking the output buffer of the program more than that. The sleep was here to prevent the infinite loop from locking up resources by running constantly.

What I realized was that we have a chunk size of 4 bytes and we check the buffer every 0.1s. So we were fetching at most (1/0.1)*4 = 40bytes a second. Usually that's enough but when we load the initial prompt it can go a lot faster than that and we were slowing things down for no reason there. It was bad design from my side :/

The symptom of that was that you would see the CPU activity decrease but it would still take a while for the answer to appear in the chat. The answer was fully generated but was just being read slowly from the output buffer. 🤦 Now with a sleep timer of 0.01 and a chunk size of 64, I don't expect we'll have a problem haha

add script to migrate weights and update dockerfile to use the latest…

a6567ae

… version

gaby requested changes Mar 31, 2023

View reviewed changes

nsarrazin added 2 commits March 31, 2023 20:12

bump version of llama.cpp to latest

9d6dbb2

fixed conversion bug when downloading models

acc40ec

nsarrazin merged commit b806c5a into main Mar 31, 2023

nsarrazin deleted the feat/update_to_latest_llama_cpp branch March 31, 2023 18:42

johncadengo mentioned this pull request Apr 3, 2023

too slow? #65

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to latest llama.cpp #118

Update to latest llama.cpp #118

nsarrazin commented Mar 30, 2023

gaby left a comment

gaby Mar 31, 2023

nsarrazin Mar 31, 2023

gaby Mar 31, 2023

nsarrazin Mar 31, 2023

		@@ -0,0 +1,305 @@
		# Migrate ggml file(s) with ggmf magic to ggml file with ggjt magic

Update to latest llama.cpp #118

Update to latest llama.cpp #118

Conversation

nsarrazin commented Mar 30, 2023

gaby left a comment

Choose a reason for hiding this comment

gaby Mar 31, 2023

Choose a reason for hiding this comment

nsarrazin Mar 31, 2023

Choose a reason for hiding this comment

gaby Mar 31, 2023

Choose a reason for hiding this comment

nsarrazin Mar 31, 2023

Choose a reason for hiding this comment