Move model download from the Docker image build step to the run script #19

edgar971 · 2023-08-18T02:51:41Z

Here's the PR that addresses issue #8 by removing the model download step from the Dockerfile for the API service. This PR updates the API service in the compose files to use a Docker Volume mounted at /models. Additionally, it enhances the run.sh script with a model download manager, which checks for model existence and downloads it if absent. As an optional improvement, this PR suggests exposing the model as an environment variable, reducing the need for multiple Dockerfiles. The run.sh model download manager can then utilize this environment variable to specify, optionally run, and launch the desired model.

…odel url. Delete unused Dockerfiles.

edicristofaro · 2023-08-18T03:42:08Z

@edgar971 that looks pretty good. if it gets merged, i can rebase onto that and add the GPU container things to that.

mayankchhabra · 2023-08-18T15:52:31Z

Awesome! Thanks for working on this @edgar971. I'm testing this now.

mayankchhabra

Great work @edgar971! I left some suggestions.

api/run.sh

api/Dockerfile

api/run.sh

docker-compose.yml

Co-authored-by: Mayank Chhabra <mayankchhabra9@gmail.com>

todaywasawesome · 2023-08-18T17:01:55Z

@edgar971 @mayankchhabra how well does the model preform if it's available over network vs locally? I'm wondering if it's worth mounting a network share with lots of models, or if storage should always be local to the node where it's running.

mayankchhabra · 2023-08-18T17:05:52Z

@todaywasawesome, I've not tried a setup like so I'm not sure. But as long as the model is locked into the RAM on start (currently enabled for 13B and 70B models in their docker-compose.yml files), then, in theory, only the initial loading time should be bandwidth-bound by the local network, and interference should have no effect in speed.

todaywasawesome · 2023-08-18T17:10:02Z

@mayankchhabra when it runs a model it always loads it all into memory? In that case network storage should be fine.

mayankchhabra · 2023-08-18T17:10:36Z

Yep!

edgar971 · 2023-08-18T17:19:36Z

The REST API is fast. I think llama.cpp has a streaming option instead of just waiting for the full response.

notocwilson · 2023-08-18T23:18:45Z

...I'm wondering if it's worth mounting a network share with lots of models, or if storage should always be local to the node where it's running.

@todaywasawesome FWIW, if you're looking to do this, one should be able to modify the compose script to redefine the model as a network share. I was planning on exposing mine over NFS once this change is merged, as my compute nodes don't have a lot of local storage.

edgar971 · 2023-08-18T23:50:16Z

...I'm wondering if it's worth mounting a network share with lots of models, or if storage should always be local to the node where it's running.

@todaywasawesome FWIW, if you're looking to do this, one should be able to modify the compose script to redefine the model as a network share. I was planning on exposing mine over NFS once this change is merged, as my compute nodes don't have a lot of local storage.

Are you able to mount the EFS to ./models and then use that as the docker compose mount?

todaywasawesome · 2023-08-19T00:10:02Z

@edgar971 For kubernetes, yeah we can just mount the external file share to that folder and done.

edgar971 · 2023-08-19T14:01:12Z

Perfect, any changes for the PR @mayankchhabra?

mayankchhabra · 2023-08-21T10:42:23Z

Awesome, just tested and everything looks good! Thanks again for your work on this @edgar971!

edgar971 added 4 commits August 18, 2023 02:23

Move model download to run.sh

2e8b9e2

Update comment

3e97ab2

Update 13b and 70b compose files to use Dockerfile, add volume, and m…

43a20ef

…odel url. Delete unused Dockerfiles.

Move make build command to Dockerfile

7900140

edgar971 mentioned this pull request Aug 18, 2023

NVIDIA GPU Support #11

Closed

mayankchhabra reviewed Aug 18, 2023

View reviewed changes

api/run.sh Show resolved Hide resolved

api/Dockerfile Outdated Show resolved Hide resolved

api/run.sh Show resolved Hide resolved

docker-compose.yml Outdated Show resolved Hide resolved

mayankchhabra mentioned this pull request Aug 18, 2023

Create basic K8s Config #18

Merged

edgar971 and others added 2 commits August 18, 2023 11:41

Update api/run.sh

8080305

Co-authored-by: Mayank Chhabra <mayankchhabra9@gmail.com>

Merge branch 'getumbrel:master' into master

a48e856

Use default llama-cpp-python image and refactor compose files

cc5d202

Merge branch 'getumbrel:master' into master

319a0c6

mayankchhabra merged commit 4dc2369 into getumbrel:master Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move model download from the Docker image build step to the run script #19

Move model download from the Docker image build step to the run script #19

edgar971 commented Aug 18, 2023

edicristofaro commented Aug 18, 2023

mayankchhabra commented Aug 18, 2023

mayankchhabra left a comment

todaywasawesome commented Aug 18, 2023

mayankchhabra commented Aug 18, 2023

todaywasawesome commented Aug 18, 2023

mayankchhabra commented Aug 18, 2023

edgar971 commented Aug 18, 2023

notocwilson commented Aug 18, 2023

edgar971 commented Aug 18, 2023

todaywasawesome commented Aug 19, 2023

edgar971 commented Aug 19, 2023

mayankchhabra commented Aug 21, 2023

Move model download from the Docker image build step to the run script #19

Move model download from the Docker image build step to the run script #19

Conversation

edgar971 commented Aug 18, 2023

edicristofaro commented Aug 18, 2023

mayankchhabra commented Aug 18, 2023

mayankchhabra left a comment

Choose a reason for hiding this comment

todaywasawesome commented Aug 18, 2023

mayankchhabra commented Aug 18, 2023

todaywasawesome commented Aug 18, 2023

mayankchhabra commented Aug 18, 2023

edgar971 commented Aug 18, 2023

notocwilson commented Aug 18, 2023

edgar971 commented Aug 18, 2023

todaywasawesome commented Aug 19, 2023

edgar971 commented Aug 19, 2023

mayankchhabra commented Aug 21, 2023