Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dockerfile + build workflow #73

Merged
merged 5 commits into from
May 2, 2023
Merged

Add Dockerfile + build workflow #73

merged 5 commits into from
May 2, 2023

Conversation

Niek
Copy link
Contributor

@Niek Niek commented Apr 12, 2023

Fixes #70

This PR adds a Dockerfile and updates the release workflow to build the latest Docker image too. Both amd64 and arm64 arches are built.

@abetlen
Copy link
Owner

abetlen commented Apr 12, 2023

@Niek do you mind moving this to the build release workflow?

@Niek
Copy link
Contributor Author

Niek commented Apr 12, 2023

@abetlen are you referring to build-and-release.yml? If we move the Docker step to this action, it can't use pip install though, it will have to download the artifacts and use that - not sure if this is what you intend.

@jmtatsch
Copy link

Maybe we should directly add openblas support?
would need those two lines:

RUN apt update && apt install -y libopenblas-dev
RUN LLAMA_OPENBLAS=1 pip install llama-cpp-python[server]

@Niek
Copy link
Contributor Author

Niek commented Apr 15, 2023

Good idea @jmtatsch - added now

Dockerfile Outdated Show resolved Hide resolved
@jmtatsch
Copy link

jmtatsch commented Apr 21, 2023

Here is a docker file for a cublas capable container that should bring huge speed ups for cuda gpu owners after the next sync with upstream:

FROM nvidia/cuda:12.1.0-devel-ubuntu22.04

EXPOSE 8000
ENV MODEL=/models/ggml-vicuna-13b-1.1-q4_0.bin
# allow non local connections to api
ENV HOST=0.0.0.0

RUN apt update && apt install -y python3 python3-pip && LLAMA_CUBLAS=1 pip install llama-cpp-python[server]

ENTRYPOINT [ "python3", "-m", "llama_cpp.server" ]

@gjmulder
Copy link
Contributor

Here is a docker file for a cublas capable container that should bring huge speed ups for cuda gpu owners after the next sync with upstream:

@jmtatsch where is requirements.txt coming from?

@jmtatsch
Copy link

jmtatsch commented Apr 22, 2023

@jmtatsch where is requirements.txt coming from?

good catch, it isn't necessary at all. I cleaned it up above.
In 0.1.36 CUBLA is broken anyhow for me, waiting for ggerganov/llama.cpp#1128

@Niek
Copy link
Contributor Author

Niek commented Apr 24, 2023

@abetlen do you need any other changes?

@abetlen
Copy link
Owner

abetlen commented Apr 24, 2023

@Niek if possible can we include @jmtatsch nvidia-docker container example as well in this PR? Ability to docker pull and run a GPU-accelerated container would be very helpful.

@jmtatsch
Copy link

@abetlen We should make this two different containers then because the nvidia container with cublas is quite fat and not everyone has a Nvidia card.
I will make a pull request once this one is merged.
Sorry for hijacking your pull request @Niek

@abetlen abetlen mentioned this pull request May 2, 2023
@abetlen abetlen merged commit 8476b32 into abetlen:main May 2, 2023
@abetlen
Copy link
Owner

abetlen commented May 2, 2023

@Niek finally got a chance to merge this, great work! We now have a docker image.

@jmtatsch if you're still interested it would be awesome to get that cuBLAS-based image, happy to help there also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problems when i try to use this inside the default python 3.10 docker container
4 participants