diff --git a/Dockerfile b/Dockerfile new file mode 100644 index 00000000..02cb78c4 --- /dev/null +++ b/Dockerfile @@ -0,0 +1,21 @@ +# Start with a rust alpine image +FROM rust:alpine3.17 as builder +# This is important, see https://github.com/rust-lang/docker-rust/issues/85 +ENV RUSTFLAGS="-C target-feature=-crt-static" +# if needed, add additional dependencies here +RUN apk add --no-cache musl-dev +# set the workdir and copy the source into it +WORKDIR /app +COPY ./ /app +# do a release build +RUN cargo build --release --bin llama-cli +RUN strip target/release/llama-cli + +# use a plain alpine image, the alpine version needs to match the builder +FROM alpine:3.17 +# if needed, install additional dependencies here +RUN apk add --no-cache libgcc +# copy the binary into the final image +COPY --from=builder /app/target/release/llama-cli . +# set the binary as entrypoint +ENTRYPOINT ["/llama-cli"] diff --git a/README.md b/README.md index 19344ba0..a9e6c04c 100644 --- a/README.md +++ b/README.md @@ -80,13 +80,12 @@ kinds of sources. After acquiring the weights, it is necessary to convert them into a format that is compatible with ggml. To achieve this, follow the steps outlined below: -> **Warning** -> +> **Warning** +> > To run the Python scripts, a Python version of 3.9 or 3.10 is required. 3.11 > is unsupported at the time of writing. - -``` shell +```shell # Convert the model to f16 ggml format python3 scripts/convert-pth-to-ggml.py /path/to/your/models/7B/ 1 @@ -95,7 +94,7 @@ python3 scripts/convert-pth-to-ggml.py /path/to/your/models/7B/ 1 ``` > **Note** -> +> > The [llama.cpp repository](https://github.com/ggerganov/llama.cpp) has > additional information on how to obtain and run specific models. With some > caveats: @@ -104,17 +103,15 @@ python3 scripts/convert-pth-to-ggml.py /path/to/your/models/7B/ 1 > (versioned) ggml formats, but not the mmap-ready version that was [recently > merged](https://github.com/ggerganov/llama.cpp/pull/613). - -*Support for other open source models is currently planned. For models where +_Support for other open source models is currently planned. For models where weights can be legally distributed, this section will be updated with scripts to make the install process as user-friendly as possible. Due to the model's legal requirements, this is currently not possible with LLaMA itself and a more -lengthy setup is required.* +lengthy setup is required._ - https://github.com/rustformers/llama-rs/pull/85 - https://github.com/rustformers/llama-rs/issues/75 - ### Running For example, try the following prompt: @@ -147,6 +144,19 @@ Some additional things to try: A modern-ish C toolchain is required to compile `ggml`. A C++ toolchain should not be necessary. +### Docker + +```shell +# To build (This will take some time, go grab some coffee): +docker build -t llama-rs . + +# To run with prompt: +docker run --rm --name llama-rs -it -v ${PWD}/data:/data -v ${PWD}/examples:/examples llama-rs -m data/gpt4all-lora-quantized-ggml.bin -p "Tell me how cool the Rust programming language is:" + +# To run with prompt file and repl (will wait for user input): +docker run --rm --name llama-rs -it -v ${PWD}/data:/data -v ${PWD}/examples:/examples llama-rs -m data/gpt4all-lora-quantized-ggml.bin -f examples/alpaca_prompt.txt --repl +``` + ## Q&A ### Why did you do this?