Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Add Docker file #112

Merged
merged 3 commits into from
Apr 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Start with a rust alpine image
FROM rust:alpine3.17 as builder
# This is important, see https://github.com/rust-lang/docker-rust/issues/85
ENV RUSTFLAGS="-C target-feature=-crt-static"
# if needed, add additional dependencies here
RUN apk add --no-cache musl-dev
# set the workdir and copy the source into it
WORKDIR /app
COPY ./ /app
# do a release build
RUN cargo build --release --bin llama-cli
RUN strip target/release/llama-cli

# use a plain alpine image, the alpine version needs to match the builder
FROM alpine:3.17
# if needed, install additional dependencies here
RUN apk add --no-cache libgcc
# copy the binary into the final image
COPY --from=builder /app/target/release/llama-cli .
# set the binary as entrypoint
ENTRYPOINT ["/llama-cli"]
28 changes: 19 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,13 +80,12 @@ kinds of sources.
After acquiring the weights, it is necessary to convert them into a format that
is compatible with ggml. To achieve this, follow the steps outlined below:

> **Warning**
>
> **Warning**
>
> To run the Python scripts, a Python version of 3.9 or 3.10 is required. 3.11
> is unsupported at the time of writing.


``` shell
```shell
# Convert the model to f16 ggml format
python3 scripts/convert-pth-to-ggml.py /path/to/your/models/7B/ 1

Expand All @@ -95,7 +94,7 @@ python3 scripts/convert-pth-to-ggml.py /path/to/your/models/7B/ 1
```

> **Note**
>
>
> The [llama.cpp repository](https://github.com/ggerganov/llama.cpp) has
> additional information on how to obtain and run specific models. With some
> caveats:
Expand All @@ -104,17 +103,15 @@ python3 scripts/convert-pth-to-ggml.py /path/to/your/models/7B/ 1
> (versioned) ggml formats, but not the mmap-ready version that was [recently
> merged](https://github.com/ggerganov/llama.cpp/pull/613).


*Support for other open source models is currently planned. For models where
_Support for other open source models is currently planned. For models where
weights can be legally distributed, this section will be updated with scripts to
make the install process as user-friendly as possible. Due to the model's legal
requirements, this is currently not possible with LLaMA itself and a more
lengthy setup is required.*
lengthy setup is required._

- https://github.com/rustformers/llama-rs/pull/85
- https://github.com/rustformers/llama-rs/issues/75


### Running

For example, try the following prompt:
Expand Down Expand Up @@ -147,6 +144,19 @@ Some additional things to try:
A modern-ish C toolchain is required to compile `ggml`. A C++ toolchain
should not be necessary.

### Docker

```shell
# To build (This will take some time, go grab some coffee):
docker build -t llama-rs .

# To run with prompt:
docker run --rm --name llama-rs -it -v ${PWD}/data:/data -v ${PWD}/examples:/examples llama-rs -m data/gpt4all-lora-quantized-ggml.bin -p "Tell me how cool the Rust programming language is:"

# To run with prompt file and repl (will wait for user input):
docker run --rm --name llama-rs -it -v ${PWD}/data:/data -v ${PWD}/examples:/examples llama-rs -m data/gpt4all-lora-quantized-ggml.bin -f examples/alpaca_prompt.txt --repl
```

## Q&A

### Why did you do this?
Expand Down