Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/split arch #146

Open
wants to merge 54 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
b7bc665
Initial support for coqui TTS, disable SpeechT5 in WIS by default
kristiankielhofner Oct 29, 2023
b0b2e01
WIP: Remove SpeechT5 support
kristiankielhofner Oct 29, 2023
ba27e96
Bye bye SpeechT5 - it wasn't great while it lasted
kristiankielhofner Oct 29, 2023
df0e953
Use only the first GPU by default
kristiankielhofner Oct 29, 2023
01277f0
Update requirements
kristiankielhofner Oct 29, 2023
3cfc0c3
Remove chatbot support
kristiankielhofner Oct 29, 2023
73ba436
Dynamic library fixes
kristiankielhofner Oct 29, 2023
cdc125a
GPU batch log tweak
kristiankielhofner Oct 29, 2023
b0d76ef
Initial flake8 fixes
kristiankielhofner Oct 29, 2023
3a38b45
Nvidia driver version check
kristiankielhofner Oct 30, 2023
43491c7
Fix CPU handling and Nvidia detection
kristiankielhofner Oct 30, 2023
b6c9da4
Batch works on CPU and GPU, remove GPU from log line to not confuse p…
kristiankielhofner Oct 30, 2023
f0d704d
Supress warning from coqui on generation
kristiankielhofner Oct 30, 2023
44a5a14
Add support for X-API-Key authentication with Willow and other clients
kristiankielhofner Oct 31, 2023
a34d239
Generate x25519 key by default
kristiankielhofner Oct 31, 2023
b8ea73d
Implement WIS Basic HTTP Auth
kristiankielhofner Oct 31, 2023
7cd04a3
Support configuring of nginx tag, fix coqui tag config, bump nginx to…
kristiankielhofner Oct 31, 2023
93c74df
Disable 3DES cipher suites in nginx - it's 2023
kristiankielhofner Oct 31, 2023
f582ad5
Generate 2048 DH params
kristiankielhofner Oct 31, 2023
24ba2a6
Wrap openssl calls to nginx docker images
kristiankielhofner Oct 31, 2023
df7adbd
Add auth-basic.config.template
kristiankielhofner Oct 31, 2023
d604900
Mutiple security fixes:
kristiankielhofner Oct 31, 2023
258c76a
Massive CPU performance improvement.
kristiankielhofner Oct 31, 2023
ea57276
Based on extensive profiling:
kristiankielhofner Nov 4, 2023
6c20f5d
Add useradd/del/list helper args
kristiankielhofner Nov 7, 2023
a54dc21
Add little help text for useradd
kristiankielhofner Nov 7, 2023
9369355
Add header Cache-Control: public so CF will cache TTS even with auth
kristiankielhofner Nov 7, 2023
634788c
Add header Cache-Control: public so CF will cache TTS even with auth
kristiankielhofner Nov 7, 2023
23bd57f
Initial pass at ensuring proper permissions on the nginx cache
kristiankielhofner Nov 14, 2023
3502dc3
Update to official ctranslate2 release for distil-whisper and large-v3
kristiankielhofner Nov 14, 2023
6b20352
Add detect compute to relevant args
kristiankielhofner Nov 15, 2023
5e1e8f3
Add support for Coqui XTTS
kristiankielhofner Nov 15, 2023
176f335
Incorporate some XTTS tweaks from their HF space
kristiankielhofner Nov 15, 2023
ed6f0bd
Check supported language, warm TTS on start
kristiankielhofner Nov 15, 2023
aa2a569
For consistency update docker-compose-cpu - XTTS not supported anyway
kristiankielhofner Nov 15, 2023
a9ebc1c
XTTS: Support passing of all xtts model args
kristiankielhofner Nov 16, 2023
07c2bae
Update to latest from xtts-streaming-server
kristiankielhofner Nov 16, 2023
db12d50
Add URL param nocache to /api/tts to disable TTS caching
kristiankielhofner Nov 16, 2023
b795186
Support adding custom speakers with POST to /api/tts
kristiankielhofner Nov 16, 2023
4ecb88c
Update XTTS Dockerfile to copy all speaker json in dir
kristiankielhofner Nov 16, 2023
28bd71b
Add female and male speakers
kristiankielhofner Nov 16, 2023
f219252
Support CPU - SLOW
kristiankielhofner Nov 16, 2023
3eeafe0
Bump Coqui
kristiankielhofner Nov 16, 2023
25883a8
Update to latest XTTS streaming-server-base with model DL fix
kristiankielhofner Nov 21, 2023
4b701f3
Update XTTS base image
kristiankielhofner Dec 23, 2023
9883ed3
Turn down default XTTS temperature to 0.1
kristiankielhofner Dec 23, 2023
1809732
Revert "Update to official ctranslate2 release for distil-whisper and…
kristiankielhofner Apr 8, 2024
a542d33
Revert "Based on extensive profiling:"
kristiankielhofner Apr 8, 2024
ddb412a
Revert "Massive CPU performance improvement."
kristiankielhofner Apr 8, 2024
9ab27a9
Revert "WIP: Initial CUDA 12 support"
kristiankielhofner Apr 8, 2024
65b8003
Update all deps
kristiankielhofner Apr 8, 2024
15b0eea
Update nginx to latest
kristiankielhofner Apr 8, 2024
3f25c00
Update coqui to latest
kristiankielhofner Apr 8, 2024
35d709d
Update xtts to latest
kristiankielhofner Apr 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
venv
__pycache__
client
cache
cache*
models
acme.json
audio
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
venv
__pycache__
cache
cache*
acme.json
models
.env
Expand Down
94 changes: 7 additions & 87 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,100 +1,20 @@
# Builder
FROM nvcr.io/nvidia/tensorrt:23.08-py3 as builder

# Set in environment in case we need to build any extensions
ENV TORCH_CUDA_ARCH_LIST="6.0;6.1;7.0;7.5;8.0;8.6;8.9;9.0+PTX"

RUN apt-get update && \
apt-get install -y --no-install-recommends \
python3-dev \
python3-pip \
wget \
&& \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

WORKDIR /root

ENV ONEAPI_VERSION=2023.0.0
RUN wget -q https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB && \
apt-key add *.PUB && \
rm *.PUB && \
echo "deb https://apt.repos.intel.com/oneapi all main" > /etc/apt/sources.list.d/oneAPI.list && \
apt-get update && \
apt-get install -y --no-install-recommends \
intel-oneapi-mkl-devel-$ONEAPI_VERSION \
&& \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

RUN --mount=type=cache,target=/root/.cache pip install cmake==3.22.*

ENV ONEDNN_VERSION=3.1.1
RUN wget -q https://github.com/oneapi-src/oneDNN/archive/refs/tags/v${ONEDNN_VERSION}.tar.gz && \
tar xf *.tar.gz && \
rm *.tar.gz && \
cd oneDNN-* && \
cmake -DCMAKE_BUILD_TYPE=Release -DONEDNN_LIBRARY_TYPE=STATIC -DONEDNN_BUILD_EXAMPLES=OFF -DONEDNN_BUILD_TESTS=OFF -DONEDNN_ENABLE_WORKLOAD=INFERENCE -DONEDNN_ENABLE_PRIMITIVE="CONVOLUTION;REORDER" -DONEDNN_BUILD_GRAPH=OFF . && \
make -j$(nproc) install && \
cd .. && \
rm -r oneDNN-*

RUN git clone --recursive https://github.com/OpenNMT/CTranslate2.git

WORKDIR /root/CTranslate2

RUN git checkout 2203ad5

ARG CXX_FLAGS
ENV CXX_FLAGS=${CXX_FLAGS:-"-msse4.1"}
ARG CUDA_NVCC_FLAGS
ENV CUDA_NVCC_FLAGS=${CUDA_NVCC_FLAGS:-"-Xfatbin=-compress-all"}
ARG CUDA_ARCH_LIST
ENV CUDA_ARCH_LIST=${TORCH_CUDA_ARCH_LIST:-"Common"}
ENV CTRANSLATE2_ROOT=/opt/ctranslate2

RUN mkdir build && \
cd build && \
cmake -DCMAKE_INSTALL_PREFIX=${CTRANSLATE2_ROOT} \
-DWITH_CUDA=ON -DWITH_CUDNN=ON -DWITH_MKL=ON -DWITH_DNNL=ON -DOPENMP_RUNTIME=COMP \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="${CXX_FLAGS}" \
-DCUDA_NVCC_FLAGS="${CUDA_NVCC_FLAGS}" -DCUDA_ARCH_LIST="${CUDA_ARCH_LIST}" .. && \
VERBOSE=1 make -j$(nproc) install

ENV LANG=en_US.UTF-8
COPY README.md .

RUN --mount=type=cache,target=/root/.cache cd python && \
pip --no-cache-dir install -r install_requirements.txt && \
python3 setup.py bdist_wheel --dist-dir $CTRANSLATE2_ROOT

# Runtime

FROM nvcr.io/nvidia/tensorrt:23.08-py3

WORKDIR /app

# Set in environment in case we need to build any extensions
ENV TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;7.5;8.0;8.6;8.9;9.0+PTX"

# Install zstd and git-lfs for model compression and distribution
RUN apt-get update && apt-get install -y zstd git-lfs && rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y zstd git-lfs && rm -rf /var/lib/apt/lists/*

# Install our torch ver matching cuda
RUN --mount=type=cache,target=/root/.cache pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2

COPY requirements.txt .
# Run pip install with cache so we speedup subsequent rebuilds
RUN --mount=type=cache,target=/root/.cache pip install -r requirements.txt

# Install our torch ver matching cuda
RUN --mount=type=cache,target=/root/.cache pip install -U torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0

# Install compiled ctranslate2
ENV CTRANSLATE2_ROOT=/opt/ctranslate2
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CTRANSLATE2_ROOT/lib

COPY --from=builder $CTRANSLATE2_ROOT $CTRANSLATE2_ROOT
RUN python3 -m pip --no-cache-dir install $CTRANSLATE2_ROOT/*.whl && \
rm $CTRANSLATE2_ROOT/*.whl

# Install auto-gptq
RUN --mount=type=cache,target=/root/.cache pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/

COPY . .

CMD ./entrypoint.sh
Expand Down
6 changes: 6 additions & 0 deletions Dockerfile.nginx
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
ARG NGINX_TAG

FROM nginx:${NGINX_TAG}

RUN apt-get update && apt-get install --no-install-recommends -y apache2-utils \
&& rm -rf /var/lib/apt/lists/*
6 changes: 6 additions & 0 deletions Dockerfile.xtts
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
FROM ghcr.io/coqui-ai/xtts-streaming-server:main-cuda121-99286c10883cb9b9dcecdb6c68933c4dc0ecbec3
WORKDIR /xtts

COPY xtts/main.py .
COPY xtts/*.json .
EXPOSE 5002
21 changes: 20 additions & 1 deletion docker-compose-cpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,30 @@ services:
- ./cache:/root/.cache
command: ./entrypoint.sh

coqui:
restart: unless-stopped
image: ${COQUI_IMAGE}:${COQUI_TAG}
environment:
- FORCE_CPU
env_file:
- .env
shm_size: ${SHM_SIZE}
ipc: host
ulimits:
memlock: -1
stack: 67108864
volumes:
- ./:/app
- ./cache:/root/.cache
- ./cache-local:/root/.local
entrypoint: /app/entrypoint-coqui.sh

nginx:
restart: unless-stopped
depends_on:
- coqui
- wis
image: nginx:1.25.2
image: ${WIS_NGINX_IMAGE}:${WIS_NGINX_TAG}
volumes:
- ./nginx:/nginx
- ./nginx/nginx.conf:/etc/nginx/nginx.conf
Expand Down
27 changes: 26 additions & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,41 @@ services:
devices:
- driver: nvidia
capabilities: [gpu]
device_ids: ['0']
volumes:
- ./:/app
- ./cache:/root/.cache
command: ./entrypoint.sh

coqui:
restart: unless-stopped
image: ${COQUI_IMAGE}:${COQUI_TAG}
env_file:
- .env
shm_size: ${SHM_SIZE}
ipc: host
ulimits:
memlock: -1
stack: 67108864
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
device_ids: ['0']
volumes:
- ./:/app
- ./cache:/root/.cache
- ./cache-local:/root/.local
entrypoint: /app/entrypoint-coqui.sh

nginx:
restart: unless-stopped
depends_on:
- coqui
- wis
image: nginx:1.25.2
image: ${WIS_NGINX_IMAGE}:${WIS_NGINX_TAG}
volumes:
- ./nginx:/nginx
- ./nginx/nginx.conf:/etc/nginx/nginx.conf
Expand Down
28 changes: 28 additions & 0 deletions entrypoint-coqui.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/bash
set -e

if [ "$FORCE_CPU" ]; then
COQUI_CUDA="false"
else
COQUI_CUDA="true"
fi

export COQUI_TOS_AGREED=1

if [ -r "/xtts/main.py" ]; then
echo "Starting coqui xtts"
cd /xtts
uvicorn main:app --host 0.0.0.0 --port 5002
else
# Fix/suppress cudnn warning to not confuse people
ln -sf /usr/local/lib/python3.10/dist-packages/torch/lib/libnvrtc-*.so.11.2 \
/usr/local/lib/python3.10/dist-packages/torch/lib/libnvrtc.so

if [ "$TTS_MODEL_NAME" ]; then
echo "Using coqui model $TTS_MODEL_NAME"
python3 TTS/server/server.py --model_name "$TTS_MODEL_NAME" --use_cuda "$COQUI_CUDA"
else
echo "Using default coqui model"
python3 TTS/server/server.py --use_cuda "$COQUI_CUDA"
fi
fi
Loading
Loading