From f825081b0655fc408be867b0c96dcac2eabd766a Mon Sep 17 00:00:00 2001 From: Aleksandr Movchan Date: Fri, 12 Jul 2024 08:33:32 +0000 Subject: [PATCH 1/7] Moved SDK logo into docs. --- README.md | 4 ++-- .../images/AanaSDK_logo_dark_theme.png | Bin .../images/AanaSDK_logo_light_theme.png | Bin 3 files changed, 2 insertions(+), 2 deletions(-) rename Aana-SDK-whitetext.png => docs/images/AanaSDK_logo_dark_theme.png (100%) rename Aanalogo.png => docs/images/AanaSDK_logo_light_theme.png (100%) diff --git a/README.md b/README.md index db5bab06..af4e7fe2 100644 --- a/README.md +++ b/README.md @@ -6,8 +6,8 @@

- - Aana Logo + + Aana Logo

diff --git a/Aana-SDK-whitetext.png b/docs/images/AanaSDK_logo_dark_theme.png similarity index 100% rename from Aana-SDK-whitetext.png rename to docs/images/AanaSDK_logo_dark_theme.png diff --git a/Aanalogo.png b/docs/images/AanaSDK_logo_light_theme.png similarity index 100% rename from Aanalogo.png rename to docs/images/AanaSDK_logo_light_theme.png From 1bc331e60c84dbf4f891d806fbdf801bef8f872e Mon Sep 17 00:00:00 2001 From: Aleksandr Movchan Date: Fri, 12 Jul 2024 08:33:51 +0000 Subject: [PATCH 2/7] Added docs index --- docs/README.md | 44 ++++++++++++++++++++++++++++++++++++++++++ docs/code_standards.md | 13 ------------- docs/testing.md | 11 +++++++++++ 3 files changed, 55 insertions(+), 13 deletions(-) create mode 100644 docs/README.md diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 00000000..e3d10ffb --- /dev/null +++ b/docs/README.md @@ -0,0 +1,44 @@ +# Documentation + +Welcome to the documentation for Aana SDK. + +## Table of Contents + +1. [Getting Started](#getting-started) +2. [Development](#development) +3. [Deployment](#deployment) +4. [Integrations](#integrations) +5. [Configuration](#configuration) +6. [Best Practices](#best-practices) + +## Documentation Files + +### Getting Started +- [Tutorial](tutorial): A step-by-step tutorial to help you get started with using Aana SDK. + +### Development +- [Development Guide](development): A guide for developers working on the project, including code structure, dev container setup, and database management. +- [Testing](testing): This document covers the testing procedures and guidelines for our project. +- [Deployment Test Cache](deployment_test_cache): Information on how deployment test caching works and its configuration. + +### Deployment +- [Docker](docker): Instructions for using Docker with Aana SDK. +- [Serve Config Files](serve_config_files): Information about [Serve Config Files](https://docs.ray.io/en/latest/serve/production-guide/config.html#serve-config-files) for production deployment, how to build them, and deploy applications using them. + +### Integrations +- [Integrations](integrations): Overview of the available predefined deployments like Whisper, vLLM, Hugging Face Transformers, Haystack etc. +- [OpenAI API](openai_api): Overview of the OpenAI-compatible Chat Completions API. + +### Configuration +- [Settings](settings): Documentation on the available settings and configuration options for the project. + +### Best Practices +- [Code Standards](code_standards): Learn about our coding standards and best practices for contributing to the project. + +## Getting Started + +If you're new to the project, we recommend starting with the [Tutorial](tutorial) to get a hands-on introduction. From there, you can explore the other documentation files based on your specific needs or interests. + +For developers looking to contribute, make sure to review the [Code Standards](code_standards) and [Development Guide](development). + +If you have any questions or need further assistance, please don't hesitate to reach out to our support team or community forums. diff --git a/docs/code_standards.md b/docs/code_standards.md index 9599115a..2e0f359d 100644 --- a/docs/code_standards.md +++ b/docs/code_standards.md @@ -22,16 +22,3 @@ poetry run ruff format aana For users of VS Code, the included `settings.json` should ensure that Ruff problems appear while you edit, and formatting is applied automatically on save. - - -# Testing - -The project uses pytest for testing. To run the tests, use the following command: - -```bash -poetry run pytest -``` - -If you are using VS Code, you can run the tests using the Test Explorer that is installed with the [Python extension](https://code.visualstudio.com/docs/python/testing). - -Testing ML models poses a couple of problems: loading and running models may be very time consuming, and you may wish to run tests on systems that lack hardware support necessary for the models, for example a subnotebook without a GPU or a CI/CD server. To solve this issue, we created a **deployment test cache**. See [the documentation](docs/deployment_test_cache.md). \ No newline at end of file diff --git a/docs/testing.md b/docs/testing.md index e69de29b..8be38fb0 100644 --- a/docs/testing.md +++ b/docs/testing.md @@ -0,0 +1,11 @@ +# Testing + +The project uses pytest for testing. To run the tests, use the following command: + +```bash +poetry run pytest +``` + +If you are using VS Code, you can run the tests using the Test Explorer that is installed with the [Python extension](https://code.visualstudio.com/docs/python/testing). + +Testing ML models poses a couple of problems: loading and running models may be very time consuming, and you may wish to run tests on systems that lack hardware support necessary for the models, for example a subnotebook without a GPU or a CI/CD server. To solve this issue, we created a **deployment test cache**. See [the documentation](docs/deployment_test_cache.md). \ No newline at end of file From 98fadba253e2fde62192e3c17bc4f3625e4700cf Mon Sep 17 00:00:00 2001 From: Aleksandr Movchan Date: Fri, 12 Jul 2024 08:40:52 +0000 Subject: [PATCH 3/7] Added missing extension for docs. --- docs/README.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/README.md b/docs/README.md index e3d10ffb..ed63e52d 100644 --- a/docs/README.md +++ b/docs/README.md @@ -14,31 +14,31 @@ Welcome to the documentation for Aana SDK. ## Documentation Files ### Getting Started -- [Tutorial](tutorial): A step-by-step tutorial to help you get started with using Aana SDK. +- [Tutorial](tutorial.md): A step-by-step tutorial to help you get started with using Aana SDK. ### Development -- [Development Guide](development): A guide for developers working on the project, including code structure, dev container setup, and database management. -- [Testing](testing): This document covers the testing procedures and guidelines for our project. -- [Deployment Test Cache](deployment_test_cache): Information on how deployment test caching works and its configuration. +- [Development Guide](development.md): A guide for developers working on the project, including code structure, dev container setup, and database management. +- [Testing](testing.md): This document covers the testing procedures and guidelines for our project. +- [Deployment Test Cache](deployment_test_cache.md): Information on how deployment test caching works and its configuration. ### Deployment -- [Docker](docker): Instructions for using Docker with Aana SDK. -- [Serve Config Files](serve_config_files): Information about [Serve Config Files](https://docs.ray.io/en/latest/serve/production-guide/config.html#serve-config-files) for production deployment, how to build them, and deploy applications using them. +- [Docker](docker.md): Instructions for using Docker with Aana SDK. +- [Serve Config Files](serve_config_files.md): Information about [Serve Config Files](https://docs.ray.io/en/latest/serve/production-guide/config.html#serve-config-files) for production deployment, how to build them, and deploy applications using them. ### Integrations -- [Integrations](integrations): Overview of the available predefined deployments like Whisper, vLLM, Hugging Face Transformers, Haystack etc. -- [OpenAI API](openai_api): Overview of the OpenAI-compatible Chat Completions API. +- [Integrations](integrations.md): Overview of the available predefined deployments like Whisper, vLLM, Hugging Face Transformers, Haystack etc. +- [OpenAI API](openai_api.md): Overview of the OpenAI-compatible Chat Completions API. ### Configuration -- [Settings](settings): Documentation on the available settings and configuration options for the project. +- [Settings](settings.md): Documentation on the available settings and configuration options for the project. ### Best Practices -- [Code Standards](code_standards): Learn about our coding standards and best practices for contributing to the project. +- [Code Standards](code_standards.md): Learn about our coding standards and best practices for contributing to the project. ## Getting Started -If you're new to the project, we recommend starting with the [Tutorial](tutorial) to get a hands-on introduction. From there, you can explore the other documentation files based on your specific needs or interests. +If you're new to the project, we recommend starting with the [Tutorial](tutorial.md) to get a hands-on introduction. From there, you can explore the other documentation files based on your specific needs or interests. -For developers looking to contribute, make sure to review the [Code Standards](code_standards) and [Development Guide](development). +For developers looking to contribute, make sure to review the [Code Standards](code_standards.md) and [Development Guide](development.md). If you have any questions or need further assistance, please don't hesitate to reach out to our support team or community forums. From 10c1c712b8aa2976881442c2b01380820520ade3 Mon Sep 17 00:00:00 2001 From: Aleksandr Movchan Date: Fri, 12 Jul 2024 09:24:22 +0000 Subject: [PATCH 4/7] Moved docs pages into subdir --- README.md | 22 ++++++++++++--------- docs/README.md | 24 +++++++++++------------ docs/{ => pages}/code_standards.md | 0 docs/{ => pages}/deployment_test_cache.md | 2 +- docs/{ => pages}/development.md | 0 docs/{ => pages}/docker.md | 10 +++++++--- docs/{ => pages}/integrations.md | 2 +- docs/{ => pages}/openai_api.md | 2 +- docs/{ => pages}/serve_config_files.md | 0 docs/{ => pages}/settings.md | 0 docs/{ => pages}/testing.md | 2 +- docs/{ => pages}/tutorial.md | 2 +- notebooks/chat_with_video_demo.ipynb | 6 +----- notebooks/hf_text_gen_deployment.ipynb | 4 ++-- notebooks/rag.ipynb | 2 +- 15 files changed, 41 insertions(+), 37 deletions(-) rename docs/{ => pages}/code_standards.md (100%) rename docs/{ => pages}/deployment_test_cache.md (94%) rename docs/{ => pages}/development.md (100%) rename docs/{ => pages}/docker.md (63%) rename docs/{ => pages}/integrations.md (97%) rename docs/{ => pages}/openai_api.md (95%) rename docs/{ => pages}/serve_config_files.md (100%) rename docs/{ => pages}/settings.md (100%) rename docs/{ => pages}/testing.md (89%) rename docs/{ => pages}/tutorial.md (99%) diff --git a/README.md b/README.md index af4e7fe2..27f021e1 100644 --- a/README.md +++ b/README.md @@ -54,7 +54,7 @@ Aana SDK simplifies this process by providing a framework that allows: - **Task Queue Support**: - Run every endpoint you define as a task in the background without any changes to your code. - **Integrations**: - - Aana SDK has integrations with various machine learning models and libraries: Whisper, vLLM, Hugging Face Transformers, Deepset Haystack, and more to come (for more information see [Integrations](docs/integrations.md)). + - Aana SDK has integrations with various machine learning models and libraries: Whisper, vLLM, Hugging Face Transformers, Deepset Haystack, and more to come (for more information see [Integrations](docs/pages/integrations.md)). ## Installation @@ -66,7 +66,9 @@ To install Aana SDK via PyPI, you can use the following command: pip install aana ``` -Make sure you have the necessary dependencies installed, such as `libgl1` for OpenCV. +For optimal performance install [PyTorch](https://pytorch.org/get-started/locally/) version >=2.1 appropriate for your system. You can skip it, but it will install a default version that may not make optimal use of your system's resources, for example, a GPU or even some SIMD operations. Therefore we recommend choosing your PyTorch package carefully and installing it manually. + +Some models use Flash Attention. Install Flash Attention library for better performance. See [flash attention installation instructions](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features) for more details and supported GPUs. ### Installing from GitHub @@ -78,7 +80,7 @@ git clone https://github.com/mobiusml/aana_sdk.git 2. Install additional libraries. -You should install [PyTorch](https://pytorch.org/get-started/locally/) version >=2.1 appropriate for your system. You can continue directly to the next step, but it will install a default version that may not make optimal use of your system's resources, for example, a GPU or even some SIMD operations. Therefore we recommend choosing your PyTorch package carefully and installing it manually. +For optimal performance install [PyTorch](https://pytorch.org/get-started/locally/) version >=2.1 appropriate for your system. You can continue directly to the next step, but it will install a default version that may not make optimal use of your system's resources, for example, a GPU or even some SIMD operations. Therefore we recommend choosing your PyTorch package carefully and installing it manually. Some models use Flash Attention. Install Flash Attention library for better performance. See [flash attention installation instructions](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features) for more details and supported GPUs. @@ -190,7 +192,7 @@ This will return the full transcription of the video, transcription for each seg Aana SDK comes with a set of example applications that demonstrate the capabilities of the SDK. You can run the example applications using the Aana CLI. The following applications are available: -- `chat_with_video`: A multimodal chat application that allows users to upload a video and ask questions about the video content based on the visual and audio information. This example requires `HF_TOKEN` to access [Llama 3 8B model from Meta](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct). See [Chat with Video Demo notebook](/notebooks/chat_with_video_demo.ipynb) for more information. +- `chat_with_video`: A multimodal chat application that allows users to upload a video and ask questions about the video content based on the visual and audio information. See [Chat with Video Demo notebook](/notebooks/chat_with_video_demo.ipynb) for more information. - `whisper`: An application that demonstrates the Whisper model for automatic speech recognition (ASR). - `llama2`: An application that deploys LLaMa2 7B Chat model. @@ -221,8 +223,6 @@ aana deploy aana.projects.whisper.app:aana_app > - `llama2` requires at least 16GB. > - `whisper` requires at least 4GB. - - ### Main components There are three main components in Aana SDK: deployments, endpoints, and AanaSDK. @@ -296,12 +296,16 @@ All you need to do is define the deployments and endpoints you want to use in yo ## Serve Config Files -The [Serve Config Files](https://docs.ray.io/en/latest/serve/production-guide/config.html#serve-config-files) is the recommended way to deploy and update your applications in production. Aana SDK provides a way to build the Serve Config Files for the Aana applications. See the [Serve Config Files documentation](docs/serve_config_files.md) on how to build and deploy the applications using the Serve Config Files. +The [Serve Config Files](https://docs.ray.io/en/latest/serve/production-guide/config.html#serve-config-files) is the recommended way to deploy and update your applications in production. Aana SDK provides a way to build the Serve Config Files for the Aana applications. See the [Serve Config Files documentation](docs/pages/serve_config_files.md) on how to build and deploy the applications using the Serve Config Files. ## Run with Docker -You can deploy example applications using Docker. See the [documentation on how to run Aana SDK with Docker](docs/docker.md). +You can deploy example applications using Docker. See the [documentation on how to run Aana SDK with Docker](docs/pages/docker.md). + +## Documentation + +For more information on how to use Aana SDK, see the [documentation](docs/README.md). ## License @@ -311,6 +315,6 @@ Aana SDK is licensed under the [Apache License 2.0](./LICENSE). Commercial licen We welcome contributions from the community to enhance Aana SDK's functionality and usability. Feel free to open issues for bug reports, feature requests, or submit pull requests to contribute code improvements. -Before contributing, please read our [Code Standards](docs/code_standards.md) and [Development Documentation](docs/development.md). +Before contributing, please read our [Code Standards](docs/pages/code_standards.md) and [Development Documentation](docs/pages/development.md). We have adopted the [Contributor Covenant](https://www.contributor-covenant.org/) as our code of conduct. diff --git a/docs/README.md b/docs/README.md index ed63e52d..88a92a57 100644 --- a/docs/README.md +++ b/docs/README.md @@ -14,31 +14,31 @@ Welcome to the documentation for Aana SDK. ## Documentation Files ### Getting Started -- [Tutorial](tutorial.md): A step-by-step tutorial to help you get started with using Aana SDK. +- [Tutorial](pages/tutorial.md): A step-by-step tutorial to help you get started with using Aana SDK. ### Development -- [Development Guide](development.md): A guide for developers working on the project, including code structure, dev container setup, and database management. -- [Testing](testing.md): This document covers the testing procedures and guidelines for our project. -- [Deployment Test Cache](deployment_test_cache.md): Information on how deployment test caching works and its configuration. +- [Development Guide](pages/development.md): A guide for developers working on the project, including code structure, dev container setup, and database management. +- [Testing](pages/testing.md): This document covers the testing procedures and guidelines for our project. +- [Deployment Test Cache](pages/deployment_test_cache.md): Information on how deployment test caching works and its configuration. ### Deployment -- [Docker](docker.md): Instructions for using Docker with Aana SDK. -- [Serve Config Files](serve_config_files.md): Information about [Serve Config Files](https://docs.ray.io/en/latest/serve/production-guide/config.html#serve-config-files) for production deployment, how to build them, and deploy applications using them. +- [Docker](pages/docker.md): Instructions for using Docker with Aana SDK. +- [Serve Config Files](pages/serve_config_files.md): Information about [Serve Config Files](https://docs.ray.io/en/latest/serve/production-guide/config.html#serve-config-files) for production deployment, how to build them, and deploy applications using them. ### Integrations -- [Integrations](integrations.md): Overview of the available predefined deployments like Whisper, vLLM, Hugging Face Transformers, Haystack etc. -- [OpenAI API](openai_api.md): Overview of the OpenAI-compatible Chat Completions API. +- [Integrations](pages/integrations.md): Overview of the available predefined deployments like Whisper, vLLM, Hugging Face Transformers, Haystack etc. +- [OpenAI API](pages/openai_api.md): Overview of the OpenAI-compatible Chat Completions API. ### Configuration -- [Settings](settings.md): Documentation on the available settings and configuration options for the project. +- [Settings](pages/settings.md): Documentation on the available settings and configuration options for the project. ### Best Practices -- [Code Standards](code_standards.md): Learn about our coding standards and best practices for contributing to the project. +- [Code Standards](pages/code_standards.md): Learn about our coding standards and best practices for contributing to the project. ## Getting Started -If you're new to the project, we recommend starting with the [Tutorial](tutorial.md) to get a hands-on introduction. From there, you can explore the other documentation files based on your specific needs or interests. +If you're new to the project, we recommend starting with the [Tutorial](pages/tutorial.md) to get a hands-on introduction. From there, you can explore the other documentation files based on your specific needs or interests. -For developers looking to contribute, make sure to review the [Code Standards](code_standards.md) and [Development Guide](development.md). +For developers looking to contribute, make sure to review the [Code Standards](pages/code_standards.md) and [Development Guide](pages/development.md). If you have any questions or need further assistance, please don't hesitate to reach out to our support team or community forums. diff --git a/docs/code_standards.md b/docs/pages/code_standards.md similarity index 100% rename from docs/code_standards.md rename to docs/pages/code_standards.md diff --git a/docs/deployment_test_cache.md b/docs/pages/deployment_test_cache.md similarity index 94% rename from docs/deployment_test_cache.md rename to docs/pages/deployment_test_cache.md index bc766586..c54b80ae 100644 --- a/docs/deployment_test_cache.md +++ b/docs/pages/deployment_test_cache.md @@ -2,7 +2,7 @@ The deployment test cache is a feature of integration tests that allows you to simulate running the model endpoints without having to go to the effort of downloading the model, loading it, and running on a GPU. This is useful to save time as well as to be able to run the integration tests without needing a GPU (for example if you are on your laptop without internet access). -To mark a function as cacheable so its output can be stored in the deployment cache, annotate it with @test_cache imported from `[aana.utils.test](../aana/utils/test.py)`. Here's our StableDiffusion 2 deployment from above with the generate method annotated: +To mark a function as cacheable so its output can be stored in the deployment cache, annotate it with `@test_cache` imported from [aana.deployments.base_deployment](/aana/deployments/base_deployment.py). Here's our StableDiffusion 2 deployment from above with the generate method annotated: ```python class StableDiffusion2Deployment(BaseDeployment): diff --git a/docs/development.md b/docs/pages/development.md similarity index 100% rename from docs/development.md rename to docs/pages/development.md diff --git a/docs/docker.md b/docs/pages/docker.md similarity index 63% rename from docs/docker.md rename to docs/pages/docker.md index 2d98c072..0bfb92bb 100644 --- a/docs/docker.md +++ b/docs/pages/docker.md @@ -13,10 +13,10 @@ docker build -t aana:latest . 3. Run the Docker container. ```bash -docker run --rm --init -p 8000:8000 --gpus all -e TARGET="llama2" -v aana_cache:/root/.aana -v aana_hf_cache:/root/.cache/huggingface --name aana_instance aana:latest +docker run --rm --init -p 8000:8000 --gpus all -e TARGET="whisper" -v aana_cache:/root/.aana -v aana_hf_cache:/root/.cache/huggingface --name aana_instance aana:latest ``` -Use the environment variable TARGET to specify the application you want to run. The available applications are `chat_with_video`, `whisper`, and `llama2`. +Use the environment variable TARGET to specify the application you want to run. The available applications are `chat_with_video`, `whisper`, `llama2`, `summarize_transcript` etc. See [Projects](/aana/projects/) for the list of available projects. The first run might take a while because the models will be downloaded from the Internet and cached. The models will be stored in the `aana_cache` volume. The HuggingFace models will be stored in the `aana_hf_cache` volume. If you want to remove the cached models, remove the volume. @@ -30,4 +30,8 @@ The app documentation is available as a [Swagger UI](http://localhost:8000/docs) 5. Send a request to the server. -You can find examples in the [demo notebook](notebooks/demo.ipynb). \ No newline at end of file +For example, if your application has `/video/transcribe` endpoint that accepts videos (like `whisper` app), you can send a POST request like this: + +```bash +curl -X POST http://127.0.0.1:8000/video/transcribe -Fbody='{"video":{"url":"https://www.youtube.com/watch?v=VhJFyyukAzA"}}' +``` \ No newline at end of file diff --git a/docs/integrations.md b/docs/pages/integrations.md similarity index 97% rename from docs/integrations.md rename to docs/pages/integrations.md index b5c43d8a..1210e849 100644 --- a/docs/integrations.md +++ b/docs/pages/integrations.md @@ -71,4 +71,4 @@ See [Haystack integration notebook](/notebooks/haystack_integration.ipynb) for a ## OpenAI-compatible Chat Completions API -The OpenAI-compatible Chat Completions API allows you to access the Aana applications with any OpenAI-compatible client. See [OpenAI-compatible API docs](/docs/openai_api.md) for more details. +The OpenAI-compatible Chat Completions API allows you to access the Aana applications with any OpenAI-compatible client. See [OpenAI-compatible API docs](openai_api.md) for more details. diff --git a/docs/openai_api.md b/docs/pages/openai_api.md similarity index 95% rename from docs/openai_api.md rename to docs/pages/openai_api.md index a9da2fd3..aa63a2ba 100644 --- a/docs/openai_api.md +++ b/docs/pages/openai_api.md @@ -49,7 +49,7 @@ for chunk in stream: print(chunk.choices[0].delta.content or "", end="") ``` -The API requires an LLM deployment. Aana SDK provides support for [vLLM](/docs/integrations.md#vllm) and [Hugging Face Transformers](/docs/integrations.md#hugging-face-transformers). +The API requires an LLM deployment. Aana SDK provides support for [vLLM](integrations.md#vllm) and [Hugging Face Transformers](integrations.md#hugging-face-transformers). The name of the model matches the name of the deployment. For example, if you registered a vLLM deployment with the name `llm_deployment`, you can use it with the OpenAI API as `model="llm_deployment"`. diff --git a/docs/serve_config_files.md b/docs/pages/serve_config_files.md similarity index 100% rename from docs/serve_config_files.md rename to docs/pages/serve_config_files.md diff --git a/docs/settings.md b/docs/pages/settings.md similarity index 100% rename from docs/settings.md rename to docs/pages/settings.md diff --git a/docs/testing.md b/docs/pages/testing.md similarity index 89% rename from docs/testing.md rename to docs/pages/testing.md index 8be38fb0..bd393b89 100644 --- a/docs/testing.md +++ b/docs/pages/testing.md @@ -8,4 +8,4 @@ poetry run pytest If you are using VS Code, you can run the tests using the Test Explorer that is installed with the [Python extension](https://code.visualstudio.com/docs/python/testing). -Testing ML models poses a couple of problems: loading and running models may be very time consuming, and you may wish to run tests on systems that lack hardware support necessary for the models, for example a subnotebook without a GPU or a CI/CD server. To solve this issue, we created a **deployment test cache**. See [the documentation](docs/deployment_test_cache.md). \ No newline at end of file +Testing ML models poses a couple of problems: loading and running models may be very time consuming, and you may wish to run tests on systems that lack hardware support necessary for the models, for example a subnotebook without a GPU or a CI/CD server. To solve this issue, we created a **deployment test cache**. See [deployment test cache docs](deployment_test_cache.md) for more information. \ No newline at end of file diff --git a/docs/tutorial.md b/docs/pages/tutorial.md similarity index 99% rename from docs/tutorial.md rename to docs/pages/tutorial.md index 992e7c3f..a00bace7 100644 --- a/docs/tutorial.md +++ b/docs/pages/tutorial.md @@ -119,7 +119,7 @@ The application consists of 3 main components: the deployment, the endpoint, and ### Deployments -Deployments are the building blocks of Aana SDK. They represent the machine learning models that you want to deploy. Aana SDK comes with a set of predefined deployments that you can use or you can define your own deployments. See [Integrations](#integrations) section for more information about predefined deployments. +Deployments are the building blocks of Aana SDK. They represent the machine learning models that you want to deploy. Aana SDK comes with a set of predefined deployments that you can use or you can define your own deployments. See [Integrations](integrations.md) for more information about predefined deployments. Each deployment has a main class that defines it and a configuration class that allows you to specify the deployment parameters. diff --git a/notebooks/chat_with_video_demo.ipynb b/notebooks/chat_with_video_demo.ipynb index df706d8a..e7fb7570 100644 --- a/notebooks/chat_with_video_demo.ipynb +++ b/notebooks/chat_with_video_demo.ipynb @@ -16,11 +16,7 @@ "\n", "```bash\n", "HF_TOKEN=\"\" CUDA_VISIBLE_DEVICES=\"0\" aana deploy aana.projects.chat_with_video.app:aana_app\n", - "```\n", - "\n", - "> **⚠️ Warning**\n", - ">\n", - "> The application is using [Llama 3 8B model from Meta](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) which is a gated model. That means that you need to have access to the model in order to run the application. If you don't yet have access to the model, you can request access by filling out the form on the model page. It's completely free even for commertial use (with certain limitations) but requires to accept the terms of use. Once you have access to the model, you can set the `HF_TOKEN` environment variable to your token and run the application." + "```" ] }, { diff --git a/notebooks/hf_text_gen_deployment.ipynb b/notebooks/hf_text_gen_deployment.ipynb index 3ad7f9d1..aeb420dc 100644 --- a/notebooks/hf_text_gen_deployment.ipynb +++ b/notebooks/hf_text_gen_deployment.ipynb @@ -348,9 +348,9 @@ "source": [ "Congratulations! You have successfully deployed an LLM using Aana SDK. You can add Aana Endpoints to your application to interact with the deployed model.\n", "\n", - "Aana SDK also provides OpenAI-compatible API to interact with the deployed model. It allows you to access the Aana applications with any OpenAI-compatible client. See [OpenAI-compatible API docs](/docs/openai_api.md) for more details.\n", + "Aana SDK also provides OpenAI-compatible API to interact with the deployed model. It allows you to access the Aana applications with any OpenAI-compatible client. See [OpenAI-compatible API docs](/docs/pages/openai_api.md) for more details.\n", "\n", - "You can also deploy LLMs using [vLLM integration](/docs/integrations.md#vllm) with Aana SDK. It is a more efficient way to deploy LLMs if you have a GPU." + "You can also deploy LLMs using [vLLM integration](/docs/pages/integrations.md#vllm) with Aana SDK. It is a more efficient way to deploy LLMs if you have a GPU." ] } ], diff --git a/notebooks/rag.ipynb b/notebooks/rag.ipynb index c4465802..52b28379 100644 --- a/notebooks/rag.ipynb +++ b/notebooks/rag.ipynb @@ -762,7 +762,7 @@ "source": [ "This is it! Now we have two pipelines: one for indexing the transcribed text and one for answering user's questions. We used the Whisper model for ASR, the text embedder for generating embeddings, and the LLM model for generating answers. We also used Qdrant as the datastore to store the chunks and their embeddings. We used `PromptBuilder` to generate a prompt and then used the LLM model to generate an answer based on the prompt.\n", "\n", - "Now you can package these pipelines into Aana Endpoints to create an Aana Application. See [tutorial](/docs/tutorial.md) for more details on how to create an Aana Application." + "Now you can package these pipelines into Aana Endpoints to create an Aana Application. See [tutorial](/docs/pages/tutorial.md) for more details on how to create an Aana Application." ] }, { From a2120f53739365f9bbb70eafed8de1c8a44c879f Mon Sep 17 00:00:00 2001 From: Aleksandr Movchan Date: Fri, 12 Jul 2024 09:27:48 +0000 Subject: [PATCH 5/7] Added link to Aana app template repo. --- README.md | 2 ++ docs/pages/tutorial.md | 2 ++ 2 files changed, 4 insertions(+) diff --git a/README.md b/README.md index 27f021e1..68dbd0a7 100644 --- a/README.md +++ b/README.md @@ -100,6 +100,8 @@ sh install.sh You can quickly develop multimodal applications using Aana SDK's intuitive APIs and components. +If you want to start building a new application, you can use the following GitHub template: [Aana App Template](https://github.com/mobiusml/aana_app_template). It will help you get started with the Aana SDK and provide you with a basic structure for your application and its dependencies. + Let's create a simple application that transcribes a video. The application will download a video from YouTube, extract the audio, and transcribe it using an ASR model. Aana SDK already provides a deployment for ASR (Automatic Speech Recognition) based on the Whisper model. We will use this [deployment](#Deployments) in the example. diff --git a/docs/pages/tutorial.md b/docs/pages/tutorial.md index a00bace7..185781be 100644 --- a/docs/pages/tutorial.md +++ b/docs/pages/tutorial.md @@ -4,6 +4,8 @@ Aana SDK is a powerful framework for building multimodal applications. It facili Aana SDK comes with a set of example applications that demonstrate the capabilities of the SDK. These applications can be used as a reference to build your own applications. See the [projects](/aana/projects/) directory for the example applications. +If you want to start building a new application, you can use the following GitHub template: [Aana App Template](https://github.com/mobiusml/aana_app_template). It will help you get started with the Aana SDK and provide you with a basic structure for your application and its dependencies. + In this tutorial, we will walk you through the process of creating a new project with Aana SDK. By the end of this tutorial, you will have a runnable application that transcribes a video and summarizes the transcript using a Language Model (LLM). We will use the video transcription application from the [README](/README.md) as a starting point and extend it to include the LLM model for summarization and a new endpoints. From c1485400e6dd53d8e28b8191f47b5783ed455340 Mon Sep 17 00:00:00 2001 From: Aleksandr Movchan Date: Fri, 12 Jul 2024 10:16:03 +0000 Subject: [PATCH 6/7] Added cluster setup docs --- docs/README.md | 1 + docs/pages/cluster_setup.md | 319 ++++++++++++++++++++++++++++++++++++ 2 files changed, 320 insertions(+) create mode 100644 docs/pages/cluster_setup.md diff --git a/docs/README.md b/docs/README.md index 88a92a57..c653dfef 100644 --- a/docs/README.md +++ b/docs/README.md @@ -24,6 +24,7 @@ Welcome to the documentation for Aana SDK. ### Deployment - [Docker](pages/docker.md): Instructions for using Docker with Aana SDK. - [Serve Config Files](pages/serve_config_files.md): Information about [Serve Config Files](https://docs.ray.io/en/latest/serve/production-guide/config.html#serve-config-files) for production deployment, how to build them, and deploy applications using them. +- [Cluster Setup](pages/cluster_setup.md): Instructions for setting up a Ray cluster for deployment. ### Integrations - [Integrations](pages/integrations.md): Overview of the available predefined deployments like Whisper, vLLM, Hugging Face Transformers, Haystack etc. diff --git a/docs/pages/cluster_setup.md b/docs/pages/cluster_setup.md new file mode 100644 index 00000000..b0d626ea --- /dev/null +++ b/docs/pages/cluster_setup.md @@ -0,0 +1,319 @@ +# Cluster Setup + +Based on the [documentation](https://docs.ray.io/en/latest/cluster/vms/user-guides/community/index.html#using-a-custom-cloud-or-cluster-manager), Ray supports the following cloud providers out of the box: AWS, Azure, GCP, Aliyun, vSphere, and KubeRay. We can also implement the node provider interface to use Ray on other cloud providers like Oracle Cloud but it requires implementing the node provider manually which is a bit more work. + +Another option is to use [Ray on Vertex AI](https://cloud.google.com/vertex-ai/docs/open-source/ray-on-vertex-ai/overview) which is a managed service that allows you to run Ray on Google Cloud. It allows to setup Ray Cluster without setting up the Kubernetes cluster manually. + +## Aana on Kubernetes + +**Step 1: Create a Kubernetes cluster** + +The first step is to create a Kubernetes cluster on the cloud provider of your choice. Ray has instructions on how to do this for AWS, Azure, and GCP in [Managed Kubernetes services docs](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/k8s-cluster-setup.html). + + +**Step 2: Deploy Ray on Kubernetes** + +Once you have a Kubernetes cluster, you need to install KubeRay on it. KubeRay is a Kubernetes operator that manages Ray clusters on Kubernetes. You can install KubeRay using Helm. Here is an example of how to install KubeRay on a Kubernetes cluster: + +```sh +helm repo add kuberay https://ray-project.github.io/kuberay-helm/ +helm repo update + +# Install both CRDs and KubeRay operator v1.1.1. +helm install kuberay-operator kuberay/kuberay-operator --version 1.1.1 + +# Confirm that the operator is running in the namespace `default`. +kubectl get pods +# NAME READY STATUS RESTARTS AGE +# kuberay-operator-7fbdbf8c89-pt8bk 1/1 Running 0 27s +``` + +KubeRay offers multiple options for operator installations, such as Helm, Kustomize, and a single-namespaced operator. For further information, please refer to [the installation instructions in the KubeRay documentation](https://ray-project.github.io/kuberay/deploy/installation/). + +**Step 3: Create a YAML file for your application** + +Next, you need to create a YAML file that describes your Ray application. See the example below to get an idea of what the YAML file should look like: + +```yaml +apiVersion: ray.io/v1 +kind: RayService +metadata: + name: +spec: + serviceUnhealthySecondThreshold: 900 # Config for the health check threshold for Ray Serve applications. Default value is 900. + deploymentUnhealthySecondThreshold: 900 # Config for the health check threshold for Ray dashboard agent. Default value is 900. + serveConfigV2: | + + + rayClusterConfig: + rayVersion: '2.20.0' # Should match the Ray version in the image of the containers + # Ray head pod template. + headGroupSpec: + # The `rayStartParams` are used to configure the `ray start` command. + # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay. + # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`. + rayStartParams: + dashboard-host: '0.0.0.0' + # Pod template + template: + spec: + containers: + - name: ray-head + image: + ports: + - containerPort: 6379 + name: gcs + - containerPort: 8265 + name: dashboard + - containerPort: 10001 + name: client + - containerPort: 8000 + name: serve + resources: + limits: + cpu: "3" # CPU limit for the head pod + memory: "28G" # Memory limit for the head pod + ephemeral-storage: "95Gi" # Ephemeral storage limit for the head pod + requests: + cpu: "3" # CPU request for the head pod + memory: "28G" # Memory request for the head pod + ephemeral-storage: "95Gi" # Ephemeral storage request for the head pod + workerGroupSpecs: + # The pod replicas in this group typed worker + - replicas: 1 # Number of worker nodes + minReplicas: 1 + maxReplicas: 10 + groupName: gpu-group + rayStartParams: {} + # Pod template + template: + spec: + containers: + - name: ray-worker + image: + resources: + limits: + cpu: "3" # CPU limit for the worker pod + memory: "28G" # Memory limit for the worker pod + ephemeral-storage: "95Gi" # Ephemeral storage limit for the worker pod + requests: + cpu: "3" # CPU request for the worker pod + memory: "28G" # Memory request for the worker pod + ephemeral-storage: "95Gi" # Ephemeral storage request for the worker pod + # Please add the following taints to the GPU node. + tolerations: + - key: "ray.io/node-type" + operator: "Equal" + value: "worker" + effect: "NoSchedule" +``` + +`serveConfigV2` can be generated by the `aana build` command. It contains the configuration for the Ray Serve applications. + +The full file will look like this: + +```yaml +apiVersion: ray.io/v1 +kind: RayService +metadata: + name: aana-sdk +spec: + serviceUnhealthySecondThreshold: 900 # Config for the health check threshold for Ray Serve applications. Default value is 900. + deploymentUnhealthySecondThreshold: 900 # Config for the health check threshold for Ray dashboard agent. Default value is 900. + serveConfigV2: | + applications: + + - name: asr_deployment + + route_prefix: /asr_deployment + + import_path: test_project.app_config:asr_deployment + + runtime_env: + working_dir: "https://mobius-public.s3.eu-west-1.amazonaws.com/test_project.zip" + env_vars: + DB_CONFIG: '{"datastore_type": "sqlite", "datastore_config": {"path": "/tmp/aana_db.sqlite"}}' + + deployments: + + - name: WhisperDeployment + num_replicas: 1 + max_ongoing_requests: 1000 + user_config: + model_size: tiny + compute_type: float32 + ray_actor_options: + num_cpus: 1.0 + + - name: vad_deployment + + route_prefix: /vad_deployment + + import_path: test_project.app_config:vad_deployment + + runtime_env: + working_dir: "https://mobius-public.s3.eu-west-1.amazonaws.com/test_project.zip" + env_vars: + DB_CONFIG: '{"datastore_type": "sqlite", "datastore_config": {"path": "/tmp/aana_db.sqlite"}}' + + deployments: + + - name: VadDeployment + num_replicas: 1 + max_ongoing_requests: 1000 + user_config: + model: https://whisperx.s3.eu-west-2.amazonaws.com/model_weights/segmentation/0b5b3216d60a2d32fc086b47ea8c67589aaeb26b7e07fcbe620d6d0b83e209ea/pytorch_model.bin + onset: 0.5 + offset: 0.363 + min_duration_on: 0.1 + min_duration_off: 0.1 + sample_rate: 16000 + ray_actor_options: + num_cpus: 1.0 + + - name: whisper_app + + route_prefix: / + + import_path: test_project.app_config:whisper_app + + runtime_env: + working_dir: "https://mobius-public.s3.eu-west-1.amazonaws.com/test_project.zip" + env_vars: + DB_CONFIG: '{"datastore_type": "sqlite", "datastore_config": {"path": "/tmp/aana_db.sqlite"}}' + + deployments: + + - name: RequestHandler + num_replicas: 2 + ray_actor_options: + num_cpus: 0.1 + + + rayClusterConfig: + rayVersion: '2.20.0' # Should match the Ray version in the image of the containers + ######################headGroupSpecs################################# + # Ray head pod template. + headGroupSpec: + # The `rayStartParams` are used to configure the `ray start` command. + # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay. + # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`. + rayStartParams: + dashboard-host: '0.0.0.0' + # Pod template + template: + spec: + containers: + - name: ray-head + image: europe-docker.pkg.dev/customised-training-app/eu.gcr.io/aana/aana:0.2-ray-2.20@sha256:8814a3c12c6249a3c2bb216c0cba6eef01267d4c91bb58700f7ffc2311d21a3d + ports: + - containerPort: 6379 + name: gcs + - containerPort: 8265 + name: dashboard + - containerPort: 10001 + name: client + - containerPort: 8000 + name: serve + resources: + limits: + cpu: "3" + memory: "28G" + ephemeral-storage: "95Gi" + requests: + cpu: "3" + memory: "28G" + ephemeral-storage: "95Gi" + workerGroupSpecs: + # The pod replicas in this group typed worker + - replicas: 1 + minReplicas: 1 + maxReplicas: 10 + groupName: gpu-group + rayStartParams: {} + # Pod template + template: + spec: + containers: + - name: ray-worker + image: europe-docker.pkg.dev/customised-training-app/eu.gcr.io/aana/aana:0.2-ray-2.20@sha256:8814a3c12c6249a3c2bb216c0cba6eef01267d4c91bb58700f7ffc2311d21a3d + resources: + limits: + cpu: "3" + memory: "28G" + ephemeral-storage: "95Gi" + requests: + cpu: "3" + memory: "28G" + ephemeral-storage: "95Gi" + # Please add the following taints to the GPU node. + tolerations: + - key: "ray.io/node-type" + operator: "Equal" + value: "worker" + effect: "NoSchedule" +``` + +Let's take a look at a few critical sections of the YAML file: + +runtime_env: This section specifies the runtime environment for the application. It includes the working directory, environment variables, and potentially python packages that need to be installed. + +The working directory should be a URL pointing to a zip file containing the application code. It is possible to include the working directory directly in the docker image, but this is not recommended as it makes it harder to update the application code. See the [Remote URIs docs](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#remote-uris) for more information. + +The environment variables are passed to the application as a dictionary. In this example, we are passing a configuration for a SQLite database. + +You can also specify additional python dependencies using keys like `py_modules`, `pip`, `conda`. For more information, see the [docs about handling dependencies](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#api-reference). + +You can also change the deployment parameters if needed. You can specify the number of replicas for each deployment or even change the model parameters. + +Another important section is the base image for the application. Usually, you can use a pre-built image from the [ray project](https://hub.docker.com/r/rayproject/ray). However, Aana requires some additional dependencies to be installed. It also makes sense to include Aana and all other Python dependencies in the image. + +Here is an example of a Dockerfile that includes Aana and ray: + +```Dockerfile +FROM rayproject/ray:2.20.0.0ae93f-py310 +RUN sudo apt-get update && sudo apt-get install -y libgl1 libglib2.0-0 ffmpeg +RUN pip install https://test-files.pythonhosted.org/packages/2e/e7/822893595c45f91acec902612c458fec9ed2684567dcd57bd3ba1770f2ed/aana-0.2.0-py3-none-any.whl +RUN pip install ray[serve]==2.20 +``` + +Keep in mind that this image does not have GPU support. If you need GPU support, choose a different base image from the [ray project](https://hub.docker.com/r/rayproject/ray). + +Ideally, we should build a few base images for Aana so they can be used directly in the YAML file without any additional build steps and pushing to the registry. + +In the example, we are using Artifact Registry from Google Cloud. You can use any other registry like Docker Hub, GitHub Container Registry, or any other registry that supports Docker images. + +Another thing that also needs adjustment is the resource limits and requests. You can adjust them based on your application requirements. But keep in mind that the ephemeral storage needs to be set to a reasonably high value otherwise the application will not deploy. + +**Step 4: Deploy the application** + +After creating the YAML file, you can deploy the application to the Kubernetes cluster using the following command: + +```sh +kubectl apply -f .yaml +``` + +This will create the necessary resources in the Kubernetes cluster to run your Ray application. + +You can also use the same command to update the application if you make changes to the YAML file. For example, if you want to scale the number of replicas for an ASR deployment, you can set `num_replicas: 2` in the WhisperDeployment section and then run `kubectl apply -f .yaml` again and kubernetes will start another replica of the ASR deployment. + +**Step 5: Monitor the application** + +To access the Ray dashboard, you can use port forwarding to access it locally: + +```sh +kubectl port-forward service/aana-sdk-head-svc 8265:8265 8000:8000 +``` + +This will forward ports 8265 and 8000 from the Ray head pod to your local machine. You can then access the Ray dashboard by opening a browser and going to `http://localhost:8265`. The application will be available at `http://localhost:8000`. The documentation will be available at `http://localhost:8000/docs` and `http://localhost:8000/redoc`. + + +## Things to Consider + +### Shared storage + +The application stores some files on the local disk that will not be accessible from other nodes in the cluster. This can be a problem if the application is deployed on a multi-node cluster. The solution would be to use a shared storage like NFS. This is a recommendation from the [Ray documentation](https://docs.ray.io/en/master/cluster/kubernetes/user-guides/storage.html#best-practices-for-storage-and-dependencies). GKE has [Filestore](https://cloud.google.com/filestore/docs/multishares) that can be used as a shared storage. + +### Database + +By default Aana SDK uses SQLite as a database. For cluster deployments, it's recommended to use a more robust database like PostgreSQL. You can use a managed database service like Cloud SQL on GCP or RDS on AWS. From 1a7e90fa483cf5e00d16ea3d4e1a363c0db62129 Mon Sep 17 00:00:00 2001 From: Aleksandr Movchan Date: Fri, 12 Jul 2024 10:26:01 +0000 Subject: [PATCH 7/7] Ray dashboard intro --- README.md | 2 ++ docs/pages/tutorial.md | 2 ++ 2 files changed, 4 insertions(+) diff --git a/README.md b/README.md index 68dbd0a7..fda3e97b 100644 --- a/README.md +++ b/README.md @@ -181,6 +181,8 @@ You have a few options to run the application: Once the application is running, you will see the message `Deployed successfully.` in the logs. You can now send a request to the application to transcribe a video. +To get an overview of the Ray cluster, you can use the Ray Dashboard. The Ray Dashboard is available at `http://127.0.0.1:8265` by default. You can see the status of the Ray cluster, the resources used, running applications and deployments, logs, and more. It is a useful tool for monitoring and debugging your applications. See [Ray Dashboard documentation](https://docs.ray.io/en/latest/ray-observability/getting-started.html) for more information. + Let's transcribe [Gordon Ramsay's perfect scrambled eggs tutorial](https://www.youtube.com/watch?v=VhJFyyukAzA) using the application. ```bash diff --git a/docs/pages/tutorial.md b/docs/pages/tutorial.md index 185781be..33ae2181 100644 --- a/docs/pages/tutorial.md +++ b/docs/pages/tutorial.md @@ -109,6 +109,8 @@ You have a few options to run the application: Once the application is running, you will see the message `Deployed successfully.` in the logs. You can now send a request to the application to transcribe a video. +To get an overview of the Ray cluster, you can use the Ray Dashboard. The Ray Dashboard is available at `http://127.0.0.1:8265` by default. You can see the status of the Ray cluster, the resources used, running applications and deployments, logs, and more. It is a useful tool for monitoring and debugging your applications. See [Ray Dashboard documentation](https://docs.ray.io/en/latest/ray-observability/getting-started.html) for more information. + Let's transcribe [Gordon Ramsay's perfect scrambled eggs tutorial](https://www.youtube.com/watch?v=VhJFyyukAzA) using the application. ```bash