Skip to content

01. Getting Started

Brian edited this page Dec 5, 2024 · 5 revisions

Prerequisites

To get started, make sure you have the following installed on your system:

  • Python 3.x (preferably 3.11) with pip

    • Do NOT install python from the Microsoft store! This will cause issues with pip.
    • Alternatively, you can use miniconda if it's present on your system.

Note

You can install miniconda3 on your system which will give you the benefit of having both python and conda!

Warning

CUDA and ROCm aren't prerequisites because torch can install them for you. However, if this doesn't work (ex. DLL load failed), install the CUDA toolkit or ROCm on your system.

Warning

Sometimes there may be an error with Windows that VS build tools needs to be installed. This means that there's a package that isn't supported for your python version. You can install VS build tools 17.8 and build the wheel locally. In addition, open an issue stating that a dependency is building a wheel.

Installing

For Beginners

  1. Clone this repository to your machine: git clone https://github.com/theroyallab/tabbyAPI

  2. Navigate to the project directory: cd tabbyAPI

  3. Run the appropriate start script (start.bat for Windows and start.sh for linux).

    1. Follow the on-screen instructions and select the correct GPU library.
    2. Assuming that the prerequisites are installed and can be located, a virtual environment will be created for you and dependencies will be installed.
  4. The API should start with no model loaded

For Advanced Users

Note

TabbyAPI has recently switched to use pyproject.toml. These instructions may look different than before.

  1. Follow steps 1-2 in the For Beginners section
  2. Create a python environment through venv:
    1. python -m venv venv
    2. Activate the venv
      1. On Windows: .\venv\Scripts\activate
      2. On Linux: source venv/bin/activate
  3. Install the pyproject features based on your system:
    1. Cuda 12.x: pip install -U .[cu121]
    2. ROCm 5.6: pip install -U .[amd]
  4. Start the API by either
    1. Run start.bat/sh. The script will check if you're in a conda environment and skip venv checks.
    2. Run python main.py to start the API. This won't automatically upgrade your dependencies.

Configuration

Loading solely the API may not be your optimal usecase. Therefore, a config.yml exists to tune initial launch parameters and other configuration options.

A config.yml file is required for overriding project defaults. If you are okay with the defaults, you don't need a config file!

If you do want a config file, copy over config_sample.yml to config.yml. All the fields are commented, so make sure to read the descriptions and comment out or remove fields that you don't need.

In addition, if you want to manually set the API keys, copy over api_keys_sample.yml to api_keys.yml and fill in the fields. However, doing this is less secure and autogenerated keys should be used instead.

You can also access the configuration parameters under 2. Configuration in this wiki!

Where next?

  1. Take a look at the usage docs
  2. Get started with community projects: Find loaders, UIs, and more created by the wider AI community. Any OAI compatible client is also supported.

Updating

There are a couple ways to update TabbyAPI:

  1. Update scripts - Inside the update_scripts folder, you can run the following scripts:
    1. update_deps: Updates dependencies to their latest versions.
    2. update_deps_and_pull: Updates dependencies and pulls the latest commit of the Github repository.

These scripts exit after running their respective tasks. To start TabbyAPI, run start.bat or start.sh.

  1. Manual - Install the pyproject features and update dependencies depending on your GPU:
    1. pip install -U .[cu121] = CUDA 12.x
    2. pip install -U .[amd] = ROCm 6.0

If you don't want to update dependencies that come from wheels (torch, exllamav2, and flash attention 2), use pip install . or pass the --nowheel flag when invoking the start scripts.

Update Exllamav2

Warning

These instructions are meant for advanced users.

Important

If you're installing a custom Exllamav2 wheel, make sure to use pip install . when updating! Otherwise, each update will overwrite your custom exllamav2 version.

NOTE:

  • TabbyAPI enforces the latest Exllamav2 version for compatibility purposes.
  • Any upgrades using a pyproject gpu lib feature will result in overwriting your installed wheel.
    • To fix this, change the feature in pyproject.toml locally, create an issue or PR, or install your version of exllamav2 after upgrades.

Here are ways to install exllamav2:

  1. From a wheel/release (Recommended)
    1. Find the version that corresponds with your cuda and python version. For example, a wheel with cu121 and cp311 corresponds to CUDA 12.1 and python 3.11
  2. From pip: pip install exllamav2
    1. This is a JIT compiled extension, which means that the initial launch of tabbyAPI will take some time. The build may also not work due to improper environment configuration.
  3. From source

Other installation methods

These are short-form instructions for other methods that users can use to install TabbyAPI.

Warning

Using methods other than venv may not play nice with startup scripts. Using these methods indicates that you're an advanced user and know what you're doing.

Conda

  1. Install Miniconda3 with python 3.11 as your base python
  2. Create a new conda environment conda create -n tabbyAPI python=3.11
  3. Activate the conda environment conda activate tabbyAPI
  4. Install optional dependencies if they aren't present
    1. CUDA via
      1. CUDA 12 - conda install -c "nvidia/label/cuda-12.2.2" cuda
    2. Git via conda install -k git
  5. Clone TabbyAPI via git clone https://github.com/theroyallab/tabbyAPI
  6. Continue installation steps from:
    1. For Beginners - Step 3. The start scripts detect if you're in a conda environment and skips the venv check.
    2. For Advanced Users - Step 3

Docker

  1. Install Docker and docker compose from the [docs](https://docs.docker.com/compose/install/
  2. Install the Nvidia container compatibility layer
    1. For Linux: Nvidia container toolkit
    2. For Windows: Cuda Toolkit on WSL
  3. Clone TabbyAPI via git clone https://github.com/theroyallab/tabbyAPI
  4. Enter the tabbyAPI directory by cd tabbyAPI.
    1. Optional: Set up a config.yml or api_tokens.yml (configuration)
  5. Update the volume mount section in the docker/docker-compose.yml file
volumes:
  # - /path/to/models:/app/models                       # Change me
  # - /path/to/config.yml:/app/config.yml               # Change me
  # - /path/to/api_tokens.yml:/app/api_tokens.yml       # Change me
  1. Optional: If you'd like to build the dockerfile from source, follow the instructions below in docker/docker-compose.yml:
    # Uncomment this to build a docker image from source
    #build:
    #  context: ..
    #  dockerfile: ./docker/Dockerfile

    # Comment this to build a docker image from source
    image: ghcr.io/theroyallab/tabbyapi:latest
  1. Run docker compose -f docker/docker-compose.yml up to build the dockerfile and start the server.