Skip to content

Commit

Permalink
Merge branch 'main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
vincelwt authored Mar 21, 2024
2 parents 1cbfd31 + bcd6203 commit 29e8c14
Show file tree
Hide file tree
Showing 68 changed files with 2,239 additions and 765 deletions.
2 changes: 2 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ jobs:
pip install "apscheduler==3.10.4"
pip install "PyGithub==1.59.1"
pip install argon2-cffi
pip install "pytest-mock==3.12.0"
pip install python-multipart
- save_cache:
paths:
Expand Down Expand Up @@ -148,6 +149,7 @@ jobs:
python -m pip install --upgrade pip
python -m pip install -r .circleci/requirements.txt
pip install "pytest==7.3.1"
pip install "pytest-mock==3.12.0"
pip install "pytest-asyncio==0.21.1"
pip install mypy
pip install "google-generativeai>=0.3.2"
Expand Down
5 changes: 5 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,11 @@ RUN pip wheel --no-cache-dir --wheel-dir=/wheels/ -r requirements.txt
# install semantic-cache [Experimental]- we need this here and not in requirements.txt because redisvl pins to pydantic 1.0
RUN pip install redisvl==0.0.7 --no-deps

# ensure pyjwt is used, not jwt
RUN pip uninstall jwt -y
RUN pip uninstall PyJWT -y
RUN pip install PyJWT --no-cache-dir

# Build Admin UI
RUN chmod +x build_admin_ui.sh && ./build_admin_ui.sh

Expand Down
5 changes: 5 additions & 0 deletions Dockerfile.database
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,11 @@ RUN pip install *.whl /wheels/* --no-index --find-links=/wheels/ && rm -f *.whl
# install semantic-cache [Experimental]- we need this here and not in requirements.txt because redisvl pins to pydantic 1.0
RUN pip install redisvl==0.0.7 --no-deps

# ensure pyjwt is used, not jwt
RUN pip uninstall jwt -y
RUN pip uninstall PyJWT -y
RUN pip install PyJWT --no-cache-dir

# Build Admin UI
RUN chmod +x build_admin_ui.sh && ./build_admin_ui.sh

Expand Down
75 changes: 74 additions & 1 deletion docs/my-website/docs/providers/openai_compatible.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# OpenAI-Compatible Endpoints

To call models hosted behind an openai proxy, make 2 changes:
Expand Down Expand Up @@ -39,4 +42,74 @@ response = litellm.embedding(
input=["good morning from litellm"]
)
print(response)
```
```



## Usage with LiteLLM Proxy Server

Here's how to call an OpenAI-Compatible Endpoint with the LiteLLM Proxy Server

1. Modify the config.yaml

```yaml
model_list:
- model_name: my-model
litellm_params:
model: openai/<your-model-name> # add openai/ prefix to route as OpenAI provider
api_base: <model-api-base> # add api base for OpenAI compatible provider
api_key: api-key # api key to send your model
```
2. Start the proxy
```bash
$ litellm --config /path/to/config.yaml
```

3. Send Request to LiteLLM Proxy Server

<Tabs>

<TabItem value="openai" label="OpenAI Python v1.0.0+">

```python
import openai
client = openai.OpenAI(
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)
response = client.chat.completions.create(
model="my-model",
messages = [
{
"role": "user",
"content": "what llm are you"
}
],
)
print(response)
```
</TabItem>

<TabItem value="curl" label="curl">

```shell
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "my-model",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}'
```
</TabItem>

</Tabs>
119 changes: 83 additions & 36 deletions docs/my-website/docs/providers/vertex.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,58 +23,105 @@ litellm.vertex_location = "us-central1" # proj location
response = litellm.completion(model="gemini-pro", messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}])
```

## OpenAI Proxy Usage
## Usage with LiteLLM Proxy Server

Here's how to use Vertex AI with the LiteLLM Proxy Server

1. Modify the config.yaml

<Tabs>
<Tabs>

<TabItem value="completion_param" label="Different location per model">

Use this when you need to set a different location for each vertex model

```yaml
model_list:
- model_name: gemini-vision
litellm_params:
model: vertex_ai/gemini-1.0-pro-vision-001
vertex_project: "project-id"
vertex_location: "us-central1"
- model_name: gemini-vision
litellm_params:
model: vertex_ai/gemini-1.0-pro-vision-001
vertex_project: "project-id2"
vertex_location: "us-east"
```
<TabItem value="completion_param" label="Different location per model">

</TabItem>
Use this when you need to set a different location for each vertex model

<TabItem value="litellm_param" label="One location all vertex models">
```yaml
model_list:
- model_name: gemini-vision
litellm_params:
model: vertex_ai/gemini-1.0-pro-vision-001
vertex_project: "project-id"
vertex_location: "us-central1"
- model_name: gemini-vision
litellm_params:
model: vertex_ai/gemini-1.0-pro-vision-001
vertex_project: "project-id2"
vertex_location: "us-east"
```
Use this when you have one vertex location for all models
</TabItem>
```yaml
litellm_settings:
vertex_project: "hardy-device-38811" # Your Project ID
vertex_location: "us-central1" # proj location
<TabItem value="litellm_param" label="One location all vertex models">
model_list:
-model_name: team1-gemini-pro
litellm_params:
model: gemini-pro
```
Use this when you have one vertex location for all models
</TabItem>
```yaml
litellm_settings:
vertex_project: "hardy-device-38811" # Your Project ID
vertex_location: "us-central1" # proj location

</Tabs>
model_list:
-model_name: team1-gemini-pro
litellm_params:
model: gemini-pro
```
</TabItem>
</Tabs>
2. Start the proxy
```bash
$ litellm --config /path/to/config.yaml
```
```bash
$ litellm --config /path/to/config.yaml
```

3. Send Request to LiteLLM Proxy Server

<Tabs>

<TabItem value="openai" label="OpenAI Python v1.0.0+">

```python
import openai
client = openai.OpenAI(
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)
response = client.chat.completions.create(
model="team1-gemini-pro",
messages = [
{
"role": "user",
"content": "what llm are you"
}
],
)
print(response)
```
</TabItem>

<TabItem value="curl" label="curl">

```shell
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "team1-gemini-pro",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}'
```
</TabItem>

</Tabs>

## Set Vertex Project & Vertex Location
All calls using Vertex AI require the following parameters:
Expand Down
29 changes: 29 additions & 0 deletions docs/my-website/docs/proxy/caching.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,35 @@ curl --location 'http://0.0.0.0:4000/embeddings' \
</TabItem>
</Tabs>
## Debugging Caching - `/cache/ping`
LiteLLM Proxy exposes a `/cache/ping` endpoint to test if the cache is working as expected
**Usage**
```shell
curl --location 'http://0.0.0.0:4000/cache/ping' -H "Authorization: Bearer sk-1234"
```
**Expected Response - when cache healthy**
```shell
{
"status": "healthy",
"cache_type": "redis",
"ping_response": true,
"set_cache_response": "success",
"litellm_cache_params": {
"supported_call_types": "['completion', 'acompletion', 'embedding', 'aembedding', 'atranscription', 'transcription']",
"type": "redis",
"namespace": "None"
},
"redis_cache_params": {
"redis_client": "Redis<ConnectionPool<Connection<host=redis-16337.c322.us-east-1-2.ec2.cloud.redislabs.com,port=16337,db=0>>>",
"redis_kwargs": "{'url': 'redis://:******@redis-16337.c322.us-east-1-2.ec2.cloud.redislabs.com:16337'}",
"async_redis_conn_pool": "BlockingConnectionPool<Connection<host=redis-16337.c322.us-east-1-2.ec2.cloud.redislabs.com,port=16337,db=0>>",
"redis_version": "7.2.0"
}
}
```
## Advanced
### Set Cache Params on config.yaml
```yaml
Expand Down
26 changes: 7 additions & 19 deletions docs/my-website/docs/proxy/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,10 @@ $ litellm --config /path/to/config.yaml

## Load Balancing

:::info
For more on this, go to [this page](./load_balancing.md)
:::

Use this to call multiple instances of the same model and configure things like [routing strategy](../routing.md#advanced).

For optimal performance:
Expand Down Expand Up @@ -306,25 +310,6 @@ router_settings: # router_settings are optional
redis_port: 1992
```
## Set Azure `base_model` for cost tracking

**Problem**: Azure returns `gpt-4` in the response when `azure/gpt-4-1106-preview` is used. This leads to inaccurate cost tracking

**Solution** ✅ : Set `base_model` on your config so litellm uses the correct model for calculating azure cost

Example config with `base_model`
```yaml
model_list:
- model_name: azure-gpt-3.5
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-07-01-preview"
model_info:
base_model: azure/gpt-4-1106-preview
```

You can view your cost once you set up [Virtual keys](https://docs.litellm.ai/docs/proxy/virtual_keys) or [custom_callbacks](https://docs.litellm.ai/docs/proxy/logging)
## Load API Keys
Expand Down Expand Up @@ -605,6 +590,9 @@ general_settings:
"litellm_settings": {}, # ALL (https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py)
"general_settings": {
"completion_model": "string",
"disable_spend_logs": "boolean", # turn off writing each transaction to the db
"disable_reset_budget": "boolean", # turn off reset budget scheduled task
"enable_jwt_auth": "boolean", # allow proxy admin to auth in via jwt tokens with 'litellm_proxy_admin' in claims
"key_management_system": "google_kms", # either google_kms or azure_kms
"master_key": "string",
"database_url": "string",
Expand Down
21 changes: 21 additions & 0 deletions docs/my-website/docs/proxy/cost_tracking.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,25 @@ model_list:
base_model: dall-e-3 # 👈 set dall-e-3 as base model
model_info:
mode: image_generation
```
## Chat Completions / Embeddings
**Problem**: Azure returns `gpt-4` in the response when `azure/gpt-4-1106-preview` is used. This leads to inaccurate cost tracking

**Solution** ✅ : Set `base_model` on your config so litellm uses the correct model for calculating azure cost

Get the base model name from [here](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)

Example config with `base_model`
```yaml
model_list:
- model_name: azure-gpt-3.5
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-07-01-preview"
model_info:
base_model: azure/gpt-4-1106-preview
```
Loading

0 comments on commit 29e8c14

Please sign in to comment.