Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(genapi): getting ready for public beta #3844

Closed
wants to merge 68 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
fd154c8
feat(genapi): getting ready for public beta
tgenaitay Oct 16, 2024
c1263e8
Apply suggestions from code review
bene2k1 Oct 16, 2024
99a7f10
feat(ai): reduced due to sliding attention window
tgenaitay Oct 30, 2024
a701da2
fix(aps): update quotas (#3847)
bene2k1 Oct 16, 2024
2bf08d6
chore(dbaas): review docs 2024-10-16 (#3842)
bene2k1 Oct 17, 2024
48c7c98
doc(tem): indicates encryption is mandatory (#3840)
cbouvat Oct 17, 2024
b4285db
fix(ddx): fix migration macro (#3843)
bene2k1 Oct 17, 2024
1ee06d5
feat(tem): update quota information (#3848)
bene2k1 Oct 17, 2024
19c3a8b
chore(gen): review documentation MTA-5152 (#3837)
SamyOubouaziz Oct 17, 2024
e98228d
feat(k8s): add information about routed ip migration completed (#3849)
bene2k1 Oct 17, 2024
c031a64
feat(ins): update create instance (#3850)
bene2k1 Oct 17, 2024
f8af361
fix(tem): fix missing closing tag (#3851)
bene2k1 Oct 17, 2024
9189c88
chore(gen): review iam and labs 2024-10-16 (#3845)
bene2k1 Oct 17, 2024
f7ee3a3
feat(changelog): transactional-email-changed-increase-in-minimum-quot…
ofranc Oct 18, 2024
caa3b33
fix(tem): fix wrong validation date (#3852)
bene2k1 Oct 18, 2024
eaa21f7
fix: clarify partition labels (#3855)
pierreozoux Oct 18, 2024
b03b49c
fix(ins): fix create doc for windows and typos (#3854)
bene2k1 Oct 21, 2024
9ccb191
feat: add warning about limitations of kosmos upgrade (#3857)
pierreozoux Oct 21, 2024
8bc10ce
chore(tuto): review tutorials (#3860)
bene2k1 Oct 21, 2024
0f6f8eb
chore(ddx): review docs compute (#3859)
bene2k1 Oct 21, 2024
c0cb1ba
docs(serverless): add entries in serverless FAQ (#3858)
Bemilie Oct 21, 2024
fafe37f
fix(pgw): add more troubleshooting (#3862)
RoRoJ Oct 22, 2024
21a256b
fix(vpc): review doc (#3861)
RoRoJ Oct 22, 2024
00d36e7
fix(ddx): update ip addresses of ntp servers (#3863)
bene2k1 Oct 22, 2024
e43cbce
docs(SRV): add note on external container registries MTA-5165 (#3866)
SamyOubouaziz Oct 23, 2024
85220f1
feat(changelog): elastic-metal-added-disk-partitioning-configuration-…
ofranc Oct 23, 2024
d497214
docs(cpt): add gpu instanceXCockpit tuto (#3865)
nerda-codes Oct 23, 2024
9fbab81
fix(inference): changed to full names everywhere (#3871)
tgenaitay Oct 23, 2024
37a7390
fix(bil): added guideflow (#3864)
jsudraud Oct 23, 2024
df41c7b
fix(edge): fix duplicate anchors (#3869)
RoRoJ Oct 23, 2024
eb7f7ef
fix(pgw): note recommended solution (#3867)
RoRoJ Oct 23, 2024
fa7ed16
feat(wbh): add low level doc (#3783)
bene2k1 Oct 23, 2024
f1e7375
chore(GEN): documentation review MTA-5176 (#3870)
SamyOubouaziz Oct 23, 2024
4dae4ed
feat(wbh): add tip to howto pages (#3876)
bene2k1 Oct 24, 2024
75f3e60
docs(cpt): mention grafana agent deprecation (#3872)
nerda-codes Oct 24, 2024
39e9b7b
feat(IAM): Add undocumented permission sets (#3878)
crlptl Oct 24, 2024
60f5e99
fix(rdb): reviews 21/10 (#3879)
ldecarvalho-doc Oct 24, 2024
be0a7b6
feat(aps): add troubleshooting apple id (#3877)
bene2k1 Oct 24, 2024
a82ffe1
fix(wbh): fix meta information (#3881)
bene2k1 Oct 24, 2024
b1834d2
feat(wbh): add php version overview (#3883)
bene2k1 Oct 24, 2024
e7dcafa
feat(k8s): add modifying kernel documentation (#3882)
bene2k1 Oct 24, 2024
71c1796
docs(cpt): add tags and category (#3880)
nerda-codes Oct 24, 2024
1bcafaf
fix(edge): correct error troubleshooting (#3884)
RoRoJ Oct 24, 2024
4a6171e
Update index.mdx (#3885)
fpagny Oct 25, 2024
1a885bc
fix(wbh): fix date from future (#3887)
bene2k1 Oct 25, 2024
4cdfb31
Update index.mdx (#3888)
fpagny Oct 25, 2024
f6f3c66
feat(ai): bring support for function calling via tools and tools_choi…
tgenaitay Oct 25, 2024
ad89918
feat(changelog): add new entry (#3892)
ofranc Oct 25, 2024
2b808c2
fix(docs): update ipfs-pinning quickstart (#3894)
christian-vdz Oct 28, 2024
5407ee4
docs(srv): add notes and troubleshooting on encoding secrets and env …
SamyOubouaziz Oct 28, 2024
7f44991
fix(pgw): remove duplicate H2 (#3897)
RoRoJ Oct 28, 2024
cccdb1c
fix(ai): add missing link (#3899)
bene2k1 Oct 28, 2024
ad62179
fix(serverless): faq title level (#3901)
thomas-tacquet Oct 29, 2024
148b1bf
docs(srv): add SEM x Jobs documentation MTA-5199 (#3898)
SamyOubouaziz Oct 29, 2024
b9bd069
feat(serverless): fix somesyntax highlighting (#3895)
thomas-tacquet Oct 29, 2024
38468c2
feat(serverless): advanced cold starts doc (#3891)
thomas-tacquet Oct 29, 2024
1e81d59
docs(S3): update mentions of S3 in documentation MTA-5188 (#3874)
SamyOubouaziz Oct 29, 2024
8196ac6
feat(instances): understanding qga (#3896)
bene2k1 Oct 29, 2024
7c996be
chore(tutorials): review (#3902)
bene2k1 Oct 29, 2024
a59253a
chore(domains): weekly review (#3904)
nerda-codes Oct 29, 2024
be58816
fix(ai): fix double link in navigation (#3906)
bene2k1 Oct 29, 2024
8a150a5
fix(s3): fix rebase error (#3905)
SamyOubouaziz Oct 29, 2024
6ef6cff
Update use-structured-outputs.mdx (#3908)
fpagny Oct 29, 2024
a054c23
fix(ai): missing export (#3910)
tgenaitay Oct 29, 2024
18d1af1
fix(datalab): wording overview page (#3911)
jsudraud Oct 29, 2024
f8644a9
fix(k8s): remove doc temprary (#3912)
bene2k1 Oct 29, 2024
8b839ba
fix(ins): add posted date (#3909)
bene2k1 Oct 30, 2024
d7e9af3
feat(genapi): introducing vision models
tgenaitay Oct 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions ai-data/generative-apis/api-cli/understanding-errors.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ Below are usual HTTP error codes:
- 404 - **Route Not Found**: The requested resource could not be found. Check your request is being made to the correct endpoint.
- 422 - **Model Not Found**: The `model` key is present in the request payload, but the corresponding model is not found.
- 422 - **Missing Model**: The `model` key is missing from the request payload.
- 429 - **Too Many Requests**: You are exceeding your current quota for the requested model, calculated in requests per minute. Find rate limits on [this page](/ai-data/generative-apis/reference-content/rate-limits/)
- 429 - **Too Many Tokens**: You are exceeding your current quota for the requested model, calculated in tokens per minute. Find rate limits on [this page](/ai-data/generative-apis/reference-content/rate-limits/)
- 500 - **API error**: An unexpected internal error has occurred within Scaleway's systems. If the issue persists, please [open a support ticket](https://console.scaleway.com/support/tickets/create).

For streaming responses via SSE, 5xx errors may occur after a 200 response has been returned.
14 changes: 8 additions & 6 deletions ai-data/generative-apis/api-cli/using-chat-api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -68,23 +68,25 @@ Our chat API is OpenAI compatible. Use OpenAI’s [API reference](https://platfo
- max_tokens
- stream
- presence_penalty
- response_format
- [response_format](/ai-data/generative-apis/how-to/use-structured-outputs)
- logprobs
- stop
- seed
- [tools](/ai-data/generative-apis/how-to/use-function-calling)
- [tool_choice](/ai-data/generative-apis/how-to/use-function-calling)

### Unsupported parameters

- frequency_penalty
- n
- top_logprobs
- tools
- tool_choice
- logit_bias
- user

If you have a use case requiring one of these unsupported parameters, please [contact us via Slack](https://slack.scaleway.com/) on #ai channel.

<Message type="note">
Go further with [Python code examples](/ai-data/generative-apis/how-to/query-text-models/#querying-text-models-via-api) to query text models using Scaleway's Chat API.
</Message>
## Going further

1. [Python code examples](/ai-data/generative-apis/how-to/query-text-models/#querying-text-models-via-api) to query text models using Scaleway's Chat API.
2. [How to use structured outputs](/ai-data/generative-apis/how-to/use-structured-outputs) with the `response_format` parameter
3. [How to use function calling](/ai-data/generative-apis/how-to/use-function-calling) with `tools` and `tool_choice`
4 changes: 4 additions & 0 deletions ai-data/generative-apis/concepts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ API rate limits define the maximum number of requests a user can make to the Gen

A context window is the maximum amount of prompt data considered by the model to generate a response. Using models with high context length, you can provide more information to generate relevant responses. The context is measured in tokens.

## Function calling

Function calling allows a large language model (LLM) to interact with external tools or APIs, executing specific tasks based on user requests. The LLM identifies the appropriate function, extracts the required parameters, and returns the results as structured data, typically in JSON format.

## Embeddings

Embeddings are numerical representations of text data that capture semantic information in a dense vector format. In Generative APIs, embeddings are essential for tasks such as similarity matching, clustering, and serving as inputs for downstream models. These vectors enable the model to understand and generate text based on the underlying meaning rather than just the surface-level words.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,35 +1,45 @@
---
meta:
title: How to query text models
description: Learn how to interact with powerful text models using Scaleway's Generative APIs service.
title: How to query language models
description: Learn how to interact with powerful language models using Scaleway's Generative APIs service.
content:
h1: How to query text models
paragraph: Learn how to interact with powerful text models using Scaleway's Generative APIs service.
tags: generative-apis ai-data text-models
h1: How to query language models
paragraph: Learn how to interact with powerful language models using Scaleway's Generative APIs service.
tags: generative-apis ai-data language-models
dates:
validation: 2024-08-28
validation: 2024-09-30
posted: 2024-08-28
---

Scaleway's Generative APIs service allows users to interact with powerful text models hosted on the platform.
Scaleway's Generative APIs service allows users to interact with powerful language models hosted on the platform.

There are several ways to interact with text models:
- The Scaleway [console](https://console.scaleway.com) will soon provide a complete [playground](/ai-data/generative-apis/how-to/query-text-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
- Via the [Chat API](/ai-data/generative-apis/how-to/query-text-models/#querying-text-models-via-api)
There are several ways to interact with language models:
- The Scaleway [console](https://console.scaleway.com) provides complete [playground](/ai-data/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
- Via the [Chat API](/ai-data/generative-apis/how-to/query-language-models/#querying-language-models-via-api)

<Macro id="requirements" />

- Access to this service is restricted while in beta. You can request access to the product by filling out a form on Scaleway's [betas page](https://www.scaleway.com/en/betas/#generative-apis).
- A Scaleway account logged into the [console](https://console.scaleway.com)
- [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization
- A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/) for API authentication
- Python 3.7+ installed on your system

## Accessing the Playground

Scaleway's Playground is in development, stay tuned!
Scaleway provides a web playground for instruct-based models hosted on Generative APIs.

## Querying text models via API
1. Navigate to Generative APIs under the AI section of the [Scaleway console](https://console.scaleway.com/) side menu. The list of models you can query displays.
2. Click the name of the chat model you want to try. Alternatively, click <Icon name="more" /> next to the chat model, and click **Try model** in the menu.

The web playground displays.

## Using the Playground
1. Enter a prompt at the bottom of the page, or use one of the suggested prompts in the conversation area.
2. Edit the hyperparameters listed on the right column, for example the default temperature for more or less randomness on the outputs.
3. Switch model at the top of the page, to observe the capabilities of chat models offered via Generative APIs.
4. Click **View code** to get code snippets configured according to your settings in the playground.

## Querying language models via API

The [Chat API](/ai-data/generative-apis/api-cli/using-chat-api/) is an OpenAI-compatible REST API for generating and manipulating conversations.

Expand Down
238 changes: 238 additions & 0 deletions ai-data/generative-apis/how-to/query-vision-models.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,238 @@
---
meta:
title: How to query vision models
description: Learn how to interact with powerful vision models using Scaleway's Generative APIs service.
content:
h1: How to query vision models
paragraph: Learn how to interact with powerful vision models using Scaleway's Generative APIs service.
tags: generative-apis ai-data vision-models
dates:
validation: 2024-09-30
posted: 2024-09-30
---

Scaleway's Generative APIs service allows users to interact with powerful vision models hosted on the platform.

<Message type="note">
Vision models can understand and analyze images, not generate them.
</Message>

There are several ways to interact with vision models:
- The Scaleway [console](https://console.scaleway.com) provides complete [playground](/ai-data/generative-apis/how-to/query-vision-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
- Via the [Chat API](/ai-data/generative-apis/how-to/query-vision-models/#querying-vision-models-via-api)

<Macro id="requirements" />

- A Scaleway account logged into the [console](https://console.scaleway.com)
- [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization
- A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/) for API authentication
- Python 3.7+ installed on your system

## Accessing the Playground

Scaleway provides a web playground for vision models hosted on Generative APIs.

1. Navigate to Generative APIs under the AI section of the [Scaleway console](https://console.scaleway.com/) side menu. The list of models you can query displays.
2. Click the name of the vision model you want to try. Alternatively, click <Icon name="more" /> next to the vision model, and click **Try model** in the menu.

The web playground displays.

## Using the Playground
1. Upload one or multiple images to the prompt area at the bottom of the page. Enter a prompt, for example, to describe the image(s) you attached.
2. Edit the hyperparameters listed on the right column, for example the default temperature for more or less randomness on the outputs.
3. Switch model at the top of the page, to observe the capabilities of chat and vision models offered via Generative APIs.
4. Click **View code** to get code snippets configured according to your settings in the playground.

## Querying vision models via API

The [Chat API](/ai-data/generative-apis/api-cli/using-chat-api/) is an OpenAI-compatible REST API for generating and manipulating conversations.

You can query the vision models programmatically using your favorite tools or languages.
Vision models take both text and images as inputs.

<Message type="tip">
Unlike traditional language models, vision models will take a content array for the user role, structuring text and images as inputs.
</Message>

In the following example, we will use the OpenAI Python client.

### Installing the OpenAI SDK

Install the OpenAI SDK using pip:

```bash
pip install openai
```

### Initializing the client

Initialize the OpenAI client with your base URL and API key:

```python
from openai import OpenAI

# Initialize the client with your base URL and API key
client = OpenAI(
base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL
api_key="<SCW_SECRET_KEY>" # Your unique API secret key from Scaleway
)
```

### Generating a chat completion

You can now create a chat completion, for example with the `pixtral-12b-2409` model:

```python
# Create a chat completion using the 'pixtral-12b-2409' model
response = client.chat.completions.create(
model="pixtral-12b-2409",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What is this image?"},
{"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}},
] # Vision models will take a content array with text and image_url objects.

}
],
temperature=0.7, # Adjusts creativity
max_tokens=2048, # Limits the length of the output
top_p=0.9 # Controls diversity through nucleus sampling. You usually only need to use temperature.
)

# Print the generated response
print(response.choices[0].message.content)
```

This code sends messages, prompt and image, to the vision model and returns an answer based on your input. The `temperature`, `max_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively.

A conversation style may include a default system prompt. You may set this prompt by setting the first message with the role system. For example:

```python
[
{
"role": "system",
"content": "You are Xavier Niel."
}
]
```

### Passing images to Pixtral

1. **Image URLs**: If the image is available online, you can just include the image URL in your request as demonstrated above. This approach is simple and does not require any encoding.
2. **Base64 encoded**: image Base64 encoding is a standard way to transform binary data, like images, into a text format, making it easier to transmit over the internet.

The following Python code sample shows you how to encode an image in base64 format and pass it to your request payload.

```python
import base64
from io import BytesIO
from PIL import Image

def encode_image(img):
buffered = BytesIO()
img.save(buffered, format="JPEG")
encoded_string = base64.b64encode(buffered.getvalue()).decode("utf-8")
return encoded_string

img = Image.open("path_to_your_image.jpg")
base64_img = encode_image(img)

payload = {
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is this image?"
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_img}"
}
}
]
}
],
... # other parameters
}

```

### Model parameters and their effects

The following parameters will influence the output of the model:

- **`messages`**: A list of message objects that represent the conversation history. Each message should have a `role` (e.g., "system", "user", "assistant") and `content`. The content is an array that can contain text and/or image objects.
- **`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative.
- **`max_tokens`**: The maximum number of tokens (words or parts of words) in the generated output.
- **`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`.
- **`stop`**: A string or list of strings where the model will stop generating further tokens. This is useful for controlling the end of the output.

<Message type="warning">
If you encounter an error such as "Forbidden 403" refer to the [API documentation](/ai-data/generative-apis/api-cli/understanding-errors) for troubleshooting tips.
</Message>

## Streaming

By default, the outputs are returned to the client only after the generation process is complete. However, a common alternative is to stream the results back to the client as they are generated. This is particularly useful in chat applications, where it allows the client to view the results incrementally as each token is produced.
Following is an example using the chat completions API:

```python
from openai import OpenAI

client = OpenAI(
base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL
api_key="<SCW_API_KEY>" # Your unique API key from Scaleway
)
response = client.chat.completions.create(
model="pixtral-12b-2409",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What is this image?"},
{"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}},
]
}],
stream=True,
)

for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
```

## Async

The service also supports asynchronous mode for any chat completion.

```python

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL
api_key="<SCW_API_KEY>" # Your unique API key from Scaleway
)

async def main():
stream = await client.chat.completions.create(
model="pixtral-12b-2409",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What is this image?"},
{"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}},
]
}],
stream=True,
)
async for chunk in stream:
print(chunk.choices[0].delta.content, end="")

asyncio.run(main())
```
Loading
Loading