Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chore/create assistant - enhancements to create assistant for latest architecture and features #63

Merged
merged 23 commits into from
Jul 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
9c61de4
Refactor to remove global variables in chainlit app and t
dividor Jul 4, 2024
fb95805
Moved assistants create into ui folder as they are tightly aligned. A…
dividor Jul 4, 2024
50f0e25
Moved assistants create into ui folder as they are tightly aligned. A…
dividor Jul 4, 2024
e2ae212
Moved assistants create into ui folder as they are tightly aligned. A…
dividor Jul 4, 2024
dd6e43d
Moved assistants create into ui folder as they are tightly aligned. A…
dividor Jul 4, 2024
4fb755a
Removing unmaintained flow version. If we implement this, would likel…
dividor Jul 4, 2024
af8d857
Removing unmaintained flow version. If we implement this, would likel…
dividor Jul 4, 2024
5de78f7
Removing unmaintained flow version. If we implement this, would likel…
dividor Jul 4, 2024
d57386b
Updated assistant create script to automatically upload files to eith…
dividor Jul 5, 2024
02caa31
Updated assistant create script to automatically upload files to eith…
dividor Jul 5, 2024
9cb63de
Updated assistant create script to automatically upload files to eith…
dividor Jul 5, 2024
18802fb
Updated autogen instructions and reverted team files
dividor Jul 5, 2024
c3a676c
Updated assistant create script to automatically upload files to eith…
dividor Jul 5, 2024
86dd752
Relaxing interrogate
dividor Jul 5, 2024
091e235
Relaxing interrogate
dividor Jul 5, 2024
71de607
Relaxing interrogate
dividor Jul 5, 2024
dfcb644
Actions debug for e2e
dividor Jul 5, 2024
22783f4
Fix to e2e tests to account for ui refactor to not use global variables
dividor Jul 5, 2024
b066ee7
Updated prompt to inform assistant of vectore store assets
dividor Jul 5, 2024
1d9a612
Moved groundedness prompt to templates dir
dividor Jul 5, 2024
a685353
Fix for issue found in e2e tests, yay
dividor Jul 5, 2024
ce3bc1b
Fix for issue found in e2e tests, yay
dividor Jul 5, 2024
1d4d6cd
Added vector stroe e2e test
dividor Jul 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/e2e_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ jobs:
RECIPES_MODEL_MAX_TOKENS: ${{ secrets.RECIPES_MODEL_MAX_TOKENS }}

IMAGE_HOST: ${{ secrets.IMAGE_HOST }}
RECIPE_SERVER_API: ${{ secrets.RECIPE_SERVER_API }}
RECIPE_SERVER_API: ${{ secrets.RECIPE_SERVER_API_FROM_GH_HOST }}

CHAINLIT_AUTH_SECRET: ${{ secrets.CHAINLIT_AUTH_SECRET }}
USER_LOGIN: ${{ secrets.USER_LOGIN }}
Expand Down Expand Up @@ -124,7 +124,7 @@ jobs:
#python3 call_assistant.py --chat_history '[{"author": "user","content": "Hi!"}, {"author": "user","content": "What is the total population of Mali"}]'
#python3 call_assistant.py --chat_history '[{"author": "user","content": "plot a line chart of fatalities by month for Chad using HDX data as an image"}]'
# This runs a few, with the script kill, like promptflow, but prints all debug. Good for testing.
# python3 call_assistant_debug.py
python3 call_assistant_debug.py

echo "Starting Promptflow batch run using data.jsonl ..."
pf run create --flow . --data ./data.jsonl --stream --column-mapping query='${data.query}' context='${data.context}' chat_history='${data.chat_history}' --name base_run
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,6 @@ management/work/
server/robocorp/actions_plugins/recipe-server/utils
data
server/fastapi/recipes/
assistants/chat_ui/files/file_search/custom
assistants/chat_ui/files/code_interpreter/custom

2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,5 @@ repos:
rev: 1.7.0
hooks:
- id: interrogate
args: [--fail-under=65, --verbose]
args: [--fail-under=50, --verbose]
exclude: __init__.py
26 changes: 21 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,12 @@ This repo contains a docker-compose environment that will run the following comp
- Assistant Settings - Set to your LLM deployment accordingly
2. `cd data && python3 download_demo_data.py && cd ..`
3. `docker compose up -d --build`
4. Go to [http://localhost:8000/](http://localhost:8000/)
4. `docker compose exec chat python create_update_assistant.py`
5. Update `.env` file and set ASSISTANTS_ID to the value returned from the previous step
6. `docker compose up -d`
7. Go to [http://localhost:8000/](http://localhost:8000/)




## Using Recipes
Expand All @@ -56,9 +61,20 @@ We are in a phase of research to identify and improve recipes, but for now the s

## Additional Features

### Adding your own files for the assistant to analyze

The assistant can be configured to analyze your own files, either in searching them or using them when analyzing data on-the-fly. To add your won files, place them in one of the following folders:

`./assistants/chat_ui/files/file_search/custom` : The assistant will search these files
`./assistants/chat_ui/files/code_interpreter/custom` : The assistant can use these files when generating and running code

Once you have put your files in the above folders, you can update your assistant by running ...

`docker compose exec chat python create_update_assistant.py`

### Analysis on Ingested Data

To run the ingestion module for ingested datasets, so assistants and plugins can analysis data on-the-fly as an experimental feature:
By default, the repo comes with some demo HAPI data. To run the ingestion module for ingested datasets yourself, so assistants and plugins can analysis data on-the-fly as an experimental feature:

1. `docker exec -it haa-ingestion /bin/bash`
2. `python3 ingest.py`
Expand Down Expand Up @@ -171,9 +187,9 @@ To activate:

1. Go to [http://localhost:8091/](http://localhost:8091/)
2. Click on 'Build'
3. Click 'Skills' on left, top right click '...' and import the skill in `./assets`
4. Click 'Agents' on left, top right click '...' and import the skill in `./assets`
5. Click 'Workflows' on left, top right click '...' and import the skill in `./assets`
3. Click 'Skills' on left, top right click '...' and import the skill in `./assistants/recipes_agents/autogen_team/`
4. Click 'Agents' on left, top right click '...' and import the agent in `assistants/recipes_agents/autogen_team/`
5. Click 'Workflows' on left, top right click '...' and import the workflow in `assistants/recipes_agents/autogen_team/`
6. Go to playground and start a new session, select the 'Recipes data Analysis' workflow
7. Ask 'What is the total population of Mali?'

284 changes: 284 additions & 0 deletions assistants/chat_ui/create_update_assistant.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,284 @@
import asyncio
import datetime
import glob
import json
import os
import sys
import zipfile

import pandas as pd
import requests
from dotenv import load_dotenv
from jinja2 import Environment, FileSystemLoader
from openai import AzureOpenAI, OpenAI

from utils.db import get_data_info

load_dotenv("../../.env")

SUPPORTED_ASSISTANT_FILE_TYPES = ["csv", "json", "xlsx", "txt", "pdf", "docx", "pptx"]

bot_name = os.environ.get("ASSISTANTS_BOT_NAME")
environment = Environment(loader=FileSystemLoader("templates/"))

# Needed to get common fields standard_names
INTEGRATION_CONFIG = "ingestion.config"
SYSTEM_PROMPT = "instructions.txt"


def setup_openai():
"""
Setup the OpenAI client.

Returns:
tuple: A tuple containing the OpenAI client, assistant ID, and model.
"""
api_key = os.environ.get("ASSISTANTS_API_KEY")
assistant_id = os.environ.get("ASSISTANTS_ID")
model = os.environ.get("ASSISTANTS_MODEL")
api_type = os.environ.get("ASSISTANTS_API_TYPE")
api_endpoint = os.environ.get("ASSISTANTS_BASE_URL")
api_version = os.environ.get("ASSISTANTS_API_VERSION")

if api_type == "openai":
print("Using OpenAI API")
client = OpenAI(api_key=api_key)
elif api_type == "azure":
print("Using Azure API")
print(f"Endpoint: {api_endpoint}")
client = AzureOpenAI(
api_key=api_key,
api_version=api_version,
azure_endpoint=api_endpoint,
default_headers={"OpenAI-Beta": "assistants=v2"},
)
else:
print("API type not supported")
sys.exit(1)

return client, assistant_id, model


def get_common_field_standard_names():
"""
Get the standard names of common fields from the integration configuration file.

Returns:
list: A list of standard names of common fields.
"""
with open(INTEGRATION_CONFIG) as f:
print(f"Reading {INTEGRATION_CONFIG}")
config = json.load(f)
return config["standard_names"]


def get_local_files(assistant_file_type):
"""
Get local files of a specific type.

Args:
assistant_file_type (str): The type of file to get, "file_search" or "code_interpreter"

Returns:
list: A list of file paths.
"""

if assistant_file_type not in ["file_search", "code_interpreter"]:
print(f"Assistant file type {assistant_file_type} is not supported")
sys.exit(1)

file_paths = []
for file_type in SUPPORTED_ASSISTANT_FILE_TYPES:
dir = f"./files/{assistant_file_type}"
files = glob.glob(f"{dir}/**/*.{file_type}", recursive=True)
file_paths = file_paths + files

print(
f"Found {len(file_paths)} files - {file_paths} - of type {assistant_file_type}"
)

return file_paths


def upload_files_for_code_interpreter(client):
"""
Uploads files for the code interpretor section of the assistant.

Args:
client (OpenAI): The OpenAI client.
assistant_id (str): The assistant ID.
"""
file_ids = []

file_paths = get_local_files("code_interpreter")

for file_path in file_paths:
print(f"Uploading {file_path} to code interpreter ...")
file = client.files.create(file=open(file_path, "rb"), purpose="assistants")
file_ids.append(file.id)

return file_ids


def upload_files_to_vector_store(vector_store_name, client):
"""
Uploads files to the vector store.

Args:
vestor_store_name(str): The name of the vector store.
client (OpenAI): The OpenAI client.
"""

file_paths = get_local_files("file_search")

# No files to upload
if len(file_paths) == 0:
print("No files found to upload to vector store")
return None

print(f"Uploading {file_paths} to vector store {vector_store_name} ...")

# Create a vector store caled "Financial Statements"
vector_store = client.beta.vector_stores.create(name=vector_store_name)

# Ready the files for upload to OpenAI
file_streams = [open(path, "rb") for path in file_paths]

# Use the upload and poll SDK helper to upload the files, add them to the vector store,
# and poll the status of the file batch for completion.
file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
vector_store_id=vector_store.id, files=file_streams
)

# You can print the status and the file counts of the batch to see the result of this operation.
print(file_batch.status)
print(file_batch.file_counts)

return vector_store.id


def get_manually_defined_functions():
"""
Get a list of manually defined functions.

Returns:
list: A list of dictionaries representing the manually defined functions in openai format
"""
functions = [
{
"type": "function",
"function": {
"name": "call_execute_query_api",
"description": "Execute Query",
"parameters": {
"properties": {
"sql": {
"type": "string",
"title": "SQL",
"description": "The SQL query to be executed. Only read queries are allowed.",
}
},
"type": "object",
"required": ["sql"],
},
},
}
]

return functions


def create_update_assistant():
"""
Creates or updates a humanitarian response assistant.

To force creation of a new assistant, be sure that ASSITANT_ID is not set in the .env file.

"""

# Client setup
client, assistant_id, model = setup_openai()

# Information on common names
standard_names = get_common_field_standard_names()

# Get database information for system prompt
data_info = get_data_info()
print(data_info)

# Load code examples
template = environment.get_template("openai_assistant_prompt.jinja2")
instructions = template.render(
data_info=data_info,
country_code_field=standard_names["country_code_field"],
admin1_code_field=standard_names["admin1_code_field"],
admin2_code_field=standard_names["admin2_code_field"],
admin3_code_field=standard_names["admin3_code_field"],
)

# Save for debugging
with open(SYSTEM_PROMPT, "w") as f:
f.write(instructions)

# Upload any local files needed by assistant for file_search (RAG)
vector_store_id = upload_files_to_vector_store("local_files_vectore_store", client)

# Upload any files that will be used for code_interpretor
code_interpreter_file_ids = upload_files_for_code_interpreter(client)

params = {
"name": bot_name,
"instructions": instructions,
"model": model,
"temperature": 0.1,
}

tool_resources = {}

# Set tools
tools = [{"type": "code_interpreter"}]

# Add file search if we have data
if vector_store_id is not None:
tools.append({"type": "file_search"})
params["tool_resources"] = {
"file_search": {"vector_store_ids": [vector_store_id]}
}
tool_resources["file_search"] = {"vector_store_ids": [vector_store_id]}

if len(code_interpreter_file_ids) > 0:
tool_resources["code_interpreter"] = {"file_ids": code_interpreter_file_ids}

# Add manually defined functions
tools = tools + get_manually_defined_functions()

params["tools"] = tools

# Add in tool files as needed
if "code_interpreter" in tool_resources or "file_search" in tool_resources:
params["tool_resources"] = tool_resources

# If we were provided an ID in .env, pass it in to update existing assistant
if assistant_id is not None:
params["assistant_id"] = assistant_id

print(json.dumps(params, indent=4))

if assistant_id is None:
print(
f"Calling assistant API for ID: {assistant_id}, name: {bot_name} and model {model} ..."
)
assistant = client.beta.assistants.create(**params)
print("Assistant created!! Here is the assistant ID:\n")
print(assistant.id)
print("\nNow update ASSISTANTS_ID in your .env file with this ID")
else:
print(
f"Calling assistant API for ID: {assistant_id}, name: {bot_name} and model {model} ..."
)
assistant = client.beta.assistants.update(**params)
print("Assistant updated!!")


if __name__ == "__main__":
create_update_assistant()
File renamed without changes.
1 change: 1 addition & 0 deletions assistants/chat_ui/files/code_interpreter/core/test.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
test
Binary file not shown.
Loading
Loading