Skip to content

Commit

Permalink
Merge branch 'main' into feat/selenium-e2e-3
Browse files Browse the repository at this point in the history
  • Loading branch information
dividor authored Jul 11, 2024
2 parents 704bba4 + 94cd770 commit 4928df3
Show file tree
Hide file tree
Showing 3 changed files with 159 additions and 93 deletions.
51 changes: 27 additions & 24 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,20 @@ RECIPE_DB_CONN_STRING=postgresql://${POSTGRES_RECIPE_USER}:${POSTGRES_RECIPE_PAS
#==================================================#
# Recipes AI Settings #
#==================================================#
# You can leave these as-is for quick start
# These control how recipes are retrieved and generated using LLMs.
#
# If you are using Azure OpenAI. Note, in Playground in Azure, you can 'View code' to get these
#RECIPES_OPENAI_API_TYPE=azure
#RECIPES_OPENAI_API_KEY=
#RECIPES_OPENAI_API_ENDPOINT=
#RECIPES_OPENAI_API_ENDPOINT=<eg https://<YOUR DEPLOYMENT NAME>.openai.azure.com/>
#RECIPES_OPENAI_API_VERSION=2024-02-15-preview
#RECIPES_BASE_URL=
#RECIPES_MODEL=gpt-4-turbo
#RECIPES_MODEL=<The deployment name you created in Azure, eg gpt-4o>
#
# Leave these as-is for quick start
#RECIPES_OPENAI_TEXT_COMPLETION_DEPLOYMENT_NAME=text-embedding-ada-002
#RECIPES_BASE_URL=${RECIPES_OPENAI_API_ENDPOINT}

# gpt-4o only available on OpenAI
# OpenAI example
RECIPES_OPENAI_API_TYPE=openai
RECIPES_OPENAI_API_KEY=
RECIPES_MODEL=gpt-4o
Expand All @@ -61,33 +64,33 @@ IMAGE_HOST=http://localhost:3080/images
#==================================================#
# API Settings #
#==================================================#
# To get this go to https://hapi.humdata.org/docs#/,
# select the the encode_identifier endpoint, click the 'Try it out' button,
# Enter a name and you email and click send. The response will have your token.
# Note also, the URL for the api is set in ./ingestion/ingestion.config
# This token is just your encoded email address. To generate it, see the instructions here:
# https://hdx-hapi.readthedocs.io/en/latest/getting-started/
HAPI_API_TOKEN=

#==================================================#
# Assistant Settings #
#==================================================#
# Needed when updating an assistant, see assistants/openai_assistants. Leave blank to create new
# Parameters for the AI assistant used in the chat interface, to serve recipes and carry out
# on-the-fly-analysis
#
# # If you are using Azure OpenAI. Note, in Playground in Azure, you can 'View code' to get these
#ASSISTANTS_API_TYPE=azure
#ASSISTANTS_API_KEY=
#ASSISTANTS_ID=
#ASSISTANTS_BASE_URL=
#ASSISTANTS_API_KEY=<API Key as found on the Azure OpenAI resource>
#ASSISTANTS_ID=<ID of the assistant you created in OpenAI. Leave blank if you do not have one yet>
#ASSISTANTS_BASE_URL=<eg https://<YOUR DEPLOYMENT NAME>.openai.azure.com/>
#ASSISTANTS_API_VERSION=2024-02-15-preview
#ASSISTANTS_MODEL=gpt4-o
#ASSISTANTS_BOT_NAME="Humanitarian AI Assistant"

#ASSISTANTS_MODEL=<The deployment name of the model you created in Azure which the assitant uses, eg gpt-4o>
#ASSISTANTS_BOT_NAME=<Your assistant name, eg "Humanitarian AI Assistant">

# OPENAI
OPENAI_API_KEY=
# If you are using OPen AI directly (ie not Azure)
ASSISTANTS_API_TYPE=openai
OPENAI_API_KEY=<The API key you created on OpenAI>
ASSISTANTS_API_KEY=${OPENAI_API_KEY}
ASSISTANTS_API_TYPE=openai
ASSISTANTS_ID=
ASSISTANTS_ID=<ID of the assistant you created in OpenAI. Leave blank if you do not have one yet>
ASSISTANTS_MODEL=<The model your assistant uses>
ASSISTANTS_BOT_NAME=<Your assistant name, eg "Humanitarian AI Assistant">
ASSISTANTS_BASE_URL=""
ASSISTANTS_MODEL=gpt-4o
ASSISTANTS_BOT_NAME="Humanitarian AI Assistant"

#==================================================#
# Deployments Settings #
Expand All @@ -112,11 +115,11 @@ CHAT_URL="http://chat:8000/"
#==================================================#
# Chainlit Settings #
#==================================================#
# Used with Literal.ai to get telemetry and voting, can be left blank for quick start.
# Used with Literal.ai to get telemetry and voting, can be left blank if running locally
LITERAL_API_KEY=

# Run "chainlit create-secret" to get this.
# WARNING!!!! You MUST run this to update the defaults below if deploying online
# WARNING!!!! These are test values, ok for a quick start. Do Not deploy online with these as-is, regenerate them
CHAINLIT_AUTH_SECRET="1R_FKRaiv0~5bqoQurBx34ctOD8kM%a=YvIx~fVmYLVd>B5vWa>e9rDX?6%^iCOv"
USER_LOGIN=muppet-data-chef
USER_PASSWORD=hB%1b36!!8-v
Expand Down
36 changes: 26 additions & 10 deletions CONTRIBUTION.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ in the paragraphs below.

The easiest way to develop is to run in the Docker environment, see [README](./README.md) for more details.

### Resetting your environment

If running locally, you can reset your environment - removing any data for your databases, which means re-registration - by running `./cleanup.sh`.

## Code quality tests

The repo has been set up with black and flake8 pre-commit hooks. These are configured in the ``.pre-commit-config.yaml` file and initialized with `pre-commit autoupdate`.
Expand All @@ -32,6 +36,8 @@ On a new repo, you must run `pre-commit install` to add pre-commit hooks.

To run code quality tests, you can run `pre-commit run --all-files`

GitHub has an action to run the pre-commit tests to ensure code adheres to standards. See folder `'github/workflows` for more details.

## Tests

### Unit tests
Expand All @@ -49,10 +55,6 @@ the desired feature.

You can use `pytest` to run your tests, no matter which type of test it is.

### Code quality tests

GitHub has an action to run the pre-commit tests to ensure code adheres to standards. See folder `'github/workflows` for more details.

### End-to-end tests (using Selenium and Promptflow)

End-to-end tests have been configured in GitHub actions which use promptflow to call a wrapper around the chainlit UI, or order to test when memories/recipes are used as well as when the assistant does some on-the-fly analysis. To do this, the chainlit class is patched heavily, and there are limitations in how cleanly this could be done, so it isn't an exact replica of the true application, but does capture changes with the flow as well as test the assistant directly. The main body of integration tests will test recipes server and the assistant independently.
Expand Down Expand Up @@ -163,13 +165,27 @@ To download demo data ...
2. `cd data && python3 download_demo_data.py && cd ..`
3. `docker compose start datadb`

## Misc.
# Evaluation with Prompt Flow

### Testing connection to actions server
First, you will need to build the environment to include Prompt Flow ...

`docker compose -f docker-compose.yml -f docker-compose-dev.yml up -d --build`

1. `docker exec -it haa-libre-chat /bin/sh`
2. To test the SQL query action, run `curl -X POST -H "Content-Type: application/json" -d '{"query": "select 1"}' "http://actions:8080/api/actions/postgresql-universal-actions/execute-query/run"`
3. To get get-memory action, run ... `curl -X POST -H "Content-Type: application/json" -d '{"chat_history": "[]", "user_input":"population of Mali", "generate_intent":"true"}' "http://actions:8080/api/actions/get-data-recipe-memory/get-memory-recipe/run"`
Then ...

1. Install the DevContainers VSCode extension
2. Build data recipes using the `docker compose` command mentioned above
3. Open the command palette in VSCode (CMD + Shift + P on Mac; CTRL + Shift + P on Windows) and select

`Dev Containers: Attach to remote container`.

Select the promptflow container. This opens a new VSCode window - use it for the next steps.
4. Install Promptflow add-in
5. Open folder `/app`
6. Click on `flow.dag.yaml`
7. Top left of main pane, click on 'Visual editor'
8. On the Groundedness node, select your new connection
9. You can no run by clicking the play icon. See Promptflow documentation for more details

# Deployment

Expand All @@ -193,4 +209,4 @@ Note:

:warning: *This is very much a work in progress, deployment will be automated with fewer compose files soon*

You will need to set key environment variables, see your local `.env` for examples.
You will need to set key environment variables, see your local `.env` for examples.
165 changes: 106 additions & 59 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,18 +41,91 @@ This repo contains a docker-compose environment that will run the following comp

# Quick start

1. Copy `.env.example` to `.env` and set variables according to instructions in the file. Most variables be left as-is, but at a minimum you will need to set variables in these sections (see `.env.example` for instructions on how to set them):
- API Settings - Needed for ingesting data from data sources
- Recipes AI Settings - Set to your LLM deployment accordingly
- Assistant Settings - Set to your LLM deployment accordingly
2. `cd data && python3 download_demo_data.py && cd ..`
3. `docker compose up -d --build`
4. `docker compose exec chat python create_update_assistant.py`
5. Update `.env` file and set ASSISTANTS_ID to the value returned from the previous step
6. `docker compose up -d`
7. Go to [http://localhost:8000/](http://localhost:8000/)
1. Install Docker if you don't have it already, see [here](https://www.docker.com/products/docker-desktop/)
2. Check out the Data Recipes AI GitHub repo

Go to the [repo](https://github.com/datakind/data-recipes-ai) in Github, and click the big green '<> Code' button. This provides a few options, you can download a zip file, or check the code out with git. If you have Git installed, a common method would be ...

`git clone https://github.com/datakind/data-recipes-ai.git`

3. Populate your `.env` file with important settings to get started

First, copy `.env.example` in your repo to `.env` in the same location, then adjust the following valriables.

If using **Azure OpenAI**, you will need to set these in your `.env` ...

```
RECIPES_OPENAI_API_TYPE=azure
RECIPES_OPENAI_API_KEY=<The API key>
RECIPES_OPENAI_API_ENDPOINT=<eg https://<YOUR DEPLOYMENT NAME>.openai.azure.com/>
RECIPES_OPENAI_API_VERSION=<The API version in your deployment, eg 2024-02-15-preview>
RECIPES_MODEL=<The deployment name you created in Azure, eg gpt-4o>
ASSISTANTS_API_TYPE=azure
ASSISTANTS_API_KEY=<API Key as found on the Azure OpenAI resource>
ASSISTANTS_ID=<ID of the assistant you created in OpenAI. Leave blank if you do not have one yet>
ASSISTANTS_BASE_URL=<eg https://<YOUR DEPLOYMENT NAME>.openai.azure.com/>
ASSISTANTS_API_VERSION=2024-02-15-preview
ASSISTANTS_MODEL=<The deployment name of the model you created in Azure which the assitant uses, eg gpt-4o>
ASSISTANTS_BOT_NAME=<Your assistant name, eg "Humanitarian AI Assistant">
```
Note: In Azure Playground, you can view code for your assistant which provide most of the variables above
If using **OpenAI directly***, you will instead need to set these ...
```
RECIPES_OPENAI_API_TYPE=openai
RECIPES_OPENAI_API_KEY=<The API key you created on OpenAI>
RECIPES_MODEL=<model name, we recommend gpt-4o>
RECIPES_OPENAI_TEXT_COMPLETION_DEPLOYMENT_NAME=text-embedding-ada-002
ASSISTANTS_API_TYPE=openai
OPENAI_API_KEY=<The API key you created on OpenAI>
ASSISTANTS_API_KEY=${OPENAI_API_KEY}
ASSISTANTS_ID=<ID of the assistant you created in OpenAI. Leave blank if you do not have one yet>
ASSISTANTS_MODEL=<The model your assistant uses>
ASSISTANTS_BOT_NAME=<Your assistant name, eg "Humanitarian AI Assistant">
```
Not needed for quick start, but if you want to run ingestion of data with the new HDX API, then you will need to set ...
`HAPI_API_TOKEN=<See https://hdx-hapi.readthedocs.io/en/latest/getting-started/>`
4. Download sample Humanitarian Data Exchange (HDX) API data
For a quick start, we have prepared a sample dataset extracted from the new [HDX API](https://hdx-hapi.readthedocs.io/en/latest/). You can also run the ingestion yourself (see below), but this demo file should get you started quickly.
From [this Google folder](https://drive.google.com/drive/folders/1E4G9HM-QzxdXVNkgP3fQXsuNcABWzdus?usp=sharing), download the file starting with 'datadb' and save it into the 'data' folder of your repo.
Note: If you use python, you can also download this file by running this in your checked out repo top directory `pip3 install gdown && cd data && python3 download_demo_data.py && cd ..`
5. Start your environment
`docker compose up -d --build`
6. If you don't have one already, create an AI Assistant on Open AI (or Azure OpenAI)
Data Recipes AI uses Open AI style assistants, which support running code, and searching user-supplied data. We have provided a script to automatically do everything for you.
In a terminal, navigate to the repo top folder and run `docker compose exec chat python create_update_assistant.py`
Make note of the assitant ID, then edit your `.env` file and using it set variable `ASSISTANTS_ID`.
Note: (i) If you rerun `create_update_assistant.py` once `ASSISTANTS_ID` is set, the script will update the assistant rather than create a new one; (ii) You can also add your own data, pdf, docx, csv, xlsx files for the assistant to use, see section 'Adding your own files for the assistant to analyze' below.
7. Restart so the assistant ID is set, `docker compose up -d`
8. Go to [http://localhost:8000/](http://localhost:8000/) and sign-in using the values in your `.env` file for `USER_LOGIN` and `USER_PASSWORD`
## Stoping/Starting the environment
The steps above are mostly one-time. Going forward you only need to stop and start the environment as follows:
- To stop the environment `docker compose stop`
- To start the environment `docker compose up -d`, then go to [http://localhost:8000/](http://localhost:8000/)
- To start with rebuild `docker compose up -d --build` (for more details about development, see [CONTRIBUTION](CONTRIBUTION.md))
## Using Recipes
Expand All @@ -63,7 +136,7 @@ We are in a phase of research to identify and improve recipes, but for now the s
### Adding your own files for the assistant to analyze
The assistant can be configured to analyze your own files, either in searching them or using them when analyzing data on-the-fly. To add your won files, place them in one of the following folders:
The assistant can be configured to analyze your own files, either in searching them or using them when analyzing data on-the-fly. To add your own files, place them in one of the following folders:
`./assistants/chat_ui/files/file_search/custom` : The assistant will search these files
`./assistants/chat_ui/files/code_interpreter/custom` : The assistant can use these files when generating and running code
Expand Down Expand Up @@ -100,33 +173,15 @@ Then run ingestion in download only mode ...
5. `python ingest.py --skip_processing --skip_uploading`

# To start the environment

You can also access the recipe server monitoring endpoint:

- Recipes server: [http://localhost:4001/](http://localhost:4001/)

## Resetting your environment

If running locally, you can reset your environment - removing any data for your databases, which means re-registration - by running `./cleanup.sh`.

# Development

## Managing recipes
# Managing recipes
The management of recipes is part of the human in the loop approach of this repo. New recipes are created in status pending and only get marked as approved, once they have been verified by a recipe manager. Recipe managers can 'check out' recipes from the database into their local development environment such as VS Code to run, debug, and edit the recipes, before checking them back in. To make this process platform independent, recipes are checked out into a docker container, which can be used as the runtime environment to run the recipes via VSCode.
Recipes are managed using the recipes Command Line Interface (CLI), which allows you to check out recipes, run and refine, the commit them back to the recipes database for use in data recipes AI.

To run the cli, you will need to install some packages ...

`pip3 install typer`
Recipes are managed using the recipes Command Line Interface (CLI), which allows you to check out recipes, run and refine with LLM assistance, then commit them back to the recipes database for use in data recipes AI.
Once this is done, and you have your docker environment running as described above, you start the recipes CLI with ...
To run the CLI, you will need to start the docker environment as described in the 'Quick Start', then
`cd management`
`python cli.py`
`docker compose exec -it manager python cli.py`
When you first log in, you will be asked for your name. This is used when checking in recipes. Once in, you will be presented with a menu like this ...
Expand All @@ -135,46 +190,38 @@ When you first log in, you will be asked for your name. This is used when checki
Welcome to the recipes management CLI, matt!

Here are the commands you can run:
'checkout': Check out recipes for you to work on
'list': List all recipes that are checked out
'run': Run a recipe, you will be prompted to choose which one
'add': Add a new recipe
'delete': Delete a recipe, you will be prompted to choose which one
'checkin': Check in recipes you have completed
'makemem': Create a memory using recipe sample output
'help': Show a list of commands
'quit': Exit this recipes CLI

'checkout': Check out recipes for you to work on
'list': List all recipes that are checked out
'run': Run a recipe, you'll be prompted, or use 'run 5' to run 5.
'add': Add a new recipe (using LLM)
'edit': Edit a recipe (using LLM). You'll be prompted, or use 'edit 5' to edit 5.
'delete': Delete a recipe, you will be prompted to choose which one
'checkin': Check in recipes you have completed
'makemem': Create a memory using recipe sample output
'rebuild': Removes database data, runs all local recipes and checks them in
'dumpdb': Dump embedding, recipe and memory tables to DB upgrade files so included in build
'help': Show a list of commands
'quit': Exit this recipes CLI

Chat with Data Mode:

'chat': Enter data chat mode to ask questions about the data

Type one of the commands above to do some stuff.


>>
```
The first thing you will want to do is run 'checkout' to get all the recipe code from the database onto your computer so you can run them. Once you have them locally, you can edit them in tools like Visual Studio code.
The first thing you will want to do is run 'checkout' to get all the recipe code from the database onto your computer so you can run them. Once you have them locally, you can edit them in tools like Visual Studio code. They will appear in folder './management/work'.
To run recipes locally you can use the CLI 'run' command. This will run the recipe in the same environment, and will save the results like sample outputs, for you so they can be published back to the database.
You can create new recipes by entering 'add', where you'll be prompted for an intent. This will call an LLM to generate a first pass at your recipe, using the data that's in the data recipes environment.
When ready, you can check in your new and edited recipes with 'checkin'.
### Other approaches

You can also configure VS Code to connect to the recipe-manage container for running recipes ...

1. Install the DevContainers VSCode extension
2. Build data recipes using the `docker compose` command mentioned above
3. Open the command palette in VSCode (CMD + Shift + P on Mac; CTRL + Shift + P on Windows) and select

`Dev Containers: Attach to remote container`.

Select the recipe-manager container. This opens a new VSCode window - use it for the next steps.
4. Open folder `/app`
5. Navigate to your recipe in sub-folder `checked_out`
6. Run the `recipe.py` in a terminal or set up the docker interpretor

# Autogen Studio and autogen agent teams for creating data recipes
![alt text](./assets/autogen-studio-recipes.png)
Expand Down

0 comments on commit 4928df3

Please sign in to comment.