Merge branch 'main' into feat/selenium-e2e-3

datakind · Jul 11, 2024 · 4928df3 · 4928df3
2 parents 704bba4 + 94cd770
commit 4928df3
Show file tree

Hide file tree

Showing 3 changed files with 159 additions and 93 deletions.
diff --git a/.env.example b/.env.example
@@ -27,17 +27,20 @@ RECIPE_DB_CONN_STRING=postgresql://${POSTGRES_RECIPE_USER}:${POSTGRES_RECIPE_PAS
 #==================================================#
 #                 Recipes AI Settings              #
 #==================================================#
-# You can leave these as-is for quick start
+# These control how recipes are retrieved and generated using LLMs.
 #
+# If you are using Azure OpenAI. Note, in Playground in Azure, you can 'View code' to get these
 #RECIPES_OPENAI_API_TYPE=azure
 #RECIPES_OPENAI_API_KEY=
-#RECIPES_OPENAI_API_ENDPOINT=
+#RECIPES_OPENAI_API_ENDPOINT=<eg https://<YOUR DEPLOYMENT NAME>.openai.azure.com/>
 #RECIPES_OPENAI_API_VERSION=2024-02-15-preview
-#RECIPES_BASE_URL=
-#RECIPES_MODEL=gpt-4-turbo
+#RECIPES_MODEL=<The deployment name you created in Azure, eg gpt-4o>
+#
+# Leave these as-is for quick start
 #RECIPES_OPENAI_TEXT_COMPLETION_DEPLOYMENT_NAME=text-embedding-ada-002
+#RECIPES_BASE_URL=${RECIPES_OPENAI_API_ENDPOINT}
 
-# gpt-4o only available on OpenAI
+# OpenAI example
 RECIPES_OPENAI_API_TYPE=openai
 RECIPES_OPENAI_API_KEY=
 RECIPES_MODEL=gpt-4o
@@ -61,33 +64,33 @@ IMAGE_HOST=http://localhost:3080/images
 #==================================================#
 #                 API Settings                     #
 #==================================================#
-# To get this go to https://hapi.humdata.org/docs#/, 
-# select the the encode_identifier endpoint, click the 'Try it out' button,
-# Enter a name and you email and click send. The response will have your token.
-# Note also, the URL for the api is set in ./ingestion/ingestion.config
+# This token is just your encoded email address. To generate it, see the instructions here:
+# https://hdx-hapi.readthedocs.io/en/latest/getting-started/
 HAPI_API_TOKEN=
 
 #==================================================#
 #               Assistant Settings                 #
 #==================================================#
-# Needed when updating an assistant, see assistants/openai_assistants. Leave blank to create new
+# Parameters for the AI assistant used in the chat interface, to serve recipes and carry out
+# on-the-fly-analysis
+#
+# # If you are using Azure OpenAI. Note, in Playground in Azure, you can 'View code' to get these
 #ASSISTANTS_API_TYPE=azure  
-#ASSISTANTS_API_KEY=
-#ASSISTANTS_ID=
-#ASSISTANTS_BASE_URL=
+#ASSISTANTS_API_KEY=<API Key as found on the Azure OpenAI resource>
+#ASSISTANTS_ID=<ID of the assistant you created in OpenAI. Leave blank if you do not have one yet>
+#ASSISTANTS_BASE_URL=<eg https://<YOUR DEPLOYMENT NAME>.openai.azure.com/>
 #ASSISTANTS_API_VERSION=2024-02-15-preview
-#ASSISTANTS_MODEL=gpt4-o
-#ASSISTANTS_BOT_NAME="Humanitarian AI Assistant"
-
+#ASSISTANTS_MODEL=<The deployment name of the model you created in Azure which the assitant uses, eg gpt-4o>
+#ASSISTANTS_BOT_NAME=<Your assistant name, eg "Humanitarian AI Assistant">
 
-# OPENAI
-OPENAI_API_KEY=
+# If you are using OPen AI directly (ie not Azure)
+ASSISTANTS_API_TYPE=openai 
+OPENAI_API_KEY=<The API key you created on OpenAI>
 ASSISTANTS_API_KEY=${OPENAI_API_KEY}
-ASSISTANTS_API_TYPE=openai  
-ASSISTANTS_ID=
+ASSISTANTS_ID=<ID of the assistant you created in OpenAI. Leave blank if you do not have one yet>
+ASSISTANTS_MODEL=<The model your assistant uses>
+ASSISTANTS_BOT_NAME=<Your assistant name, eg "Humanitarian AI Assistant">
 ASSISTANTS_BASE_URL=""
-ASSISTANTS_MODEL=gpt-4o
-ASSISTANTS_BOT_NAME="Humanitarian AI Assistant"
 
 #==================================================#
 #             Deployments Settings                 #
@@ -112,11 +115,11 @@ CHAT_URL="http://chat:8000/"
 #==================================================#
 #                    Chainlit Settings             #
 #==================================================#
-# Used with Literal.ai to get telemetry and voting, can be left blank for quick start.
+# Used with Literal.ai to get telemetry and voting, can be left blank if running locally
 LITERAL_API_KEY=
 
 # Run "chainlit create-secret" to get this. 
-# WARNING!!!! You MUST run this to update the defaults below if deploying online
+# WARNING!!!! These are test values, ok for a quick start. Do Not deploy online with these as-is, regenerate them 
 CHAINLIT_AUTH_SECRET="1R_FKRaiv0~5bqoQurBx34ctOD8kM%a=YvIx~fVmYLVd>B5vWa>e9rDX?6%^iCOv"
 USER_LOGIN=muppet-data-chef
 USER_PASSWORD=hB%1b36!!8-v

diff --git a/CONTRIBUTION.md b/CONTRIBUTION.md
@@ -24,6 +24,10 @@ in the paragraphs below.
 
 The easiest way to develop is to run in the Docker environment, see [README](./README.md) for more details. 
 
+### Resetting your environment
+
+If running locally, you can reset your environment - removing any data for your databases, which means re-registration - by running `./cleanup.sh`.
+
 ## Code quality tests
 
 The repo has been set up with black and flake8 pre-commit hooks. These are configured in the ``.pre-commit-config.yaml` file and initialized with `pre-commit autoupdate`.
@@ -32,6 +36,8 @@ On a new repo, you must run `pre-commit install` to add pre-commit hooks.
 
 To run code quality tests, you can run `pre-commit run --all-files`
 
+GitHub has an action to run the pre-commit tests to ensure code adheres to standards. See folder `'github/workflows` for more details.
+
 ## Tests
 
 ### Unit tests
@@ -49,10 +55,6 @@ the desired feature.
 
 You can use `pytest` to run your tests, no matter which type of test it is.
 
-### Code quality tests
-
-GitHub has an action to run the pre-commit tests to ensure code adheres to standards. See folder `'github/workflows` for more details.
-
 ### End-to-end tests (using Selenium and Promptflow)
 
 End-to-end tests have been configured in GitHub actions which use promptflow to call a wrapper around the chainlit UI, or order to test when memories/recipes are used as well as when the assistant does some on-the-fly analysis. To do this, the chainlit class is patched heavily, and there are limitations in how cleanly this could be done, so it isn't an exact replica of the true application, but does capture changes with the flow as well as test the assistant directly. The main body of integration tests will test recipes server and the assistant independently.
@@ -163,13 +165,27 @@ To download demo data ...
 2. `cd data && python3 download_demo_data.py && cd ..`
 3. `docker compose start datadb` 
 
-## Misc.
+# Evaluation with Prompt Flow
 
-### Testing connection to actions server
+First, you will need to build the environment to include Prompt Flow ...
+
+`docker compose -f docker-compose.yml -f docker-compose-dev.yml up -d --build`
 
-1. `docker exec -it haa-libre-chat  /bin/sh`
-2. To test the SQL query action, run `curl -X POST -H "Content-Type: application/json"  -d '{"query": "select 1"}' "http://actions:8080/api/actions/postgresql-universal-actions/execute-query/run"`
-3. To get get-memory action, run ... `curl -X POST -H "Content-Type: application/json"  -d '{"chat_history": "[]", "user_input":"population of Mali", "generate_intent":"true"}'  "http://actions:8080/api/actions/get-data-recipe-memory/get-memory-recipe/run"`
+Then ...
+
+1. Install the DevContainers VSCode extension 
+2. Build data recipes using the `docker compose` command mentioned above
+3. Open the command palette in VSCode (CMD + Shift + P on Mac; CTRL + Shift + P on Windows) and select 
+
+   `Dev Containers: Attach to remote container`. 
+
+   Select the promptflow container. This opens a new VSCode window - use it for the next steps.
+4. Install Promptflow add-in
+5. Open folder `/app`
+6. Click on `flow.dag.yaml`
+7. Top left of main pane, click on 'Visual editor'
+8. On the Groundedness node, select your new connection
+9. You can no run by clicking the play icon. See Promptflow documentation for more details
 
 # Deployment
 
@@ -193,4 +209,4 @@ Note:
 
 :warning: *This is very much a work in progress, deployment will be automated with fewer compose files soon*
 
-You will need to set key environment variables, see your local `.env` for examples.
+You will need to set key environment variables, see your local `.env` for examples.
diff --git a/README.md b/README.md
@@ -41,18 +41,91 @@ This repo contains a docker-compose environment that will run the following comp
 
 # Quick start
 
-1. Copy `.env.example` to `.env` and set variables according to instructions in the file. Most variables be left as-is, but at a minimum you will need to set variables in these sections (see `.env.example` for instructions on how to set them):
-    - API Settings - Needed for ingesting data from data sources
-    - Recipes AI Settings - Set to your LLM deployment accordingly
-    - Assistant Settings - Set to your LLM deployment accordingly
-2. `cd data && python3 download_demo_data.py && cd ..`
-3. `docker compose up -d --build`
-4. `docker compose exec chat python create_update_assistant.py`
-5. Update `.env` file and set ASSISTANTS_ID to the value returned from the previous step
-6. `docker compose up -d`
-7. Go to [http://localhost:8000/](http://localhost:8000/)
+1. Install Docker if you don't have it already, see [here](https://www.docker.com/products/docker-desktop/)
+2. Check out the Data Recipes AI GitHub repo
 
+    Go to the [repo](https://github.com/datakind/data-recipes-ai) in Github, and click the big green '<> Code' button. This provides a few options, you can download a zip file, or check the code out with git. If you have Git installed, a common method would be ...
 
+    `git clone https://github.com/datakind/data-recipes-ai.git`
+
+3. Populate your `.env` file with important settings to get started
+
+    First, copy `.env.example` in your repo to `.env` in the same location, then adjust the following valriables.
+
+    If using **Azure OpenAI**, you will need to set these in your `.env` ...
+
+    ```
+    RECIPES_OPENAI_API_TYPE=azure
+    RECIPES_OPENAI_API_KEY=<The API key>
+    RECIPES_OPENAI_API_ENDPOINT=<eg https://<YOUR DEPLOYMENT NAME>.openai.azure.com/>
+    RECIPES_OPENAI_API_VERSION=<The API version in your deployment, eg 2024-02-15-preview>
+    RECIPES_MODEL=<The deployment name you created in Azure, eg gpt-4o>
+
+    ASSISTANTS_API_TYPE=azure  
+    ASSISTANTS_API_KEY=<API Key as found on the Azure OpenAI resource>
+    ASSISTANTS_ID=<ID of the assistant you created in OpenAI. Leave blank if you do not have one yet>
+    ASSISTANTS_BASE_URL=<eg https://<YOUR DEPLOYMENT NAME>.openai.azure.com/>
+    ASSISTANTS_API_VERSION=2024-02-15-preview
+    ASSISTANTS_MODEL=<The deployment name of the model you created in Azure which the assitant uses, eg gpt-4o>
+    ASSISTANTS_BOT_NAME=<Your assistant name, eg "Humanitarian AI Assistant">
+
+    ```
+
+    Note: In Azure Playground, you can view code for your assistant which provide most of the variables above
+
+    If using **OpenAI directly***, you will instead need to set these ...
+
+    ```
+    RECIPES_OPENAI_API_TYPE=openai
+    RECIPES_OPENAI_API_KEY=<The API key you created on OpenAI>
+    RECIPES_MODEL=<model name, we recommend gpt-4o>
+    RECIPES_OPENAI_TEXT_COMPLETION_DEPLOYMENT_NAME=text-embedding-ada-002
+
+    ASSISTANTS_API_TYPE=openai 
+    OPENAI_API_KEY=<The API key you created on OpenAI>
+    ASSISTANTS_API_KEY=${OPENAI_API_KEY}
+    ASSISTANTS_ID=<ID of the assistant you created in OpenAI. Leave blank if you do not have one yet>
+    ASSISTANTS_MODEL=<The model your assistant uses>
+    ASSISTANTS_BOT_NAME=<Your assistant name, eg "Humanitarian AI Assistant">
+    ```
+
+    Not needed for quick start, but if you want to run ingestion of data with the new HDX API, then you will need to set ...
+
+    `HAPI_API_TOKEN=<See https://hdx-hapi.readthedocs.io/en/latest/getting-started/>`
+
+4. Download sample Humanitarian Data Exchange (HDX) API data
+
+    For a quick start, we have prepared a sample dataset extracted from the new [HDX API](https://hdx-hapi.readthedocs.io/en/latest/). You can also run the ingestion yourself (see below), but this demo file should get you started quickly.
+
+    From [this Google folder](https://drive.google.com/drive/folders/1E4G9HM-QzxdXVNkgP3fQXsuNcABWzdus?usp=sharing), download the file starting with 'datadb' and save it into the 'data' folder of your repo.
+
+    Note: If you use python, you can also download this file by running this in your checked out repo top directory `pip3 install gdown && cd data && python3 download_demo_data.py && cd ..`
+
+5. Start your environment
+
+    `docker compose up -d --build`
+
+6. If you don't have one already, create an AI Assistant on Open AI (or Azure OpenAI)
+
+    Data Recipes AI uses Open AI style assistants, which support running code, and searching user-supplied data. We have provided a script to automatically do everything for you. 
+
+    In a terminal, navigate to the repo top folder and run `docker compose exec chat python create_update_assistant.py`
+
+    Make note of the assitant ID, then edit your `.env` file and using it set variable `ASSISTANTS_ID`.
+
+    Note: (i) If you rerun `create_update_assistant.py` once `ASSISTANTS_ID` is set, the script will update the assistant rather than create a new one; (ii) You can also add your own data, pdf, docx, csv, xlsx files for the assistant to use, see section 'Adding your own files for the assistant to analyze' below.
+
+7. Restart so the assistant ID is set, `docker compose up -d`
+
+8. Go to [http://localhost:8000/](http://localhost:8000/) and sign-in using the values in your `.env` file for `USER_LOGIN` and `USER_PASSWORD`
+
+## Stoping/Starting the environment
+
+The steps above are mostly one-time. Going forward you only need to stop and start the environment as follows:
+
+- To stop the environment `docker compose stop`
+- To start the environment `docker compose up -d`, then go to [http://localhost:8000/](http://localhost:8000/)
+- To start with rebuild `docker compose up -d --build` (for more details about development, see [CONTRIBUTION](CONTRIBUTION.md))
 
 
 ## Using Recipes
@@ -63,7 +136,7 @@ We are in a phase of research to identify and improve recipes, but for now the s
 
 ### Adding your own files for the assistant to analyze
 
-The assistant can be configured to analyze your own files, either in searching them or using them when analyzing data on-the-fly. To add your won files, place them in one of the following folders:
+The assistant can be configured to analyze your own files, either in searching them or using them when analyzing data on-the-fly. To add your own files, place them in one of the following folders:
 
 `./assistants/chat_ui/files/file_search/custom` : The assistant will search these files
 `./assistants/chat_ui/files/code_interpreter/custom` : The assistant can use these files when generating and running code
@@ -100,33 +173,15 @@ Then run ingestion in download only mode ...
 
 5. `python ingest.py --skip_processing --skip_uploading`
 
-
-# To start the environment
-
-You can also access the recipe server monitoring endpoint:
-
-- Recipes server: [http://localhost:4001/](http://localhost:4001/)
-
-## Resetting your environment
-
-If running locally, you can reset your environment - removing any data for your databases, which means re-registration - by running `./cleanup.sh`.
-
-# Development
-
-## Managing recipes
+# Managing recipes
 
 The management of recipes is part of the human in the loop approach of this repo. New recipes are created in status pending and only get marked as approved, once they have been verified by a recipe manager. Recipe managers can 'check out' recipes from the database into their local development environment such as VS Code to run, debug, and edit the recipes, before checking them back in. To make this process platform independent, recipes are checked out into a docker container, which can be used as the runtime environment to run the recipes via VSCode. 
 
-Recipes are managed using the recipes Command Line Interface (CLI), which allows you to check out recipes, run and refine, the commit them back to the recipes database for use in data recipes AI.
-
-To run the cli, you will need to install some packages ...
-
-`pip3 install typer`
+Recipes are managed using the recipes Command Line Interface (CLI), which allows you to check out recipes, run and refine with LLM assistance, then commit them back to the recipes database for use in data recipes AI.
 
-Once this is done, and you have your docker environment running as  described above, you start the recipes CLI with ...
+To run the CLI, you will need to start the docker environment as described in the 'Quick Start', then
 
-`cd management`
-`python cli.py`
+`docker compose exec -it manager python cli.py`
 
 When you first log in, you will be asked for your name. This is used when checking in recipes. Once in, you will be presented with a menu like this ...
 
@@ -135,46 +190,38 @@ When you first log in, you will be asked for your name. This is used when checki
 Welcome to the recipes management CLI, matt!
 
     Here are the commands you can run:
-    
-    'checkout': Check out recipes for you to work on
-    'list': List all recipes that are checked out
-    'run': Run a recipe, you will be prompted to choose which one
-    'add': Add a new recipe
-    'delete': Delete a recipe, you will be prompted to choose which one
-    'checkin': Check in recipes you have completed
-    'makemem': Create a memory using recipe sample output
-    'help': Show a list of commands
-    'quit': Exit this recipes CLI
+
+       'checkout': Check out recipes for you to work on
+       'list': List all recipes that are checked out
+       'run': Run a recipe, you'll be prompted, or use 'run 5' to run 5.
+       'add': Add a new recipe (using LLM)
+       'edit': Edit a recipe (using LLM). You'll be prompted, or use 'edit 5' to edit 5.
+       'delete': Delete a recipe, you will be prompted to choose which one
+       'checkin': Check in recipes you have completed
+       'makemem': Create a memory using recipe sample output
+       'rebuild': Removes database data, runs all local recipes and checks them in
+       'dumpdb': Dump embedding, recipe and memory tables to DB upgrade files so included in build
+       'help': Show a list of commands
+       'quit': Exit this recipes CLI
+
+    Chat with Data Mode:
+
+       'chat': Enter data chat mode to ask questions about the data
 
     Type one of the commands above to do some stuff.
 
 
 >> 
 ```
 
-The first thing you will want to do is run 'checkout' to get all the recipe code from the database onto your computer so you can run them. Once you have them locally, you can edit them in tools like Visual Studio code. 
+The first thing you will want to do is run 'checkout' to get all the recipe code from the database onto your computer so you can run them. Once you have them locally, you can edit them in tools like Visual Studio code. They will appear in folder './management/work'.
 
 To run recipes locally you can use the CLI 'run' command. This will run the recipe in the same environment, and will save the results like sample outputs, for you so they can be published back to the database.
 
 You can create new recipes by entering 'add', where you'll be prompted for an intent. This will call an LLM to generate a first pass at your recipe, using the data that's in the data recipes environment.
 
 When ready, you can check in your new and edited recipes with 'checkin'.
 
-### Other approaches
-
-You can also configure VS Code to connect to the recipe-manage container for running recipes ...
-
-1. Install the DevContainers VSCode extension 
-2. Build data recipes using the `docker compose` command mentioned above
-3. Open the command palette in VSCode (CMD + Shift + P on Mac; CTRL + Shift + P on Windows) and select 
-
-   `Dev Containers: Attach to remote container`. 
-
-   Select the recipe-manager container. This opens a new VSCode window - use it for the next steps.
-4. Open folder `/app`
-5. Navigate to your recipe in sub-folder `checked_out`
-6. Run the `recipe.py` in a terminal or set up the docker interpretor
-
 # Autogen Studio and autogen agent teams for creating data recipes
 
 ![alt text](./assets/autogen-studio-recipes.png)