Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/pf tests v2 #60

Merged
merged 20 commits into from
Jun 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
81ef861
Added promptflow standard build
dividor Jun 25, 2024
d43373f
Fixed container name
dividor Jun 26, 2024
66fa688
Interim checkin to make main loop simpler, in prep for self-tests
dividor Jun 27, 2024
f040277
Interim checkin to make main loop simpler, in prep for self-tests
dividor Jun 27, 2024
e47a0e8
Mock test harness, with Mock chainlit so we can use UI code for promp…
dividor Jun 27, 2024
8302bc1
Mock test harness, with Mock chainlit so we can use UI code for promp…
dividor Jun 27, 2024
7260074
Mock test harness, with Mock chainlit so we can use UI code for promp…
dividor Jun 27, 2024
9b97c48
Interim commit, still having thread management issues due to async ch…
dividor Jun 27, 2024
5f82705
Promptflow works partially
dividor Jun 27, 2024
177d350
Implemented workaround for async hanging thread, to call script and k…
dividor Jun 28, 2024
c423f14
Implemented workaround for async hanging thread, to call script and k…
dividor Jun 28, 2024
4c2a457
Implemented workaround for async hanging thread, to call script and k…
dividor Jun 28, 2024
ac99815
Implemented workaround for async hanging thread, to call script and k…
dividor Jun 28, 2024
6fdb85f
Added Promptflow to docker build as a dev option, ie not part of prod
dividor Jun 28, 2024
248ec0f
Adjusted AI judge prompt as part of creating unit tests. We will refi…
dividor Jun 28, 2024
009de78
Fixed bug to back-populate assistant history when in test mode
dividor Jun 28, 2024
f07f853
Fixed bug to back-populate assistant history when in test mode
dividor Jun 28, 2024
5218ed8
Had to add a dockerbuild to be able to install and mock chainlit
dividor Jun 28, 2024
56b3de3
Fixed bug to back-populate assistant history when in test mode
dividor Jun 28, 2024
4405eb0
Fixed bug to back-populate assistant history when in test mode
dividor Jun 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,29 @@ To activate:
6. Go to playground and start a new session, select the 'Recipes data Analysis' workflow
7. Ask 'What is the total population of Mali?'

# Evaluation with Prompt Flow

First, you will need to build the environment to include Prompt Flow ...

`docker compose -f docker-compose.yml -f docker-compose-dev.yml up -d --build`

Then ...

1. Install the DevContainers VSCode extension
2. Build data recipes using the `docker compose` command mentioned above
3. Open the command palette in VSCode (CMD + Shift + P on Mac; CTRL + Shift + P on Windows) and select

`Dev Containers: Attach to remote container`.

Select the promptflow container. This opens a new VSCode window - use it for the next steps.
4. Install Promptflow add-in
5. Open folder `/app`
6. Click on `flow.dag.yaml`
7. Top left of main pane, click on 'Visual editor'
8. On bottom left under connections, configure an Azure OpenAI connection called 'azure_openai'
9. On the Groundedness node, select your new connection
10. You can no run by clicking the play icon. See Promptflow documentation for more details

# Deployment

We will add more details here soon, for now, here are some notes on Azure ...
Expand Down
21 changes: 21 additions & 0 deletions docker-compose-dev.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#version: "3.4"

services:
promptflow:
#image: mcr.microsoft.com/azureml/promptflow/promptflow-runtime-stable:latest
build:
context: .
dockerfile: ./flows/chainlit-ui-evaluation//Dockerfile
container_name: recipes-ai-promptflow
env_file:
- .env
volumes:
- ./flows:/app
- ./utils:/app/chainlit-ui-evaluation/utils
- ./templates:/app/chainlit-ui-evaluation/templates
- shared-data:/app/chainlit-ui-evaluation/recipes/public
- ./management/skills.py:/app/chainlit-ui-evaluation/recipes/skills.py
- ./ui/chat-chainlit-assistant/app.py:/app/chainlit-ui-evaluation/app.py
volumes:
pgdata2:
shared-data:
1 change: 0 additions & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,6 @@ services:
- ./utils:/app/utils
- ./templates:/app/templates
- ./db/recipedb:/app/db

volumes:
pgdata2:
shared-data:
6 changes: 6 additions & 0 deletions flows/chainlit-ui-evaluation/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
FROM mcr.microsoft.com/azureml/promptflow/promptflow-runtime-stable:latest

# No need to copy the app code, we mount via docker-compose-dev.yml

RUN pip3 install --upgrade pip
RUN pip3 install chainlit==1.1.305
38 changes: 38 additions & 0 deletions flows/chainlit-ui-evaluation/aggregate_variant_results.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
from typing import List

import numpy as np
from promptflow import log_metric, tool


@tool
def aggregate_variants_results(results: List[dict]):
"""
Aggregate the results of multiple variants.

Args:
results (List[dict]): A list of dictionaries containing the results for each variant.

Returns:
dict: A dictionary containing the aggregated results, with the metric names as keys and the aggregated values as values.
"""
aggregate_results = {}
for result in results:
for name, value in result.items():
if name not in aggregate_results.keys():
aggregate_results[name] = []
try:
float_val = float(value)
except Exception:
float_val = np.nan
aggregate_results[name].append(float_val)

for name, value in aggregate_results.items():
metric_name = name
aggregate_results[name] = np.nanmean(value)
if "pass_rate" in metric_name:
metric_name = metric_name + "(%)"
aggregate_results[name] = aggregate_results[name] * 100.0
aggregate_results[name] = round(aggregate_results[name], 2)
log_metric(metric_name, aggregate_results[name])

return aggregate_results
6 changes: 6 additions & 0 deletions flows/chainlit-ui-evaluation/azure_openai.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
$schema: https://azuremlschemas.azureedge.net/promptflow/latest/AzureOpenAIConnection.schema.json
name: open_ai_connection
type: azure_open_ai
api_key: "<user-input>"
api_base: "<user-input>"
api_type: "azure"
Loading
Loading