Feat/pf - End-to-End Github actions tests #61

dividor · 2024-06-29T19:56:10Z

Reopened to respond to reviewer and add GH actions

…er, less steps, also positions for GH actions

… actions

dividor · 2024-06-29T21:44:20Z

Added extras ...

Batch tests file data.jsonl (this will change to use Jan's work), has two simple tests for Promptflow
GitHub action to build environment, run promptflow batch, and check output

dividor · 2024-07-03T15:20:17Z

This PR finishes off the end-to-end tests first draft. See below for a summary from CONTRIBUTION.md.

Note, it also includes a new demo data zipfile, script to download and documentation, so assistant analysis tests can be run.

End-to-end tests

End-to-end tests have been configured in GitHub actions which use promptflow to call a wrapper around the chainlit UI, or order to test when memories/recipes are used as well as when the assistant does some on-the-fly analysis. To do this, the chainlit class is patched heavily, and there are limitations in how
cleanly this could be done, so it isn't an exact replica of the true application, but does capture changes
with the flow as well as test the assistant directly. The main body of integration tests will test recipes server and the assistant independently.

Additionally, there were some limitation when implementing in GitHub actions where workarounsd were implemented
until a lter data, namely: promptflow is run on the GitHub actions host rather than in docker, and the promptflow wrapper to call chainlit has to run as a script and kill the script based on a STDOUT string. These should be fixed in future.

Code for e2e tests can be found in flows/chainlit-ui-evaluation as run by .github/workflows/e2e_tests.yml

The tests work using promptflow evaluation and a call to an LLM to guage groundedness, due to the fact LLM assistants can produce slightly different results if not providing answers from memory/recipes. The promptflow evaluation test data can be found in flows/chainlit-ui-evaluation/data.jsonl.

A useful way to test a new scenario and to get the 'expected' output for data.jsonl, is to add it to call_assistant_debug.py.

TODO, future work:

Add promptflow to docker-compose-github.yml and update action to use this env (time was short and wasn't working). This will reduce overhead and complexity
Figure out how to make call_assistant.py exit async look so it doesn't have to run in a wrapper that then kills process
Push docker containers to a registry so flow doesn't run build every time
Bug the chainlit folks to see if they can do something more formal around testing, to avoid complex monkey patching

JanPeterDatakind

Approving this as the first iteration of promptflow testing/ evaluation

dividor added 13 commits June 29, 2024 15:24

Added promptflow connection creates to dockerbuild to make setup easi…

6a4133d

…er, less steps, also positions for GH actions

Added basic data.jsonl batch test file to use for setting up PF in GH…

eeb44a9

… actions

Added e2e tests GH action, with docker compose build

bc74ffb

Added e2e tests GH action, with docker compose build

b17e6d4

Added e2e tests GH action, with docker compose build

fb5753e

Added e2e tests GH action, with docker compose build

334c69c

Added e2e tests GH action, with docker compose build

3b37ef2

Added e2e tests GH action, with docker compose build

c1b3db2

Added e2e tests GH action, with docker compose build

c568abb

Added e2e tests GH action, with docker compose build

1d85a3c

Added e2e tests GH action, with docker compose build

79d130e

Added e2e tests GH action, with docker compose build

9cea28e

Added e2e tests GH action, with docker compose build

c104c98

dividor added 6 commits June 29, 2024 17:51

Added e2e tests GH action, with docker compose build

912a554

Added e2e tests GH action, with docker compose build

e8d3ece

Added e2e tests GH action, with docker compose build

4787ca1

Added e2e tests GH action, with docker compose build

2c8687d

Added e2e tests GH action, with docker compose build

94cd8ba

Added e2e tests GH action, with docker compose build

0536234

dividor had a problem deploying to GitHub Actions 1 June 29, 2024 22:08 — with GitHub Actions Failure

Added e2e tests GH action, with docker compose build

21f897a

dividor had a problem deploying to GitHub Actions 1 June 29, 2024 22:12 — with GitHub Actions Failure

Added e2e tests GH action, with docker compose build

3584f18

dividor had a problem deploying to GitHub Actions 1 June 29, 2024 22:17 — with GitHub Actions Failure

Added e2e tests GH action, with docker compose build

d571931

dividor had a problem deploying to GitHub Actions 1 June 29, 2024 22:18 — with GitHub Actions Failure

Added e2e tests GH action, with docker compose build

b8ee350

dividor had a problem deploying to GitHub Actions 1 June 29, 2024 22:19 — with GitHub Actions Failure

Added e2e tests GH action, with docker compose build

28540a1

Simple script to download and install datadb demo data

08a60c8

dividor temporarily deployed to GitHub Actions 1 July 3, 2024 15:08 — with GitHub Actions Inactive

Simple script to download and install datadb demo data

49fb898

dividor temporarily deployed to GitHub Actions 1 July 3, 2024 15:10 — with GitHub Actions Inactive

Simple script to download and install datadb demo data

d46a20b

dividor temporarily deployed to GitHub Actions 1 July 3, 2024 15:19 — with GitHub Actions Inactive

Simple script to download and install datadb demo data

d5d4ccb

dividor temporarily deployed to GitHub Actions 1 July 3, 2024 15:20 — with GitHub Actions Inactive

dividor requested a review from JanPeterDatakind July 3, 2024 15:26

dividor changed the title ~~Feat/pf tests v2 - Reopened to respond to reviewer and add GH actions~~ Feat/pf - End-to-End Github actions tests Jul 3, 2024

Simple script to download and install datadb demo data

ed74d67

dividor temporarily deployed to GitHub Actions 1 July 3, 2024 15:51 — with GitHub Actions Inactive

dividor added 2 commits July 3, 2024 11:54

Simple script to download and install datadb demo data

60b400e

Simple script to download and install datadb demo data

bab971f

dividor temporarily deployed to GitHub Actions 1 July 3, 2024 15:58 — with GitHub Actions Inactive

Simple script to download and install datadb demo data

77c21fc

dividor temporarily deployed to GitHub Actions 1 July 3, 2024 15:59 — with GitHub Actions Inactive

Simple script to download and install datadb demo data

1bfe54d

dividor had a problem deploying to GitHub Actions 1 July 3, 2024 16:01 — with GitHub Actions Failure

Simple script to download and install datadb demo data

c16107c

dividor temporarily deployed to GitHub Actions 1 July 3, 2024 16:02 — with GitHub Actions Inactive

Simple script to download and install datadb demo data

aa438b7

dividor temporarily deployed to GitHub Actions 1 July 3, 2024 16:03 — with GitHub Actions Inactive

Simple script to download and install datadb demo data

0c17e8e

dividor temporarily deployed to GitHub Actions 1 July 3, 2024 16:08 — with GitHub Actions Inactive

Removing extraneous chainlit folders, not needed in top dir

089e362

dividor temporarily deployed to GitHub Actions 1 July 3, 2024 16:21 — with GitHub Actions Inactive

JanPeterDatakind approved these changes Jul 5, 2024

View reviewed changes

JanPeterDatakind merged commit 14e04f4 into main Jul 5, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/pf - End-to-End Github actions tests #61

Feat/pf - End-to-End Github actions tests #61

dividor commented Jun 29, 2024

dividor commented Jun 29, 2024

dividor commented Jul 3, 2024

JanPeterDatakind left a comment

Feat/pf - End-to-End Github actions tests #61

Feat/pf - End-to-End Github actions tests #61

Conversation

dividor commented Jun 29, 2024

dividor commented Jun 29, 2024

dividor commented Jul 3, 2024

End-to-end tests

JanPeterDatakind left a comment

Choose a reason for hiding this comment