Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/pf - End-to-End Github actions tests #61

Merged
merged 205 commits into from
Jul 5, 2024
Merged

Conversation

dividor
Copy link
Contributor

@dividor dividor commented Jun 29, 2024

Reopened to respond to reviewer and add GH actions

@dividor
Copy link
Contributor Author

dividor commented Jun 29, 2024

Added extras ...

  1. Batch tests file data.jsonl (this will change to use Jan's work), has two simple tests for Promptflow
  2. GitHub action to build environment, run promptflow batch, and check output

@dividor dividor had a problem deploying to GitHub Actions 1 June 29, 2024 22:08 — with GitHub Actions Failure
@dividor dividor had a problem deploying to GitHub Actions 1 June 29, 2024 22:12 — with GitHub Actions Failure
@dividor dividor had a problem deploying to GitHub Actions 1 June 29, 2024 22:17 — with GitHub Actions Failure
@dividor dividor had a problem deploying to GitHub Actions 1 June 29, 2024 22:18 — with GitHub Actions Failure
@dividor dividor had a problem deploying to GitHub Actions 1 June 29, 2024 22:19 — with GitHub Actions Failure
@dividor dividor temporarily deployed to GitHub Actions 1 July 3, 2024 15:08 — with GitHub Actions Inactive
@dividor dividor temporarily deployed to GitHub Actions 1 July 3, 2024 15:10 — with GitHub Actions Inactive
@dividor dividor temporarily deployed to GitHub Actions 1 July 3, 2024 15:19 — with GitHub Actions Inactive
@dividor dividor temporarily deployed to GitHub Actions 1 July 3, 2024 15:20 — with GitHub Actions Inactive
@dividor
Copy link
Contributor Author

dividor commented Jul 3, 2024

This PR finishes off the end-to-end tests first draft. See below for a summary from CONTRIBUTION.md.

Note, it also includes a new demo data zipfile, script to download and documentation, so assistant analysis tests can be run.

End-to-end tests

End-to-end tests have been configured in GitHub actions which use promptflow to call a wrapper around the chainlit UI, or order to test when memories/recipes are used as well as when the assistant does some on-the-fly analysis. To do this, the chainlit class is patched heavily, and there are limitations in how
cleanly this could be done, so it isn't an exact replica of the true application, but does capture changes
with the flow as well as test the assistant directly. The main body of integration tests will test recipes server and the assistant independently.

Additionally, there were some limitation when implementing in GitHub actions where workarounsd were implemented
until a lter data, namely: promptflow is run on the GitHub actions host rather than in docker, and the promptflow wrapper to call chainlit has to run as a script and kill the script based on a STDOUT string. These should be fixed in future.

Code for e2e tests can be found in flows/chainlit-ui-evaluation as run by .github/workflows/e2e_tests.yml

The tests work using promptflow evaluation and a call to an LLM to guage groundedness, due to the fact LLM assistants can produce slightly different results if not providing answers from memory/recipes. The promptflow evaluation test data can be found in flows/chainlit-ui-evaluation/data.jsonl.

A useful way to test a new scenario and to get the 'expected' output for data.jsonl, is to add it to call_assistant_debug.py.

TODO, future work:

  • Add promptflow to docker-compose-github.yml and update action to use this env (time was short and wasn't working). This will reduce overhead and complexity
  • Figure out how to make call_assistant.py exit async look so it doesn't have to run in a wrapper that then kills process
  • Push docker containers to a registry so flow doesn't run build every time
  • Bug the chainlit folks to see if they can do something more formal around testing, to avoid complex monkey patching

@dividor dividor requested a review from JanPeterDatakind July 3, 2024 15:26
@dividor dividor changed the title Feat/pf tests v2 - Reopened to respond to reviewer and add GH actions Feat/pf - End-to-End Github actions tests Jul 3, 2024
@dividor dividor temporarily deployed to GitHub Actions 1 July 3, 2024 15:51 — with GitHub Actions Inactive
@dividor dividor temporarily deployed to GitHub Actions 1 July 3, 2024 15:58 — with GitHub Actions Inactive
@dividor dividor temporarily deployed to GitHub Actions 1 July 3, 2024 15:59 — with GitHub Actions Inactive
@dividor dividor had a problem deploying to GitHub Actions 1 July 3, 2024 16:01 — with GitHub Actions Failure
@dividor dividor temporarily deployed to GitHub Actions 1 July 3, 2024 16:02 — with GitHub Actions Inactive
@dividor dividor temporarily deployed to GitHub Actions 1 July 3, 2024 16:03 — with GitHub Actions Inactive
@dividor dividor temporarily deployed to GitHub Actions 1 July 3, 2024 16:08 — with GitHub Actions Inactive
@dividor dividor temporarily deployed to GitHub Actions 1 July 3, 2024 16:21 — with GitHub Actions Inactive
Copy link
Contributor

@JanPeterDatakind JanPeterDatakind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this as the first iteration of promptflow testing/ evaluation

@JanPeterDatakind JanPeterDatakind merged commit 14e04f4 into main Jul 5, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants