[Enhancement] Add LLM API key checks to LLM-based evaluators #1989

aybruhm · 2024-08-14T07:07:26Z

Description

This PR enhances the backend by adding checks to ensure that an OpenAI API key is present for LLM-based evaluators, with clear exception messages if the key is missing. Additionally, test coverage has been expanded to include scenarios where the OpenAI API key is required, including a new test case for auto_ai_critique.

Related Issue

Closes AGE-532 & AGE-569

Acceptance Tests

Test 1: OpenAI API Key is Required for LLM-Based Evaluators

Precondition: Ensure no OpenAI API key is set in the LLM Keys view on the frontend.
Action:
1. Run an LLM-based evaluator (e.g., auto_ai_critique) without an OpenAI API key.
Expected Outcome:
- The evaluator should raise a clear and informative exception message indicating that an OpenAI API key is missing.

Test 2: LLM-based Evaluator Runs Successfully with a Valid API Key

Precondition: Ensure a valid OpenAI API key is set in the LLM Keys view on the frontend.
Action:
1. Run an LLM-based evaluator (e.g., auto_ai_critique) with the valid OpenAI API key.
Expected Outcome:
- The evaluator should run without errors.
- The output should be consistent with expected results based on the inputs provided.

Test 3 (automated): Test for AI Critique (LLM as a Judge) API Key Checks

Precondition: Ensure the test suite includes scenarios where the OpenAI API key is required.
Action:
1. This will be run automatically by GitHub.
Expected Outcome:
- The test suite should pass, covering scenarios with and without an OpenAI API key.
- The new test case for auto_ai_critique should be included and verified as part of the coverage.

… message

…ring OpenAI API key

vercel · 2024-08-14T07:07:37Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
agenta	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 26, 2024 1:16pm
agenta-documentation	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 26, 2024 1:16pm

jp-agenta

Thanks @aybruhm !
There is one thing missing though.
See 👇

agenta/agenta-backend/agenta_backend/routers/evaluation_router.py

Line 115 in d8a1bbd

success, response = await check_ai_critique_inputs(

We'd need to update that so that we immediately check for API keys for any LLM-based evaluator, not just AI critique. Does it make sense ?

…in evaluators

- format llm provider keys - and to ensure required llm keys exists in the provided evaluator configs

- properly format llm provider keys - and check that the required llm keys exists

agenta-backend/agenta_backend/services/helpers.py

agenta-backend/agenta_backend/tests/variants_main_router/conftest.py

jp-agenta

Left one suggestion to centralise evaluator info.

- configurable setting to evaluators requiring llm api keys - update fixture to make use of centralized evaluators

…s-in-llm-based-evaluators

…loat for ai critique evaluator

…i-to-playground' into feature/age-532-poc-1e-add-llm-api-key-checks-in-llm-based-evaluators

jp-agenta

This can't go to QA until we fix the issue whereby the LLM doesn't respond in numeric format anymore.
-- @mmabrouk @aybruhm

aybruhm · 2024-08-26T13:37:38Z

This can't go to QA until we fix the issue whereby the LLM doesn't respond in numeric format anymore. -- @mmabrouk @aybruhm

It works now, and I didn’t change anything. Safe to say the language model was probably hallucinating at the time, right?

aybruhm · 2024-08-29T08:22:01Z

QA Report:

https://agenta-hq.slack.com/archives/C05JT3NQB1R/p1724832222331489?thread_ts=1724757298.386189&cid=C05JT3NQB1R

zenUnicorn · 2024-08-29T12:23:00Z

I have QA the PR and I got the expected result ✅
Works fine!

aybruhm added 2 commits August 14, 2024 07:58

refactor (backend): add check for OpenAI API key with clear exception…

28320b8

… message

feat (tests): add test case for auto_ai_critique and evaluators requi…

d8a1bbd

…ring OpenAI API key

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Aug 14, 2024

aybruhm temporarily deployed to oss August 14, 2024 07:07 — with GitHub Actions Inactive

aybruhm requested a review from jp-agenta August 14, 2024 07:07

dosubot bot added Backend enhancement New feature or request labels Aug 14, 2024

jp-agenta requested changes Aug 19, 2024

View reviewed changes

aybruhm added 4 commits August 19, 2024 23:50

refactor (backend): rewrite db function to check if evaluators exist …

e02fefa

…in evaluators

chore (backend): remove deprecated function 'check_ai_critique_inputs'

4cee49f

feat (backend): implemented helper functions to:

c6ee3c8

- format llm provider keys - and to ensure required llm keys exists in the provided evaluator configs

refactor (backend): update evaluator_router to:

a8c1273

- properly format llm provider keys - and check that the required llm keys exists

aybruhm temporarily deployed to oss August 19, 2024 23:03 — with GitHub Actions Inactive

vercel bot deployed to Preview – agenta August 19, 2024 23:04 View deployment

feat (tests): added test to create evaluation with no llm keys

f3367ef

aybruhm temporarily deployed to oss August 20, 2024 07:16 — with GitHub Actions Inactive

vercel bot had a problem deploying to Preview – agenta-documentation August 20, 2024 07:16 Failure

vercel bot deployed to Preview – agenta August 20, 2024 07:18 View deployment

jp-agenta reviewed Aug 20, 2024

View reviewed changes

agenta-backend/agenta_backend/services/helpers.py Show resolved Hide resolved

jp-agenta reviewed Aug 20, 2024

View reviewed changes

agenta-backend/agenta_backend/tests/variants_main_router/conftest.py Show resolved Hide resolved

jp-agenta reviewed Aug 20, 2024

View reviewed changes

refactor (backend): added

c499a19

- configurable setting to evaluators requiring llm api keys - update fixture to make use of centralized evaluators

vercel bot deployed to Preview – agenta August 20, 2024 09:35 View deployment

vercel bot had a problem deploying to Preview – agenta-documentation August 20, 2024 09:35 Failure

Merge branch 'main' into feature/age-532-poc-1e-add-llm-api-key-check…

cc90567

…s-in-llm-based-evaluators

dosubot bot removed the size:L This PR changes 100-499 lines, ignoring generated files. label Aug 20, 2024

aybruhm temporarily deployed to oss August 22, 2024 00:06 — with GitHub Actions Inactive

vercel bot deployed to Preview – agenta-documentation August 22, 2024 00:06 View deployment

vercel bot deployed to Preview – agenta August 22, 2024 00:10 View deployment

jp-agenta approved these changes Aug 22, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Aug 22, 2024

fix ai critique

91d23d8

jp-agenta temporarily deployed to oss August 23, 2024 12:14 — with GitHub Actions Inactive

vercel bot deployed to Preview – agenta-documentation August 23, 2024 12:15 View deployment

vercel bot deployed to Preview – agenta August 23, 2024 12:17 View deployment

minor refactor (backend): resolve ValueError when casting string to f…

ca81cea

…loat for ai critique evaluator

vercel bot deployed to Preview – agenta-documentation August 23, 2024 13:33 View deployment

vercel bot deployed to Preview – agenta August 23, 2024 13:35 View deployment

Merge branch 'feature/age-491-poc-1e-expose-running-evaluators-via-ap…

0ce0022

…i-to-playground' into feature/age-532-poc-1e-add-llm-api-key-checks-in-llm-based-evaluators

aybruhm temporarily deployed to oss August 26, 2024 09:31 — with GitHub Actions Inactive

vercel bot deployed to Preview – agenta August 26, 2024 09:37 View deployment

vercel bot deployed to Preview – agenta-documentation August 26, 2024 09:38 View deployment

jp-agenta requested changes Aug 26, 2024

View reviewed changes

Update evaluators_service.py

cc33a66

jp-agenta temporarily deployed to oss August 26, 2024 13:13 — with GitHub Actions Inactive

vercel bot deployed to Preview – agenta August 26, 2024 13:15 View deployment

vercel bot deployed to Preview – agenta-documentation August 26, 2024 13:16 View deployment

aybruhm merged commit 9212c7b into feature/age-491-poc-1e-expose-running-evaluators-via-api-to-playground Aug 29, 2024
7 checks passed

aybruhm deleted the feature/age-532-poc-1e-add-llm-api-key-checks-in-llm-based-evaluators branch August 29, 2024 08:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Add LLM API key checks to LLM-based evaluators #1989

[Enhancement] Add LLM API key checks to LLM-based evaluators #1989

aybruhm commented Aug 14, 2024 •

edited

Loading

vercel bot commented Aug 14, 2024 •

edited

Loading

jp-agenta left a comment •

edited

Loading

jp-agenta left a comment

jp-agenta left a comment

aybruhm commented Aug 26, 2024

aybruhm commented Aug 29, 2024

zenUnicorn commented Aug 29, 2024

[Enhancement] Add LLM API key checks to LLM-based evaluators #1989

[Enhancement] Add LLM API key checks to LLM-based evaluators #1989

Conversation

aybruhm commented Aug 14, 2024 • edited Loading

Description

Related Issue

Acceptance Tests

Test 1: OpenAI API Key is Required for LLM-Based Evaluators

Test 2: LLM-based Evaluator Runs Successfully with a Valid API Key

Test 3 (automated): Test for AI Critique (LLM as a Judge) API Key Checks

vercel bot commented Aug 14, 2024 • edited Loading

jp-agenta left a comment • edited Loading

Choose a reason for hiding this comment

jp-agenta left a comment

Choose a reason for hiding this comment

jp-agenta left a comment

Choose a reason for hiding this comment

aybruhm commented Aug 26, 2024

aybruhm commented Aug 29, 2024

zenUnicorn commented Aug 29, 2024

aybruhm commented Aug 14, 2024 •

edited

Loading

vercel bot commented Aug 14, 2024 •

edited

Loading

jp-agenta left a comment •

edited

Loading