[Enhancement]: Handle non-string outputs gracefully in auto_contains_json evaluator #1987

aybruhm · 2024-08-13T14:37:18Z

Description

This PR enhances the auto_contains_json evaluator to handle cases where the output is not a string more gracefully, ensuring that any untracked errors lead to a clear and informative failure.

Related Issue

Closes AGE-573

What to QA

Evaluation Run:
- Run a contains_json evaluator and run an evaluation and ensure it completes successfully
- Verify that evaluation results are accurate and consistent across multiple runs.

Acceptance Tests

Test 1: Evaluator Handles Non-String Output Gracefully

Precondition: Ensure the auto_contains_json evaluator is set up.
Action:
1. Run the evaluator with an output (where output is the LLM response of the application) that is not a string (e.g., a dictionary or list).
Expected Outcome:
- The evaluator should fail gracefully, providing a clear and informative error message.
- The error message should indicate that the output was not a string and suggest possible resolutions.

Test 2: Evaluation Completes Successfully

Precondition: Ensure the auto_contains_json evaluator is set up.
Action:
1. Run a typical evaluation using the auto_contains_json evaluator with a string output (where output is the LLM response of the application).
Expected Outcome:
- The evaluation should complete without errors.
- The evaluation process should not be interrupted, and all expected results should be produced.

…luator

…ins_json evaluator

vercel · 2024-08-13T14:37:21Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
agenta	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 23, 2024 2:06pm
agenta-documentation	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 23, 2024 2:06pm

agenta-backend/agenta_backend/services/evaluators_service.py

jp-agenta

Thanks @aybruhm, quick question 👇
Shouldn't this apply to all non-RAG evaluators (not just contains_json) ?

…i-to-playground' into feature/age-573-evaluators-fail-gracefully-when-we-send-a-dict-to-a-str-only

aybruhm · 2024-08-20T15:07:45Z

Thanks @aybruhm, quick question 👇 Shouldn't this apply to all non-RAG evaluators (not just contains_json) ?

We already are doing that. The contains_json evaluator requires that the value of data is a str-JSON, and not another type. See comment here.

…i-to-playground' into feature/age-573-evaluators-fail-gracefully-when-we-send-a-dict-to-a-str-only

…nd use functions in evaluators

…flect changes in test cases - Added parameters in 'test_auto_json_diff' for BaseResponse compatibility - Updated parameters in 'test_auto_contains_json' to align with recent changes

jp-agenta · 2024-08-23T11:39:33Z

QA'd in oss-local.
QA in cloud-staging pending.

…o-@next/font fix(frontend): Migrate Inter Font to @next/font

…hen-we-send-a-dict-to-a-str-only

aybruhm added 2 commits August 13, 2024 15:15

refactor (backend): improve error handling for auto_contains_json eva…

d3f7315

…luator

feat (tests): add tests for dictionary-based output handling in conta…

08b9e87

…ins_json evaluator

dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Aug 13, 2024

aybruhm temporarily deployed to oss August 13, 2024 14:37 — with GitHub Actions Inactive

aybruhm requested a review from jp-agenta August 13, 2024 14:37

dosubot bot added Backend enhancement New feature or request labels Aug 13, 2024

jp-agenta reviewed Aug 19, 2024

View reviewed changes

agenta-backend/agenta_backend/services/evaluators_service.py Outdated Show resolved Hide resolved

jp-agenta reviewed Aug 19, 2024

View reviewed changes

agenta-backend/agenta_backend/services/evaluators_service.py Outdated Show resolved Hide resolved

jp-agenta reviewed Aug 19, 2024

View reviewed changes

jp-agenta mentioned this pull request Aug 19, 2024

fix(backend) json-diff evaluator #2001

Merged

aybruhm added 2 commits August 20, 2024 11:17

Merge branch 'feature/age-491-poc-1e-expose-running-evaluators-via-ap…

f0cc8c6

…i-to-playground' into feature/age-573-evaluators-fail-gracefully-when-we-send-a-dict-to-a-str-only

chore (backend): remove redundant error message

d1fe5aa

aybruhm temporarily deployed to oss August 20, 2024 10:25 — with GitHub Actions Inactive

vercel bot had a problem deploying to Preview – agenta-documentation August 20, 2024 10:25 Failure

vercel bot deployed to Preview – agenta August 20, 2024 10:28 View deployment

Merge branch 'feature/age-491-poc-1e-expose-running-evaluators-via-ap…

05ae4b5

…i-to-playground' into feature/age-573-evaluators-fail-gracefully-when-we-send-a-dict-to-a-str-only

aybruhm requested a review from jp-agenta August 20, 2024 15:08

aybruhm temporarily deployed to oss August 20, 2024 18:30 — with GitHub Actions Inactive

vercel bot had a problem deploying to Preview – agenta-documentation August 20, 2024 18:30 Failure

vercel bot deployed to Preview – agenta August 20, 2024 18:32 View deployment

aybruhm added 4 commits August 21, 2024 18:17

Merge branch 'feature/age-491-poc-1e-expose-running-evaluators-via-ap…

ac1ac7e

…i-to-playground' into feature/age-573-evaluators-fail-gracefully-when-we-send-a-dict-to-a-str-only

Merge branch 'feature/age-491-poc-1e-expose-running-evaluators-via-ap…

9309f43

…i-to-playground' into feature/age-573-evaluators-fail-gracefully-when-we-send-a-dict-to-a-str-only

refactor (backend): centralize validation of string and json output a…

23be8b6

…nd use functions in evaluators

feat (tests): update parameters for BaseResponse compatibility and re…

b6db4f1

…flect changes in test cases - Added parameters in 'test_auto_json_diff' for BaseResponse compatibility - Updated parameters in 'test_auto_contains_json' to align with recent changes

chore (style): format evaluators_service with black@23.12.0

892a351

aybruhm had a problem deploying to oss August 21, 2024 20:12 — with GitHub Actions Failure

aybruhm temporarily deployed to oss August 21, 2024 20:12 — with GitHub Actions Inactive

vercel bot deployed to Preview – agenta-documentation August 21, 2024 20:13 View deployment

vercel bot deployed to Preview – agenta August 21, 2024 20:16 View deployment

aybruhm temporarily deployed to oss August 21, 2024 20:28 — with GitHub Actions Inactive

jp-agenta approved these changes Aug 22, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Aug 22, 2024

jp-agenta added 2 commits August 23, 2024 08:57

Merge branch 'main' of github.com:Agenta-AI/agenta

2e76a1c

Enforce in Union[str, Dict[str, Any]] in BaseResponse in SDK

3cad5db

jp-agenta temporarily deployed to oss August 23, 2024 12:31 — with GitHub Actions Inactive

vercel bot deployed to Preview – agenta-documentation August 23, 2024 12:31 View deployment

vercel bot deployed to Preview – agenta August 23, 2024 12:34 View deployment

bekossy and others added 4 commits August 23, 2024 13:36

fix(frontend): Migrate Inter font to use @next/font

35e6fec

Merge pull request #2016 from Agenta-AI/AGE-654/-migrate-Inter-Font-t…

1dec2c6

…o-@next/font fix(frontend): Migrate Inter Font to @next/font

Merge branch 'main' of github.com:Agenta-AI/agenta

6238cd8

Merge branch 'main' into feature/age-573-evaluators-fail-gracefully-w…

f3546ef

…hen-we-send-a-dict-to-a-str-only

jp-agenta temporarily deployed to oss August 23, 2024 13:35 — with GitHub Actions Inactive

vercel bot deployed to Preview – agenta August 23, 2024 13:38 View deployment

vercel bot deployed to Preview – agenta-documentation August 23, 2024 13:41 View deployment

fix exception message and bump SDK out of pre-release

2402f94

jp-agenta temporarily deployed to oss August 23, 2024 14:04 — with GitHub Actions Inactive

vercel bot deployed to Preview – agenta-documentation August 23, 2024 14:04 View deployment

vercel bot deployed to Preview – agenta August 23, 2024 14:06 View deployment

jp-agenta merged commit 532a4bb into feature/age-491-poc-1e-expose-running-evaluators-via-api-to-playground Aug 23, 2024
11 checks passed

jp-agenta deleted the feature/age-573-evaluators-fail-gracefully-when-we-send-a-dict-to-a-str-only branch August 23, 2024 14:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement]: Handle non-string outputs gracefully in auto_contains_json evaluator #1987

[Enhancement]: Handle non-string outputs gracefully in auto_contains_json evaluator #1987

aybruhm commented Aug 13, 2024 •

edited

Loading

vercel bot commented Aug 13, 2024 •

edited

Loading

jp-agenta left a comment

aybruhm commented Aug 20, 2024

jp-agenta commented Aug 23, 2024

[Enhancement]: Handle non-string outputs gracefully in auto_contains_json evaluator #1987

[Enhancement]: Handle non-string outputs gracefully in auto_contains_json evaluator #1987

Conversation

aybruhm commented Aug 13, 2024 • edited Loading

Description

Related Issue

What to QA

Acceptance Tests

Test 1: Evaluator Handles Non-String Output Gracefully

Test 2: Evaluation Completes Successfully

vercel bot commented Aug 13, 2024 • edited Loading

jp-agenta left a comment

Choose a reason for hiding this comment

aybruhm commented Aug 20, 2024

jp-agenta commented Aug 23, 2024

aybruhm commented Aug 13, 2024 •

edited

Loading

vercel bot commented Aug 13, 2024 •

edited

Loading