Fixes to JSON evaluators #2105

mmabrouk · 2024-10-07T20:49:23Z

Closes: AGE-1016 and AGE-1017

Both contains_json and json_diff were not functioning correctly. I refactored the code, removing validate_json (which was unnecessary—more on that later), and fixed two bugs:

We were throwing an exception if the output wasn't JSON, contradicting the evaluator's purpose of scoring based on whether the output is JSON.
We weren't parsing strings as dictionaries for the evaluator.

On validate_json:

Methods should do one thing and one thing only. In this case it was a function that created a string json if the input is a dict and validated the input if it is a string
Methods should have a name that describe what they do. The method did not validate jsons, it did something else
Whenever we have a method that take two possible data types (string or dict), usually the code is not well written.

QA

This PR fixes the evaluators contains_json and diff_json. It does not touch the logic of the second but only the inputs.
Please test the following:
-- Whether the two evaluators work both in no-code evaluation and in the evaluator playground. Please make sure to look very well at the results and see whether they make sense at all configuration
-- Please test the evaluators with different input mapping (using testset.column for correct answer and correct answer directly) and both from the evaluation playground and from no-code evaluation

bug was that when the output was not json we thrown an exception instead of returning False closes AGE-1017

Closes: AGE-1016

vercel · 2024-10-07T20:49:28Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
agenta	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Oct 7, 2024 9:08pm
agenta-documentation	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Oct 7, 2024 9:08pm

aybruhm

Thank you for the fix (and the awesome work), @mmabrouk!

mmabrouk added 3 commits October 7, 2024 20:08

refactor(backend): AGE-1016 remove useless condition

905967b

fix(backend): AGE-1017 fix contains json

fc40104

bug was that when the output was not json we thrown an exception instead of returning False closes AGE-1017

fix(backend): AGE-1016 fix diff json eval

7252508

Closes: AGE-1016

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Oct 7, 2024

mmabrouk temporarily deployed to oss October 7, 2024 20:49 — with GitHub Actions Inactive

mmabrouk had a problem deploying to oss October 7, 2024 20:49 — with GitHub Actions Failure

dosubot bot added bug Something isn't working refactoring labels Oct 7, 2024

chore(backend): AGE-1016 formatting

538310e

mmabrouk had a problem deploying to oss October 7, 2024 20:56 — with GitHub Actions Failure

mmabrouk temporarily deployed to oss October 7, 2024 20:56 — with GitHub Actions Inactive

vercel bot deployed to Preview – agenta October 7, 2024 20:59 View deployment

vercel bot deployed to Preview – agenta-documentation October 7, 2024 20:59 View deployment

mmabrouk added 2 commits October 7, 2024 23:01

test(backend): fix tests contains_json

b42613f

test(backend): AGE-1016 fix code for tests

cf33c49

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Oct 7, 2024

mmabrouk had a problem deploying to oss October 7, 2024 21:05 — with GitHub Actions Failure

mmabrouk temporarily deployed to oss October 7, 2024 21:05 — with GitHub Actions Inactive

vercel bot deployed to Preview – agenta-documentation October 7, 2024 21:06 View deployment

vercel bot deployed to Preview – agenta October 7, 2024 21:08 View deployment

mmabrouk had a problem deploying to oss October 7, 2024 21:11 — with GitHub Actions Failure

mmabrouk had a problem deploying to oss October 7, 2024 21:13 — with GitHub Actions Failure

aybruhm approved these changes Oct 8, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 8, 2024

mmabrouk merged commit 385cb90 into main Oct 8, 2024
9 of 10 checks passed

mmabrouk deleted the mmabrouk/fix/AGE-1016-json-diff-evaluator-fix branch October 8, 2024 13:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes to JSON evaluators #2105

Fixes to JSON evaluators #2105

mmabrouk commented Oct 7, 2024 •

edited

Loading

vercel bot commented Oct 7, 2024 •

edited

Loading

aybruhm left a comment

Fixes to JSON evaluators #2105

Fixes to JSON evaluators #2105

Conversation

mmabrouk commented Oct 7, 2024 • edited Loading

QA

vercel bot commented Oct 7, 2024 • edited Loading

aybruhm left a comment

Choose a reason for hiding this comment

mmabrouk commented Oct 7, 2024 •

edited

Loading

vercel bot commented Oct 7, 2024 •

edited

Loading