Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to JSON evaluators #2105

Merged
merged 6 commits into from
Oct 8, 2024

Conversation

mmabrouk
Copy link
Member

@mmabrouk mmabrouk commented Oct 7, 2024

Closes: AGE-1016 and AGE-1017

Both contains_json and json_diff were not functioning correctly. I refactored the code, removing validate_json (which was unnecessary—more on that later), and fixed two bugs:

  1. We were throwing an exception if the output wasn't JSON, contradicting the evaluator's purpose of scoring based on whether the output is JSON.
  2. We weren't parsing strings as dictionaries for the evaluator.

On validate_json:

  • Methods should do one thing and one thing only. In this case it was a function that created a string json if the input is a dict and validated the input if it is a string
  • Methods should have a name that describe what they do. The method did not validate jsons, it did something else
  • Whenever we have a method that take two possible data types (string or dict), usually the code is not well written.

QA

  • This PR fixes the evaluators contains_json and diff_json. It does not touch the logic of the second but only the inputs.
  • Please test the following:
    -- Whether the two evaluators work both in no-code evaluation and in the evaluator playground. Please make sure to look very well at the results and see whether they make sense at all configuration
    -- Please test the evaluators with different input mapping (using testset.column for correct answer and correct answer directly) and both from the evaluation playground and from no-code evaluation

bug was that when the output was not json we thrown an exception instead of returning False

closes AGE-1017
Copy link

vercel bot commented Oct 7, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
agenta ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 7, 2024 9:08pm
agenta-documentation ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 7, 2024 9:08pm

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Oct 7, 2024
@dosubot dosubot bot added bug Something isn't working refactoring labels Oct 7, 2024
Copy link
Member

@aybruhm aybruhm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix (and the awesome work), @mmabrouk!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 8, 2024
@mmabrouk mmabrouk merged commit 385cb90 into main Oct 8, 2024
9 of 10 checks passed
@mmabrouk mmabrouk deleted the mmabrouk/fix/AGE-1016-json-diff-evaluator-fix branch October 8, 2024 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lgtm This PR has been approved by a maintainer refactoring size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants