Evaluators can access all columns #1606

aakrem · 2024-05-02T17:42:20Z

Now evaluators have access to whatever correct answer column from the testset.

vercel · 2024-05-02T17:42:27Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
agenta	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 31, 2024 7:46am

mmabrouk

I don’t understand why you specify correct_answer_key as a special argument for all evaluators. This makes the assumption that the evaluator would need access to one column in the test set that has the correct_answer. However, it could be that the evaluator requires multiple columns (which do not even need to be semantically correct answers), or it might not require anything other than the output of the LLM app (as for instance the case of a toxicity evaluator).

Instead, what I propose is to have the correct_answer_key as part of the configuration of the evaluators (i.e. in settings_values). This would simplify the code (less arguments in the calls), and make the logic more general.

The second comment is about the UI, I think that per default, the evaluators we have should use the correct_answer column. We can have this option hidden inside a collapse.

agenta-backend/agenta_backend/services/evaluators_service.py

agenta-backend/agenta_backend/tasks/evaluations.py

…ultiple correct answers

Removed langchain and use openai directly

Fix evaluation PR

mmabrouk

Thanks for the PR @aakrem

I reviewed the migration. It makes sense to me from logical perspective. For the syntax itself (using .dict() and deleting __dict__) I trust you have tested it.

mmabrouk · 2024-05-31T08:15:01Z

Thanks for the PR, @aakrem! Great job. I have a few concerns (unrelated to the code changes).

I think the title of the PR and its description are a bit misleading. I have two questions:
* If an evaluator has access to all columns, why can we only access a single column in the test set?

This has been fixed. Now you configure in the evaluator which columns it has access to and it would be able to access them

* If the evaluators has access to any correct answer column from the test set, why are similarity match, JSON field match, webhook test, and Levenshtein distance the only evaluators that do?

It depends on the evaluator. Some of them, like JSON check, which checks whether the llm output is in json format does not require access to any ground truth obviously.

Responded to review and clarified issues in #1606 (comment)

aakrem added 2 commits May 2, 2024 19:13

access to all columns

7145533

fix correct answer in ai critique

d3d3412

fix tests

e4a481f

vercel bot deployed to Preview May 2, 2024 19:03 View deployment

aakrem added 2 commits May 3, 2024 12:13

add correct answer to evaluators

6c09e6e

get correct answer from evaluator instead of evaluation payload

2a62d29

vercel bot deployed to Preview May 3, 2024 10:15 View deployment

mmabrouk requested changes May 3, 2024

View reviewed changes

agenta-backend/agenta_backend/services/evaluators_service.py Outdated Show resolved Hide resolved

agenta-backend/agenta_backend/services/evaluators_service.py Outdated Show resolved Hide resolved

agenta-backend/agenta_backend/tasks/evaluations.py Outdated Show resolved Hide resolved

aakrem added 3 commits May 3, 2024 19:09

access correct answer directly in the evaluators and handle passing m…

ae58b0e

…ultiple correct answers

add default value for correct_answer and small renaming

d54f20d

adjust schema

c9c212a

vercel bot deployed to Preview May 9, 2024 01:24 View deployment

aakrem added 2 commits May 9, 2024 03:24

add as much correct answers columns there are in an ES

be1ee39

ajust correct answer type in frontend

e18c3a4

vercel bot had a problem deploying to Preview May 9, 2024 01:26 Failure

small build fix

ed2d413

vercel bot had a problem deploying to Preview May 9, 2024 01:31 Failure

aakrem added 2 commits May 9, 2024 04:13

fix build

2b131d2

hsndle multiple correct answers

519e99d

vercel bot deployed to Preview May 9, 2024 02:16 View deployment

revert to single ground truth

68f626e

vercel bot deployed to Preview May 9, 2024 09:05 View deployment

toggle correct answer input visibility

ac6bf19

vercel bot deployed to Preview May 9, 2024 12:35 View deployment

aakrem added 2 commits May 9, 2024 15:23

rename correct_answer to value

d2e1a36

migration script

07e0e86

vercel bot deployed to Preview May 9, 2024 13:25 View deployment

added antd collapse to toggle correct_answer input

fc30f52

vercel bot deployed to Preview May 9, 2024 14:02 View deployment

mmabrouk and others added 15 commits May 28, 2024 17:17

Fixed evaluators definition

df16e88

minor fix

87b0c2f

updated pyproject

a0d0420

Added auto similarity

1a6e4ce

Removed langchain and use openai directly

formatting

6b44f5c

updated docker

3fba85d

fix-lenshtein test

111c2e5

t

7b69bec

fix the test

31d7c88

improved tests

ea6919b

remove comment

f5ca784

improved label

c78856a

fixed correct_answer_key payload

7bffe15

cleanup

d285754

Merge pull request #1711 from Agenta-AI/fix-all-columns

2080680

Fix evaluation PR

vercel bot deployed to Preview May 30, 2024 18:10 View deployment

Merge branch 'main' into access-to-all-columns

d6132bc

mmabrouk temporarily deployed to oss May 31, 2024 07:22 — with GitHub Actions Inactive

vercel bot deployed to Preview May 31, 2024 07:24 View deployment

update lock

b9dde52

mmabrouk temporarily deployed to oss May 31, 2024 07:44 — with GitHub Actions Inactive

vercel bot deployed to Preview May 31, 2024 07:46 View deployment

mmabrouk approved these changes May 31, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label May 31, 2024

aakrem merged commit 3f12a1f into main May 31, 2024
12 checks passed

aakrem deleted the access-to-all-columns branch May 31, 2024 08:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluators can access all columns #1606

Evaluators can access all columns #1606

aakrem commented May 2, 2024 •

edited

Loading

vercel bot commented May 2, 2024 •

edited

Loading

mmabrouk left a comment

mmabrouk left a comment

mmabrouk commented May 31, 2024

Evaluators can access all columns #1606

Evaluators can access all columns #1606

Conversation

aakrem commented May 2, 2024 • edited Loading

vercel bot commented May 2, 2024 • edited Loading

mmabrouk left a comment

Choose a reason for hiding this comment

mmabrouk left a comment

Choose a reason for hiding this comment

mmabrouk commented May 31, 2024

aakrem commented May 2, 2024 •

edited

Loading

vercel bot commented May 2, 2024 •

edited

Loading