Bugfix/code-evaluators #1750

mmabrouk · 2024-06-03T09:51:26Z

Issues:

We were incorrectly calling the execute_code_safely function with the wrong parameters (data_point instead of correct_answer).

Solutions:

The code should ideally have access to the whole row, not just the correct_answer. However, there are pre-existing eval functions with a signature that expects the correct_answer.
To address this, I created a solution that is backward compatible.
I have updated the example to demonstrate a snippet with the new signature that expects data_point.

Another issue was that the frontend had no way to know which ground truth to display in the new architecture. To resolve this:

I added a correct_answer parameter in the evaluator configuration. This parameter is used solely to inform the frontend which column to display on the results page.
In the future, this could be improved to enable the display of multiple or a list of ground truth columns.

Quality Assurance:

I tested the PR locally using both an old evaluator with the old configuration and a new evaluator with the new signature.
I also tested a failing evaluator to verify the changes made to the results display.

Additional Testing Requirements:

Test this in the staging environment with both settings.
Ensure the correct_answer column is correctly displayed in the results.
Test changing the name of the correct_column, using a code that uses the new name, and verifying whether it is correctly displayed in the results.

Note: Migration scripts will need to be updated to modify the code evaluator and add correct_answer.

vercel · 2024-06-03T09:51:30Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
agenta	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jun 3, 2024 1:02pm

aakrem

lgtm thanks for the fix @mmabrouk

mmabrouk added 3 commits June 3, 2024 11:41

Modified the code eval config

942ad82

Modified the code evaluator to fix the bug and be backward compatible

f8b096c

Improved the error message when an evaluation fails

fdd5ea6

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 3, 2024

mmabrouk temporarily deployed to oss June 3, 2024 09:51 — with GitHub Actions Inactive

dosubot bot added Backend bug Something isn't working labels Jun 3, 2024

mmabrouk requested a review from aakrem June 3, 2024 09:56

add migration for code evaluator

b1ba356

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jun 3, 2024

aakrem temporarily deployed to oss June 3, 2024 10:08 — with GitHub Actions Inactive

vercel bot deployed to Preview June 3, 2024 10:10 View deployment

rename migration name

d6158e6

aakrem temporarily deployed to oss June 3, 2024 13:00 — with GitHub Actions Inactive

vercel bot deployed to Preview June 3, 2024 13:02 View deployment

aakrem approved these changes Jun 3, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Jun 3, 2024

aakrem merged commit c8a4969 into main Jun 3, 2024
9 checks passed

aakrem deleted the bugfix/code-evaluators branch June 3, 2024 13:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix/code-evaluators #1750

Bugfix/code-evaluators #1750

mmabrouk commented Jun 3, 2024

vercel bot commented Jun 3, 2024 •

edited

Loading

aakrem left a comment

Bugfix/code-evaluators #1750

Bugfix/code-evaluators #1750

Conversation

mmabrouk commented Jun 3, 2024

vercel bot commented Jun 3, 2024 • edited Loading

aakrem left a comment

Choose a reason for hiding this comment

vercel bot commented Jun 3, 2024 •

edited

Loading