Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/code-evaluators #1750

Merged
merged 5 commits into from
Jun 3, 2024
Merged

Bugfix/code-evaluators #1750

merged 5 commits into from
Jun 3, 2024

Conversation

mmabrouk
Copy link
Member

@mmabrouk mmabrouk commented Jun 3, 2024

Closes #1749

Issues:

  • We were incorrectly calling the execute_code_safely function with the wrong parameters (data_point instead of correct_answer).

Solutions:

  • The code should ideally have access to the whole row, not just the correct_answer. However, there are pre-existing eval functions with a signature that expects the correct_answer.
  • To address this, I created a solution that is backward compatible.
  • I have updated the example to demonstrate a snippet with the new signature that expects data_point.

Another issue was that the frontend had no way to know which ground truth to display in the new architecture. To resolve this:

  • I added a correct_answer parameter in the evaluator configuration. This parameter is used solely to inform the frontend which column to display on the results page.
  • In the future, this could be improved to enable the display of multiple or a list of ground truth columns.

Quality Assurance:

  • I tested the PR locally using both an old evaluator with the old configuration and a new evaluator with the new signature.
  • I also tested a failing evaluator to verify the changes made to the results display.

Additional Testing Requirements:

  • Test this in the staging environment with both settings.
  • Ensure the correct_answer column is correctly displayed in the results.
  • Test changing the name of the correct_column, using a code that uses the new name, and verifying whether it is correctly displayed in the results.

Note: Migration scripts will need to be updated to modify the code evaluator and add correct_answer.

Copy link

vercel bot commented Jun 3, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
agenta ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 3, 2024 1:02pm

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 3, 2024
@dosubot dosubot bot added Backend bug Something isn't working labels Jun 3, 2024
@mmabrouk mmabrouk requested a review from aakrem June 3, 2024 09:56
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jun 3, 2024
Copy link
Collaborator

@aakrem aakrem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks for the fix @mmabrouk

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jun 3, 2024
@aakrem aakrem merged commit c8a4969 into main Jun 3, 2024
9 checks passed
@aakrem aakrem deleted the bugfix/code-evaluators branch June 3, 2024 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend bug Something isn't working lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AGE-291] [Bug] Code Evaluation is not working
2 participants