Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test: automatic evaluation with different correct_answer column #1985

Merged
merged 6 commits into from
Sep 9, 2024

Conversation

ashrafchowdury
Copy link
Collaborator

Description:
Added test for automatic evaluation to check whether it works with different correct_answer columns or not.

Changes:

  • Automatic evaluation with different correct_answer columns.
  • Made the cy.createNewEvaluation() test function dynamic for better useibility.

Copy link

vercel bot commented Aug 13, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
agenta ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 3, 2024 1:56pm
agenta-documentation ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 3, 2024 1:56pm

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. cypress tests labels Aug 13, 2024
Copy link
Member

@bekossy bekossy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s also modify the Expected Answer Column in the test to match the ground truth in the testset
Screenshot 2024-09-03 at 11 13 10 AM

And fix the failing Evaluation scenarios test

@ashrafchowdury
Copy link
Collaborator Author

  1. In this test, We are modifying the Expected answer 'correct_answer' to just 'answer'.
  2. The test is falling for the LLM models freezing issue:

image

Copy link
Member

@bekossy bekossy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @ashrafchowdury

LGTM @aakrem

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Sep 3, 2024
@aakrem aakrem merged commit 322a89f into main Sep 9, 2024
9 checks passed
@aakrem aakrem deleted the test-automatic-evaluation-with-different-column branch September 9, 2024 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cypress lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files. tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants