Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Evaluators - Debugging #2071

Merged
merged 193 commits into from
Sep 23, 2024
Merged

Conversation

bekossy
Copy link
Member

@bekossy bekossy commented Sep 10, 2024

Description

This PR enhances the Evaluation UI for both auto and human evaluations, improving user experience and workflow efficiency.

Related Issue

Closes AGE-587

Related PR

Commons PR #108

Key Changes

  • Auto-Evaluation: Redesigned interface and added filters for easier navigation.
  • Human-Evaluation: Improved flow, feedback, and layout for better usability.
  • Evaluator Management: Full UI overhaul, added evaluator debug/test feature.

QA Instructions

  • Evaluators:

    • Create New Evaluators: Ensure new evaluators can be created without issues.
    • Test Evaluator Debug Feature: Thoroughly test the debug feature by selecting a test set and a variant, and confirm the variant runs successfully.
    • Customize Evaluator Output: Verify that the advanced settings allow for proper customization of the evaluator output.
    • Use Evaluator Filters: Apply all available filters to ensure you can accurately find the desired evaluator in the suggestions list.
    • Perform C.R.U.D. Operations: Test the creation, reading, updating, and deletion (CRUD) operations on evaluators to confirm full functionality.
  • Auto-Evaluation:

    • Create New Evaluations: Confirm that new auto-evaluations can be created successfully.
    • Status Update Check: Verify that the evaluation status updates correctly and does not remain stuck at 0s.
    • Filters, Sorting, and Editing: Test all filter, sort, and edit functions to ensure they work as expected.
    • Batch Evaluation Creation: Successfully create multiple evaluations at once and confirm they are processed correctly.
    • UI Interaction: Interact with every visible element in the UI and ensure everything functions properly.
  • Human Evaluations:

    • Create New Human Evaluations: Test the successful creation of new human evaluations.
    • Delete Evaluations: Ensure that both single and multiple evaluations can be deleted without errors.
    • UI Interaction: Engage with every visible UI element and confirm all components are working as intended.

aybruhm added 30 commits August 1, 2024 21:26
…evaluator interface and modified endpoint to evaluate llm app run
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend enhancement New feature or request Frontend lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files. UI UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants