Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluators can access all columns #1606

Merged
merged 76 commits into from
May 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
7145533
access to all columns
aakrem May 2, 2024
d3d3412
fix correct answer in ai critique
aakrem May 2, 2024
e4a481f
fix tests
aakrem May 2, 2024
6c09e6e
add correct answer to evaluators
aakrem May 3, 2024
2a62d29
get correct answer from evaluator instead of evaluation payload
aakrem May 3, 2024
ae58b0e
access correct answer directly in the evaluators and handle passing m…
aakrem May 3, 2024
d54f20d
add default value for correct_answer and small renaming
aakrem May 3, 2024
c9c212a
adjust schema
aakrem May 9, 2024
be1ee39
add as much correct answers columns there are in an ES
aakrem May 9, 2024
e18c3a4
ajust correct answer type in frontend
aakrem May 9, 2024
ed2d413
small build fix
aakrem May 9, 2024
2b131d2
fix build
aakrem May 9, 2024
519e99d
hsndle multiple correct answers
aakrem May 9, 2024
68f626e
revert to single ground truth
aakrem May 9, 2024
ac6bf19
toggle correct answer input visibility
bekossy May 9, 2024
d2e1a36
rename correct_answer to value
aakrem May 9, 2024
07e0e86
migration script
aakrem May 9, 2024
fc30f52
added antd collapse to toggle correct_answer input
bekossy May 9, 2024
7e24866
fix evaluators tests
aakrem May 9, 2024
bab7582
select ground truth to apply diff in eval scenario view
bekossy May 9, 2024
0f67bd3
display only unique correct_answers
aakrem May 9, 2024
c01f6b2
filtered out duplicate keys from correctAnswer array
bekossy May 10, 2024
cb3b08f
added filtercolumns component and improve table headername display
bekossy May 10, 2024
725cf8d
Merge branch 'main' into access-to-all-columns
bekossy May 10, 2024
00529d0
bug fix
bekossy May 10, 2024
398dfe8
bug fix
bekossy May 10, 2024
89faad9
added dropdown diff and cleanup
bekossy May 11, 2024
14e5eb4
made static onClick prop dynamic and improve diff feature
bekossy May 12, 2024
e152c51
added ground truth column to comparison view and improved diff feature
bekossy May 13, 2024
0495376
fixed correct answer output
bekossy May 13, 2024
d879572
added helper to remove correctAnswer prefix and improved dropdown def…
bekossy May 14, 2024
3ebabb3
rename variable
aakrem May 14, 2024
975b01f
Merge pull request #1645 from Agenta-AI/sub-issue/-improve-eval-compa…
aakrem May 14, 2024
fc8ae62
improved diff button text
bekossy May 15, 2024
90f7380
access to all columns
aakrem May 2, 2024
34f2436
Merge branch 'main' into access-to-all-columns
aakrem May 15, 2024
14c8ee2
small refactor for correct answers logic
aakrem May 15, 2024
7424be3
fix errors type
aakrem May 15, 2024
95b82ef
convert correct_answer_keys to list
aakrem May 16, 2024
1d3d9d8
improve type
aakrem May 16, 2024
87c1d81
access to all columns
aakrem May 2, 2024
a706a69
Merge branch 'main' into access-to-all-columns
aakrem May 17, 2024
b5300df
add default correct answer in case its not provided
aakrem May 17, 2024
1002198
advanced settings in a separate component
aakrem May 17, 2024
7624354
bug fix
bekossy May 18, 2024
f4e2d47
Merge pull request #1665 from Agenta-AI/access-to-all-columns-advance…
aakrem May 19, 2024
f610819
fix backend tests
aakrem May 19, 2024
0aa62e1
create direct_use evaluators with default correct answers
aakrem May 19, 2024
27f8a0a
remove not needed code
aakrem May 19, 2024
dc8ed51
Add condition to evaluator card to show action buttons when direct_us…
bekossy May 19, 2024
ffb3c2d
filtered out evaluators when direct_use is true or settings_template …
bekossy May 20, 2024
5811770
Modify the evaluator definition for correct answer key
mmabrouk May 28, 2024
ca309b5
Refactored the evaluator service to use specific correct_answers
mmabrouk May 28, 2024
1f14f0e
Show the advanced settings undere a hidden collapse
mmabrouk May 28, 2024
7e11adc
Made the code more secure by removing the global().get which would al…
mmabrouk May 28, 2024
81f9b12
Improved the logic to use a correct_answer as a ground truth column i…
mmabrouk May 28, 2024
20d54ab
rewrote logic for creating ready to use evaluators
mmabrouk May 28, 2024
e6dcfd3
Allow editing ready to use evalutors
mmabrouk May 28, 2024
9faa400
allow the addition of ready to use evaluators
mmabrouk May 28, 2024
df16e88
Fixed evaluators definition
mmabrouk May 28, 2024
87b0c2f
minor fix
mmabrouk May 28, 2024
a0d0420
updated pyproject
mmabrouk May 28, 2024
1a6e4ce
Added auto similarity
mmabrouk May 28, 2024
6b44f5c
formatting
mmabrouk May 28, 2024
3fba85d
updated docker
mmabrouk May 28, 2024
111c2e5
fix-lenshtein test
mmabrouk May 28, 2024
7b69bec
t
mmabrouk May 28, 2024
31d7c88
fix the test
mmabrouk May 28, 2024
ea6919b
improved tests
mmabrouk May 28, 2024
f5ca784
remove comment
mmabrouk May 29, 2024
c78856a
improved label
mmabrouk May 29, 2024
7bffe15
fixed correct_answer_key payload
bekossy May 29, 2024
d285754
cleanup
bekossy May 29, 2024
2080680
Merge pull request #1711 from Agenta-AI/fix-all-columns
mmabrouk May 30, 2024
d6132bc
Merge branch 'main' into access-to-all-columns
mmabrouk May 31, 2024
b9dde52
update lock
mmabrouk May 31, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,345 @@
from datetime import datetime, timezone
from typing import Any, Dict, List, Optional

from pydantic import BaseModel, Field
from beanie import Document, Link, PydanticObjectId, iterative_migration


class UserDB(Document):
uid: str = Field(default="0", unique=True, index=True)
username: str = Field(default="agenta")
email: str = Field(default="demo@agenta.ai", unique=True)
created_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))
updated_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))

class Settings:
name = "users"


class ImageDB(Document):
"""Defines the info needed to get an image and connect it to the app variant"""

type: Optional[str] = Field(default="image")
template_uri: Optional[str]
docker_id: Optional[str] = Field(index=True)
tags: Optional[str]
deletable: bool = Field(default=True)
user: Link[UserDB]
created_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))
updated_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))

class Settings:
name = "docker_images"


class AppDB(Document):
app_name: str
user: Link[UserDB]
created_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))
updated_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))

class Settings:
name = "app_db"


class DeploymentDB(Document):
app: Link[AppDB]
user: Link[UserDB]
container_name: Optional[str]
container_id: Optional[str]
uri: Optional[str]
status: str
created_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))
updated_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))

class Settings:
name = "deployments"


class VariantBaseDB(Document):
app: Link[AppDB]
user: Link[UserDB]
base_name: str
image: Link[ImageDB]
deployment: Optional[PydanticObjectId] # Link to deployment
created_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))
updated_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))

class Settings:
name = "bases"


class ConfigDB(BaseModel):
config_name: str
parameters: Dict[str, Any] = Field(default_factory=dict)


class AppVariantDB(Document):
app: Link[AppDB]
variant_name: str
revision: int
image: Link[ImageDB]
user: Link[UserDB]
modified_by: Link[UserDB]
parameters: Dict[str, Any] = Field(default=dict) # TODO: deprecated. remove
previous_variant_name: Optional[str] # TODO: deprecated. remove
base_name: Optional[str]
base: Link[VariantBaseDB]
config_name: Optional[str]
config: ConfigDB
created_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))
updated_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))

is_deleted: bool = Field( # TODO: deprecated. remove
default=False
) # soft deletion for using the template variants

class Settings:
name = "app_variants"


class AppVariantRevisionsDB(Document):
variant: Link[AppVariantDB]
revision: int
modified_by: Link[UserDB]
base: Link[VariantBaseDB]
config: ConfigDB
created_at: datetime
updated_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))

class Settings:
name = "app_variant_revisions"


class AppEnvironmentDB(Document):
app: Link[AppDB]
name: str
user: Link[UserDB]
revision: int
deployed_app_variant: Optional[PydanticObjectId]
deployed_app_variant_revision: Optional[Link[AppVariantRevisionsDB]]
deployment: Optional[PydanticObjectId] # reference to deployment
created_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))

class Settings:
name = "environments"


class AppEnvironmentRevisionDB(Document):
environment: Link[AppEnvironmentDB]
revision: int
modified_by: Link[UserDB]
deployed_app_variant_revision: Optional[PydanticObjectId]
deployment: Optional[PydanticObjectId] # reference to deployment
created_at: datetime

class Settings:
name = "environments_revisions"


class TemplateDB(Document):
type: Optional[str] = Field(default="image")
template_uri: Optional[str]
tag_id: Optional[int]
name: str = Field(unique=True) # tag name of image
repo_name: Optional[str]
title: str
description: str
size: Optional[int]
digest: Optional[str] # sha256 hash of image digest
last_pushed: Optional[datetime]

class Settings:
name = "templates"


class TestSetDB(Document):
name: str
app: Link[AppDB]
csvdata: List[Dict[str, str]]
user: Link[UserDB]
created_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))
updated_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))

class Settings:
name = "testsets"


class EvaluatorConfigDB(Document):
app: Link[AppDB]
user: Link[UserDB]
name: str
evaluator_key: str
settings_values: Dict[str, Any] = Field(default=dict)
created_at: datetime = Field(default=datetime.now(timezone.utc))
updated_at: datetime = Field(default=datetime.now(timezone.utc))

class Settings:
name = "evaluators_configs"


class Error(BaseModel):
message: str
stacktrace: Optional[str] = None


class Result(BaseModel):
type: str
value: Optional[Any] = None
error: Optional[Error] = None


class InvokationResult(BaseModel):
result: Result
cost: Optional[float] = None
latency: Optional[float] = None


class EvaluationScenarioResult(BaseModel):
evaluator_config: PydanticObjectId
result: Result


class AggregatedResult(BaseModel):
evaluator_config: PydanticObjectId
result: Result


class EvaluationScenarioInputDB(BaseModel):
name: str
type: str
value: str


class EvaluationScenarioOutputDB(BaseModel):
result: Result
cost: Optional[float] = None
latency: Optional[float] = None


class HumanEvaluationScenarioInput(BaseModel):
input_name: str
input_value: str


class HumanEvaluationScenarioOutput(BaseModel):
variant_id: str
variant_output: str


class HumanEvaluationDB(Document):
app: Link[AppDB]
user: Link[UserDB]
status: str
evaluation_type: str
variants: List[PydanticObjectId]
variants_revisions: List[PydanticObjectId]
testset: Link[TestSetDB]
created_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))
updated_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))

class Settings:
name = "human_evaluations"


class HumanEvaluationScenarioDB(Document):
user: Link[UserDB]
evaluation: Link[HumanEvaluationDB]
inputs: List[HumanEvaluationScenarioInput]
outputs: List[HumanEvaluationScenarioOutput]
vote: Optional[str]
score: Optional[Any]
correct_answer: Optional[str]
created_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))
updated_at: Optional[datetime] = Field(default=datetime.now(timezone.utc))
is_pinned: Optional[bool]
note: Optional[str]

class Settings:
name = "human_evaluations_scenarios"


class EvaluationDB(Document):
app: Link[AppDB]
user: Link[UserDB]
status: Result
testset: Link[TestSetDB]
variant: PydanticObjectId
variant_revision: PydanticObjectId
evaluators_configs: List[PydanticObjectId]
aggregated_results: List[AggregatedResult]
average_cost: Optional[Result] = None
total_cost: Optional[Result] = None
average_latency: Optional[Result] = None
created_at: datetime = Field(default=datetime.now(timezone.utc))
updated_at: datetime = Field(default=datetime.now(timezone.utc))

class Settings:
name = "new_evaluations"


class CorrectAnswer(BaseModel):
key: str
value: str


class EvaluationScenarioDB(Document):
user: Link[UserDB]
evaluation: Link[EvaluationDB]
variant_id: PydanticObjectId
inputs: List[EvaluationScenarioInputDB]
outputs: List[EvaluationScenarioOutputDB]
correct_answers: Optional[List[CorrectAnswer]]
is_pinned: Optional[bool]
note: Optional[str]
evaluators_configs: List[PydanticObjectId]
results: List[EvaluationScenarioResult]
latency: Optional[int] = None
cost: Optional[int] = None
created_at: datetime = Field(default=datetime.now(timezone.utc))
updated_at: datetime = Field(default=datetime.now(timezone.utc))

class Settings:
name = "new_evaluation_scenarios"


class OldEvaluationScenarioDB(Document):
user: Link[UserDB]
evaluation: Link[EvaluationDB]
variant_id: PydanticObjectId
inputs: List[EvaluationScenarioInputDB]
outputs: List[EvaluationScenarioOutputDB]
correct_answer: Optional[str]
is_pinned: Optional[bool]
note: Optional[str]
evaluators_configs: List[PydanticObjectId]
results: List[EvaluationScenarioResult]
latency: Optional[int] = None
cost: Optional[int] = None
created_at: datetime = Field(default=datetime.now(timezone.utc))
updated_at: datetime = Field(default=datetime.now(timezone.utc))

class Settings:
name = "new_evaluation_scenarios"


class Forward:
@iterative_migration()
async def migrate_correct_answers(
self,
input_document: OldEvaluationScenarioDB,
output_document: EvaluationScenarioDB,
):
if input_document.correct_answer:
output_document.correct_answers = [
CorrectAnswer(key="correct_answer", value=input_document.correct_answer)
]
else:
output_document.correct_answers = []

if "correct_answer" in input_document.dict():
del input_document.__dict__["correct_answer"]


class Backward:
...
7 changes: 6 additions & 1 deletion agenta-backend/agenta_backend/models/api/evaluation_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,13 +158,18 @@ class HumanEvaluationScenarioUpdate(BaseModel):
note: Optional[str]


class CorrectAnswer(BaseModel):
key: str
value: str


class EvaluationScenario(BaseModel):
id: Optional[str]
evaluation_id: str
inputs: List[EvaluationScenarioInput]
outputs: List[EvaluationScenarioOutput]
evaluation: Optional[str]
correct_answer: Optional[str]
correct_answers: Optional[List[CorrectAnswer]]
is_pinned: Optional[bool]
note: Optional[str]
results: List[EvaluationScenarioResult]
Expand Down
6 changes: 5 additions & 1 deletion agenta-backend/agenta_backend/models/converters.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from agenta_backend.utils.common import isCloudEE
from agenta_backend.models.api.user_models import User
from agenta_backend.models.api.evaluation_model import (
CorrectAnswer,
Evaluation,
HumanEvaluation,
EvaluatorConfig,
Expand Down Expand Up @@ -253,7 +254,10 @@ def evaluation_scenario_db_to_pydantic(
EvaluationScenarioOutput(**scenario_output.dict())
for scenario_output in evaluation_scenario_db.outputs
],
correct_answer=evaluation_scenario_db.correct_answer,
correct_answers=[
CorrectAnswer(**correct_answer.dict())
for correct_answer in evaluation_scenario_db.correct_answers
],
is_pinned=evaluation_scenario_db.is_pinned or False,
note=evaluation_scenario_db.note or "",
results=evaluation_scenarios_results_to_pydantic(
Expand Down
Loading
Loading