Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve merge conflict between postgres and main #1854

Merged
merged 47 commits into from
Jul 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
96f26cc
added evaluation error modal component
bekossy May 26, 2024
cc4fe9c
improved error handling in evaluation views
bekossy May 26, 2024
2ec19f5
renamed Evaluation error components
bekossy May 26, 2024
eeea808
removed duplicate code
bekossy May 26, 2024
87d48d8
Merge branch 'main' into AGE-171/-improve-error-handling-in-evaluation
bekossy May 27, 2024
53fbda7
Merge branch 'main' into AGE-171/-improve-error-handling-in-evaluation
bekossy May 28, 2024
af94b57
Merge branch 'main' into AGE-171/-improve-error-handling-in-evaluation
bekossy Jun 6, 2024
58ce224
feat(frontend): improved error handling in evaluation
bekossy Jun 7, 2024
1ba13c1
minor refactor
bekossy Jun 7, 2024
2212f65
Merge branch 'main' into AGE-171/-improve-error-handling-in-evaluation
mmabrouk Jun 27, 2024
b0b86dc
Fix (AGE-285and AGE-342)
Jul 3, 2024
1afec5f
Merge branch 'main' into fix/infinitely-running-evaluations
Jul 3, 2024
337ba11
Fix (AGE-381)
Jul 3, 2024
33ace66
Fix (AGE-380)
jp-agenta Jul 3, 2024
64a47cb
perf(frontend): added new eval status type and improved component to …
bekossy Jul 3, 2024
784db61
perf(frontend): show error stacktrace in tooltip
bekossy Jul 3, 2024
48eb0f7
EVALUATION_AGGREGATION_FAILED now has a different value than EVALUATI…
jp-agenta Jul 3, 2024
95b64b1
fix(backend): run black formatter
bekossy Jul 3, 2024
0f1dccc
fix(backend): run black formatter
bekossy Jul 3, 2024
dda018a
fix(backend): run black formatter
bekossy Jul 3, 2024
d1aec70
fix(backend): run black formatter
bekossy Jul 3, 2024
f229c86
fix(backend): run black formatter
bekossy Jul 3, 2024
a5aade8
Fix (AGE-382)
jp-agenta Jul 4, 2024
59631af
improved evaluation status label message
bekossy Jul 4, 2024
486ea1e
Merge pull request #1708 from Agenta-AI/AGE-171/-improve-error-handli…
mmabrouk Jul 5, 2024
0bfc70b
fix(sdk): AGE-272 Propagate func errors up in @ag.instrument() wrappers
jp-agenta Jul 5, 2024
32a2486
fix(sdk): Add func error and stacktrace to result
jp-agenta Jul 5, 2024
be92ce3
refactor(backend): Review PR comments
jp-agenta Jul 5, 2024
9ca7955
refactor(backend): Fixes comments from PR
jp-agenta Jul 5, 2024
ae6201b
Merge pull request #1844 from Agenta-AI/fix/unhandled-no-correct-answers
mmabrouk Jul 5, 2024
8053dea
Merge pull request #1842 from Agenta-AI/fix/incomplete-stacktrace
mmabrouk Jul 5, 2024
f53f9c1
Merge pull request #1841 from Agenta-AI/fix/missing-queued-evaluation…
mmabrouk Jul 5, 2024
bacad32
Merge branch 'main' into fix/infinitely-running-evaluations
mmabrouk Jul 5, 2024
450e899
minor changes
bekossy Jul 5, 2024
32f3a99
Merge pull request #1840 from Agenta-AI/fix/infinitely-running-evalua…
mmabrouk Jul 5, 2024
8b5ecfa
docs: update README.md [skip ci]
allcontributors[bot] Jul 5, 2024
96c478d
docs: update .all-contributorsrc [skip ci]
allcontributors[bot] Jul 5, 2024
2560667
Merge pull request #1848 from Agenta-AI/all-contributors/add-jp-agenta
mmabrouk Jul 5, 2024
3c35942
Update pyproject.toml
mmabrouk Jul 5, 2024
2d2ee83
Update pyproject.toml
mmabrouk Jul 5, 2024
3818fbf
Merge pull request #1846 from Agenta-AI/fix/llm-app-error-as-output-m…
mmabrouk Jul 5, 2024
d70c19e
Bump versions
mmabrouk Jul 5, 2024
dd12f19
Merge pull request #1849 from Agenta-AI/bump-versions
mmabrouk Jul 5, 2024
5ae47c6
Merge branch 'main' into postgres
aybruhm Jul 7, 2024
5801ad3
minor refactor (backend): resolve ImportError in evaluators and aggre…
aybruhm Jul 7, 2024
9728818
refactor (backend): resolve 'super' object has no attribute 'coerce' …
aybruhm Jul 7, 2024
48de96a
tests (backend): make use of EVALUATION_INITIALIZED enum and not EVAL…
aybruhm Jul 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .all-contributorsrc
Original file line number Diff line number Diff line change
Expand Up @@ -437,6 +437,16 @@
"bug",
"code"
]
},
{
"login": "jp-agenta",
"name": "jp-agenta",
"avatar_url": "https://avatars.githubusercontent.com/u/174311389?v=4",
"profile": "https://github.com/jp-agenta",
"contributions": [
"code",
"bug"
]
}
],
"contributorsPerLine": 7,
Expand Down
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,9 +171,7 @@ Check out our [Contributing Guide](https://docs.agenta.ai/misc/contributing/gett
## Contributors ✨

<!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section -->

[![All Contributors](https://img.shields.io/badge/all_contributors-46-orange.svg?style=flat-square)](#contributors-)

[![All Contributors](https://img.shields.io/badge/all_contributors-47-orange.svg?style=flat-square)](#contributors-)
<!-- ALL-CONTRIBUTORS-BADGE:END -->

Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
Expand All @@ -194,7 +192,7 @@ Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/d
</tr>
<tr>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/Pajko97"><img src="https://avatars.githubusercontent.com/u/25198892?v=4?s=100" width="100px;" alt="Pavle Janjusevic"/><br /><sub><b>Pavle Janjusevic</b></sub></a><br /><a href="#infra-Pajko97" title="Infrastructure (Hosting, Build-Tools, etc)">🚇</a></td>
<td align="center" valign="top" width="14.28%"><a href="http://kaosiso-ezealigo.netlify.app"><img src="https://avatars.githubusercontent.com/u/99529776?v=4?s=100" width="100px;" alt="Kaosiso Ezealigo"/><br /><sub><b>Kaosiso Ezealigo</b></sub></a><br /><a href="https://github.com/Agenta-AI/agenta/issues?q=author%3Abekossy" title="Bug reports">🐛</a> <a href="https://github.com/Agenta-AI/agenta/commits?author=bekossy" title="Code">💻</a></td>
<td align="center" valign="top" width="14.28%"><a href="http://kaosiso-ezealigo.netlify.app"><img src="https://avatars.githubusercontent.com/u/99529776?v=4?s=100" width="100px;" alt="Kaosi Ezealigo"/><br /><sub><b>Kaosi Ezealigo</b></sub></a><br /><a href="https://github.com/Agenta-AI/agenta/issues?q=author%3Abekossy" title="Bug reports">🐛</a> <a href="https://github.com/Agenta-AI/agenta/commits?author=bekossy" title="Code">💻</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/albnunes"><img src="https://avatars.githubusercontent.com/u/46302915?v=4?s=100" width="100px;" alt="Alberto Nunes"/><br /><sub><b>Alberto Nunes</b></sub></a><br /><a href="https://github.com/Agenta-AI/agenta/issues?q=author%3Aalbnunes" title="Bug reports">🐛</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://www.linkedin.com/in/mohammed-maaz-6290b0116/"><img src="https://avatars.githubusercontent.com/u/17180132?v=4?s=100" width="100px;" alt="Maaz Bin Khawar"/><br /><sub><b>Maaz Bin Khawar</b></sub></a><br /><a href="https://github.com/Agenta-AI/agenta/commits?author=MohammedMaaz" title="Code">💻</a> <a href="https://github.com/Agenta-AI/agenta/pulls?q=is%3Apr+reviewed-by%3AMohammedMaaz" title="Reviewed Pull Requests">👀</a> <a href="#mentoring-MohammedMaaz" title="Mentoring">🧑‍🏫</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/devgenix"><img src="https://avatars.githubusercontent.com/u/56418363?v=4?s=100" width="100px;" alt="Nehemiah Onyekachukwu Emmanuel"/><br /><sub><b>Nehemiah Onyekachukwu Emmanuel</b></sub></a><br /><a href="https://github.com/Agenta-AI/agenta/commits?author=devgenix" title="Code">💻</a> <a href="#example-devgenix" title="Examples">💡</a> <a href="https://github.com/Agenta-AI/agenta/commits?author=devgenix" title="Documentation">📖</a></td>
Expand Down Expand Up @@ -242,6 +240,7 @@ Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/d
<td align="center" valign="top" width="14.28%"><a href="https://github.com/youcefs21"><img src="https://avatars.githubusercontent.com/u/34604972?v=4?s=100" width="100px;" alt="Youcef Boumar"/><br /><sub><b>Youcef Boumar</b></sub></a><br /><a href="https://github.com/Agenta-AI/agenta/commits?author=youcefs21" title="Documentation">📖</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/LucasTrg"><img src="https://avatars.githubusercontent.com/u/47852577?v=4?s=100" width="100px;" alt="LucasTrg"/><br /><sub><b>LucasTrg</b></sub></a><br /><a href="https://github.com/Agenta-AI/agenta/commits?author=LucasTrg" title="Code">💻</a> <a href="https://github.com/Agenta-AI/agenta/issues?q=author%3ALucasTrg" title="Bug reports">🐛</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://ashrafchowdury.me"><img src="https://avatars.githubusercontent.com/u/87828904?v=4?s=100" width="100px;" alt="Ashraf Chowdury"/><br /><sub><b>Ashraf Chowdury</b></sub></a><br /><a href="https://github.com/Agenta-AI/agenta/issues?q=author%3Aashrafchowdury" title="Bug reports">🐛</a> <a href="https://github.com/Agenta-AI/agenta/commits?author=ashrafchowdury" title="Code">💻</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/jp-agenta"><img src="https://avatars.githubusercontent.com/u/174311389?v=4?s=100" width="100px;" alt="jp-agenta"/><br /><sub><b>jp-agenta</b></sub></a><br /><a href="https://github.com/Agenta-AI/agenta/commits?author=jp-agenta" title="Code">💻</a> <a href="https://github.com/Agenta-AI/agenta/issues?q=author%3Ajp-agenta" title="Bug reports">🐛</a></td>
</tr>
</tbody>
</table>
Expand Down
1 change: 1 addition & 0 deletions agenta-backend/agenta_backend/celery_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
CELERY_ACCEPT_CONTENT = ["json"]
CELERY_RESULT_SERIALIZER = "json"
CELERY_TIMEZONE = "UTC"
CELERY_TASK_TRACK_STARTED = True

CELERY_QUEUES = (
Queue(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ class EvaluationStatusEnum(str, Enum):
EVALUATION_FINISHED = "EVALUATION_FINISHED"
EVALUATION_FINISHED_WITH_ERRORS = "EVALUATION_FINISHED_WITH_ERRORS"
EVALUATION_FAILED = "EVALUATION_FAILED"
EVALUATION_AGGREGATION_FAILED = "EVALUATION_AGGREGATION_FAILED"


class EvaluationScenarioStatusEnum(str, Enum):
Expand Down
49 changes: 27 additions & 22 deletions agenta-backend/agenta_backend/services/aggregation_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,28 +15,33 @@ def aggregate_ai_critique(results: List[Result]) -> Result:
Result: aggregated result
"""

numeric_scores = []
for result in results:
try:
try:
numeric_scores = []
for result in results:
# Extract the first number found in the result value
match = re.search(r"\d+", result.value) # type: ignore
if not match:
continue

score = int(match.group())
numeric_scores.append(score)
except (TypeError, ValueError):
# Ignore if the extracted value is not an integer or is None
continue

# Calculate the average of numeric scores if any are present
average_value = (
sum(numeric_scores) / len(numeric_scores) if numeric_scores else None
)
return Result(
type="number",
value=average_value,
)
match = re.search(r"\d+", result.value)
if match:
try:
score = int(match.group())
numeric_scores.append(score)
except ValueError:
# Ignore if the extracted value is not an integer
continue

# Calculate the average of numeric scores if any are present
average_value = (
sum(numeric_scores) / len(numeric_scores) if numeric_scores else None
)
return Result(
type="number",
value=average_value,
)
except Exception as exc:
return Result(
type="error",
value=None,
error=Error(message=str(exc), stacktrace=str(traceback.format_exc())),
)


def aggregate_binary(results: List[Result]) -> Result:
Expand Down Expand Up @@ -73,7 +78,7 @@ def aggregate_float(results: List[Result]) -> Result:
return Result(
type="error",
value=None,
error=Error(message="Failed", stacktrace=str(traceback.format_exc())),
error=Error(message=str(exc), stacktrace=str(traceback.format_exc())),
)


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -436,7 +436,7 @@ async def create_new_evaluation(
user_id=str(app.user_id),
testset=testset,
status=Result(
value=EvaluationStatusEnum.EVALUATION_STARTED, type="status", error=None
value=EvaluationStatusEnum.EVALUATION_INITIALIZED, type="status", error=None
),
variant=variant_id,
variant_revision=str(variant_revision.id),
Expand Down
46 changes: 30 additions & 16 deletions agenta-backend/agenta_backend/services/evaluators_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@
import json
import logging
import traceback
from typing import Any, Dict, List, Tuple
from typing import Any, Dict

import httpx
from openai import OpenAI

from agenta_backend.models.shared_models import Error, Result
from agenta_backend.services.security import sandbox
from agenta_backend.models.shared_models import Error, Result

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
Expand Down Expand Up @@ -80,7 +80,8 @@ def auto_exact_match(
type="error",
value=None,
error=Error(
message="Error during Auto Exact Match evaluation", stacktrace=str(e)
message="Error during Auto Exact Match evaluation",
stacktrace=str(traceback.format_exc()),
),
)

Expand All @@ -104,7 +105,8 @@ def auto_regex_test(
type="error",
value=None,
error=Error(
message="Error during Auto Regex evaluation", stacktrace=str(e)
message="Error during Auto Regex evaluation",
stacktrace=str(traceback.format_exc()),
),
)

Expand Down Expand Up @@ -187,15 +189,16 @@ def auto_webhook_test(
value=None,
error=Error(
message="Error during Auto Webhook evaluation; An HTTP error occurred",
stacktrace=str(e),
stacktrace=str(traceback.format_exc()),
),
)
except Exception as e: # pylint: disable=broad-except
return Result(
type="error",
value=None,
error=Error(
message="Error during Auto Webhook evaluation", stacktrace=str(e)
message="Error during Auto Webhook evaluation",
stacktrace=str(traceback.format_exc()),
),
)

Expand Down Expand Up @@ -225,7 +228,8 @@ def auto_custom_code_run(
type="error",
value=None,
error=Error(
message="Error during Auto Custom Code Evaluation", stacktrace=str(e)
message="Error during Auto Custom Code Evaluation",
stacktrace=str(traceback.format_exc()),
),
)

Expand Down Expand Up @@ -284,7 +288,7 @@ def auto_ai_critique(
value=None,
error=Error(
message="Error during Auto AI Critique",
stacktrace=traceback.format_exc(),
stacktrace=str(traceback.format_exc()),
),
)

Expand Down Expand Up @@ -312,7 +316,8 @@ def auto_starts_with(
type="error",
value=None,
error=Error(
message="Error during Starts With evaluation", stacktrace=str(e)
message="Error during Starts With evaluation",
stacktrace=str(traceback.format_exc()),
),
)

Expand All @@ -339,7 +344,10 @@ def auto_ends_with(
return Result(
type="error",
value=None,
error=Error(message="Error during Ends With evaluation", stacktrace=str(e)),
error=Error(
message="Error during Ends With evaluation",
stacktrace=str(traceback.format_exc()),
),
)


Expand All @@ -365,7 +373,10 @@ def auto_contains(
return Result(
type="error",
value=None,
error=Error(message="Error during Contains evaluation", stacktrace=str(e)),
error=Error(
message="Error during Contains evaluation",
stacktrace=str(traceback.format_exc()),
),
)


Expand Down Expand Up @@ -395,7 +406,8 @@ def auto_contains_any(
type="error",
value=None,
error=Error(
message="Error during Contains Any evaluation", stacktrace=str(e)
message="Error during Contains Any evaluation",
stacktrace=str(traceback.format_exc()),
),
)

Expand Down Expand Up @@ -426,7 +438,8 @@ def auto_contains_all(
type="error",
value=None,
error=Error(
message="Error during Contains All evaluation", stacktrace=str(e)
message="Error during Contains All evaluation",
stacktrace=str(traceback.format_exc()),
),
)

Expand Down Expand Up @@ -456,7 +469,8 @@ def auto_contains_json(
type="error",
value=None,
error=Error(
message="Error during Contains JSON evaluation", stacktrace=str(e)
message="Error during Contains JSON evaluation",
stacktrace=str(traceback.format_exc()),
),
)

Expand Down Expand Up @@ -515,7 +529,7 @@ def auto_levenshtein_distance(
value=None,
error=Error(
message="Error during Levenshtein threshold evaluation",
stacktrace=str(e),
stacktrace=str(traceback.format_exc()),
),
)

Expand Down Expand Up @@ -556,7 +570,7 @@ def auto_similarity_match(
value=None,
error=Error(
message="Error during Auto Similarity Match evaluation",
stacktrace=str(e),
stacktrace=str(traceback.format_exc()),
),
)

Expand Down
Loading
Loading