Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: test llm lock to version 1 evals orb #8999

Merged
merged 3 commits into from
Oct 3, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@ contentTags:

This page describes common methods for testing applications powered by large language models (LLMs) through evaluations.

[NOTE]
====
This documentation is only applicable for link:https://circleci.com/developer/orbs/orb/circleci/evals?version=1.0.8[CircleCI Evals Orb versions 1.x.x].
====

== Evaluations overview

Evaluations, also known as evals, are a methodology for assessing the quality of AI software.
Expand Down Expand Up @@ -41,7 +46,7 @@ Evaluations can cover many aspects of a model performance, including:

Evaluations can be expressed as classic software tests, typically characterised by the "input, expected output, assertion" format, and as such they can be automated into CircleCI pipelines.

There are two important differences between evals and classic software tests to keep in mind:
Two important differences between evals and classic software tests to keep in mind:

* LLMs are predominantly non-deterministic, leading to flaky evaluations, unlike deterministic software tests.
* Evaluation results are subjective. Small regressions in a metric might not necessarily be a cause for concern, unlike failing tests in regular software testing.
Expand All @@ -65,7 +70,7 @@ Given the volatile nature of evaluations, evaluations orchestrated by the Circle

Instead, a summary of the evaluation results is created and presented:

* As a comment on the corresponding GitHub pull request (currently available only for projects integrated with xref:github-integration#[Github OAuth]):
* As a comment on the corresponding GitHub pull request (currently available only for projects integrated with xref:github-integration#[GitHub OAuth]):
+
image::/docs/assets/img/docs/llmops/github-pr-comment.png[Jobs overview]

Expand Down