From b3ae97593b1ec1ff7e0f44940b371c8ff470c6f9 Mon Sep 17 00:00:00 2001 From: stiyyagura0901 Date: Wed, 2 Oct 2024 23:52:25 -0400 Subject: [PATCH 1/3] docs: test llm lock to version 1 evals orb --- ...testing-llm-enabled-applications-through-evaluations.adoc | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/jekyll/_cci2/testing-llm-enabled-applications-through-evaluations.adoc b/jekyll/_cci2/testing-llm-enabled-applications-through-evaluations.adoc index 60940da0804..067fdda3c44 100644 --- a/jekyll/_cci2/testing-llm-enabled-applications-through-evaluations.adoc +++ b/jekyll/_cci2/testing-llm-enabled-applications-through-evaluations.adoc @@ -13,6 +13,11 @@ contentTags: This page describes common methods for testing applications powered by large language models (LLMs) through evaluations. +[NOTE] +==== +This documentation is only applicable for link:https://circleci.com/developer/orbs/orb/circleci/evals?version=1.0.8[CircleCI Evals Orb versions 1.x.x]. +==== + == Evaluations overview Evaluations, also known as evals, are a methodology for assessing the quality of AI software. From be6003104692e3a1896ca7cb87c3f9116fe0147a Mon Sep 17 00:00:00 2001 From: stiyyagura0901 Date: Wed, 2 Oct 2024 23:58:11 -0400 Subject: [PATCH 2/3] fix lint errors --- .../testing-llm-enabled-applications-through-evaluations.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/jekyll/_cci2/testing-llm-enabled-applications-through-evaluations.adoc b/jekyll/_cci2/testing-llm-enabled-applications-through-evaluations.adoc index 067fdda3c44..e2b78eb7ddd 100644 --- a/jekyll/_cci2/testing-llm-enabled-applications-through-evaluations.adoc +++ b/jekyll/_cci2/testing-llm-enabled-applications-through-evaluations.adoc @@ -46,7 +46,7 @@ Evaluations can cover many aspects of a model performance, including: Evaluations can be expressed as classic software tests, typically characterised by the "input, expected output, assertion" format, and as such they can be automated into CircleCI pipelines. -There are two important differences between evals and classic software tests to keep in mind: +There is two important differences between evals and classic software tests to keep in mind: * LLMs are predominantly non-deterministic, leading to flaky evaluations, unlike deterministic software tests. * Evaluation results are subjective. Small regressions in a metric might not necessarily be a cause for concern, unlike failing tests in regular software testing. @@ -70,7 +70,7 @@ Given the volatile nature of evaluations, evaluations orchestrated by the Circle Instead, a summary of the evaluation results is created and presented: -* As a comment on the corresponding GitHub pull request (currently available only for projects integrated with xref:github-integration#[Github OAuth]): +* As a comment on the corresponding GitHub pull request (currently available only for projects integrated with xref:github-integration#[GitHub OAuth]): + image::/docs/assets/img/docs/llmops/github-pr-comment.png[Jobs overview] From 31dfce80fa5e3ebb50ac041408f6492ad1c11d28 Mon Sep 17 00:00:00 2001 From: stiyyagura0901 Date: Thu, 3 Oct 2024 00:01:19 -0400 Subject: [PATCH 3/3] fix lint errors --- .../testing-llm-enabled-applications-through-evaluations.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/jekyll/_cci2/testing-llm-enabled-applications-through-evaluations.adoc b/jekyll/_cci2/testing-llm-enabled-applications-through-evaluations.adoc index e2b78eb7ddd..a4f1ade4f5c 100644 --- a/jekyll/_cci2/testing-llm-enabled-applications-through-evaluations.adoc +++ b/jekyll/_cci2/testing-llm-enabled-applications-through-evaluations.adoc @@ -46,7 +46,7 @@ Evaluations can cover many aspects of a model performance, including: Evaluations can be expressed as classic software tests, typically characterised by the "input, expected output, assertion" format, and as such they can be automated into CircleCI pipelines. -There is two important differences between evals and classic software tests to keep in mind: +Two important differences between evals and classic software tests to keep in mind: * LLMs are predominantly non-deterministic, leading to flaky evaluations, unlike deterministic software tests. * Evaluation results are subjective. Small regressions in a metric might not necessarily be a cause for concern, unlike failing tests in regular software testing.