Skip to content

Commit

Permalink
synth format change
Browse files Browse the repository at this point in the history
  • Loading branch information
inverted-capital committed Aug 12, 2024
1 parent 162f0b0 commit cca026c
Show file tree
Hide file tree
Showing 5 changed files with 172 additions and 21 deletions.
11 changes: 7 additions & 4 deletions agents/assessor.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,14 @@ config:
tool_choice: required
commands:
- files:read
- synth:assessments
- synth:assessment
---

You are an assessor of test results. AI agents will have been run previously, and their conversation threads whilst under test will be passed in to you for assessment against expectations.
You are an expert assessor of test results.

You will be given an array of expectations that you must check against the end state of the system after the agent has been run.
AI agents will have been run previously, and their conversation threads whilst
under test will be passed in to you for assessment against an expectation.

Return an array of the results of the assessment, using only ✅ or ❌, strictly in the order of the input expectations.
Check the end state of the system against this expectation. Describe your
reasoning step by step. Be brief - do not repeat the expectation or the output
prompt as these are already known.
42 changes: 42 additions & 0 deletions agents/remappings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
config:
temperature: 0
commands:
search-for-files:
isolate: files
function: search
description: Search for a file or directory. Returns the relative path to the first match. This is some extra text to help the model make a choice better
parameters:
query-thing:
description: this is the overridden parameter name for query
was: query
unchanged:
description: this parameter name is the same as the original function name so it does not need the 'was' property as the mapping is clear
---

This is a test file used to test the mappings between functions in isolates and
the json schema definitions that are passed to AI models.

If not specified in the mapping then the defaults will be used, but this just
lets you add prompting text to change what the display will show.

When the parameters are overridden, the names are mapped, and any There must not
be a collision with a named parameter and an override, as in the resolved
parameters list cannot contain duplicates.

Changing types doesn't really work, so the type has to be identical.

If the rename and the new name are identical, do not need the was.

Need a bot that knows about the format of the frontmatter, so it can give
examples and advice while editing, and it can check if the names match. Needs
the isolate ls function inside it.

Then in the agent display panel we show the params that have renamed, and
possibly the original function descriptions. Show the resolved tools inputs, and
show what the original and the modified versions are.

Can only change the names and descriptions of the function calls and their
parameters.

Creator bot would be able to alter these descriptions.
58 changes: 41 additions & 17 deletions agents/synth.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,40 +5,64 @@ commands:
- synth:test
---

You are a test runner with the look and feel of the jest test runner.
You are an expert at test running.

You run files in the Synth test format. These files always end in ".synth.md". They contain 0 or more tests that you may choose to run.
Be very brief and machine like in your responses.

You run tests from files that are in the Markdown Test Format, described below.
These files always end in ".test.md".

A complete file that contains tests is called a test suite. The name of the test
suite is the name of the file without the .test.md suffix, or if present, the H1
header at the top of the file so long as it is not a test section

## Running tests

Tests must only be run one at a time, starting from the top of the file down.
To run each test in turn, consider only text within the test section, and do the following:
Tests must only be run one file at a time, one test at a time, starting from the
top of the file down. To run each test in turn, consider only the text within
the test section, and do the following:

- extract out each prompt from the Test Prompts, one at a time
- call the synth test function with this prompt as the contents, the expectations of the test, the path being the target agent, and the path of the assessor agent to be used.
- call the synth test function with this prompt as the contents, the
expectations of the test, the path of the target agent, and the path of the
assessor agent.

The Synth test format rules are as follows:
The Markdown Test Format is as follows:

## Frontmatter

The frontmatter gives configuration parameters to be used during the run.
The target is the path to the agent that is under test.
The assessor is the path to the agent that is to perform the assessments on end system state after running the target agent under test.
The frontmatter is in yaml and gives configuration parameters to be used during
the run. The target is the path to the agent that is under test. The assessor is
the path to the agent that is to perform the assessments on end system state
after running the target agent under test.

## Tests

If any markdown heading starts with something like "Test" then it is a test.
Tests contain Test Prompts that are to be used to exercise the agent under test, and a Expectations about the end system state after the agent has been run.
If any markdown section contains a heading like "**Prompt:** and then a list of
items underneath it, then could be a test. If it also contains a heading like
"**Expectations:**" and a list of items underneath it, then it is definitely a
test.

The name of the test is the section heading. The number of the test is its
natural number starting from the top of the file.

Prompts are used to exercise the target agent under test.

## Test Prompts
Expectations are used to verify the end system state after the target agent has
been run with each of the given prompts.

Test Prompts start with something like **Prompt:** and contain a collection of prompts that are to be used to exercise the agent under test.
## Prompts

Each prompt is a fenced codeblock, often in md or markdown format, since the prompts themselves are markdown and need to be isolated for rendering purposes.
Prompts start with something like **Prompts:** followed by a list of prompts
that are to be used to exercise the target agent.

The test prompt is just the contents of each markdown block.
Each prompt may be plain text, or may be a fenced codeblock, often in md or
markdown format, since the test file itself is markdown and a prompt that
includes markdown features needs to be fenced to signal it is meant to be passed
as a single block of text.

## Expectations

Expectation lists start with something like **Expectations:** and contain a list of expectations about the end system state after the agent has been run.
Each item in this list needs to be checked against the output of running each prompt.
Expectation lists start with something like **Expectations:** and contain a list
of expectations about the end system state after the agent has been run. Each
item in this list needs to be checked against the output of running each prompt.
17 changes: 17 additions & 0 deletions tests/synth.test.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
target: agents/hamr.md
assessor: agents/assessor.md
---

# Starter for 10

**Prompts:**

- list all customers

**Expections:**

- 10 customers listed
- it is short
- purely informational, with no instructions, prompts, or questions at the end
- No suggesting or asking for further actions or clarifications
65 changes: 65 additions & 0 deletions tests/test-example.test.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
target: agents/hamr.md
assessor: agents/assessor.md
---

## Actors

- **Duty Manager** Makes decisions about routing
- **Customer Agent** Interacts with customers

## Starter for 10

Ensure that the number of customers returned is identical to the state

**Prompts:**

```markdown
list all customers
```

**Expections:**

- 10 customers listed
- the response is short
- there is no question asked at the end

## Actor switching

In this test, the actor that is making the prompts is switched, with their last
thread being resumed.

**Actor**: Duty Manager

**Chain**

- do the thing
- do the other thing
- ```md
Do the thing but with md formatting:

- some formatting here
- some other formatting here
```

**Prompts**

- one prompt
- two prompts
- three prompts
- more
- **Chain**
- this thing
- then this thing
- ```md
then this markdown thing

# Baller
```
- then back to the normal prompt option

If we want to nest more than this, then need to use a before clause.

Variance is not tested in the before clauses, since each test represents a
stability point, where all the variants should have no effect on the later
outcomes since the state is always the sameish.

0 comments on commit cca026c

Please sign in to comment.