synth format change

dreamcatcher-tech · Aug 12, 2024 · cca026c · cca026c
1 parent 162f0b0
commit cca026c
Show file tree

Hide file tree

Showing 5 changed files with 172 additions and 21 deletions.
diff --git a/agents/assessor.md b/agents/assessor.md
@@ -4,11 +4,14 @@ config:
   tool_choice: required
 commands:
   - files:read
-  - synth:assessments
+  - synth:assessment
 ---
 
-You are an assessor of test results. AI agents will have been run previously, and their conversation threads whilst under test will be passed in to you for assessment against expectations.
+You are an expert assessor of test results.
 
-You will be given an array of expectations that you must check against the end state of the system after the agent has been run.
+AI agents will have been run previously, and their conversation threads whilst
+under test will be passed in to you for assessment against an expectation.
 
-Return an array of the results of the assessment, using only ✅ or ❌, strictly in the order of the input expectations.
+Check the end state of the system against this expectation. Describe your
+reasoning step by step. Be brief - do not repeat the expectation or the output
+prompt as these are already known.
diff --git a/agents/remappings.md b/agents/remappings.md
@@ -0,0 +1,42 @@
+---
+config:
+  temperature: 0
+commands:
+  search-for-files:
+    isolate: files
+    function: search
+    description: Search for a file or directory.  Returns the relative path to the first match. This is some extra text to help the model make a choice better
+    parameters:
+      query-thing:
+        description: this is the overridden parameter name for query
+        was: query
+      unchanged:
+        description: this parameter name is the same as the original function name so it does not need the 'was' property as the mapping is clear
+---
+
+This is a test file used to test the mappings between functions in isolates and
+the json schema definitions that are passed to AI models.
+
+If not specified in the mapping then the defaults will be used, but this just
+lets you add prompting text to change what the display will show.
+
+When the parameters are overridden, the names are mapped, and any There must not
+be a collision with a named parameter and an override, as in the resolved
+parameters list cannot contain duplicates.
+
+Changing types doesn't really work, so the type has to be identical.
+
+If the rename and the new name are identical, do not need the was.
+
+Need a bot that knows about the format of the frontmatter, so it can give
+examples and advice while editing, and it can check if the names match. Needs
+the isolate ls function inside it.
+
+Then in the agent display panel we show the params that have renamed, and
+possibly the original function descriptions. Show the resolved tools inputs, and
+show what the original and the modified versions are.
+
+Can only change the names and descriptions of the function calls and their
+parameters.
+
+Creator bot would be able to alter these descriptions.
diff --git a/agents/synth.md b/agents/synth.md
@@ -5,40 +5,64 @@ commands:
   - synth:test
 ---
 
-You are a test runner with the look and feel of the jest test runner.
+You are an expert at test running.
 
-You run files in the Synth test format. These files always end in ".synth.md". They contain 0 or more tests that you may choose to run.
+Be very brief and machine like in your responses.
+
+You run tests from files that are in the Markdown Test Format, described below.
+These files always end in ".test.md".
+
+A complete file that contains tests is called a test suite. The name of the test
+suite is the name of the file without the .test.md suffix, or if present, the H1
+header at the top of the file so long as it is not a test section
 
 ## Running tests
 
-Tests must only be run one at a time, starting from the top of the file down.
-To run each test in turn, consider only text within the test section, and do the following:
+Tests must only be run one file at a time, one test at a time, starting from the
+top of the file down. To run each test in turn, consider only the text within
+the test section, and do the following:
 
 - extract out each prompt from the Test Prompts, one at a time
-- call the synth test function with this prompt as the contents, the expectations of the test, the path being the target agent, and the path of the assessor agent to be used.
+- call the synth test function with this prompt as the contents, the
+  expectations of the test, the path of the target agent, and the path of the
+  assessor agent.
 
-The Synth test format rules are as follows:
+The Markdown Test Format is as follows:
 
 ## Frontmatter
 
-The frontmatter gives configuration parameters to be used during the run.
-The target is the path to the agent that is under test.
-The assessor is the path to the agent that is to perform the assessments on end system state after running the target agent under test.
+The frontmatter is in yaml and gives configuration parameters to be used during
+the run. The target is the path to the agent that is under test. The assessor is
+the path to the agent that is to perform the assessments on end system state
+after running the target agent under test.
 
 ## Tests
 
-If any markdown heading starts with something like "Test" then it is a test.
-Tests contain Test Prompts that are to be used to exercise the agent under test, and a Expectations about the end system state after the agent has been run.
+If any markdown section contains a heading like "**Prompt:** and then a list of
+items underneath it, then could be a test. If it also contains a heading like
+"**Expectations:**" and a list of items underneath it, then it is definitely a
+test.
+
+The name of the test is the section heading. The number of the test is its
+natural number starting from the top of the file.
+
+Prompts are used to exercise the target agent under test.
 
-## Test Prompts
+Expectations are used to verify the end system state after the target agent has
+been run with each of the given prompts.
 
-Test Prompts start with something like **Prompt:** and contain a collection of prompts that are to be used to exercise the agent under test.
+## Prompts
 
-Each prompt is a fenced codeblock, often in md or markdown format, since the prompts themselves are markdown and need to be isolated for rendering purposes.
+Prompts start with something like **Prompts:** followed by a list of prompts
+that are to be used to exercise the target agent.
 
-The test prompt is just the contents of each markdown block.
+Each prompt may be plain text, or may be a fenced codeblock, often in md or
+markdown format, since the test file itself is markdown and a prompt that
+includes markdown features needs to be fenced to signal it is meant to be passed
+as a single block of text.
 
 ## Expectations
 
-Expectation lists start with something like **Expectations:** and contain a list of expectations about the end system state after the agent has been run.
-Each item in this list needs to be checked against the output of running each prompt.
+Expectation lists start with something like **Expectations:** and contain a list
+of expectations about the end system state after the agent has been run. Each
+item in this list needs to be checked against the output of running each prompt.
diff --git a/tests/synth.test.md b/tests/synth.test.md
@@ -0,0 +1,17 @@
+---
+target: agents/hamr.md
+assessor: agents/assessor.md
+---
+
+# Starter for 10
+
+**Prompts:**
+
+- list all customers
+
+**Expections:**
+
+- 10 customers listed
+- it is short
+- purely informational, with no instructions, prompts, or questions at the end
+- No suggesting or asking for further actions or clarifications
diff --git a/tests/test-example.test.md b/tests/test-example.test.md
@@ -0,0 +1,65 @@
+---
+target: agents/hamr.md
+assessor: agents/assessor.md
+---
+
+## Actors
+
+- **Duty Manager** Makes decisions about routing
+- **Customer Agent** Interacts with customers
+
+## Starter for 10
+
+Ensure that the number of customers returned is identical to the state
+
+**Prompts:**
+
+```markdown
+list all customers
+```
+
+**Expections:**
+
+- 10 customers listed
+- the response is short
+- there is no question asked at the end
+
+## Actor switching
+
+In this test, the actor that is making the prompts is switched, with their last
+thread being resumed.
+
+**Actor**: Duty Manager
+
+**Chain**
+
+- do the thing
+- do the other thing
+- ```md
+  Do the thing but with md formatting:
+
+  - some formatting here
+  - some other formatting here
+  ```
+
+**Prompts**
+
+- one prompt
+- two prompts
+- three prompts
+- more
+- **Chain**
+  - this thing
+  - then this thing
+  - ```md
+    then this markdown thing
+
+    # Baller
+    ```
+- then back to the normal prompt option
+
+If we want to nest more than this, then need to use a before clause.
+
+Variance is not tested in the before clauses, since each test represents a
+stability point, where all the variants should have no effect on the later
+outcomes since the state is always the sameish.