Updates on existing solvers and bugged tool eval #1506
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@JunShern will review this
Wrap solvers with completion functions for compatibility with pre-solver Evals. This means you can execute all evals using solvers. 49fd9ef
Add context length information about gpt-4-turbo-preview and gpt-4-0125-preview. 9a0ab1c
Move oai and together solvers into providers / subdir 063bf4f
Update the default task descriptions for bugged tools. We added more information when using gemini + OS models, since they got confused. 0523dd4
Modified the default solver chain-of-thought prompt, as well as other custom chain-of-thought prompts used in some evals. The default CoTSolver prompts were a bit misleading in some cases; we observed GeminiSolver working too hard to arrive at a final answer for the whole eval when it's in fact supposed to give just a response for the next turn. 287f3cf