Python Mode insted of JSON mode. #1244

qdrddr · 2025-01-27T18:16:51Z

Is your feature request related to a problem? Please describe.
LLMs were trained with a much larger Python code corpus of text and less in JSON, therefore the quality is better with Python.

Describe the solution you'd like
Here is how it works instead of asking LLM to produce a valid JSON response, you tell LLM to make Python code with the response, and then you need to execute this Python code and extract your results.

Describe alternatives you've considered
JSON-Mode

Additional context
This request is inspired by HuggingFace SmolAgents.
For this, you'll need a safe Python execution environment such as https://github.com/e2b-dev/E2B

JSON vs. Python benchmarking
Subjectively it looks like JSON fluctuates around 88% accuracy, while Python is around 91%.
JSON Benchmarks: Tool Use (BFCL), MMLU (including tool use and function calling), MT-Bench
Python Benchmarks: Code (HumanEval-python), HumanEval+, HumanEval-Pro, LiveCodeBench (Pass®1-COT), Codeforces (Percentile/Rating), SWE Verified (Resolved), Aider-Polyglot (Acc.)

https://blog.langchain.dev/extraction-benchmarking/

https://www.promptlayer.com/blog/llm-benchmarks-a-comprehensive-guide-to-ai-model-evaluation
https://llm-stats.com/
https://paperswithcode.com/sota/code-generation-on-mbpp
https://paperswithcode.com/sota/code-generation-on-humaneval
https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard
https://anonymous.4open.science/r/PythonSaga/README.md
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/
https://huggingface.co/spaces/lmarena-ai/chatbot-arena-leaderboard
https://github.com/OpenBMB/ToolBench?tab=readme-ov-file#-model-experiments-results
https://gorilla.cs.berkeley.edu/leaderboard.html#leaderboard
https://answers111.github.io/evalpro.github.io/leaderboard.html
https://evalplus.github.io/repoqa.html
https://evalplus.github.io/repoqa.html
https://evalplus.github.io/evalperf.html
https://github.com/svilupp/Julia-LLM-Leaderboard?tab=readme-ov-file#julia-llm-leaderboard
https://huggingface.co/spaces/qiantong-xu/toolbench-leaderboard
https://scale.com/leaderboard/coding
https://huggingface.co/spaces/opencompass/opencompass-llm-leaderboard
https://aider.chat/docs/leaderboards/
https://www.swebench.com/#verified
https://codeforces.com/profile/Leaderboard?graphType=all
https://livecodebench.github.io/leaderboard.html
https://www.llmcodearena.com/top-models

qdrddr added the feature-request label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python Mode insted of JSON mode. #1244

Python Mode insted of JSON mode. #1244

qdrddr commented Jan 27, 2025 •

edited

Loading

Python Mode insted of JSON mode. #1244

Python Mode insted of JSON mode. #1244

Comments

qdrddr commented Jan 27, 2025 • edited Loading

qdrddr commented Jan 27, 2025 •

edited

Loading