-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue]: Workflow terminates after 2 tokens when using AutogenStudio with LM Studio #2445
Comments
I have the exact same issue with lm online and autogen studio. |
I just encountered the same issue as well. My LM Studio logs are essentially identical to what you have shared. |
Rolled back to 53 works - |
If anyone is interested in a "work-around" until the fix, you can: then create a new model (exactly like your previous) |
cc @victordibia for awareness |
@ALL, thanks for flagging this. This has been fixed in the current dev branch for ags (pr here). Should be merged into main soon. |
@victordibia Thanks! |
@brustulim Thanks for the idea. I was going to try something like that but I was not sure if it was the right solution. |
You seemed to have mentioned having issues with |
@victordibia Thank you for providing the v0.0.56rc3 version. This version fixes my problem with the maximal limit of two tokens when using LM Studio. However, I encountered that Python code generated by primary-assistant cannot be run. Could you kindly help me figure out what happened? I provide the multi-conversations between userproxy and primary-assistant. Thank you very much. |
@victordibia Thank you. The issue was solved. |
I'm using .NET code with LMStudio and it always fails on the second call. It works as expected when |
Describe the issue
If I create a model in Autogen studio that points to the LM studio endpoint then add the model to an agent, then a workflow etc, when I run the workflow it will terminate after 2 tokens. Works perfectly with Ollama but every time I try LM Studio I have this problem regardless of the model I try with LM Studio. This is the output from LM studio server log.
[2024-04-19 16:25:21.195] [INFO] [LM STUDIO SERVER] Processing queued request...
[2024-04-19 16:25:21.197] [INFO] Received POST request to /v1/chat/completions with body: {
"messages": [
{
"content": "You are a helpful AI assistant",
"role": "system"
},
{
"content": "what is the federal capital of australia",
"role": "user"
}
],
"model": "gemma",
"max_tokens": null,
"stream": false,
"temperature": 0.1
}
[2024-04-19 16:25:21.198] [INFO] [LM STUDIO SERVER] Context Overflow Policy is: Rolling Window
[2024-04-19 16:25:21.201] [INFO] [LM STUDIO SERVER] Last message: { role: 'user', content: 'what is the federal capital of australia' } (total messages = 2)
[2024-04-19 16:25:21.696] [INFO] [LM STUDIO SERVER] Accumulating tokens ... (stream = false)
[2024-04-19 16:25:21.698] [INFO] [lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf] Accumulated 1 tokens: The
[2024-04-19 16:25:21.799] [INFO] [lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf] Accumulated 2 tokens: The Federal
[2024-04-19 16:25:21.911] [INFO] [LM STUDIO SERVER] [lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf] Generated prediction: {
"id": "chatcmpl-zbpbxj8o70qatvsd8fz9ri",
"object": "chat.completion",
"created": 1713507921,
"model": "lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The Federal"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 2,
"completion_tokens": 2,
"total_tokens": 4
}
}
Steps to reproduce
I have no issue replicating on my setup. I have latest LM studio and Ollama. Anytime I create a model that is run on the LM Studio server I have this problem but not on the Ollama server
Screenshots and logs
No response
Additional Information
autogenstudio v0.0.56
LM Studio 0.2.19 with lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf
Ollama 0.1.32 with llama3:latest (8B)
The text was updated successfully, but these errors were encountered: