Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: Workflow terminates after 2 tokens when using AutogenStudio with LM Studio #2445

Open
Freffles opened this issue Apr 19, 2024 · 12 comments
Labels
0.2 Issues which were filed before re-arch to 0.4 needs-triage proj-studio Related to AutoGen Studio.

Comments

@Freffles
Copy link

Describe the issue

If I create a model in Autogen studio that points to the LM studio endpoint then add the model to an agent, then a workflow etc, when I run the workflow it will terminate after 2 tokens. Works perfectly with Ollama but every time I try LM Studio I have this problem regardless of the model I try with LM Studio. This is the output from LM studio server log.

[2024-04-19 16:25:21.195] [INFO] [LM STUDIO SERVER] Processing queued request...
[2024-04-19 16:25:21.197] [INFO] Received POST request to /v1/chat/completions with body: {
"messages": [
{
"content": "You are a helpful AI assistant",
"role": "system"
},
{
"content": "what is the federal capital of australia",
"role": "user"
}
],
"model": "gemma",
"max_tokens": null,
"stream": false,
"temperature": 0.1
}
[2024-04-19 16:25:21.198] [INFO] [LM STUDIO SERVER] Context Overflow Policy is: Rolling Window
[2024-04-19 16:25:21.201] [INFO] [LM STUDIO SERVER] Last message: { role: 'user', content: 'what is the federal capital of australia' } (total messages = 2)
[2024-04-19 16:25:21.696] [INFO] [LM STUDIO SERVER] Accumulating tokens ... (stream = false)
[2024-04-19 16:25:21.698] [INFO] [lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf] Accumulated 1 tokens: The
[2024-04-19 16:25:21.799] [INFO] [lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf] Accumulated 2 tokens: The Federal
[2024-04-19 16:25:21.911] [INFO] [LM STUDIO SERVER] [lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf] Generated prediction: {
"id": "chatcmpl-zbpbxj8o70qatvsd8fz9ri",
"object": "chat.completion",
"created": 1713507921,
"model": "lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The Federal"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 2,
"completion_tokens": 2,
"total_tokens": 4
}
}

Steps to reproduce

I have no issue replicating on my setup. I have latest LM studio and Ollama. Anytime I create a model that is run on the LM Studio server I have this problem but not on the Ollama server

Screenshots and logs

No response

Additional Information

autogenstudio v0.0.56
LM Studio 0.2.19 with lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf
Ollama 0.1.32 with llama3:latest (8B)

@victordibia victordibia added the proj-studio Related to AutoGen Studio. label Apr 19, 2024
@stigtk
Copy link

stigtk commented Apr 21, 2024

I have the exact same issue with lm online and autogen studio.

@castlenthesky
Copy link

I just encountered the same issue as well. My LM Studio logs are essentially identical to what you have shared.

@flynntmaverick
Copy link

flynntmaverick commented Apr 23, 2024

Rolled back to 53 works -
Same issue here. I'm able to get agents communicating with writing python,but autogen studio gets stuck with 2 tokens

@brustulim
Copy link

If anyone is interested in a "work-around" until the fix, you can:
update the file:
autogenstudio\datamodel.py
by replacing the line 95 from:
max_tokens: Optional[int] = None
to:
max_tokens: Optional[int] = 3000

then create a new model (exactly like your previous)
and replace the old one in your agents, workflows, etc.

@ekzhu
Copy link
Collaborator

ekzhu commented Apr 25, 2024

cc @victordibia for awareness

@victordibia
Copy link
Collaborator

victordibia commented Apr 25, 2024

@ALL, thanks for flagging this.

This has been fixed in the current dev branch for ags (pr here).
In addition to a default value for max_tokens, we also expose the max_tokens parameter in the UI so users can specify max_tokens. You can try this branch out today by running pip install autogenstudio==0.0.56rc3.

Should be merged into main soon.

image

@Freffles
Copy link
Author

@victordibia Thanks!

@Freffles
Copy link
Author

Freffles commented Apr 25, 2024

replacing the line 95 from: max_tokens: Optional[int] = None to: max_tokens: Optional[int] = 3000

@brustulim Thanks for the idea. I was going to try something like that but I was not sure if it was the right solution.

@victordibia
Copy link
Collaborator

@godblessme

You seemed to have mentioned having issues with v0.0.56rc3 , and LM studio in a deleted comment ... can you verify this is the case?

@GodBlessingMe
Copy link

@godblessme

You seemed to have mentioned having issues with v0.0.56rc3 , and LM studio in a deleted comment ... can you verify this is the case?

@victordibia Thank you for providing the v0.0.56rc3 version. This version fixes my problem with the maximal limit of two tokens when using LM Studio. However, I encountered that Python code generated by primary-assistant cannot be run. Could you kindly help me figure out what happened? I provide the multi-conversations between userproxy and primary-assistant. Thank you very much.
Screenshot 2024-05-06 222252
Screenshot 2024-05-06 222326
Screenshot 2024-05-06 222410
Screenshot 2024-05-06 222445
Screenshot 2024-05-06 222510
Screenshot 2024-05-06 222533
Screenshot 2024-05-06 222621
Screenshot 2024-05-06 222654

@GodBlessingMe
Copy link

@victordibia Thank you. The issue was solved.

@MaxAkbar
Copy link

I'm using .NET code with LMStudio and it always fails on the second call. It works as expected when maxTokens is specified during agent creation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.2 Issues which were filed before re-arch to 0.4 needs-triage proj-studio Related to AutoGen Studio.
Projects
None yet
Development

No branches or pull requests

10 participants