Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Adding Openai Structured outputs ( json_schema) for agents #3442

Open
AhmedOmarYounusShahhat opened this issue Aug 28, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@AhmedOmarYounusShahhat
Copy link

Is your feature request related to a problem? Please describe.

the problem with complex json output you can not be sure always of the output structure

Describe the solution you'd like

Openai have new updated avialiable only for the new modles which is structured outputs using json schmea, where you give in the api call the stuctureed output json you want

Additional context

No response

@AhmedOmarYounusShahhat AhmedOmarYounusShahhat added the enhancement New feature or request label Aug 28, 2024
@colaso96
Copy link

+1 this would be an awesome feature to add

@r4881t
Copy link
Collaborator

r4881t commented Sep 5, 2024

This also needs to be added to Gemini as well since it also supports structured output.

@r4881t
Copy link
Collaborator

r4881t commented Sep 5, 2024

This is how one might implement it

  1. initiate_chat, a_initiate_chat, generate_reply and a_generate_reply to be updated to include an additional parameter called output_schema. output_schema should be a Pydantic BaseModel's instance.
  2. When inside initiate_chat, _summarize_chat will be called which may call _reflection_with_llm_as_summary, so the output_schema needs to be passed to _reflection_with_llm_as_summary as well. This will also pass it to _reflection_with_llm
  3. Finally _generate_oai_reply_from_client will be needed to pass this param.

However, there may be many to and fro from the agents, so how to ensure that only the last message is having structured output?

@marklysze
Copy link
Collaborator

Hey @AhmedOmarYounusShahhat and @r4881t, thanks for requesting this feature and for the thoughts on implementation.

@r4881t, I think your implementation idea is a good start, here are some thoughts:

  • I think we could leavegenerate_reply/a_generate_reply untouched and focus on initiate_chat/a_initiate_chat. initiate_chat is more typically used for the main entry and exit for a conversation.
  • I'm wondering whether we can leave the summarisation as is and instead provide a further, optional step, which is to format the output. This could be triggered if an output_format parameter is set, which could be a Pydantic BaseModel class (as you noted) or a callable if you wanted to customise the output to something else (such as just a number, etc.).
  • ChatResult, which initiate_chat returns could contain another attribute, such as formatted_output which contains this format.
  • How to create the structured output? I'm thinking this should be client-specific (as different providers/LLMs will handle this differently, or not at all and we'll have to convert it ourselves), so for OpenAI the OpenAI client class (OpenAIClient in client.py) would be used and we can follow their guide on how to use it. Perhaps we could add another function to the client class protocol that's specifically for generating a structured output format (or pass in an additional parameter to the create function). For non-OpenAI client classes we could attempt to do it using prompting and new code in the helper class, client_utils.py.

Thoughts?

cc @Hk669

@r4881t
Copy link
Collaborator

r4881t commented Sep 7, 2024

@marklysze - Great thoughts and I agree with most of them.

  1. I also included generate_reply/a_generate_reply because in my use case, we use this as well and would be interested in having a structured output in this function call also.
  2. My question regarding creating structured output was more on the lines of multiple communications. So typically in a two agent chat, I have noticed that there's LLM call -> Tools picked up and executed -> Answer generated -> Some to and fro -> Final answer. So if we add the output_format to each to & fro, then it may not be the case always. So some prompting technique may be required such that the final answer is only in the output_format and intermediate conversations b/w agents can be in regular strings. A better to understand this is in the context of Group Chat. The conversation b/w various agents internally can happen in str but the final response from the initiate_chat should be in structured output.
  3. Currently Gemini, OpenAI are suporting structured output, so we need to add it at both places.

@marklysze
Copy link
Collaborator

Thanks @r4881t, noted on #1.

For #2, I think we're on the same page - I think an endstep with the formatted output makes sense. I think changing the internal communications would require a lot more work. With the endstep, my thoughts were to leave summarisation as is and take that summarisation output and then format it, rather than try and do it in there?

For #3, for the LLM-level structured output, agreed, both places.

@r4881t
Copy link
Collaborator

r4881t commented Sep 9, 2024

for 2, I think that's wise to add an extra step to convert into specific structure. Reduces all complexity. I will start working on a PR for this. @marklysze

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants