Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Pydantic Model as Signature InputField type #1904

Open
rohitgarud opened this issue Dec 8, 2024 · 9 comments
Open

Support for Pydantic Model as Signature InputField type #1904

rohitgarud opened this issue Dec 8, 2024 · 9 comments

Comments

@rohitgarud
Copy link

Currently, only Signature OutputField can have Pydantic Model as a type. Adding it to InputField does not reflect in the System prompt

@okhat
Copy link
Collaborator

okhat commented Dec 8, 2024

Hey @rohitgarud , it works though, right? It just doesn't describe the schema. We don't need to describe the schema of the input, since we show the input value to the model.

@rohitgarud
Copy link
Author

rohitgarud commented Dec 8, 2024

Yes, @okhat it works, I was wondering if adding the description of the fields in the input Pydantic model can help use the provided information better? Was the decision to leave out the InputField schema based on some experimentation or on a judgment and an obvious fact that the data itself is available to the model? I will try to add schema to the input field using the custom adapter and check if you have not tried it previously

@okhat
Copy link
Collaborator

okhat commented Dec 8, 2024

It was an analytic decision. Why should we show the schema to the model, if it sees the actual structure + values anyway?

However, I'm interested in whether there are edge cases where showing the input schema is important.

@rohitgarud
Copy link
Author

rohitgarud commented Dec 8, 2024

What would you prefer:

  1. multiple (say 5-6) InputFields with descriptions
  2. single InputField with Pydantic model and show the schema using a custom adapter

@okhat
Copy link
Collaborator

okhat commented Dec 8, 2024

Of course the first one :D

@rohitgarud
Copy link
Author

In that case, what happens if the data for any InputField is optional and missing?.. I will be testing this, but I wanted to get your intuition on this

@rohitgarud
Copy link
Author

rohitgarud commented Dec 9, 2024

What would you prefer:

  1. multiple (say 5-6) InputFields with descriptions
  2. single InputField with Pydantic model and show the schema using a custom adapter

Going against your suggestion, went with option 2 😄 by doing this in the custom adapter (a little hacky but will be improving later):

def field_metadata(field_name, field_info):
        type_ = field_info.annotation

        if type_ is str:
            desc = ""
        elif type_ is bool:
            desc = "must be True or False"
        elif type_ in (int, float):
            desc = f"must be a single {type_.__name__} value"
        elif inspect.isclass(type_) and issubclass(type_, enum.Enum):
            desc = f"must be one of: {'; '.join(type_.__members__)}"
        elif hasattr(type_, "__origin__") and type_.__origin__ is Literal:
            desc = f"must be one of: {'; '.join([str(x) for x in type_.__args__])}"  # noqa: E501
        else:
            desc = (
                "must be pareseable according to the following JSON schema: "
            )
            processed_schema = ProcessSchema(
                schema=TypeAdapter(type_).json_schema()
            ).transform_schema()
            desc += processed_schema
        desc = (
            (" " * 8) + f"# note: the value you produce {desc}" if desc else ""
        )

        if get_dspy_field_type(field_info) == "input":
            desc = desc.replace(
                "# note: the value you produce must be", "#note: input will be"
            )
            desc = desc.replace("pareseable according to the", "having")

        return f"{{{field_name}}}{desc}"

Rationale:

  • Fields and descriptions of inputs and outputs stay at a single location inside models, keeping the program cleaner
  • Can use nested Pydantic models for input
  • As I am processing the schema before injecting into the system prompt using custom adapter the increase in number of tokens is manageable

Please let me know your thoughts on this

@thomasahle
Copy link
Collaborator

Sometimes a codebase may already have a bunch of pydantic types it uses. It's inconvenient to "unpack" them into a signature, rather than just using them directly. So maybe some automatic "unpacking formatter" is not a bad idea.

@rohitgarud
Copy link
Author

Thank you @thomasahle, I would really appreciate if you can review this approach for unpacking formatter. Works really well with even smaller models. I think we can still use the Structured Output feature from the providers. Usage is shown above in ProcessSchema

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants