Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature-request: Pydantic validators+serializers to be able to round-trip all supported types #558

Open
charles-dyfis-net opened this issue Jun 27, 2024 · 11 comments

Comments

@charles-dyfis-net
Copy link

Right now, pydantic can't instantiate a TypeAdapter for anthropic.types.MessageParam on account of the support for file-like objects (which, by nature, can't be serialized to JSON) in the data types used for image support. Attempting to instantiate a pydantic.TypeAdapter(anthropic.types.MessageParam) will throw a PydanticSchemaGenerationError, because typing.IO[bytes] can't be represented as a pydantic_core schema.

If instead of using TypedDict the various classes were implemented as dataclasses (ideally using pydantic.dataclasses.dataclass), or were implemented using subclasses of pydantic.BaseModel, these classes could define custom serializers to convert into JSON-representable data -- for example, serializing a file-like object by actually reading its content into memory.

@rattrayalex
Copy link
Collaborator

rattrayalex commented Jun 27, 2024

Attempting to instantiate a pydantic.TypeAdapter(anthropic.types.MessageParam)

Can you share more about your use-case? What are you trying to do?

cc @RobertCraigie

@charles-dyfis-net
Copy link
Author

Attempting to instantiate a pydantic.TypeAdapter(anthropic.types.MessageParam)

Can you share more about your use-case? What are you trying to do?

Sure -- I'm trying to serialize pending LLM requests to JSON to put them in a work queue, with a consumer for each backend able to execute them (one per Bedrock region with the appropriate model, one for Anthropic first-party, &c) and then deserialize and run those requests.

@lingster
Copy link

lingster commented Jul 4, 2024

Rather than serialising the entire object, if it's a file could you not store in an s3 or R2 bucket and serialize the url and just add that to your queue.

@charles-dyfis-net
Copy link
Author

charles-dyfis-net commented Jul 4, 2024

Rather than serialising the entire object, if it's a file could you not store in an s3 or R2 bucket and serialize the url and just add that to your queue.

I don't actually need to serialize file-like objects.

Thing is, that doesn't matter: Because file-like objects are possible in a MessageParam, I can't instantiate a pydantic.TypeAdapter for MessageParam instances; Pydantic wants to be able to build a JSONSchema description of the type, so as long as there's something in the union that can't be represented in JSONSchema, the TypeAdapter instantiation fails during introspection before ever looking at the individual instance and what values are or aren't present.

That's the point of adding a serializer that replaces those objects with their content: the act of doing so will make messages serializable in practice even if they don't use the option to have a file handle attached, and it'll do so losslessly (in a way that lets folks use the Anthropic API and Pydantic together in a way that's natural to each and adds no extra configuration or dependencies); perhaps a bit inefficient compared to S3 or R2, but someone who cares about that inefficiency and is willing to add new service dependencies can add their own code to store content out-of-band as they see fit.

@rattrayalex
Copy link
Collaborator

@charles-dyfis-net can you share a full example of the code you'd like to be able to write, and what you have to do today?

@rattrayalex
Copy link
Collaborator

Have you looked at our .to_json() helpers? Do they help at all?

@charles-dyfis-net
Copy link
Author

Have you looked at our .to_json() helpers? Do they help at all?

I haven't; if there exist corresponding from_json() helpers to be able to round-trip back to an object, that would be exactly what I need.

@rattrayalex
Copy link
Collaborator

rattrayalex commented Jul 6, 2024

mmm, I think something like Message.from_json('{"foo": …}') could make sense!

FWIW, I'd expect this to internally look roughy like this:

data = json.loads(…)
return Message.build(**data)

care to give that a try and see how it goes for you?

@charles-dyfis-net
Copy link
Author

Thank you -- I'll do that, hopefully within the next few days. (I'd still prefer to see Pydantic's (de)serialization work out-of-the-box, so folks don't need to implement logic specific to the Anthropic SDK, but if this does in fact work as advertised that reduces the priority / pain level significantly).

@rattrayalex
Copy link
Collaborator

Great, let me know what you find!

@RobertCraigie
Copy link
Collaborator

Hi @charles-dyfis-net, in the next release you'll be able to use MessageParam with TypeAdapters :)

Image params won't serialise properly yet as we haven't defined a custom serialiser to handle file inputs, will have more to share on that front soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants