Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for mapping Python types to CombinedValidator #1337

Open
adriangb opened this issue Jun 23, 2024 · 17 comments
Open

Proposal for mapping Python types to CombinedValidator #1337

adriangb opened this issue Jun 23, 2024 · 17 comments
Assignees
Labels

Comments

@adriangb
Copy link
Member

adriangb commented Jun 23, 2024

Currently because of configs it is not possible to map a type to a SchemaValidator or CoreSchema. Consider:

class Inner(BaseModel):
    x: dict[str, list[float]]

    model_config = ConfigDict(allow_inf_nan=False)

class Outer(BaseModel):
    inner: Inner
    x: dict[str, list[float]]

    model_config = ConfigDict(allow_inf_nan=True)

Edit by @Viicos: allow_inf_nan actually doesn't seem to alter the core schema of x. One example that does however is use_enum_values.

Because of this we can’t just have a type_cache: dict[type, CoreSchema] when we build a schema from types in Pydantic: although dict[str, list[float]] shows up in two places the actual schema differs because of the model config.

To get around this I propose that we establish a rule: “every validator must be derivable from the type and only the type”. That means that all of the things coming from configs have to live elsewhere. For this I propose we (1) move all config things to ValidationState and (2) introduce a validator that mutates ValidationState to set the current config.

For example, we’d have {‘type’: ‘config’, ‘config’: {‘allow_inf_nan’: True}, ‘schema’: {…}}. At runtime this would mutate the validation context. And FloatValidator would pull that configuration from the context instead of storing it on the struct. Now both CoreSchema::Float and FloatValidator are immutable respect to the type and thus we can map from a type to a CoreSchema or SchemaValidator.

I expect this will slightly simplify some code (it essentially merges runtime parameters and compile time parameters into one code path where compile time just means a compile time defined mutation of runtime parameters) and have a minimal performance impact (we can still convert all configs to rust structs ahead of time, it’s just a struct being passed in through context vs hardcoded on the validator).

I think we might as well also establish or consider current behavior wrt rules for config merging and interaction with runtime parameters.

@davidhewitt @sydney-runkle wdyt?

@samuelcolvin
Copy link
Member

In principle, sounds good to me.

My concern is performance - there may be some config options that result in significantly different validators being built, I've no idea if those switches are widely used, but if they are, it could be a pain.

@adriangb
Copy link
Member Author

Good point. I know we do that with float validators. Do you think that will be a big performance impact? I'm hoping we can solve startup performance and feature requests like a replace types config with this sort of thing.

@samuelcolvin
Copy link
Member

samuelcolvin commented Jun 24, 2024

My guess is the impact will be small, especially if we do something like

if let Some(config) = opt_config {
  if let Some(first_thing) = config.first_thing {
     ...
  }
  if let Some(other_thing) = config.other_thing {
     ...
  }
}

Instead of just

  if let Some(first_thing) = config.first_thing {
     ...
  }
  if let Some(other_thing) = config.other_thing {
     ...
  }

So for the common case we only have one branch point. We could perhaps also make some validators generic, and thereby compile multiple configs, then choose them based on config?

@adriangb
Copy link
Member Author

Yeah seems like something we can work to optimize if needed.

@davidhewitt
Copy link
Contributor

Agreed on all of the above, I think this will make things like #993 much easier to reason about

@sydney-runkle
Copy link
Member

This seems like a good idea to me.

I have some concerns about how complex our pydantic-core logic is when it comes to managing serialization preferences, especially when it comes to differentiating settings that are pushed down to nested models (like runtime flags) vs config settings, for which we generally don't do this. For example, consider these structures:

So we have a SerializationState struct, which has a SerializationConfig field. That struct is constructed via individual settings passed to SerializationState, like inf_nan_mode. SerializationState has an extra method that returns an Extra struct instance based on information pulled from the SerializationState instance, which is where we find a lot of the config settings like by_alias, etc, but this also includes a reference to &self.config, which is that SerializationConfig instance.

If you're not already confused, it gets worse...

All of this to say, I think that the changes you're proposing sound good, and we might be able to refactor some of the above in the process.

@adriangb
Copy link
Member Author

I started an attempt in #1341. If we can keep this backwards compatible that would be a huge win. But probably still worth doing as much as reasonable in 1 PR just to fully understand the impact of the change.

@adriangb
Copy link
Member Author

adriangb commented Jun 24, 2024

One thing to think about: this change means that anything with a config always pushes it down. E.g. right now you can have strict on a list but not it's items (I'm not sure if that's possible or happens via public Python or pydantic APIs but it certainly is possible from the pydantic-core Rust APIs point of view). This change would make that strict get pushed down to the items. Maybe this is fine, but I worry there are other cases where it's not fine.

One way around this would be to add a ResetConfig validator that resets it to default (or to some previous config?) so we can do ApplyConfig -> List -> ResetConfig(PreviousConfig) if we encounter {'type': 'list', 'strict': True, ...}.
I'm guessing we generate {'type': 'list', 'strict': True, ...} from Annotated[list[...], Field(strict=True)] or similar (as opposed to x: list[...] on a model with strict=True). So we need to make sure that with this change we can map Annotated[list[...], Field(strict=True)] -> ApplyConfig -> List -> Items and list[...] -> List -> Items for things to work nicely. But then the above proposal of inserting a ResetConfig between List and Items wouldn't really work. We could say strict is part of the type but that seems dangerous conceptually.

@sydney-runkle
Copy link
Member

One thing to think about: this change means that anything with a config always pushes it down.

I'm concerned about making a change like this in a minor version - I fear this would be a breaking change for some folks...

@adriangb
Copy link
Member Author

Any ideas then what we do about things that can have values derived from configs that don't get pushed down? Are those a separate case?

@sydney-runkle
Copy link
Member

I'm a bit stumped here. Perhaps a good convo for our oss meeting tomorrow.

Just a concrete example for the current strict behavior:

from pydantic import BaseModel, ConfigDict


# note, strict defaults to false
class NotStrictModel(BaseModel):
    b: int

class StrictModel(BaseModel):
    a: int
    not_strict_model: NotStrictModel

    model_config = ConfigDict(strict=True)


# currently works, strict isn't pushed down to submodels
sm = StrictModel(**{'a': 1, 'not_strict_model': {'b': '2'}})
print(repr(sm))
#> StrictModel(a=1, not_strict_model=NotStrictModel(b=2))

@adriangb
Copy link
Member Author

I think that example would be fine. Models / dataclasses always stop configs from being pushed down. They'd continue to do that by (1) always having a default config in Pydntic and thus (2) always inserting an ApplyConfig thing.

@sydney-runkle
Copy link
Member

Any ideas then what we do about things that can have values derived from configs that don't get pushed down? Are those a separate case?
We could say strict is part of the type but that seems dangerous conceptually.

What feels dangerous conceptually here?

@sydney-runkle
Copy link
Member

This would fix pydantic/pydantic#8326

@adriangb
Copy link
Member Author

Any ideas then what we do about things that can have values derived from configs that don't get pushed down? Are those a separate case?

We could say strict is part of the type but that seems dangerous conceptually.

What feels dangerous conceptually here?

Just making special cases. And saying that strict is part of the type when it's not really part of the type system.

@adriangb
Copy link
Member Author

The easier change, but I fear would be breaking, would be to say that if you do Strict[list[T]] strict gets applied to T unless you do Strict[list[Lax[T]] or something?

@sydney-runkle
Copy link
Member

The easier change, but I fear would be breaking, would be to say that if you do Strict[list[T]] strict gets applied to T unless you do Strict[list[Lax[T]] or something?

I think I'm in favor of this change. I'd like to pick this work up soon and move forward with that approach, maybe enhancing later with the ResetConfig.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants