Proposal for mapping Python types to CombinedValidator #1337

adriangb · 2024-06-23T19:47:02Z

Currently because of configs it is not possible to map a type to a SchemaValidator or CoreSchema. Consider:

class Inner(BaseModel):
    x: dict[str, list[float]]

    model_config = ConfigDict(allow_inf_nan=False)

class Outer(BaseModel):
    inner: Inner
    x: dict[str, list[float]]

    model_config = ConfigDict(allow_inf_nan=True)

Edit by @Viicos: allow_inf_nan actually doesn't seem to alter the core schema of x. One example that does however is use_enum_values.

Because of this we can’t just have a type_cache: dict[type, CoreSchema] when we build a schema from types in Pydantic: although dict[str, list[float]] shows up in two places the actual schema differs because of the model config.

To get around this I propose that we establish a rule: “every validator must be derivable from the type and only the type”. That means that all of the things coming from configs have to live elsewhere. For this I propose we (1) move all config things to ValidationState and (2) introduce a validator that mutates ValidationState to set the current config.

For example, we’d have {‘type’: ‘config’, ‘config’: {‘allow_inf_nan’: True}, ‘schema’: {…}}. At runtime this would mutate the validation context. And FloatValidator would pull that configuration from the context instead of storing it on the struct. Now both CoreSchema::Float and FloatValidator are immutable respect to the type and thus we can map from a type to a CoreSchema or SchemaValidator.

I expect this will slightly simplify some code (it essentially merges runtime parameters and compile time parameters into one code path where compile time just means a compile time defined mutation of runtime parameters) and have a minimal performance impact (we can still convert all configs to rust structs ahead of time, it’s just a struct being passed in through context vs hardcoded on the validator).

I think we might as well also establish or consider current behavior wrt rules for config merging and interaction with runtime parameters.

@davidhewitt @sydney-runkle wdyt?

The text was updated successfully, but these errors were encountered:

samuelcolvin · 2024-06-24T07:35:35Z

In principle, sounds good to me.

My concern is performance - there may be some config options that result in significantly different validators being built, I've no idea if those switches are widely used, but if they are, it could be a pain.

adriangb · 2024-06-24T11:53:19Z

Good point. I know we do that with float validators. Do you think that will be a big performance impact? I'm hoping we can solve startup performance and feature requests like a replace types config with this sort of thing.

samuelcolvin · 2024-06-24T11:57:14Z

My guess is the impact will be small, especially if we do something like

if let Some(config) = opt_config {
  if let Some(first_thing) = config.first_thing {
     ...
  }
  if let Some(other_thing) = config.other_thing {
     ...
  }
}

Instead of just

  if let Some(first_thing) = config.first_thing {
     ...
  }
  if let Some(other_thing) = config.other_thing {
     ...
  }

So for the common case we only have one branch point. We could perhaps also make some validators generic, and thereby compile multiple configs, then choose them based on config?

adriangb · 2024-06-24T12:19:06Z

Yeah seems like something we can work to optimize if needed.

davidhewitt · 2024-06-24T14:40:24Z

Agreed on all of the above, I think this will make things like #993 much easier to reason about

sydney-runkle · 2024-06-24T14:41:14Z

This seems like a good idea to me.

I have some concerns about how complex our pydantic-core logic is when it comes to managing serialization preferences, especially when it comes to differentiating settings that are pushed down to nested models (like runtime flags) vs config settings, for which we generally don't do this. For example, consider these structures:

So we have a SerializationState struct, which has a SerializationConfig field. That struct is constructed via individual settings passed to SerializationState, like inf_nan_mode. SerializationState has an extra method that returns an Extra struct instance based on information pulled from the SerializationState instance, which is where we find a lot of the config settings like by_alias, etc, but this also includes a reference to &self.config, which is that SerializationConfig instance.

If you're not already confused, it gets worse...

All of this to say, I think that the changes you're proposing sound good, and we might be able to refactor some of the above in the process.

adriangb · 2024-06-24T15:13:00Z

I started an attempt in #1341. If we can keep this backwards compatible that would be a huge win. But probably still worth doing as much as reasonable in 1 PR just to fully understand the impact of the change.

adriangb · 2024-06-24T15:16:31Z

One thing to think about: this change means that anything with a config always pushes it down. E.g. right now you can have strict on a list but not it's items (I'm not sure if that's possible or happens via public Python or pydantic APIs but it certainly is possible from the pydantic-core Rust APIs point of view). This change would make that strict get pushed down to the items. Maybe this is fine, but I worry there are other cases where it's not fine.

One way around this would be to add a ResetConfig validator that resets it to default (or to some previous config?) so we can do ApplyConfig -> List -> ResetConfig(PreviousConfig) if we encounter {'type': 'list', 'strict': True, ...}.
I'm guessing we generate {'type': 'list', 'strict': True, ...} from Annotated[list[...], Field(strict=True)] or similar (as opposed to x: list[...] on a model with strict=True). So we need to make sure that with this change we can map Annotated[list[...], Field(strict=True)] -> ApplyConfig -> List -> Items and list[...] -> List -> Items for things to work nicely. But then the above proposal of inserting a ResetConfig between List and Items wouldn't really work. We could say strict is part of the type but that seems dangerous conceptually.

sydney-runkle · 2024-06-24T17:52:54Z

One thing to think about: this change means that anything with a config always pushes it down.

I'm concerned about making a change like this in a minor version - I fear this would be a breaking change for some folks...

adriangb · 2024-06-24T17:54:10Z

Any ideas then what we do about things that can have values derived from configs that don't get pushed down? Are those a separate case?

sydney-runkle · 2024-06-24T18:02:42Z

I'm a bit stumped here. Perhaps a good convo for our oss meeting tomorrow.

Just a concrete example for the current strict behavior:

from pydantic import BaseModel, ConfigDict


# note, strict defaults to false
class NotStrictModel(BaseModel):
    b: int

class StrictModel(BaseModel):
    a: int
    not_strict_model: NotStrictModel

    model_config = ConfigDict(strict=True)


# currently works, strict isn't pushed down to submodels
sm = StrictModel(**{'a': 1, 'not_strict_model': {'b': '2'}})
print(repr(sm))
#> StrictModel(a=1, not_strict_model=NotStrictModel(b=2))

adriangb · 2024-06-24T18:04:36Z

I think that example would be fine. Models / dataclasses always stop configs from being pushed down. They'd continue to do that by (1) always having a default config in Pydntic and thus (2) always inserting an ApplyConfig thing.

sydney-runkle · 2024-07-30T15:04:24Z

Any ideas then what we do about things that can have values derived from configs that don't get pushed down? Are those a separate case?
We could say strict is part of the type but that seems dangerous conceptually.

What feels dangerous conceptually here?

sydney-runkle · 2024-09-16T18:04:35Z

This would fix pydantic/pydantic#8326

adriangb · 2024-09-17T09:37:22Z

Any ideas then what we do about things that can have values derived from configs that don't get pushed down? Are those a separate case?

We could say strict is part of the type but that seems dangerous conceptually.

What feels dangerous conceptually here?

Just making special cases. And saying that strict is part of the type when it's not really part of the type system.

adriangb · 2024-09-17T09:42:14Z

The easier change, but I fear would be breaking, would be to say that if you do Strict[list[T]] strict gets applied to T unless you do Strict[list[Lax[T]] or something?

sydney-runkle · 2024-09-17T14:22:42Z

The easier change, but I fear would be breaking, would be to say that if you do Strict[list[T]] strict gets applied to T unless you do Strict[list[Lax[T]] or something?

I think I'm in favor of this change. I'd like to pick this work up soon and move forward with that approach, maybe enhancing later with the ResetConfig.

adriangb mentioned this issue Jun 24, 2024

Make config wrap validators #1341

Draft

sydney-runkle added the refactor label Aug 16, 2024

sydney-runkle self-assigned this Sep 17, 2024

sydney-runkle mentioned this issue Sep 17, 2024

pydantic work plan for Sydney Runkle, September 16th - 20th, 2024 pydantic/pydantic#10415

Closed

11 tasks

Viicos mentioned this issue Nov 5, 2024

Weird issue when using use_enum_values in Pydantic v2 pydantic/pydantic#10570

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for mapping Python types to CombinedValidator #1337

Proposal for mapping Python types to CombinedValidator #1337

adriangb commented Jun 23, 2024 •

edited by Viicos

Loading

samuelcolvin commented Jun 24, 2024

adriangb commented Jun 24, 2024

samuelcolvin commented Jun 24, 2024 •

edited

Loading

adriangb commented Jun 24, 2024

davidhewitt commented Jun 24, 2024

sydney-runkle commented Jun 24, 2024

adriangb commented Jun 24, 2024

adriangb commented Jun 24, 2024 •

edited

Loading

sydney-runkle commented Jun 24, 2024

adriangb commented Jun 24, 2024

sydney-runkle commented Jun 24, 2024

adriangb commented Jun 24, 2024

sydney-runkle commented Jul 30, 2024

sydney-runkle commented Sep 16, 2024

adriangb commented Sep 17, 2024

adriangb commented Sep 17, 2024

sydney-runkle commented Sep 17, 2024

Proposal for mapping Python types to CombinedValidator #1337

Proposal for mapping Python types to CombinedValidator #1337

Comments

adriangb commented Jun 23, 2024 • edited by Viicos Loading

samuelcolvin commented Jun 24, 2024

adriangb commented Jun 24, 2024

samuelcolvin commented Jun 24, 2024 • edited Loading

adriangb commented Jun 24, 2024

davidhewitt commented Jun 24, 2024

sydney-runkle commented Jun 24, 2024

adriangb commented Jun 24, 2024

adriangb commented Jun 24, 2024 • edited Loading

sydney-runkle commented Jun 24, 2024

adriangb commented Jun 24, 2024

sydney-runkle commented Jun 24, 2024

adriangb commented Jun 24, 2024

sydney-runkle commented Jul 30, 2024

sydney-runkle commented Sep 16, 2024

adriangb commented Sep 17, 2024

adriangb commented Sep 17, 2024

sydney-runkle commented Sep 17, 2024

adriangb commented Jun 23, 2024 •

edited by Viicos

Loading

samuelcolvin commented Jun 24, 2024 •

edited

Loading

adriangb commented Jun 24, 2024 •

edited

Loading