Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix sequence like validator with strict True #8977

Merged
merged 10 commits into from
Mar 26, 2024

Conversation

andresliszt
Copy link
Contributor

@andresliszt andresliszt commented Mar 8, 2024

Fix #8930

Change Summary

This PR aims to solve #8930. Until now, when strict=True is passed in the config dict, only the type of the elements of the sequence fields is forced. Now with this change, both the type of the sequence and the items will be validated strictly. This validation is done perfectly in pydantic-core, the problem was that this information was not sent through the json schema to pydantic-core for sequence like types. However, the model_validate_json method, from what I understand, needs to be fixed in pydantic-core. An example for the last is shown below and also there is test marked with xfail in this PR

  class LaxModel(BaseModel):
      x: List[str]
      model_config = ConfigDict(strict=False)
      
 >>> LaxModel.model_validate_json(json.dumps({'x': ('a', 'b', 'c')}), strict=True)
 # No error. 

The example has no error, strict=True in model_validate_json should overwrite config dict strict=False (To be honest, I make this statement looking at the numerous examples in the tests, where it is seen that it overwrites)

Related issue number

#8930

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @dmontagu

Copy link

codspeed-hq bot commented Mar 8, 2024

CodSpeed Performance Report

Merging #8977 will degrade performances by 19.35%

Comparing andresliszt:fix/strict-iterable-like (cd2d802) with main (e58134b)

Summary

⚡ 5 improvements
❌ 3 (👁 3) regressions
✅ 2 untouched benchmarks

Benchmarks breakdown

Benchmark main andresliszt:fix/strict-iterable-like Change
test_fastapi_startup_perf 3.6 s 1.7 s ×2.2
test_fastapi_startup_perf 1,272 ms 193.4 ms ×6.6
test_north_star_json_dumps 179.6 ms 123.1 ms +45.9%
👁 test_north_star_json_loads 98.3 ms 116.9 ms -15.9%
test_north_star_validate_json 296.6 ms 258.1 ms +14.91%
test_north_star_validate_json_strict 293.1 ms 257.4 ms +13.88%
👁 test_north_star_validate_python 169.2 ms 204.8 ms -17.36%
👁 test_north_star_validate_python_strict 109.3 ms 135.6 ms -19.35%

@andresliszt
Copy link
Contributor Author

please review

@sydney-runkle
Copy link
Member

@andresliszt,

Ugh, looks like this is a reflection of the issues we currently have with sequence type validation / schema building. It's odd that we don't even use the _set_schema() logic for this (something I need to fix when I refactor the sequence validators.

I'd prefer something like this, as a patch fix for now:

diff --git a/pydantic/_internal/_std_types_schema.py b/pydantic/_internal/_std_types_schema.py
index 5c61d8f0..77d5fc37 100644
--- a/pydantic/_internal/_std_types_schema.py
+++ b/pydantic/_internal/_std_types_schema.py
@@ -406,9 +406,21 @@ def sequence_like_prepare_pydantic_annotations(
     item_source_type = args[0]
 
     metadata, remaining_annotations = _known_annotated_metadata.collect_known_metadata(annotations)
-    _known_annotated_metadata.check_metadata(metadata, _known_annotated_metadata.SEQUENCE_CONSTRAINTS, source_type)
+    _known_annotated_metadata.check_metadata(
+        metadata, [*_known_annotated_metadata.SEQUENCE_CONSTRAINTS, *_known_annotated_metadata.STRICT], source_type
+    )
 
-    return (source_type, [SequenceValidator(mapped_origin, item_source_type, **metadata), *remaining_annotations])
+    strict = metadata.pop('strict', _config.get('strict', False))
+
+    return (
+        source_type,
+        [
+            SequenceValidator(
+                mapped_origin=mapped_origin, item_source_type=item_source_type, strict=strict, **metadata
+            ),
+            *remaining_annotations,
+        ],
+    )

Even if we resolve this with this patch (or something similar), I'd still like to keep an eye on this issue as an example of what's wrong with our sequence schema building! Thanks for your help!

Also, re your xfail test, I think it is desired to have model config take precedence over runtime serialization flags, so I think the behavior that you're seeing is expected.

@sydney-runkle
Copy link
Member

Ah actually, here's a more simple fix:

diff --git a/pydantic/_internal/_std_types_schema.py b/pydantic/_internal/_std_types_schema.py
index 5c61d8f0..90ed7b6d 100644
--- a/pydantic/_internal/_std_types_schema.py
+++ b/pydantic/_internal/_std_types_schema.py
@@ -288,7 +288,7 @@ class SequenceValidator:
     item_source_type: type[Any]
     min_length: int | None = None
     max_length: int | None = None
-    strict: bool = False
+    strict: bool | None = None
 
     def serialize_sequence_via_list(
         self, v: Any, handler: core_schema.SerializerFunctionWrapHandler, info: core_schema.SerializationInfo

@sydney-runkle
Copy link
Member

Another issue here that I stumbled upon during my testing is that you can't apply strict directly to a Set[int] field (which you should be able to), but that can be addressed separately.

@sydney-runkle
Copy link
Member

@andresliszt,

Any chance you'll be able to update this in the next few days? We'll be doing a new release soon, and I want to make sure we can get your fix in :).

@andresliszt
Copy link
Contributor Author

andresliszt commented Mar 21, 2024

@sydney-runkle hey hello!, sure, are you requesting this change strict = metadata.pop('strict', _config.get('strict', False)) avoiding strict duplicates (if it's present in both, metadata and config), right?. Because my change is
SequenceValidator(mapped_origin, item_source_type, **metadata, strict=_config.get('strict', False)), so if metadata includes strict key, it's going to send it twice

@sydney-runkle
Copy link
Member

@andresliszt,

I think you should be able to just implement this 1 line diff to fix the issue (but definitely keep your tests)!

diff --git a/pydantic/_internal/_std_types_schema.py b/pydantic/_internal/_std_types_schema.py
index 5c61d8f0..90ed7b6d 100644
--- a/pydantic/_internal/_std_types_schema.py
+++ b/pydantic/_internal/_std_types_schema.py
@@ -288,7 +288,7 @@ class SequenceValidator:
     item_source_type: type[Any]
     min_length: int | None = None
     max_length: int | None = None
-    strict: bool = False
+    strict: bool | None = None
 
     def serialize_sequence_via_list(
         self, v: Any, handler: core_schema.SerializerFunctionWrapHandler, info: core_schema.SerializationInfo

@andresliszt
Copy link
Contributor Author

Nice i will review it once i finish working!. Im super curious, why that line makes the magic

@sydney-runkle
Copy link
Member

@andresliszt,

The issue here is that if you have a model with strict=True and don't specify strict for some given sequence, it was defaulting to False. We don't want to default for specific types like this, because that messes up the behavior we should get by respecting the model's strict specification.

Thus, defaulting to None here fixes that issue, and follows the pattern we see with other settings for the sequence validator, like min_length and max_length :).

@andresliszt andresliszt force-pushed the fix/strict-iterable-like branch from ad435e1 to b9dfaf2 Compare March 21, 2024 20:14
@andresliszt
Copy link
Contributor Author

@sydney-runkle i pushed the one line change and it's failing, maybe i missed something? I still not understanding how the SequenceValidator gets the True value if it's never passed, in my original fix after debugging i noticed that strict=True was never passed to this validator, it was never included in metadata, and the current code was not getting it from config

@sydney-runkle
Copy link
Member

@andresliszt,

Hmm, that's odd. I thought I had it working with just the one liner. I'll pull down the code tomorrow morning and take a closer work. Thanks for your work on this!!

@andresliszt
Copy link
Contributor Author

It is failing on a test i added, so probably your fix is ok! Will review it tomorrow as well

@andresliszt
Copy link
Contributor Author

It's failing on an existing test.

def test_deque_generic_success_strict(cls, value: Any, result):

I think is because deque and Counter are being validated using list_schema here
constrained_schema = core_schema.list_schema(items_schema, **metadata)

So with the fix, passing deque to the test model fails because it is expecting a list in strict mode. So I think currently deque and Counter are not supported to be strict. Now i remember that i added an overwrite metadata['strict'] = False in my original commit, in that part of the code to avoid forcing strict mode for those types. I don't know if that is expected anyways, if that fixes the issue i can add that line again!

@sydney-runkle
Copy link
Member

@andresliszt,

Currently debugging. I can't reproduce the test failure locally, for some reason...

Comment on lines +2380 to +2401
@pytest.mark.xfail(
reason='strict=True in model_validate_json does not overwrite strict=False given in ConfigDict'
'See issue: https://github.com/pydantic/pydantic/issues/8930'
)
def test_model_validate_list_strict() -> None:
# FIXME: This change must be implemented in pydantic-core. The argument strict=True
# in model_validate_json method is not overwriting the one set with ConfigDict(strict=False)
# for sequence like types. See: https://github.com/pydantic/pydantic/issues/8930

class LaxModel(BaseModel):
x: List[str]
model_config = ConfigDict(strict=False)

assert LaxModel.model_validate_json(json.dumps({'x': ('a', 'b', 'c')}), strict=None) == LaxModel(x=('a', 'b', 'c'))
assert LaxModel.model_validate_json(json.dumps({'x': ('a', 'b', 'c')}), strict=False) == LaxModel(x=('a', 'b', 'c'))
with pytest.raises(ValidationError) as exc_info:
LaxModel.model_validate_json(json.dumps({'x': ('a', 'b', 'c')}), strict=True)
assert exc_info.value.errors(include_url=False) == [
{'type': 'list_type', 'loc': ('x',), 'msg': 'Input should be a valid list', 'input': ('a', 'b', 'c')}
]


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidhewitt, interesting rust bug here!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to chat with @davidhewitt about this PR tomorrow and will get back to you :).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AHHHHH I can reproduce now. Dumb error on my end :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is actually an issue with the deque schema generation, not with the solution we've implemented :).

@sydney-runkle
Copy link
Member

You're right, we have to add that strict override back for the list, but I don't think we need the other change to the initialization of SequenceValidator. Thanks again for your great work on this!

Copy link
Member

@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good now, great work!

@sydney-runkle
Copy link
Member

Acknowledged benchmarks don't truly represent a regression (this is just bc this was opened before we moved the benchmarks to 3.12)

@sydney-runkle sydney-runkle enabled auto-merge (squash) March 26, 2024 01:50
@sydney-runkle sydney-runkle merged commit af3d335 into pydantic:main Mar 26, 2024
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

validate_call makes an unexpected conversion
3 participants