Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate bytes based on ser_json_bytes #1308

Merged
merged 14 commits into from
Aug 1, 2024

Conversation

josh-newman
Copy link
Contributor

@josh-newman josh-newman commented May 31, 2024

Change Summary

ser_json_bytes transforms values (with base64 encoding) during serialization. But validation doesn't do a complementary base64 decode, so a serialization round-trip into the same model type yields an unequal object.

Related issue number

None for this directly. Other users have mentioned base64 decoding, though: pydantic/pydantic#7000 (comment)

Checklist

  • Unit tests for the changes exist
  • Documentation reflects the changes where applicable
  • Pydantic tests pass with this pydantic-core (except for expected changes)
  • My PR is ready to review

Selected Reviewer: @davidhewitt

@josh-newman
Copy link
Contributor Author

please review

@josh-newman
Copy link
Contributor Author

CC @jcharum

Copy link

codecov bot commented May 31, 2024

Codecov Report

Attention: Patch coverage is 85.13514% with 11 lines in your changes missing coverage. Please review.

Project coverage is 89.61%. Comparing base (ab503cb) to head (428f479).
Report is 126 commits behind head on main.

Files Patch % Lines
src/input/input_string.rs 0.00% 9 Missing ⚠️
src/input/input_json.rs 93.75% 1 Missing ⚠️
src/input/input_python.rs 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1308      +/-   ##
==========================================
- Coverage   90.21%   89.61%   -0.60%     
==========================================
  Files         106      109       +3     
  Lines       16339    17304     +965     
  Branches       36       41       +5     
==========================================
+ Hits        14740    15507     +767     
- Misses       1592     1777     +185     
- Partials        7       20      +13     
Files Coverage Δ
python/pydantic_core/core_schema.py 94.71% <100.00%> (-0.06%) ⬇️
src/errors/types.rs 99.44% <100.00%> (+<0.01%) ⬆️
src/input/input_abstract.rs 42.85% <ø> (-27.39%) ⬇️
src/lib.rs 97.72% <100.00%> (+10.58%) ⬆️
src/serializers/config.rs 94.38% <100.00%> (-0.07%) ⬇️
src/serializers/mod.rs 100.00% <ø> (ø)
src/validators/bytes.rs 100.00% <100.00%> (ø)
src/validators/config.rs 100.00% <100.00%> (ø)
src/validators/json.rs 98.64% <100.00%> (+0.21%) ⬆️
src/validators/mod.rs 96.05% <ø> (+0.02%) ⬆️
... and 4 more

... and 33 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 40b8a94...428f479. Read the comment docs.

@josh-newman
Copy link
Contributor Author

I'm also interested in accepting both standard and URL-safe base64 encoding. If this PR is acceptable, I plan to send a separate one for that: d8e-ai/pydantic-core@validate-base64...d8e-ai:pydantic-core:validate-base64-any

Copy link

codspeed-hq bot commented May 31, 2024

CodSpeed Performance Report

Merging #1308 will not alter performance

Comparing d8e-ai:validate-base64 (428f479) with main (40b8a94)

Summary

✅ 155 untouched benchmarks

@davidhewitt
Copy link
Contributor

Thanks for the PR! I fully support the use case and think this makes sense.

That said, I worry about silently breaking user code by changing the meaning of the existing option in this way. How about adding a new flag, e.g. json_bytes_encoding='base64' and preferring that over the existing flag?

Or we could add val_json_bytes so validation and serialization are controlled separately.

@josh-newman
Copy link
Contributor Author

Either option works for me. I don't currently know of a use case where someone would want base64 encoding only and not decoding, so I'd lean towards the new bidirectional encoding flag. But maybe that use case exists, or maybe there's a pattern in other Pydantic config to follow (that I'm not familiar with)?

I'm happy to make changes for whichever you recommend! (And please feel free to make changes yourself, too, if you prefer.)

@davidhewitt
Copy link
Contributor

Having discussed with @sydney-runkle and @samuelcolvin, I think we would prefer to go with a new val_json_bytes option so that users who do have a use-case where they don't want to change validation can continue to use ser_json_bytes.

I agree that the bidirectional encoding flag is probably what most people actually need, so if you strongly want that I'd probably accept it; with the individual flags it should be an easy layer on top which just sets both of them (and we can error if it is set as well as either individual flag).

It's a shame to have the complexity of both forms, but hey, it's where we've ended up. It's possible we could consider deprecating the individual flags in V3.

@josh-newman
Copy link
Contributor Author

Sounds good, I've switched to a new val_json_bytes. This should be ok for us; we're setting ser_json_bytes in a common place for many of our models so now we'll just set both.

I'm guessing there's some more work to do with the new config key (docs, Pydantic's ConfigDict). If this looks good I can open a PR on the other repo for that (maybe after the next release?). Or let me know if you normally handle these things differently.

Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good! I'd like to see a couple of changes for consistency with the rest of the library...

As for updating the Python code in pydantic, yes that's a good idea. I'd suggest opening a PR already, you can test it all locally and then add pytest.xfail markers in the PR which we can then remove when this support gets released in pydantic-core.

tests/test_json.py Outdated Show resolved Hide resolved
src/validators/config.rs Outdated Show resolved Hide resolved
josh-newman added a commit to d8e-ai/pydantic that referenced this pull request Jun 26, 2024
@josh-newman
Copy link
Contributor Author

I've created pydantic/pydantic#9770. I'm guessing it needs to wait until after the pydantic-core upgrade, otherwise a user may see this new option and be confused about why it doesn't work?

@josh-newman
Copy link
Contributor Author

I've created pydantic/pydantic#9772 to address the test-pydantic-integration failure.

@josh-newman
Copy link
Contributor Author

Well, I think I'm stuck 😅. pydantic/pydantic#9772 has test failures saying the new documented error type isn't in pydantic-core, and this PR has a CI failure saying the new error type isn't documented there.

Copy link
Contributor Author

@josh-newman josh-newman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I forgot to publish these comments earlier.)

tests/test_json.py Outdated Show resolved Hide resolved
src/validators/config.rs Outdated Show resolved Hide resolved
Copy link
Member

@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, just a few questions / comments :).

I'm back from vacation now, so can iterate with you on this more quickly so we can get this across the line for v2.9 🚀

@@ -16,7 +16,7 @@ pub use shared::CombinedSerializer;
use shared::{to_json_bytes, BuildSerializer, TypeSerializer};

mod computed_fields;
mod config;
pub(crate) mod config;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValBytesMode wraps this so it needs to be visible. I've narrowed the pub(crate) exposure to just BytesMode; is that ok?

I'm new to Rust so I don't know if there's a better way than wrapping. I want to make sure ValBytesMode defines the same variants as BytesMode and this seems to be the only way (other than maybe with macros).

Copy link
Contributor

@davidhewitt davidhewitt Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my thought of a common module would solve this #1308 (comment), just move BytesMode in there and have it know both serialization and deserialization.

src/validators/json.rs Show resolved Hide resolved
src/validators/mod.rs Outdated Show resolved Hide resolved
tests/test_errors.py Outdated Show resolved Hide resolved
@sydney-runkle
Copy link
Member

This looks good to me. We should get @davidhewitt's feedback on #1308 (comment) before we merge 👍

Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality looks great to me, maybe one implementation question...

src/validators/config.rs Show resolved Hide resolved
Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating to main should fix lint job, and then I am happy with this 👍

@sydney-runkle
Copy link
Member

@josh-newman,

Once you merge, I can create a new release of pydantic-core so that we can move your pydantic PR across the line as well :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants