-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional parameters to Schema for validation and annotation #311
Add additional parameters to Schema for validation and annotation #311
Conversation
Codecov Report
@@ Coverage Diff @@
## json-schema-extra #311 +/- ##
================================================
Coverage 100% 100%
================================================
Files 13 14 +1
Lines 1708 1770 +62
Branches 323 338 +15
================================================
+ Hits 1708 1770 +62 |
pydantic/fields.py
Outdated
:param gt: only applies to numbers, requires the field to be "greater than". The schema | ||
will have an ``exclusiveMinimum`` validation keyword | ||
:param ge: only applies to numbers, requires the field to be "greater than or equal to". The | ||
schema will have a ``minimum`` validation keyword |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subsequent lines need to be indented as above.
pydantic/fields.py
Outdated
schema will have a ``maxLength`` validation keyword | ||
:param regex: only applies to strings, requires the field match agains a regular expression | ||
pattern string. The schema will have a ``pattern`` validation keyword | ||
:param **extra: any additional keyword arguments will be added as is to the schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still true, is it valid in json-schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR: Yes, it's still valid JSON Schema. Extensibility by adding extra keywords is part of the standard.
Additionally, there are some keywords that are part of the standard and are not declared as parameters of the Schema
class. So, passing them as part of the **extra
is the current only option to include them.
For example, the examples
keyword could have a list of JSON "objects" (dict
s) with examples of valid JSON "documents". ...I might create a new PR later with auto-generation of an example for each generated schema. But for now, they could be declared by hand by developers / Pydantic users using the **extra
.
Also, I didn't add all the possible extra keywords as parameters to the Schema
class because there are some keywords that would apply a validation that doesn't have an equivalent in Pydantic yet. For example, max and min size of an array (list
). So, to not confuse users, I just added as parameters of Schema
the things that actually implement a validation on data. And what I think are the most common extra informational (non-validation) fields: title
and description
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
pydantic/fields.py
Outdated
@@ -36,16 +51,117 @@ class Validator(NamedTuple): | |||
|
|||
class Schema: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe best to move this into schema.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that pydantic now requires dataclasses, why don't we make Schema
a dataclass. It's just what they're made for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll move it to schema.py
. Makes sense.
I would like to use dataclasses, and avoid all that code / parameter duplication. The problem is that editors don't seem to support them very well yet. So, completion and type checks still depend on / expect a __init__
.
I guess soon, at some point PyCharm, Jedi and Python Language Server will end up supporting them and we'll be able to use them.
But for now, I would say it's better to keep the __init__
. Nevertheless, if you want me to, I can add the decorator to make it explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use dataclasses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I made it a dataclass with the decorator.
It still has the __init__
, let me know if you want me to remove it (although that would limit some of the editor's support 😨 ).
So, I tried moving Schema
to schema.py
, and I ran into several issues with circular imports. I tried importing the whole module (as it seems to work with some type declarations) and other "tricks". But I didn't find another way to solve it.
I'm putting the Schema
class in its own schema_class.py
file that is then imported by the rest, to reduce the noise in fields.py
while avoiding circular imports. It is still exported by schema.py
, but it is actually declared in schema_class.py
and imported from there. I would like to put it in schema.py
as it seems like the place it should live, but I don't know how to avoid the circular imports.
Let me know if that sounds OK, if you want me to put it back into fields.py
...or what should we do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave it here and I'll move it.
Oops, I had already moved it to schema_class.py
, I'm not sure what you refer to with here. Let me know if you want me to move it somewhere else.
Yes, please remove
__init__
, that's the whole point in dataclasses. PyCharm does lots of things badly, that shouldn't constrain what we do with python.
So, I was removing the __init__
and leaving it as a pure dataclass, and now I see that we can't declare **kwargs
using pure dataclasses. We need __init__
for that. How do you want me to proceed?
We (PyCharm) have more dataclasses work scheduled for next release. Let me know some specifics (separate ticket?) and I can get the right person involved.
That sounds great! Where should we have the conversation?
In summary:
from dataclasses import dataclass
@dataclass
class A:
a: str
b: int
my_a = A()
When calling my_a = A()
we should see autocompletion and type checking for the arguments passed to A()
as if the class was declared like:
class A:
def __init__(self, a: str, b: int):
self.a = a
self.b = b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiangolo Can we move it to another ticket in this repo? Then, if needed, I'll make a ticket in PyCharm repo and get Seymon (the one owning dataclasses) involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiangolo Can we move it to another ticket in this repo? Then, if needed, I'll make a ticket in PyCharm repo and get Seymon (the one owning dataclasses) involved.
Yeah, I guess. Although it wouldn't be specifically related to this repo... @samuelcolvin would that be ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or email me directly, paul dot everitt at jetbrains dot com
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pauleveritt thanks for the interest.
I was writing and documenting everything with screenshots, etc. But during the process, I realized that I had misconfigured the PyCharm environment in which I did the previous tests and that PyCharm actually does support dataclasses. All I was asking is already implemented.
So, up to now, PyCharm actually has the best dataclass support, the rest of the editors are the ones still behind.
pydantic/fields.py
Outdated
|
||
|
||
def get_annotation_from_schema(annotation, schema): | ||
"""Get an annotation with validation implemented for numbers and strings based on the schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(sorry to be a pedant) please move the first line of the docstring onto it's own line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. Done.
pydantic/fields.py
Outdated
isinstance(annotation, type) | ||
and issubclass(annotation, (str,) + _numeric_types) | ||
and not issubclass(annotation, bool) | ||
and not any([issubclass(annotation, c) for c in _blacklist]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
surely add bool
to _blacklist
, also you don't need square brackets here, indeed square brackets make this line lower by meaning issubclass
has to be evaluated for each item in _blacklist
even if the first is True
pydantic/schema.py
Outdated
|
||
|
||
def get_field_schema_validations(field): | ||
"""Get the JSON Schema validation keywords for a ``field`` with an annotation of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again docstrings in the format
"""
this is the first line.
"""
pydantic/schema.py
Outdated
a Pydantic ``Schema`` with validation arguments. | ||
""" | ||
f_schema = {} | ||
if isinstance(field.type_, type) and issubclass(field.type_, (str, bytes)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do this a lot, should we have a method lenient_issubclass
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
I added it to utils.py
.
Added a couple of tests for it.
Update the code that used that to use the new function.
and not issubclass(annotation, bool) | ||
and not any([issubclass(annotation, c) for c in _blacklist]) | ||
): | ||
if issubclass(annotation, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(str, bytes)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but as I'm calling constr
to generate the final type, it would always output str
, instead of bytes
.
I guess one option would be adding a conbytes
, or using the validators directly, like anystr_length_validator
, but I wanted to use internally the same functions that already exist.
What do you think we should do about that?
pydantic/schema.py
Outdated
@@ -140,6 +146,38 @@ def field_schema( | |||
return s, f_definitions | |||
|
|||
|
|||
numeric_types = (int, float, Decimal) | |||
_str_types_attrs = { | |||
'max_length': (numeric_types, 'maxLength'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe these should be 3 element tuples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I was initially planning on using it differently. But now it doesn't make sense to have it as a dict
.
pydantic/schema.py
Outdated
f_schema = {} | ||
if isinstance(field.type_, type) and issubclass(field.type_, (str, bytes)): | ||
for attr, (t, keyword) in _str_types_attrs.items(): | ||
if getattr(field._schema, attr) and isinstance(getattr(field._schema, attr), t): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call getattr
once not 3 times, also surely isinstance(None, str) == False
anyway?
Thanks for the review! I'll come, check and fix in a couple days. |
pydantic/fields.py
Outdated
def get_annotation_from_schema(annotation, schema): | ||
"""Get an annotation with validation implemented for numbers and strings based on the schema. | ||
|
||
:param annotation: an annotation from a field specification, as ``str``, ``ConstrainedStr`` or the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for or the return value of constr(max_length=10)
since that is a ConstrainedStr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
I had it that ways as I understand the return of constr
would be of type ConstrainedStrValue
, that inherits from ConstranedStr
, because of the:
return type('ConstrainedStrValue', (ConstrainedStr,), namespace)
But I also think that it sounds difficult to understand and would be an overcomplication there in the docstings. So, if it's ok for you to just have up to the ConstrainedStr
, then I'm going to remove that last part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
I've changed the base for this PR so we can merge it without it going into master. Fix the stuff that makes sense above, then I'll work on anything I want to change. Easier than going back and fourth for ever. |
Great. Do you want me to mark as "resolved" the comments in the review that are already done? I'm leaving them there so you can come back, check and confirm, before marking them as resolved. But let me know if you want me to mark the ones that are done and don't have a pending discussion. |
* Add additional parameters to Schema for validation and annotation (#311) * Add tests for validation declared via Schema class in defaults * Add validations to field from declarations in Schema * Add annotations in generated JSON Schema from validations via Schema * Augment tests for schema generation * Simplify validations from Schema in fields, from coverage hints * Update schema test to use the spec plural "examples" * Add docs and simple example of the additional parameters for Schema * Update history * Fix number of PR in HISTORY * Refactor check for numeric types, remove break to make coverage happy * Fix typo in docs, I confused gt with maximum * Finish docstring for Schema (I had forgotten about it) * Implement code review requests and lenient_issubclass with tests * Move Schema to its now file to extract from fields.py but avoid circular imports * Control coverage * Schema fixes (#318) * rearrange code * cleanup get_annotation_from_schema * fix typo * rename _schema to schema
* 3.11 windows builds, fix pydantic#260 * install 3.11 on windows * reconfiguring build matrix * tweak build matrix * another try * fix ci yaml * fix manylinux alt * remove 3.7 on aarch macos * try simplifying build * add more 3.11
Change Summary
Add additional parameters to Schema for validation and annotation, in a single declaration.
Integrates and uses
constr
,conint
,confloat
andcondecimal
. Doesn't change any of the current functionality.As a result, a current model created as:
can also be created as:
by being able to declare
a
as a purestr
makes it simpler for some editors to get the hint that the propertya
is astr
and provide support for it (autocomplete, type error checking, etc).Note
The generated JSON Schema will output validation keywords of
minLength
andmaxLength
in this case.For numbers validations like
gt
it outputsexclusiveMinimum
.Other option for the parameters would be to use the literal name used in JSON Schema, as
exclusiveMinimum
, but it wouldn't be very "Pythonic".And other option would be using JSON Schema names and change them to snake-case, as
exclusive_minimum
.But I imagine the preferred option is to keep the same parameter names used in the rest of the code, as in
conint
, etc. Otherwise, let me know and I'll change it.Specific parameters added to
Schema
description
gt
ge
lt
le
min_length
max_length
regex
Related issue number
I didn't create a specific issue for this. But the changes where discussed here: #291 (comment)
Performance Changes
pydantic cares about performance, if there's any risk performance changed on this PR,
please run
make benchmark-pydantic
before and after the change:Checklist
HISTORY.rst
has been updated#<number>
@whatever