Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional parameters to Schema for validation and annotation #311

Merged

Conversation

tiangolo
Copy link
Collaborator

@tiangolo tiangolo commented Nov 22, 2018

Change Summary

Add additional parameters to Schema for validation and annotation, in a single declaration.

Integrates and uses constr, conint, confloat and condecimal. Doesn't change any of the current functionality.

As a result, a current model created as:

from pydantic import BaseModel, constr, Schema

class Foo(BaseModel):
    a: constr(min_length=2, max_length=10) = Schema('bar')

can also be created as:

from pydantic import BaseModel, Schema

class Foo(BaseModel):
    a: str = Schema('bar', min_length=2, max_length=10)

by being able to declare a as a pure str makes it simpler for some editors to get the hint that the property a is a str and provide support for it (autocomplete, type error checking, etc).

Note

The generated JSON Schema will output validation keywords of minLength and maxLength in this case.

For numbers validations like gt it outputs exclusiveMinimum.

Other option for the parameters would be to use the literal name used in JSON Schema, as exclusiveMinimum, but it wouldn't be very "Pythonic".

And other option would be using JSON Schema names and change them to snake-case, as exclusive_minimum.

But I imagine the preferred option is to keep the same parameter names used in the rest of the code, as in conint, etc. Otherwise, let me know and I'll change it.

Specific parameters added to Schema

  • description
  • gt
  • ge
  • lt
  • le
  • min_length
  • max_length
  • regex

Related issue number

I didn't create a specific issue for this. But the changes where discussed here: #291 (comment)

Performance Changes

pydantic cares about performance, if there's any risk performance changed on this PR,
please run make benchmark-pydantic before and after the change:

  • before:
                                pydantic time=0.775s, success=49.95%
                                pydantic time=0.794s, success=49.95%
                                pydantic time=0.792s, success=49.95%
                                pydantic time=0.773s, success=49.95%
                                pydantic time=0.790s, success=49.95%
                                pydantic best=0.773s, avg=0.785s, stdev=0.010s

                                pydantic best=25.770μs/iter avg=26.155μs/iter stdev=0.332μs/iter
  • after:
                                pydantic time=0.794s, success=49.95%
                                pydantic time=0.801s, success=49.95%
                                pydantic time=0.809s, success=49.95%
                                pydantic time=0.765s, success=49.95%
                                pydantic time=0.797s, success=49.95%
                                pydantic best=0.765s, avg=0.793s, stdev=0.017s

                                pydantic best=25.507μs/iter avg=26.441μs/iter stdev=0.554μs/iter

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI and coverage remains at 100%
  • Documentation reflects the changes
  • No performance deterioration (if applicable)
  • HISTORY.rst has been updated
    • if this is the first change since a release, please add a new section
    • include the issue number or this pull request number #<number>
    • if you're not a regular contributer please include your github username @whatever

@codecov
Copy link

codecov bot commented Nov 22, 2018

Codecov Report

Merging #311 into json-schema-extra will not change coverage.
The diff coverage is 100%.

@@               Coverage Diff                @@
##           json-schema-extra   #311   +/-   ##
================================================
  Coverage                100%   100%           
================================================
  Files                     13     14    +1     
  Lines                   1708   1770   +62     
  Branches                 323    338   +15     
================================================
+ Hits                    1708   1770   +62

:param gt: only applies to numbers, requires the field to be "greater than". The schema
will have an ``exclusiveMinimum`` validation keyword
:param ge: only applies to numbers, requires the field to be "greater than or equal to". The
schema will have a ``minimum`` validation keyword
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subsequent lines need to be indented as above.

schema will have a ``maxLength`` validation keyword
:param regex: only applies to strings, requires the field match agains a regular expression
pattern string. The schema will have a ``pattern`` validation keyword
:param **extra: any additional keyword arguments will be added as is to the schema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still true, is it valid in json-schema?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: Yes, it's still valid JSON Schema. Extensibility by adding extra keywords is part of the standard.


Additionally, there are some keywords that are part of the standard and are not declared as parameters of the Schema class. So, passing them as part of the **extra is the current only option to include them.

For example, the examples keyword could have a list of JSON "objects" (dicts) with examples of valid JSON "documents". ...I might create a new PR later with auto-generation of an example for each generated schema. But for now, they could be declared by hand by developers / Pydantic users using the **extra.

Also, I didn't add all the possible extra keywords as parameters to the Schema class because there are some keywords that would apply a validation that doesn't have an equivalent in Pydantic yet. For example, max and min size of an array (list). So, to not confuse users, I just added as parameters of Schema the things that actually implement a validation on data. And what I think are the most common extra informational (non-validation) fields: title and description.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@@ -36,16 +51,117 @@ class Validator(NamedTuple):

class Schema:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe best to move this into schema.py?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that pydantic now requires dataclasses, why don't we make Schema a dataclass. It's just what they're made for.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll move it to schema.py. Makes sense.


I would like to use dataclasses, and avoid all that code / parameter duplication. The problem is that editors don't seem to support them very well yet. So, completion and type checks still depend on / expect a __init__.

I guess soon, at some point PyCharm, Jedi and Python Language Server will end up supporting them and we'll be able to use them.

But for now, I would say it's better to keep the __init__. Nevertheless, if you want me to, I can add the decorator to make it explicit.

Copy link
Member

@samuelcolvin samuelcolvin Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use dataclasses.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I made it a dataclass with the decorator.

It still has the __init__, let me know if you want me to remove it (although that would limit some of the editor's support 😨 ).


So, I tried moving Schema to schema.py, and I ran into several issues with circular imports. I tried importing the whole module (as it seems to work with some type declarations) and other "tricks". But I didn't find another way to solve it.

I'm putting the Schema class in its own schema_class.py file that is then imported by the rest, to reduce the noise in fields.py while avoiding circular imports. It is still exported by schema.py, but it is actually declared in schema_class.py and imported from there. I would like to put it in schema.py as it seems like the place it should live, but I don't know how to avoid the circular imports.

Let me know if that sounds OK, if you want me to put it back into fields.py ...or what should we do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samuelcolvin

Leave it here and I'll move it.

Oops, I had already moved it to schema_class.py, I'm not sure what you refer to with here. Let me know if you want me to move it somewhere else.

Yes, please remove __init__, that's the whole point in dataclasses. PyCharm does lots of things badly, that shouldn't constrain what we do with python.

So, I was removing the __init__ and leaving it as a pure dataclass, and now I see that we can't declare **kwargs using pure dataclasses. We need __init__ for that. How do you want me to proceed?


@pauleveritt

We (PyCharm) have more dataclasses work scheduled for next release. Let me know some specifics (separate ticket?) and I can get the right person involved.

That sounds great! Where should we have the conversation?

In summary:

from dataclasses import dataclass

@dataclass
class A:
    a: str
    b: int

my_a = A()

When calling my_a = A() we should see autocompletion and type checking for the arguments passed to A() as if the class was declared like:

class A:
    def __init__(self, a: str, b: int):
        self.a = a
        self.b = b

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiangolo Can we move it to another ticket in this repo? Then, if needed, I'll make a ticket in PyCharm repo and get Seymon (the one owning dataclasses) involved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiangolo Can we move it to another ticket in this repo? Then, if needed, I'll make a ticket in PyCharm repo and get Seymon (the one owning dataclasses) involved.

Yeah, I guess. Although it wouldn't be specifically related to this repo... @samuelcolvin would that be ok?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or email me directly, paul dot everitt at jetbrains dot com

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pauleveritt thanks for the interest.

I was writing and documenting everything with screenshots, etc. But during the process, I realized that I had misconfigured the PyCharm environment in which I did the previous tests and that PyCharm actually does support dataclasses. All I was asking is already implemented.

So, up to now, PyCharm actually has the best dataclass support, the rest of the editors are the ones still behind.



def get_annotation_from_schema(annotation, schema):
"""Get an annotation with validation implemented for numbers and strings based on the schema.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(sorry to be a pedant) please move the first line of the docstring onto it's own line.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. Done.

isinstance(annotation, type)
and issubclass(annotation, (str,) + _numeric_types)
and not issubclass(annotation, bool)
and not any([issubclass(annotation, c) for c in _blacklist])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surely add bool to _blacklist, also you don't need square brackets here, indeed square brackets make this line lower by meaning issubclass has to be evaluated for each item in _blacklist even if the first is True



def get_field_schema_validations(field):
"""Get the JSON Schema validation keywords for a ``field`` with an annotation of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again docstrings in the format

    """
    this is the first line.
    """

a Pydantic ``Schema`` with validation arguments.
"""
f_schema = {}
if isinstance(field.type_, type) and issubclass(field.type_, (str, bytes)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do this a lot, should we have a method lenient_issubclass?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.
I added it to utils.py.
Added a couple of tests for it.
Update the code that used that to use the new function.

and not issubclass(annotation, bool)
and not any([issubclass(annotation, c) for c in _blacklist])
):
if issubclass(annotation, str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(str, bytes)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but as I'm calling constr to generate the final type, it would always output str, instead of bytes.

I guess one option would be adding a conbytes, or using the validators directly, like anystr_length_validator, but I wanted to use internally the same functions that already exist.

What do you think we should do about that?

@@ -140,6 +146,38 @@ def field_schema(
return s, f_definitions


numeric_types = (int, float, Decimal)
_str_types_attrs = {
'max_length': (numeric_types, 'maxLength'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe these should be 3 element tuples?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I was initially planning on using it differently. But now it doesn't make sense to have it as a dict.

f_schema = {}
if isinstance(field.type_, type) and issubclass(field.type_, (str, bytes)):
for attr, (t, keyword) in _str_types_attrs.items():
if getattr(field._schema, attr) and isinstance(getattr(field._schema, attr), t):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call getattr once not 3 times, also surely isinstance(None, str) == False anyway?

@tiangolo
Copy link
Collaborator Author

Thanks for the review! I'll come, check and fix in a couple days.

def get_annotation_from_schema(annotation, schema):
"""Get an annotation with validation implemented for numbers and strings based on the schema.

:param annotation: an annotation from a field specification, as ``str``, ``ConstrainedStr`` or the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for or the return value of constr(max_length=10) since that is a ConstrainedStr

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

I had it that ways as I understand the return of constr would be of type ConstrainedStrValue, that inherits from ConstranedStr, because of the:

return type('ConstrainedStrValue', (ConstrainedStr,), namespace)

But I also think that it sounds difficult to understand and would be an overcomplication there in the docstings. So, if it's ok for you to just have up to the ConstrainedStr, then I'm going to remove that last part.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@samuelcolvin samuelcolvin changed the base branch from master to json-schema-extra November 28, 2018 13:49
@samuelcolvin
Copy link
Member

I've changed the base for this PR so we can merge it without it going into master.

Fix the stuff that makes sense above, then I'll work on anything I want to change. Easier than going back and fourth for ever.

@tiangolo
Copy link
Collaborator Author

I've changed the base for this PR so we can merge it without it going into master.

Fix the stuff that makes sense above, then I'll work on anything I want to change. Easier than going back and fourth for ever.

Great.


Do you want me to mark as "resolved" the comments in the review that are already done?

I'm leaving them there so you can come back, check and confirm, before marking them as resolved. But let me know if you want me to mark the ones that are done and don't have a pending discussion.

@samuelcolvin samuelcolvin mentioned this pull request Nov 30, 2018
@samuelcolvin samuelcolvin merged commit 38fba73 into pydantic:json-schema-extra Dec 3, 2018
samuelcolvin added a commit that referenced this pull request Dec 3, 2018
* Add additional parameters to Schema for validation and annotation (#311)

* Add tests for validation declared via Schema class in defaults

* Add validations to field from declarations in Schema

* Add annotations in generated JSON Schema from validations via Schema

* Augment tests for schema generation

* Simplify validations from Schema in fields, from coverage hints

* Update schema test to use the spec plural "examples"

* Add docs and simple example of the additional parameters for Schema

* Update history

* Fix number of PR in HISTORY

* Refactor check for numeric types, remove break to make coverage happy

* Fix typo in docs, I confused gt with maximum

* Finish docstring for Schema (I had forgotten about it)

* Implement code review requests and lenient_issubclass with tests

* Move Schema to its now file to extract from fields.py but avoid circular imports

* Control coverage

* Schema fixes (#318)

* rearrange code

* cleanup get_annotation_from_schema

* fix typo

* rename _schema to schema
alexdrydew pushed a commit to alexdrydew/pydantic that referenced this pull request Dec 23, 2023
* 3.11 windows builds, fix pydantic#260

* install 3.11 on windows

* reconfiguring build matrix

* tweak build matrix

* another try

* fix ci yaml

* fix manylinux alt

* remove 3.7 on aarch macos

* try simplifying build

* add more 3.11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants