Add additional parameters to Schema for validation and annotation #311

tiangolo · 2018-11-22T22:26:00Z

Change Summary

Add additional parameters to Schema for validation and annotation, in a single declaration.

Integrates and uses constr, conint, confloat and condecimal. Doesn't change any of the current functionality.

As a result, a current model created as:

from pydantic import BaseModel, constr, Schema

class Foo(BaseModel):
    a: constr(min_length=2, max_length=10) = Schema('bar')

can also be created as:

from pydantic import BaseModel, Schema

class Foo(BaseModel):
    a: str = Schema('bar', min_length=2, max_length=10)

by being able to declare a as a pure str makes it simpler for some editors to get the hint that the property a is a str and provide support for it (autocomplete, type error checking, etc).

Note

The generated JSON Schema will output validation keywords of minLength and maxLength in this case.

For numbers validations like gt it outputs exclusiveMinimum.

Other option for the parameters would be to use the literal name used in JSON Schema, as exclusiveMinimum, but it wouldn't be very "Pythonic".

And other option would be using JSON Schema names and change them to snake-case, as exclusive_minimum.

But I imagine the preferred option is to keep the same parameter names used in the rest of the code, as in conint, etc. Otherwise, let me know and I'll change it.

Specific parameters added to `Schema`

description
gt
ge
lt
le
min_length
max_length
regex

Related issue number

I didn't create a specific issue for this. But the changes where discussed here: #291 (comment)

Performance Changes

pydantic cares about performance, if there's any risk performance changed on this PR,
please run make benchmark-pydantic before and after the change:

before:

                                pydantic time=0.775s, success=49.95%
                                pydantic time=0.794s, success=49.95%
                                pydantic time=0.792s, success=49.95%
                                pydantic time=0.773s, success=49.95%
                                pydantic time=0.790s, success=49.95%
                                pydantic best=0.773s, avg=0.785s, stdev=0.010s

                                pydantic best=25.770μs/iter avg=26.155μs/iter stdev=0.332μs/iter

after:

                                pydantic time=0.794s, success=49.95%
                                pydantic time=0.801s, success=49.95%
                                pydantic time=0.809s, success=49.95%
                                pydantic time=0.765s, success=49.95%
                                pydantic time=0.797s, success=49.95%
                                pydantic best=0.765s, avg=0.793s, stdev=0.017s

                                pydantic best=25.507μs/iter avg=26.441μs/iter stdev=0.554μs/iter

Checklist

Unit tests for the changes exist
Tests pass on CI and coverage remains at 100%
Documentation reflects the changes
No performance deterioration (if applicable)
HISTORY.rst has been updated
- if this is the first change since a release, please add a new section
- include the issue number or this pull request number #<number>
- if you're not a regular contributer please include your github username @whatever

codecov · 2018-11-22T22:28:36Z

Codecov Report

Merging #311 into json-schema-extra will not change coverage.
The diff coverage is 100%.

@@               Coverage Diff                @@
##           json-schema-extra   #311   +/-   ##
================================================
  Coverage                100%   100%           
================================================
  Files                     13     14    +1     
  Lines                   1708   1770   +62     
  Branches                 323    338   +15     
================================================
+ Hits                    1708   1770   +62

samuelcolvin · 2018-11-26T15:00:14Z

pydantic/fields.py

+    :param gt: only applies to numbers, requires the field to be "greater than". The schema
+      will have an ``exclusiveMinimum`` validation keyword
+    :param ge: only applies to numbers, requires the field to be "greater than or equal to". The
+    schema will have a ``minimum`` validation keyword


subsequent lines need to be indented as above.

samuelcolvin · 2018-11-26T15:00:43Z

pydantic/fields.py

+    schema will have a ``maxLength`` validation keyword
+    :param regex: only applies to strings, requires the field match agains a regular expression
+    pattern string. The schema will have a ``pattern`` validation keyword
+    :param **extra: any additional keyword arguments will be added as is to the schema


is this still true, is it valid in json-schema?

TL;DR: Yes, it's still valid JSON Schema. Extensibility by adding extra keywords is part of the standard.

Additionally, there are some keywords that are part of the standard and are not declared as parameters of the Schema class. So, passing them as part of the **extra is the current only option to include them.

For example, the examples keyword could have a list of JSON "objects" (dicts) with examples of valid JSON "documents". ...I might create a new PR later with auto-generation of an example for each generated schema. But for now, they could be declared by hand by developers / Pydantic users using the **extra.

Also, I didn't add all the possible extra keywords as parameters to the Schema class because there are some keywords that would apply a validation that doesn't have an equivalent in Pydantic yet. For example, max and min size of an array (list). So, to not confuse users, I just added as parameters of Schema the things that actually implement a validation on data. And what I think are the most common extra informational (non-validation) fields: title and description.

samuelcolvin · 2018-11-26T15:00:55Z

pydantic/fields.py

@@ -36,16 +51,117 @@ class Validator(NamedTuple):

 class Schema:


Maybe best to move this into schema.py?

Given that pydantic now requires dataclasses, why don't we make Schema a dataclass. It's just what they're made for.

Sure, I'll move it to schema.py. Makes sense.

I would like to use dataclasses, and avoid all that code / parameter duplication. The problem is that editors don't seem to support them very well yet. So, completion and type checks still depend on / expect a __init__.

I guess soon, at some point PyCharm, Jedi and Python Language Server will end up supporting them and we'll be able to use them.

But for now, I would say it's better to keep the __init__. Nevertheless, if you want me to, I can add the decorator to make it explicit.

Let's use dataclasses.

Done. I made it a dataclass with the decorator.

It still has the __init__, let me know if you want me to remove it (although that would limit some of the editor's support 😨 ).

So, I tried moving Schema to schema.py, and I ran into several issues with circular imports. I tried importing the whole module (as it seems to work with some type declarations) and other "tricks". But I didn't find another way to solve it.

I'm putting the Schema class in its own schema_class.py file that is then imported by the rest, to reduce the noise in fields.py while avoiding circular imports. It is still exported by schema.py, but it is actually declared in schema_class.py and imported from there. I would like to put it in schema.py as it seems like the place it should live, but I don't know how to avoid the circular imports.

Let me know if that sounds OK, if you want me to put it back into fields.py ...or what should we do.

@samuelcolvin

Leave it here and I'll move it.

Oops, I had already moved it to schema_class.py, I'm not sure what you refer to with here. Let me know if you want me to move it somewhere else.

Yes, please remove __init__, that's the whole point in dataclasses. PyCharm does lots of things badly, that shouldn't constrain what we do with python.

So, I was removing the __init__ and leaving it as a pure dataclass, and now I see that we can't declare **kwargs using pure dataclasses. We need __init__ for that. How do you want me to proceed?

@pauleveritt

We (PyCharm) have more dataclasses work scheduled for next release. Let me know some specifics (separate ticket?) and I can get the right person involved.

That sounds great! Where should we have the conversation?

In summary:

from dataclasses import dataclass @dataclass class A: a: str b: int my_a = A()

When calling my_a = A() we should see autocompletion and type checking for the arguments passed to A() as if the class was declared like:

class A: def __init__(self, a: str, b: int): self.a = a self.b = b

@tiangolo Can we move it to another ticket in this repo? Then, if needed, I'll make a ticket in PyCharm repo and get Seymon (the one owning dataclasses) involved.

@tiangolo Can we move it to another ticket in this repo? Then, if needed, I'll make a ticket in PyCharm repo and get Seymon (the one owning dataclasses) involved.

Yeah, I guess. Although it wouldn't be specifically related to this repo... @samuelcolvin would that be ok?

Or email me directly, paul dot everitt at jetbrains dot com

@pauleveritt thanks for the interest.

I was writing and documenting everything with screenshots, etc. But during the process, I realized that I had misconfigured the PyCharm environment in which I did the previous tests and that PyCharm actually does support dataclasses. All I was asking is already implemented.

So, up to now, PyCharm actually has the best dataclass support, the rest of the editors are the ones still behind.

samuelcolvin · 2018-11-26T15:02:49Z

pydantic/fields.py

+
+
+def get_annotation_from_schema(annotation, schema):
+    """Get an annotation with validation implemented for numbers and strings based on the schema.


(sorry to be a pedant) please move the first line of the docstring onto it's own line.

No worries. Done.

samuelcolvin · 2018-11-26T15:05:01Z

pydantic/fields.py

+        isinstance(annotation, type)
+        and issubclass(annotation, (str,) + _numeric_types)
+        and not issubclass(annotation, bool)
+        and not any([issubclass(annotation, c) for c in _blacklist])


surely add bool to _blacklist, also you don't need square brackets here, indeed square brackets make this line lower by meaning issubclass has to be evaluated for each item in _blacklist even if the first is True

samuelcolvin · 2018-11-26T15:18:50Z

pydantic/schema.py

+
+
+def get_field_schema_validations(field):
+    """Get the JSON Schema validation keywords for a ``field`` with an annotation of


again docstrings in the format

""" this is the first line. """

samuelcolvin · 2018-11-26T15:19:28Z

pydantic/schema.py

+      a Pydantic ``Schema`` with validation arguments.
+    """
+    f_schema = {}
+    if isinstance(field.type_, type) and issubclass(field.type_, (str, bytes)):


we do this a lot, should we have a method lenient_issubclass?

Agreed.
I added it to utils.py.
Added a couple of tests for it.
Update the code that used that to use the new function.

samuelcolvin · 2018-11-26T15:20:29Z

pydantic/fields.py

+        and not issubclass(annotation, bool)
+        and not any([issubclass(annotation, c) for c in _blacklist])
+    ):
+        if issubclass(annotation, str):


(str, bytes)?

Yeah, but as I'm calling constr to generate the final type, it would always output str, instead of bytes.

I guess one option would be adding a conbytes, or using the validators directly, like anystr_length_validator, but I wanted to use internally the same functions that already exist.

What do you think we should do about that?

samuelcolvin · 2018-11-26T15:21:49Z

pydantic/schema.py

@@ -140,6 +146,38 @@ def field_schema(
        return s, f_definitions


+numeric_types = (int, float, Decimal)
+_str_types_attrs = {
+    'max_length': (numeric_types, 'maxLength'),


maybe these should be 3 element tuples?

You're right. I was initially planning on using it differently. But now it doesn't make sense to have it as a dict.

samuelcolvin · 2018-11-26T15:23:08Z

pydantic/schema.py

+    f_schema = {}
+    if isinstance(field.type_, type) and issubclass(field.type_, (str, bytes)):
+        for attr, (t, keyword) in _str_types_attrs.items():
+            if getattr(field._schema, attr) and isinstance(getattr(field._schema, attr), t):


call getattr once not 3 times, also surely isinstance(None, str) == False anyway?

tiangolo · 2018-11-26T16:47:13Z

Thanks for the review! I'll come, check and fix in a couple days.

samuelcolvin · 2018-11-28T12:32:07Z

pydantic/fields.py

+def get_annotation_from_schema(annotation, schema):
+    """Get an annotation with validation implemented for numbers and strings based on the schema.
+
+    :param annotation: an annotation from a field specification, as ``str``, ``ConstrainedStr`` or the


no need for or the return value of constr(max_length=10) since that is a ConstrainedStr

Sure.

I had it that ways as I understand the return of constr would be of type ConstrainedStrValue, that inherits from ConstranedStr, because of the:

return type('ConstrainedStrValue', (ConstrainedStr,), namespace)

But I also think that it sounds difficult to understand and would be an overcomplication there in the docstings. So, if it's ok for you to just have up to the ConstrainedStr, then I'm going to remove that last part.

samuelcolvin · 2018-11-28T13:50:31Z

I've changed the base for this PR so we can merge it without it going into master.

Fix the stuff that makes sense above, then I'll work on anything I want to change. Easier than going back and fourth for ever.

…lar imports

tiangolo · 2018-11-28T15:20:50Z

I've changed the base for this PR so we can merge it without it going into master.

Fix the stuff that makes sense above, then I'll work on anything I want to change. Easier than going back and fourth for ever.

Great.

Do you want me to mark as "resolved" the comments in the review that are already done?

I'm leaving them there so you can come back, check and confirm, before marking them as resolved. But let me know if you want me to mark the ones that are done and don't have a pending discussion.

* Add additional parameters to Schema for validation and annotation (#311) * Add tests for validation declared via Schema class in defaults * Add validations to field from declarations in Schema * Add annotations in generated JSON Schema from validations via Schema * Augment tests for schema generation * Simplify validations from Schema in fields, from coverage hints * Update schema test to use the spec plural "examples" * Add docs and simple example of the additional parameters for Schema * Update history * Fix number of PR in HISTORY * Refactor check for numeric types, remove break to make coverage happy * Fix typo in docs, I confused gt with maximum * Finish docstring for Schema (I had forgotten about it) * Implement code review requests and lenient_issubclass with tests * Move Schema to its now file to extract from fields.py but avoid circular imports * Control coverage * Schema fixes (#318) * rearrange code * cleanup get_annotation_from_schema * fix typo * rename _schema to schema

* 3.11 windows builds, fix pydantic#260 * install 3.11 on windows * reconfiguring build matrix * tweak build matrix * another try * fix ci yaml * fix manylinux alt * remove 3.7 on aarch macos * try simplifying build * add more 3.11

tiangolo added 9 commits November 23, 2018 01:17

Add tests for validation declared via Schema class in defaults

f0a0373

Add validations to field from declarations in Schema

e92b045

Add annotations in generated JSON Schema from validations via Schema

14f9a2b

Augment tests for schema generation

a3f36d5

Simplify validations from Schema in fields, from coverage hints

0e9d136

Update schema test to use the spec plural "examples"

43a448a

Add docs and simple example of the additional parameters for Schema

5dee3de

Update history

f7b256b

Fix number of PR in HISTORY

fa4a5a2

tiangolo added 3 commits November 23, 2018 02:32

Refactor check for numeric types, remove break to make coverage happy

108227d

Fix typo in docs, I confused gt with maximum

e71b678

Finish docstring for Schema (I had forgotten about it)

fbfde36

samuelcolvin requested changes Nov 26, 2018

View reviewed changes

samuelcolvin reviewed Nov 28, 2018

View reviewed changes

samuelcolvin changed the base branch from master to json-schema-extra November 28, 2018 13:49

tiangolo added 2 commits November 28, 2018 17:51

Implement code review requests and lenient_issubclass with tests

c8fcfd8

Move Schema to its now file to extract from fields.py but avoid circu…

b66e1b7

…lar imports

Control coverage

fee0d3c

samuelcolvin mentioned this pull request Nov 30, 2018

conbytes #315

Closed

samuelcolvin merged commit 38fba73 into pydantic:json-schema-extra Dec 3, 2018

This was referenced Dec 3, 2018

Schema fixes #318

Merged

Add additional parameters to Schema for validation and annotation #319

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add additional parameters to Schema for validation and annotation #311

Add additional parameters to Schema for validation and annotation #311

tiangolo commented Nov 22, 2018 •

edited

Loading

codecov bot commented Nov 22, 2018 •

edited

Loading

samuelcolvin Nov 26, 2018

samuelcolvin Nov 26, 2018

tiangolo Nov 27, 2018

samuelcolvin Nov 27, 2018

samuelcolvin Nov 26, 2018

samuelcolvin Nov 26, 2018

tiangolo Nov 27, 2018

samuelcolvin Nov 27, 2018 •

edited

Loading

tiangolo Nov 28, 2018

tiangolo Nov 28, 2018

pauleveritt Nov 28, 2018

tiangolo Nov 28, 2018

pauleveritt Nov 28, 2018

tiangolo Nov 30, 2018

samuelcolvin Nov 26, 2018

tiangolo Nov 28, 2018

samuelcolvin Nov 26, 2018

samuelcolvin Nov 26, 2018

samuelcolvin Nov 26, 2018

tiangolo Nov 28, 2018

samuelcolvin Nov 26, 2018

tiangolo Nov 28, 2018

samuelcolvin Nov 26, 2018

tiangolo Nov 28, 2018

samuelcolvin Nov 26, 2018

tiangolo commented Nov 26, 2018

samuelcolvin Nov 28, 2018

tiangolo Nov 28, 2018

samuelcolvin Nov 28, 2018

samuelcolvin commented Nov 28, 2018

tiangolo commented Nov 28, 2018

		@@ -36,16 +51,117 @@ class Validator(NamedTuple):

		class Schema:



		def get_annotation_from_schema(annotation, schema):
		"""Get an annotation with validation implemented for numbers and strings based on the schema.



		def get_field_schema_validations(field):
		"""Get the JSON Schema validation keywords for a ``field`` with an annotation of

Add additional parameters to Schema for validation and annotation #311

Add additional parameters to Schema for validation and annotation #311

Conversation

tiangolo commented Nov 22, 2018 • edited Loading

Change Summary

Note

Specific parameters added to Schema

Related issue number

Performance Changes

Checklist

codecov bot commented Nov 22, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samuelcolvin Nov 27, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiangolo commented Nov 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samuelcolvin commented Nov 28, 2018

tiangolo commented Nov 28, 2018

tiangolo commented Nov 22, 2018 •

edited

Loading

Specific parameters added to `Schema`

codecov bot commented Nov 22, 2018 •

edited

Loading

samuelcolvin Nov 27, 2018 •

edited

Loading