fix: convert python tuples into lists #312

blacha · 2021-11-01T22:15:32Z

We use tuples for geographic coordinates and bounding boxes which doesn't seem to have a direct mapping in JSON Schema.

This pull requests converts the tuple into a list to fix ValueError: Unsupported type: 'tuple' when validating a object that contains a tuple.

{ "bbox": (1,2,3,4) }

bindings/python/src/ser.rs

bindings/python/src/types.rs

codecov · 2021-11-01T22:27:21Z

Codecov Report

Merging #312 (0ada8f3) into master (80c5af9) will decrease coverage by 0.30%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #312      +/-   ##
==========================================
- Coverage   81.88%   81.58%   -0.31%     
==========================================
  Files          57       57              
  Lines        5494     5513      +19     
==========================================
- Hits         4499     4498       -1     
- Misses        995     1015      +20

Impacted Files	Coverage Δ
jsonschema/src/output.rs	`72.16% <0.00%> (-17.58%)`	⬇️
jsonschema/src/error.rs	`62.56% <0.00%> (-0.28%)`	⬇️
jsonschema/src/lib.rs	`100.00% <0.00%> (ø)`
jsonschema/src/compilation/mod.rs	`85.92% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80c5af9...0ada8f3. Read the comment docs.

Stranger6667 · 2021-11-02T12:49:48Z

Hi @blacha

Thanks for opening a PR! Though I clearly see the value of this change for your use case, I think it might be a problematic change for the library as a whole. Unfortunately, I didn't share the main design points anywhere, so I'll try to put it here (and will later put it in the docs).

As you pointed out, tuple doesn't have a direct equivalent in JSON, though it feels like a list in a general sense. The core thing here is to avoid implicit type conversions, so it is more conservative by default to avoid surprising behavior.
Compatibility with jsonschema in its Python-specific idiosyncrasies. jsonschema ignores tuples - if somebody will switch from jsonschema to jsonschema_rs it might be confusing as tuples will be converted to lists & validation will be performed. It is not necessarily a bad thing, as it will uncover that non-JSONSchema types were used.

So, my main intention is to keep the library conservative to its inputs by default so it is less surprising and more explicit about deviations from the JSON Schema spec.

However, I think that we still can add this change to the library but put it behind a feature flag, similarly to what orjson is doing (feature flags for supporting subclasses, out-of-range int values, etc).

So, the end-user API might look like this:

jsonschema_rs.validate({"items": {"type": "integer"}}, [1, "2"], option=jsonschema_rs.OPT_TUPLE_AS_LIST)

compiled = jsonschema_rs.JSONSchema({"items": {"type": "integer"}}, option=jsonschema_rs.OPT_TUPLE_AS_LIST)

What do you think?

blacha · 2021-11-02T20:29:13Z

Hi @blacha

Thanks for opening a PR! Though I clearly see the value of this change for your use case, I think it might be a problematic change for the library as a whole. Unfortunately, I didn't share the main design points anywhere, so I'll try to put it here (and will later put it in the docs).

As you pointed out, tuple doesn't have a direct equivalent in JSON, though it feels like a list in a general sense. The core thing here is to avoid implicit type conversions, so it is more conservative by default to avoid surprising behavior.

Compatibility with jsonschema in its Python-specific idiosyncrasies. jsonschema ignores tuples - if somebody will switch from jsonschema to jsonschema_rs it might be confusing as tuples will be converted to lists & validation will be performed. It is not necessarily a bad thing, as it will uncover that non-JSONSchema types were used.

So, my main intention is to keep the library conservative to its inputs by default so it is less surprising and more explicit about deviations from the JSON Schema spec.

However, I think that we still can add this change to the library but put it behind a feature flag, similarly to what orjson is doing (feature flags for supporting subclasses, out-of-range int values, etc).

So, the end-user API might look like this:
jsonschema_rs.validate({"items": {"type": "integer"}}, [1, "2"], option=jsonschema_rs.OPT_TUPLE_AS_LIST)

compiled = jsonschema_rs.JSONSchema({"items": {"type": "integer"}}, option=jsonschema_rs.OPT_TUPLE_AS_LIST)
What do you think?

We are coming from fastjsonschema which does this conversion automatically horejsek/python-fastjsonschema#87

I do see that jsonschema does follow a similar pattern to the configuration options you are talking about python-jsonschema/jsonschema#148 it does have a much more complicated type setting option

I am for the automatic conversion of tuples to arrays, as there is no concept of tuples in JSON. If I was to take a tuple dict and seralize it to JSON using json.dumps then parse it as JSON json.loads then validate it would work, as the serialisation process automatically converts tuples into arrays

import json
json.dumps({"foo": (1,2,3)})
# '{"foo": [1, 2, 3]}'

validate(json.loads(json.dumps({"foo": (1,2,3)})), schema) # Works

It looks somewhat easy to plumb the configuration option into the serialiser via the ser::to_value function.

Interested in your thoughts if its worth the effort to create a config for this

Stranger6667 · 2021-11-03T14:34:55Z

Generally, I am somewhat hesitant to follow Python's json behavior as it serializes Inf / -Inf / nan which do not exist in JSON (which leads to data that can't be parsed by other languages). But specifically, in this case, I do agree that it would be convenient to have this kind of automatic conversion, especially as the scope is limited to Python bindings.

It looks somewhat easy to plumb the configuration option into the serialiser via the ser::to_value function.
Interested in your thoughts if its worth the effort to create a config for this

I think so, yes. The to_value function might accept u8 (a bitmask for enabled features) and then decide during the serialization process whether to convert tuples or not. As it is a private implementation detail, we can use u8 for now as we don't have many config options and we'll always can extend it to u16 in the future if needed.

On the public API side, I think it could be the orjson approach with a single options argument (which is also similar to Python's re config options).

Indeed, it should not be a big deal to extend the current public API to accept a new argument, process it, and forward the value to to_value.

Let me know if I can help with the implementation!

bindings/python/src/ser.rs

bindings/python/tests-py/test_jsonschema.py

blacha · 2021-11-03T19:49:28Z

Generally, I am somewhat hesitant to follow Python's json behavior as it serializes Inf / -Inf / nan which do not exist in JSON (which leads to data that can't be parsed by other languages).

I had no idea that python's json.dumps does that! that is really poor.

But specifically, in this case, I do agree that it would be convenient to have this kind of automatic conversion, especially as the scope is limited to Python bindings.

I really think at least having this the default it would save a lot of issues in the long run, maybe a way to turn it off via options?

Yesterday I found this library using the dumps/loads to work around the tuple problem: https://github.com/stac-utils/pystac/blob/main/pystac/validation/stac_validator.py#L100

I think so, yes. The to_value function might accept u8 (a bitmask for enabled features) and then decide during the serialization process whether to convert tuples or not. As it is a private implementation detail, we can use u8 for now as we don't have many config options and we'll always can extend it to u16 in the future if needed.

Ill have a go at making it an optional bitmask today.

blacha · 2021-11-03T20:22:52Z

@Stranger6667 Playing with orjson it automaticallys convert tuples by default

import orjson

print(orjson.dumps({"foo": (1,2,3)}))
# b'{"foo":[1,2,3]}'

print(orjson.dumps({"foo": [1,2,3]}))
# b'{"foo":[1,2,3]}'

It only avoids the named tuple type

from collections import namedtuple
Person = namedtuple('Person', 'first_name last_name')
person_a = Person('Joe', 'Smith')

print(orjson.dumps(person_a)) # Dies with TypeError: Type is not JSON serializable: Person

Which makes me think it should just by default convert, I'll add a test for named tuples.

bindings/python/tests-py/test_jsonschema.py

Stranger6667 · 2021-11-04T10:59:35Z

@blacha

Good point! I didn't know that orjson does it by default. Lets make it the default behavior then :)

Stranger6667 · 2021-11-04T11:04:15Z

Also, we might get into the bitmask options later :)

blacha · 2021-11-04T11:08:14Z

Also, we might get into the bitmask options later :)

haha, I enjoy reverse engineering old games, so I love it when I get a chance to put a bitmask somewhere :)

however if we just go for the default for tuples as lists I'm not really seeing a place where we need to turn that off, so to keep this PR somewhat small maybe the options logic could be added later?

Stranger6667 · 2021-11-04T11:11:52Z

haha, I enjoy reverse engineering old games, so I love it when I get a chance to put a bitmask somewhere :)
however if we just go for the default for tuples as lists I'm not really seeing a place where we need to turn that off, so to keep this PR somewhat small maybe the options logic could be added later?

Yep, exactly :) It would be better to implement the tuple conversion for now and then if it will be needed, we can always get back into implementing those bitmask options later :)

Stranger6667 · 2021-11-04T11:29:42Z

Thanks! I'll make a new release soon

fix: convert tuples into lists

a5cc8b6

blacha commented Nov 1, 2021

View reviewed changes

bindings/python/src/ser.rs Show resolved Hide resolved

blacha commented Nov 1, 2021

View reviewed changes

bindings/python/src/types.rs Outdated Show resolved Hide resolved

refactor: avoid serialization logic duplication

a71d037

Stranger6667 reviewed Nov 3, 2021

View reviewed changes

bindings/python/src/ser.rs Outdated Show resolved Hide resolved

bindings/python/tests-py/test_jsonschema.py Outdated Show resolved Hide resolved

blacha added 2 commits November 4, 2021 08:52

refactor: convert tuple size to 0

d042801

refactor: expand tests around arrays and tuples

8df27d1

test: validate that named tuples error

c757370

blacha commented Nov 3, 2021

View reviewed changes

bindings/python/tests-py/test_jsonschema.py Show resolved Hide resolved

refactor: split list and tuple serialization

0ada8f3

Stranger6667 approved these changes Nov 4, 2021

View reviewed changes

Stranger6667 merged commit 9867b38 into Stranger6667:master Nov 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: convert python tuples into lists #312

fix: convert python tuples into lists #312

blacha commented Nov 1, 2021

codecov bot commented Nov 1, 2021 •

edited

Loading

Stranger6667 commented Nov 2, 2021

blacha commented Nov 2, 2021

Stranger6667 commented Nov 3, 2021

blacha commented Nov 3, 2021

blacha commented Nov 3, 2021 •

edited

Loading

Stranger6667 commented Nov 4, 2021

Stranger6667 commented Nov 4, 2021

blacha commented Nov 4, 2021

Stranger6667 commented Nov 4, 2021

Stranger6667 commented Nov 4, 2021

fix: convert python tuples into lists #312

fix: convert python tuples into lists #312

Conversation

blacha commented Nov 1, 2021

codecov bot commented Nov 1, 2021 • edited Loading

Codecov Report

Stranger6667 commented Nov 2, 2021

blacha commented Nov 2, 2021

Stranger6667 commented Nov 3, 2021

blacha commented Nov 3, 2021

blacha commented Nov 3, 2021 • edited Loading

Stranger6667 commented Nov 4, 2021

Stranger6667 commented Nov 4, 2021

blacha commented Nov 4, 2021

Stranger6667 commented Nov 4, 2021

Stranger6667 commented Nov 4, 2021

codecov bot commented Nov 1, 2021 •

edited

Loading

blacha commented Nov 3, 2021 •

edited

Loading