Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: convert python tuples into lists #312

Merged
merged 6 commits into from
Nov 4, 2021

Conversation

blacha
Copy link
Contributor

@blacha blacha commented Nov 1, 2021

We use tuples for geographic coordinates and bounding boxes which doesn't seem to have a direct mapping in JSON Schema.

This pull requests converts the tuple into a list to fix ValueError: Unsupported type: 'tuple' when validating a object that contains a tuple.

{ "bbox": (1,2,3,4) }

bindings/python/src/types.rs Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Nov 1, 2021

Codecov Report

Merging #312 (0ada8f3) into master (80c5af9) will decrease coverage by 0.30%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #312      +/-   ##
==========================================
- Coverage   81.88%   81.58%   -0.31%     
==========================================
  Files          57       57              
  Lines        5494     5513      +19     
==========================================
- Hits         4499     4498       -1     
- Misses        995     1015      +20     
Impacted Files Coverage Δ
jsonschema/src/output.rs 72.16% <0.00%> (-17.58%) ⬇️
jsonschema/src/error.rs 62.56% <0.00%> (-0.28%) ⬇️
jsonschema/src/lib.rs 100.00% <0.00%> (ø)
jsonschema/src/compilation/mod.rs 85.92% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80c5af9...0ada8f3. Read the comment docs.

@Stranger6667
Copy link
Owner

Hi @blacha

Thanks for opening a PR! Though I clearly see the value of this change for your use case, I think it might be a problematic change for the library as a whole. Unfortunately, I didn't share the main design points anywhere, so I'll try to put it here (and will later put it in the docs).

  1. As you pointed out, tuple doesn't have a direct equivalent in JSON, though it feels like a list in a general sense. The core thing here is to avoid implicit type conversions, so it is more conservative by default to avoid surprising behavior.
  2. Compatibility with jsonschema in its Python-specific idiosyncrasies. jsonschema ignores tuples - if somebody will switch from jsonschema to jsonschema_rs it might be confusing as tuples will be converted to lists & validation will be performed. It is not necessarily a bad thing, as it will uncover that non-JSONSchema types were used.

So, my main intention is to keep the library conservative to its inputs by default so it is less surprising and more explicit about deviations from the JSON Schema spec.

However, I think that we still can add this change to the library but put it behind a feature flag, similarly to what orjson is doing (feature flags for supporting subclasses, out-of-range int values, etc).

So, the end-user API might look like this:

jsonschema_rs.validate({"items": {"type": "integer"}}, [1, "2"], option=jsonschema_rs.OPT_TUPLE_AS_LIST)

compiled = jsonschema_rs.JSONSchema({"items": {"type": "integer"}}, option=jsonschema_rs.OPT_TUPLE_AS_LIST)

What do you think?

@blacha
Copy link
Contributor Author

blacha commented Nov 2, 2021

Hi @blacha

Thanks for opening a PR! Though I clearly see the value of this change for your use case, I think it might be a problematic change for the library as a whole. Unfortunately, I didn't share the main design points anywhere, so I'll try to put it here (and will later put it in the docs).

  1. As you pointed out, tuple doesn't have a direct equivalent in JSON, though it feels like a list in a general sense. The core thing here is to avoid implicit type conversions, so it is more conservative by default to avoid surprising behavior.
  2. Compatibility with jsonschema in its Python-specific idiosyncrasies. jsonschema ignores tuples - if somebody will switch from jsonschema to jsonschema_rs it might be confusing as tuples will be converted to lists & validation will be performed. It is not necessarily a bad thing, as it will uncover that non-JSONSchema types were used.

So, my main intention is to keep the library conservative to its inputs by default so it is less surprising and more explicit about deviations from the JSON Schema spec.

However, I think that we still can add this change to the library but put it behind a feature flag, similarly to what orjson is doing (feature flags for supporting subclasses, out-of-range int values, etc).

So, the end-user API might look like this:

jsonschema_rs.validate({"items": {"type": "integer"}}, [1, "2"], option=jsonschema_rs.OPT_TUPLE_AS_LIST)

compiled = jsonschema_rs.JSONSchema({"items": {"type": "integer"}}, option=jsonschema_rs.OPT_TUPLE_AS_LIST)

What do you think?

We are coming from fastjsonschema which does this conversion automatically horejsek/python-fastjsonschema#87

I do see that jsonschema does follow a similar pattern to the configuration options you are talking about python-jsonschema/jsonschema#148 it does have a much more complicated type setting option

I am for the automatic conversion of tuples to arrays, as there is no concept of tuples in JSON. If I was to take a tuple dict and seralize it to JSON using json.dumps then parse it as JSON json.loads then validate it would work, as the serialisation process automatically converts tuples into arrays

import json
json.dumps({"foo": (1,2,3)})
# '{"foo": [1, 2, 3]}'

validate(json.loads(json.dumps({"foo": (1,2,3)})), schema) # Works

It looks somewhat easy to plumb the configuration option into the serialiser via the ser::to_value function.

Interested in your thoughts if its worth the effort to create a config for this

@Stranger6667
Copy link
Owner

Generally, I am somewhat hesitant to follow Python's json behavior as it serializes Inf / -Inf / nan which do not exist in JSON (which leads to data that can't be parsed by other languages). But specifically, in this case, I do agree that it would be convenient to have this kind of automatic conversion, especially as the scope is limited to Python bindings.

It looks somewhat easy to plumb the configuration option into the serialiser via the ser::to_value function.
Interested in your thoughts if its worth the effort to create a config for this

I think so, yes. The to_value function might accept u8 (a bitmask for enabled features) and then decide during the serialization process whether to convert tuples or not. As it is a private implementation detail, we can use u8 for now as we don't have many config options and we'll always can extend it to u16 in the future if needed.

On the public API side, I think it could be the orjson approach with a single options argument (which is also similar to Python's re config options).

Indeed, it should not be a big deal to extend the current public API to accept a new argument, process it, and forward the value to to_value.

Let me know if I can help with the implementation!

bindings/python/src/ser.rs Outdated Show resolved Hide resolved
bindings/python/tests-py/test_jsonschema.py Outdated Show resolved Hide resolved
@blacha
Copy link
Contributor Author

blacha commented Nov 3, 2021

Generally, I am somewhat hesitant to follow Python's json behavior as it serializes Inf / -Inf / nan which do not exist in JSON (which leads to data that can't be parsed by other languages).

I had no idea that python's json.dumps does that! that is really poor.

But specifically, in this case, I do agree that it would be convenient to have this kind of automatic conversion, especially as the scope is limited to Python bindings.

I really think at least having this the default it would save a lot of issues in the long run, maybe a way to turn it off via options?

Yesterday I found this library using the dumps/loads to work around the tuple problem: https://github.com/stac-utils/pystac/blob/main/pystac/validation/stac_validator.py#L100

I think so, yes. The to_value function might accept u8 (a bitmask for enabled features) and then decide during the serialization process whether to convert tuples or not. As it is a private implementation detail, we can use u8 for now as we don't have many config options and we'll always can extend it to u16 in the future if needed.

Ill have a go at making it an optional bitmask today.

@blacha
Copy link
Contributor Author

blacha commented Nov 3, 2021

@Stranger6667 Playing with orjson it automaticallys convert tuples by default

import orjson

print(orjson.dumps({"foo": (1,2,3)}))
# b'{"foo":[1,2,3]}'

print(orjson.dumps({"foo": [1,2,3]}))
# b'{"foo":[1,2,3]}'

It only avoids the named tuple type

from collections import namedtuple
Person = namedtuple('Person', 'first_name last_name')
person_a = Person('Joe', 'Smith')

print(orjson.dumps(person_a)) # Dies with TypeError: Type is not JSON serializable: Person

Which makes me think it should just by default convert, I'll add a test for named tuples.

@Stranger6667
Copy link
Owner

@blacha

Good point! I didn't know that orjson does it by default. Lets make it the default behavior then :)

@Stranger6667
Copy link
Owner

Also, we might get into the bitmask options later :)

@blacha
Copy link
Contributor Author

blacha commented Nov 4, 2021

Also, we might get into the bitmask options later :)

haha, I enjoy reverse engineering old games, so I love it when I get a chance to put a bitmask somewhere :)

however if we just go for the default for tuples as lists I'm not really seeing a place where we need to turn that off, so to keep this PR somewhat small maybe the options logic could be added later?

@Stranger6667
Copy link
Owner

haha, I enjoy reverse engineering old games, so I love it when I get a chance to put a bitmask somewhere :)
however if we just go for the default for tuples as lists I'm not really seeing a place where we need to turn that off, so to keep this PR somewhat small maybe the options logic could be added later?

Yep, exactly :) It would be better to implement the tuple conversion for now and then if it will be needed, we can always get back into implementing those bitmask options later :)

@Stranger6667
Copy link
Owner

Thanks! I'll make a new release soon

@Stranger6667 Stranger6667 merged commit 9867b38 into Stranger6667:master Nov 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants