Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loads does not handle subclasses of str #445

Closed
emontnemery opened this issue Dec 8, 2023 · 7 comments
Closed

loads does not handle subclasses of str #445

emontnemery opened this issue Dec 8, 2023 · 7 comments

Comments

@emontnemery
Copy link

emontnemery commented Dec 8, 2023

Expected outcome:
Subclasses of str should be handled

Actual outcome:
orjson.loads blows up

Step by step reproduction:

>>> import orjson
>>> class MyStr(str):
...     pass
...
>>> obj = MyStr('"abc"')
>>> orjson.loads(obj)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
orjson.JSONDecodeError: Input must be bytes, bytearray, memoryview, or str: line 1 column 1 (char 0)
@FrnchFrgg
Copy link

FrnchFrgg commented Dec 8, 2023

Note that this also fails (even when trying to catch MyStr instances in a default handler):

>>> import orjson
>>> class MyStr(str):
...     pass
...
>>> obj = MyStr('key')
>>> orjson.dumps({obj: "value"})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Dict key must be str

EDIT: I see that this may be normal according to https://github.com/ijl/orjson#opt_non_str_keys

@emontnemery
Copy link
Author

I don't agree these fall under "non str keys", that option is to allow keys which are int, float etc. In Python it's always assumed that if a class is handled, so are subclasses of it. Note that there's no way to write a type annotation in Python where a base class is allowed, but subclasses aren't.

@ijl
Copy link
Owner

ijl commented Dec 8, 2023

This is a sort of feature request if you squint, but not in a useful way, and I don't think it would be productive to engage. Someone else is welcome to do it better.

@ijl ijl closed this as completed Dec 8, 2023
@FrnchFrgg
Copy link

@ijl I think I agree with the reasoning not to accept by default subclasses of str since these can mess up with hashing and thus be accepted as duplicate keys in a dict, which will result in problematic JSON output. That was what #446 was about and I understand why it has been closed (especially since you provide a way out with OPT_NON_STR_KEYS).

But this bug is not to be conflated with #446: I understand that you may want to handle bytes, bytearray, memoryview and str with a specific, faster path, but here the hashing issue does not exist. It is expected that a python function accepting str input should handle its subclasses (without any of the added bells and whistles it may have): this is one of the fundamental promises of object-oriented programming.

(BTW, kudos for having the courage to tackle the nonsense that is CPython interfaced from other languages I understand that having to handle subclasses is cumbersome and that you don't really know if someone broke some fundamental promise of vanilla str)

@FrnchFrgg
Copy link

I found that you already merged #438 so I suppose you could just fallback to the same str() call for unknown loads inputs ?

@timkpaine
Copy link

@FrnchFrgg fyi that was closed as stale, not merged. Rebased in #454

@timkpaine
Copy link

Looks like the stale bot is very aggressive 😅, i’m going to incorporate this change in my fork xorjson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants