-
-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transform from dataclasses_json to DataClassJSONMixin #129
Comments
Hi @hhcs9527
So that I can help you, could you provide more details about how you did it? I just replaced Source codefrom dataclasses import dataclass
from mashumaro.mixins.json import DataClassJSONMixin
from timeit import timeit
from dataclasses_json import DataClassJsonMixin
@dataclass
class StockPosition1(DataClassJSONMixin):
ticker: str
name: str
balance: int
@dataclass
class StockPosition2(DataClassJsonMixin):
ticker: str
name: str
balance: int
t1 = timeit(
"loader(data)",
globals={
"loader": StockPosition1.from_json,
"data": '{"ticker": "AAPL", "name": "Apple", "balance": 42}',
},
)
print('mashumaro from_json', t1)
t2 = timeit(
"loader(data)",
globals={
"loader": StockPosition2.from_json,
"data": '{"ticker": "AAPL", "name": "Apple", "balance": 42}',
},
)
print('dataclasses_json from_json', t2)
obj1 = StockPosition1("AAPL", "Apple", 42)
obj2 = StockPosition2("AAPL", "Apple", 42)
t1 = timeit("dumper()", globals={"dumper": obj1.to_json})
print('mashumaro to_json', t1)
t2 = timeit("dumper()", globals={"dumper": obj2.to_json})
print('dataclasses_json to_json', t2)
AFAIK, dataclasses-json doesn't compile loader / dumper for specific schema as mashumaro does, which is what makes it so slow. |
@Fatal1ty Thanks for the help. I have a object from @DataClass and DataClassJsonMixin. However, due to the speed issue, we would like to transform such object to DataClassJSONMixin. But I have no idea how to retrieve the data in DataClassJsonMixin to DataClassJSONMixin => which I believe is the main reason that DataClassJSONMixin is fast. I've tried several ways
Here is my example for way 2. import json
from enum import Enum
from typing import List, Any, Union, Generic, TypeVar, Tuple, cast
from dataclasses import dataclass, field
from dataclasses_json import dataclass_json, DataClassJsonMixin
from mashumaro.mixins.json import DataClassJSONMixin
from mashumaro.types import SerializationStrategy
T = TypeVar("T")
def timeDiff(obj):
import time
start = time.time()
json_str = obj.to_json()
end = time.time()
print("to_json cost:", (end - start)*10000)
start = time.time()
type(obj).from_json(json_str)
end = time.time()
print("from_json cost:", (end - start)*10000)
print()
class Currency(Enum):
USD = "USD"
EUR = "EUR"
@dataclass
class CurrencyPosition(DataClassJSONMixin):
currency: Currency
balance: float
@dataclass
class StockPosition(DataClassJSONMixin):
ticker: str
name: str
balance: int
@dataclass
class Portfolio(DataClassJSONMixin):
currencies: List[CurrencyPosition]
stocks: List[StockPosition]
my_portfolio = Portfolio(
currencies=[
CurrencyPosition(Currency.USD, 238.67),
CurrencyPosition(Currency.EUR, 361.84),
],
stocks=[
StockPosition("AAPL", "Apple", 10),
StockPosition("AMZN", "Amazon", 10),
]
)
json_string = my_portfolio.to_json()
# print("my_portfolio og: ", json_string)
# print(json_string == Portfolio.from_json(my_portfolio.to_json()).to_json())
@dataclass_json
@dataclass
class CurrencyPosition2:
currency: Currency
balance: float
@dataclass_json
@dataclass
class Portfolio2():
currencies: List[CurrencyPosition]
stocks: List[StockPosition]
my_portfolio2 = Portfolio2(
currencies=[
CurrencyPosition(Currency.USD, 238.67),
CurrencyPosition(Currency.EUR, 361.84),
],
stocks=[
StockPosition("AAPL", "Apple", 10),
StockPosition("AMZN", "Amazon", 10),
]
)
timeDiff(my_portfolio)
timeDiff(my_portfolio2)
my_portfolio3 = Portfolio2(
currencies=[
CurrencyPosition(Currency.USD, 238.67),
CurrencyPosition(Currency.EUR, 361.84),
],
stocks=[
StockPosition("AAPL", "Apple", 10),
StockPosition("AMZN", "Amazon", 10),
]
)
Portfolio2.to_dict = DataClassJSONMixin.to_dict
Portfolio2.from_dict = DataClassJSONMixin.from_dict
Portfolio2.to_json = DataClassJSONMixin.to_json
Portfolio2.from_json = DataClassJSONMixin.from_json
timeDiff(cast(DataClassJsonMixin, my_portfolio3)) Hope this helps, thanks! |
Yes, thank you, this example shed some light here, but you made a few mistakes. Let me explain.
You can't simply replace class attributes to the ones from
The second point. You're mixing Portfolio2 with CurrencyPosition and StockPosition not with CurrencyPosition2 and StockPosition2. This works for dataclasses_json because it fortunately assumes your dataclasses have methods with the same names that mashumaro introduces. But if you're going to mix the libraries it will definitely be error prone.
I strongly believe this is a typo and you actually didn't get this result. I run your code on my laptop with Python 3.11 and got the following numbers:
So, mashumaro is roughly 18x faster on
Based on all the above I'd like to ask you, why don't you just remove |
Just for the reference, it's not documented and subjected to changes but specific Portfolio2.to_dict = DataClassJSONMixin.to_dict
Portfolio2.from_dict = DataClassJSONMixin.from_dict
Portfolio2.to_json = DataClassJSONMixin.to_json
Portfolio2.from_json = DataClassJSONMixin.from_json You're better to do this: from mashumaro.core.meta.code.builder import CodeBuilder
cb = CodeBuilder(Portfolio2) # if you need to call from_dict / to_dict in your code
cb.add_pack_method()
cb.add_unpack_method()
cb = CodeBuilder(
Portfolio2, format_name="json", encoder=json.dumps, decoder=json.loads
) # if you need to call from_json / to_json in your code
cb.add_pack_method()
cb.add_unpack_method() It will give |
Hi @Fatal1ty , Not sure what you mean by the last paragraph. |
Can you share more details about your performance issues? P.S. I found your recent changes in flytekit and I haven't delved deeply into the internals of flytekit, but as far as I understand, Anyway, I'm still struggling to understand why you're trying to mix dataclasses_json with mashumaro in this weird way? To help me better understand your motivation, I will quote my question:
I found chapter "Using Custom Python Objects" in flytekit documentation that says that you support dataclasses with dataclass_json mixed functionality. So, why not extend dataclasses support either to dataclasses with |
You're right, we should start at the bottom to replace the DataClassJsonMixin with DataClassJSONMixin. I think this issue is resolved, thanks. |
By the way, can you point out the path where you implement the mashumaro_from_json for any type? Thanks for the clarification. |
To be more accurate, method
The moment when compilation occurs depends on the use of the library. Let me elaborate this.
Also, you can be interested in this issue: While it's not done yet you can add support for dataclasses without any mixins as follows: import json
from dataclasses import dataclass
from typing import Type, TypeVar
from mashumaro.core.meta.code.builder import CodeBuilder
T = TypeVar("T")
class JSONDecoder:
def __init__(self):
self._cache = {}
def from_json(self, type_: Type[T], data: str) -> T:
type_id = id(type_)
method = self._cache.get(type_id)
if method:
return method(data)
cb = CodeBuilder(type_, format_name="json", decoder=json.loads)
cb.add_unpack_method()
method = getattr(type_, "__mashumaro_from_json__")
self._cache[type_id] = method
return method(data)
def from_json2(self, type_: Type[T], data: str) -> T:
method = getattr(type_, "__mashumaro_from_json__")
if method:
return method(data)
cb = CodeBuilder(type_, format_name="json", decoder=json.loads)
cb.add_unpack_method()
method = getattr(type_, "__mashumaro_from_json__")
return method(data)
Source codeimport json
from dataclasses import dataclass
from typing import Type, TypeVar
from timeit import timeit
from mashumaro.core.meta.code.builder import CodeBuilder
from dataclasses_json import DataClassJsonMixin
from mashumaro.mixins.json import DataClassJSONMixin
T = TypeVar("T")
class JSONDecoder:
def __init__(self):
self._cache = {}
def from_json(self, type_: Type[T], data: str) -> T:
type_id = id(type_)
method = self._cache.get(type_id)
if method:
return method(data)
cb = CodeBuilder(type_, format_name="json", decoder=json.loads)
cb.add_unpack_method()
method = getattr(type_, "__mashumaro_from_json__")
self._cache[type_id] = method
return method(data)
def from_json2(self, type_: Type[T], data: str) -> T:
method = getattr(type_, "__mashumaro_from_json__")
if method:
return method(data)
cb = CodeBuilder(type_, format_name="json", decoder=json.loads)
cb.add_unpack_method()
method = getattr(type_, "__mashumaro_from_json__")
return method(data)
@dataclass
class MyClass1:
x: int
@dataclass
class MyClass2(DataClassJSONMixin):
x: int
@dataclass
class MyClass3(DataClassJsonMixin):
x: int
json_decoder = JSONDecoder()
data = '{"x": "42"}'
json_decoder.from_json(MyClass1, data) # warmup
print(
"without mixins (1)",
timeit(
"decoder(cls, data)",
globals={
"decoder": json_decoder.from_json,
"data": data,
"cls": MyClass1,
},
),
)
print(
"without mixins (2)",
timeit(
"decoder(cls, data)",
globals={
"decoder": json_decoder.from_json2,
"data": data,
"cls": MyClass1,
},
),
)
print(
"DataClassJSONMixin",
timeit(
"decoder(data)",
globals={
"decoder": MyClass2.from_json,
"data": data,
},
),
)
print(
"DataClassJsonMixin",
timeit(
"decoder(data)",
globals={
"decoder": MyClass3.from_json,
"data": data,
},
),
)
I hope this will help you make the right decision. P.S. You are free to close this issue if everything was answered. |
Thanks for the explanation. |
If you mean dataclass code generation from JSON schema, then answer is no. It’s a job for another library as I see it. |
Do you know which library? Thanks. |
Take a look at datamodel-code-generator |
Is your feature request related to a problem? Please describe.
Currently, our project tried to migrate the serialize function in dataclasses_json to DataClassJSONMixin.
If we just replace the to_json / from_json function with DataClassJSONMixin ones, the speed remains the same, the reason I believe is the difference in how dataclasses_json implements the to_dict / from_dict.
Describe the solution you'd like
It would be great to have a function to transform the dataclasses_json to DataClassJSONMixin.
Describe alternatives you've considered
or Provide a general way to accept the init data
Additional context
Add any other context about the feature request here.
The text was updated successfully, but these errors were encountered: