Encoding and decoding objects such as `ProvenanceDoc`-s via e.g. `as_dict()` and `from_dict()` #615

sgbaird · 2022-06-04T05:17:17Z

I've been having a bit of a heyday trying to save a DataFrame to a JSON file (or a jsonpickle JSON file) when it includes ProvenanceDoc objects. My workaround right now is just to extract some minimal data from each document, such as references and material_id. Wondering if you have any suggestions.

I'm trying to follow the style of Matbench/Matminer in having my own benchmark dataset stored on figshare and encoding/decoding it. Maybe I'm too hung up on saving a ProvenanceDoc and should stick with extracting what I can easily/manually.

The text was updated successfully, but these errors were encountered:

sgbaird · 2022-06-04T05:17:34Z

https://pymatgen.org/usage.html#montyencoder-decoder

sgbaird · 2022-06-04T05:20:01Z

Object of type CrystalSystem is not JSON serializable
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\site-packages\monty\json.py", line 321, in default
    d = o.as_dict()

During handling of the above exception, another exception occurred:

  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\json\encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\site-packages\monty\json.py", line 336, in default
    return json.JSONEncoder.default(self, o)
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\json\encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\json\encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\site-packages\pandas\io\json\_json.py", line 172, in write
    return dumps(
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\site-packages\pandas\io\json\_json.py", line 110, in to_json
    s = writer(
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\site-packages\pandas\core\generic.py", line 2621, in to_json
    return json.to_json(
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\site-packages\monty\json.py", line 301, in default
    "data": o.to_json(default_handler=MontyEncoder().encode),
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\json\encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\json\encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\json\__init__.py", line 234, in dumps
    return cls(
  File "C:\Users\sterg\Documents\GitHub\sparks-baird\mp-time-split\scripts\data_snapshot.py", line 20, in <module>
    json.dumps(dummy_expt_df, cls=MontyEncoder)

munrojm · 2022-06-04T05:55:29Z

This is something I should be able to fix on my end in emmet-core. I'll report back when I have made the fix and patch released.

janosh · 2022-08-13T17:41:55Z

@sgbaird You may know this already but what I tend to do in this case is pass a custom handler to pd.to_json().

from emmet.core.provenance import ProvenanceDoc


def as_dict_handler(obj: object) -> dict[str, Any] | None:
    """Use as default_handler kwarg to json.dump() or pandas.to_json()."""
    try:
        return obj.as_dict()  # all MSONable objects implement as_dict()
    except AttributeError:
        if isinstance(obj, ProvenanceDoc):
            needed_attrs = ("foo", "bar", ...)
            return {k: obj[k] for k in needed_attrs}

        return None  # replace unhandled objects with None in serialized data

df.to_json("some-data.json.gz", default_handler=as_dict_handler)

sgbaird · 2022-08-13T18:54:20Z

@janosh, interesting. That's new to me. Thanks for the tip!

mkhorton · 2022-08-26T20:48:12Z

@munrojm just wondering if there was an update on this issue?

I believe monty dumpfn/loadfn can serialize and de-serialize both pandas DataFrames and pydantic models, but I haven't actually verified both simultaneously. Seems like it'd be a common use case however.

tschaume · 2024-12-16T21:04:34Z

@sgbaird Is this still an issue with the latest version of mp-api? If so, could you post a short snippet for me to reproduce? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding and decoding objects such as `ProvenanceDoc`-s via e.g. `as_dict()` and `from_dict()` #615

Encoding and decoding objects such as `ProvenanceDoc`-s via e.g. `as_dict()` and `from_dict()` #615

sgbaird commented Jun 4, 2022

sgbaird commented Jun 4, 2022

sgbaird commented Jun 4, 2022

munrojm commented Jun 4, 2022

janosh commented Aug 13, 2022 •

edited

Loading

sgbaird commented Aug 13, 2022

mkhorton commented Aug 26, 2022

tschaume commented Dec 16, 2024

Encoding and decoding objects such as ProvenanceDoc-s via e.g. as_dict() and from_dict() #615

Encoding and decoding objects such as ProvenanceDoc-s via e.g. as_dict() and from_dict() #615

Comments

sgbaird commented Jun 4, 2022

sgbaird commented Jun 4, 2022

sgbaird commented Jun 4, 2022

munrojm commented Jun 4, 2022

janosh commented Aug 13, 2022 • edited Loading

sgbaird commented Aug 13, 2022

mkhorton commented Aug 26, 2022

tschaume commented Dec 16, 2024

Encoding and decoding objects such as `ProvenanceDoc`-s via e.g. `as_dict()` and `from_dict()` #615

Encoding and decoding objects such as `ProvenanceDoc`-s via e.g. `as_dict()` and `from_dict()` #615

janosh commented Aug 13, 2022 •

edited

Loading