Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickle w/o Statics #1458

Closed
ax3l opened this issue Jun 6, 2023 · 3 comments · Fixed by #1633
Closed

Pickle w/o Statics #1458

ax3l opened this issue Jun 6, 2023 · 3 comments · Fixed by #1633
Labels
frontend: Python3 third party third party libraries that are shipped and/or linked

Comments

@ax3l
Copy link
Member

ax3l commented Jun 6, 2023

For the first implementation of multi-process (multi-node) Dask, we pickle objects like the Record and RecordComponents + their series.

The series is unpickled into a static function member, to avoid:

  • double open/parsing for each object
  • having to express lifetime of the series on the unpickled side with respect to the pickled object.

This works as a hack until you need to work with two series at a time.

https://github.com/openPMD/openPMD-api/blob/0.15.1/include/openPMD/binding/python/Pickle.hpp#L73-L77

@ax3l ax3l added third party third party libraries that are shipped and/or linked frontend: Python3 labels Jun 6, 2023
@pordyna
Copy link
Contributor

pordyna commented Jun 22, 2023

@ax3l I tried using dask delayed with openpmd-api for parallelization over iterations. I couldn't get it to work, I think, because dask wasn't able to pickle the series object. Would it make sense to add add_pickle to Series and Iteration?

@pordyna
Copy link
Contributor

pordyna commented Jun 12, 2024

@ax3l @franzpoeschel This also doesn't seem to work when working with one series at a time but more than one in a kernel instance. I don't understand why, but I'm trying to use dask and iterate over multiple simulations. The result is that the data keeps being loaded from the first series! Deleting series in between or restarting dask workers doesn't help. The only thing that worked for me in Jupyter was restarting the kernel between running the code for different series.

Are you aware of any workaround?

@franzpoeschel
Copy link
Contributor

franzpoeschel commented Jun 12, 2024

That's precisely what Axel means above by

This works as a hack until you need to work with two series at a time.

Unpickling e.g. a single RecordComponent does not really work trivially together with our memory model in openPMD since a RecordComponent will become invalid once its Series is deleted, but the Pickle API gives us no way to store the Series anywhere.
Ideally, we should change our C++ API to a model where any handle keeps the entire thing alive, this would also solve this issue. This should be possible, but would be a slightly larger change (I actually have PR open with an internal remodeling that might help here).
For now, this is what we do:

            // Create a new openPMD Series and keep it alive.
            // This is a big hack for now, but it works for our use
            // case, which is spinning up remote serial read series
            // for DASK.
            static auto series = openPMD::Series(filename, Access::READ_ONLY);

... which leads exactly to the behavior that you see.

I do have an idea though how we could fix this short-term, lemme see

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend: Python3 third party third party libraries that are shipped and/or linked
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants