Investigate alternatives to `xarray` to handle `ProcessedVariable` computations #3913

agriyakhetarpal · 2024-03-20T17:47:23Z

Recently, #3892 highlighted that pandas was being installed as an implicit required dependency for PyBaMM, because it was a required dependency for one of our required dependencies (xarray). pandas was otherwise listed as an optional dependency with the [pandas] extra and is currently used only for handling CSV files.

This dependence on xarray is particularly concerning, because:

If pandas decides to act upon PDEP-10 with v3, it would drastically increase the download size for PyBaMM (pyarrow wheels across platforms are 120+ megabytes in size at a minimum).
This would have complications on if things like Pyodide support are considered – where running PyBaMM on the browser would require excess bandwidth utilisation and slow down usage workflows. It would also affect regular users by a bit in Google Colab, where Python virtual environments and dependencies are not saved or cached.

Prior to the use of xarray (see #2366) as a backend for the ProcessedVariable and the ProcessedVariableComputed classes, the scipy.interpolate module was being used – which could be an option to return to.

There is time until pandas decides on this and also until we release v24.5, so we can take into account some of the developments around this area as they arise (as discussed in the technical roadmap meeting on 18/03/2024).

The text was updated successfully, but these errors were encountered:

kratman · 2024-03-20T20:28:20Z

What is pyodide being used for if it is an issue?

I have used pyarrow and pandas in a lot of web based apps without issue. Both pandas and pyarrow are pretty common in data science, so I know these get used in web/notebook applications on a regular basis

agriyakhetarpal · 2024-03-20T20:49:58Z

What is pyodide being used for if it is an issue?

It's not being used by us currently, but as a part of my work assignment I am extending support for it across a lot of PyData projects and across the Scientific Python ecosystem (please see Quansight-Labs/czi-scientific-python-mgmt#18 and Quansight-Labs/czi-scientific-python-mgmt#19). PyBaMM isn't quite there yet, because we have CasADi as a dependency—it is tricky to compile it to WASM—if it becomes optional, we could move things forward on that (see #3826). The best and most stable example of where you can see Pyodide currently is on any of the usage examples in the scikit-learn documentation, where you can bring interactive docs via client-side JupyterLite notebooks.

I have used pyarrow and pandas in a lot of web based apps without issue. Both pandas and pyarrow are pretty common in data science, so I know these get used in web/notebook applications on a regular basis

There's no issue as such if you do so locally for any data science workflows because the pyarrow backend is extremely fast, but 1. those with unstable connections can have issues running such notebooks online, and 2. having a heavy (required) dependency graph in general isn't good for any library (packaging/distribution, for example, is one of the areas). But this is a smaller part of the picture; some of the responses on pandas-dev/pandas#54466 are quite insightful in this regard.

kratman · 2024-03-20T21:34:49Z

Yeah if we are going to drop xarray then using scipy or numpy native features would be good. However, it looks like we use pandas directly in a bunch of files, so it is not just due to xarray. I think if you want to make pandas optional, then you would need to pandas from a bunch of places (notebooks, tests, etc) and not just remove xarray.

Pandas can be useful for analysis and plotting, so we should probably think about if it is useful on the whole to include it and make sure it is a concern for our users. Realistically optional dependencies just make things more complicated. Unless we have fully optional modules then we should try to just remove problematic libraries all together.

agriyakhetarpal · 2024-03-20T23:31:26Z

We did have pandas as an optional dependency before #3892, didn't we? I imagine it should not be a lot of work to make it fully optional back again with the import_optional_dependency wrapper. Or are we using it in a notebook where we haven't installed it in the introductory code cell?

A lot of the plotting features (for example matplotlib) were set as optional so that you were not forced to use it, and therefore you could use libraries like holoviz, pyvista, altair, seaborn, or any others of your choice offering a plotting backend and a graphics module. It is still optional at this time but in PyBaMM's history before v23.5 it was one of the "truly" optional dependencies (but we didn't have a list of optional dependencies back then).

agriyakhetarpal added difficulty: medium Will take a few days priority: medium To be resolved if time allows labels Mar 20, 2024

agriyakhetarpal changed the title ~~Investigate alternatives for xarray to handle ProcessedVariable computations~~ Investigate alternatives to xarray to handle ProcessedVariable computations Mar 21, 2024

agriyakhetarpal mentioned this issue Apr 9, 2024

[Bug]: BaseSolver.solve() multiprocessing is broken on system's defaulting to spawn context #3974

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate alternatives to `xarray` to handle `ProcessedVariable` computations #3913

Investigate alternatives to `xarray` to handle `ProcessedVariable` computations #3913

agriyakhetarpal commented Mar 20, 2024 •

edited

Loading

kratman commented Mar 20, 2024

agriyakhetarpal commented Mar 20, 2024

kratman commented Mar 20, 2024

agriyakhetarpal commented Mar 20, 2024

Investigate alternatives to xarray to handle ProcessedVariable computations #3913

Investigate alternatives to xarray to handle ProcessedVariable computations #3913

Comments

agriyakhetarpal commented Mar 20, 2024 • edited Loading

kratman commented Mar 20, 2024

agriyakhetarpal commented Mar 20, 2024

kratman commented Mar 20, 2024

agriyakhetarpal commented Mar 20, 2024

Investigate alternatives to `xarray` to handle `ProcessedVariable` computations #3913

Investigate alternatives to `xarray` to handle `ProcessedVariable` computations #3913

agriyakhetarpal commented Mar 20, 2024 •

edited

Loading