Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seaborn plots do not work with latest pandas settings (copy-on-write) #3331

Closed
connortann opened this issue Apr 19, 2023 · 9 comments
Closed

Comments

@connortann
Copy link

connortann commented Apr 19, 2023

Summary

Some seaborn plots throw a ValueError depending on the users's global pandas configuration.

Pandas 1.5 introduced a feature called Copy-On-Write (CoW), which aims to lead to more predictable and performant behaviour. CoW is currently an optional setting, which can be controlled with pd.set_option (see example below).

However, CoW it is expected to become the default behaviour in the next version of pandas, and as of the recent launch of pandas v2 it is prominently recommended in the pandas docs . So, I think it is important for seaborn plots work with this setting enabled.

Minimal example

# Minimal example
import pandas as pd
import numpy as np
import seaborn as sns

# Enable CoW
pd.set_option("mode.copy_on_write", True)

# Seaborn displot
data = np.array([1,2,3])
sns.displot(data)

Results in error:

File ~/seaborn/distributions.py:497, in _DistributionPlotter.plot_univariate_histogram(self, multiple, element, fill, common_norm, common_bins, shrink, kde, kde_kws, color, legend, line_kws, estimate_kws, **plot_kws)
    495 # Pack the histogram data and metadata together
    496 edges = edges + (1 - shrink) / 2 * widths
--> 497 widths *= shrink
    498 index = pd.MultiIndex.from_arrays([
    499     pd.Index(edges, name="edges"),
    500     pd.Index(widths, name="widths"),
    501 ])
    502 hist = pd.Series(heights, index=index, name="heights")

ValueError: output array is read-only

Versions

  • seaborn 0.12.2
  • matplotlib 3.7.1
  • pandas 2.0.0
  • numpy 1.23.5

[EDIT: updated example to give full error output]

@connortann connortann changed the title Seaborn plots do not work with modern pandas copy-on-write Seaborn plots do not work with latest pandas settings (copy-on-write) Apr 20, 2023
@mwaskom
Copy link
Owner

mwaskom commented Apr 25, 2023

To be clear, by "next version of pandas" you mean next major version, i.e. 3.0, right?

@connortann
Copy link
Author

connortann commented Apr 26, 2023

Yes that's right, sorry for the ambiguity. Pandas 3.0 is probably a long way off; my point was just that CoW seems to be the broad direction of travel for pandas, and so users are increasingly likely to have this setting enabled globally.

@mwaskom
Copy link
Owner

mwaskom commented Apr 26, 2023

Yeah I'm fine with that, although it looks like right now the story about pandas <> numpy with CoW is a little bit unsettled (numpy/numpy#23625) and I might wait for the dust to settle there...

@mwaskom
Copy link
Owner

mwaskom commented Jan 21, 2024

I can no longer replicate the bug here with

import pandas as pd
pd.set_option("mode.copy_on_write", True)

import numpy as np
import seaborn as sns

print(np.__version__)
print(pd.__version__)
print(sns.__version__)

data = np.array([1,2,3])
sns.displot(data)
1.26.2
2.2.0
0.14.0.dev0

@mwaskom
Copy link
Owner

mwaskom commented Jan 21, 2024

I don't recall intentionally changing anything in seaborn to support copy-on-write. Therefore I expect the relevant change was in numpy. There may not be anything needed to change in seaborn to support this new behavior in pandas. When I run the test suite now I see only one failing test and the failure is inside pandas, I suspect it may be a bug on their end but having trouble reproducing externally.

@mwaskom
Copy link
Owner

mwaskom commented Jan 23, 2024

I think it's safe to close this given the above. Also interesting that nobody else has chimed in with other issues. Please continue testing seaborn with copy-on-write enabled and open a new issue if you encounter any problems!

@mwaskom mwaskom closed this as completed Jan 23, 2024
@jorisvandenbossche
Copy link

My guess is that this was indirectly fixed by #3440

The widths is a column of a DataFrame converted to numpy:

widths = res["space"].to_numpy()

And then there are some transformations done on it:

# Convert edges back to original units for plotting
ax = self._get_axes(sub_vars)
_, inv = _get_transform_functions(ax, self.data_variable)
widths = inv(edges + widths) - inv(edges)
edges = inv(edges)
# Pack the histogram data and metadata together
edges = edges + (1 - shrink) / 2 * widths
widths *= shrink

Before the aforementioned PR (and in case of no log scale), the first edit to widths was the last line above (widths *= shrink). That is an inplace multiplication, which started failing because the to_numpy() call returns a read-only array with Copy-on-Write enabled.
Now, widths is already modified (widths = inv(edges + widths) - inv(edges), which will make it a writeable array) before we get to that inplace operator.

@jorisvandenbossche
Copy link

On the actual topic of testing seaborn with CoW: I did a test run locally and only got one error, but which happened without enabling CoW as well, which is a bug in pandas (pandas-dev/pandas#57276).

So it seems that it is currently all good (at least based on the test suite ;)). There are also no warnings about behaviour changes (eg chained assignment will stop working).
Given that CoW is a rather big change and will become the default in a few months, I would personally recommend testing it on CI in one of the builds, though (I can do a PR to do that, if you want)

@mwaskom
Copy link
Owner

mwaskom commented Feb 6, 2024

That sounds like the same failure I saw, glad my intuitions were correct :)

Having a build in the test matrix that uses copy-on-write makes sense, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants