Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PX scatter, bar behaviour when only one of x or y is specified #2332

Closed
nicolaskruchten opened this issue Mar 30, 2020 · 9 comments · Fixed by #2336
Closed

PX scatter, bar behaviour when only one of x or y is specified #2332

nicolaskruchten opened this issue Mar 30, 2020 · 9 comments · Fixed by #2336
Milestone

Comments

@nicolaskruchten
Copy link
Contributor

Plotly.js defaults to assuming x or y are increasing integers if unspecified, but this is a bad fit for PX... consider the following case where coloring the markers results in their shifting position (note the x axis range!)

image

This case is even scarier, with individual bars have different heights based on position in the data frame:

image

For bar, line, scatter and area we should probably disallow specifying one but not both of x and y. Possibly in the 3d cases also? Anywhere else? Bar-like traces?

@nicolaskruchten
Copy link
Contributor Author

Note that we can already pass in df.index to either of these, if we really do want sequential integers and that's what the index is, and we'll soon be able to pass in y=px.Constant(1) for the second example there.

@emmanuelle
Copy link
Contributor

Hum here I don't think we should make it more complicated than what is possible to do with go. I think users see px as a simplification compared to go so ideally it should not be more verbose. Having increasing indices as a default for x and y is a sensible default, which is also shared with other libraries such as matplotlib.
I'm not shocked by the scatter example. The bar one is very weird indeed (and trying to adapt this to a go example is even weirder! Don't really understand what is done!).
I think I would rather try to document this behaviour rather than changing it.

@nicolaskruchten
Copy link
Contributor Author

Oh so I'm not proposing to make anything more complicated: I'm saying we should raise an exception if one of the two is None.

@nicolaskruchten
Copy link
Contributor Author

(this is what Seaborn does also BTW)

@emmanuelle
Copy link
Contributor

well having to pass an additional parameter is more complicated :-). When what you want to do is

px.line(y=[10, 3, 5])

@nicolaskruchten
Copy link
Contributor Author

OK, so here is a heuristic we could use which results in reasonable results for scatter/line/area/bar...

  • Define a trivial new class px.Range such that it stands in for columns as range(len(df))
  • For scatter-like traces, if only y is provided, then x should default to px.Range() (and vice-versa) ... this would give similar results to what px.line(y=[10,3,5]) does, except that adding color=['a',b','a'] wouldn't change the position of any point, which is important IMO.
  • For bar-like traces, if only y is provided, then x should default to px.Constant(1) (and vice-versa) ... this is basically what we do right now for box/violin/strip via an x0/y0 hack.

@nicolaskruchten
Copy link
Contributor Author

The Constant(1) approach looks like this in various permutations:

image

image
image

@nicolaskruchten
Copy link
Contributor Author

This is what px.Range() looks like:

image

@nicolaskruchten
Copy link
Contributor Author

OK so I've actually implemented a more complicated option, wherein if the df is provided, we use its index otherwise px.Range() except in the bar case if the provided column is categorical, then we use px.Constant(1).

@nicolaskruchten nicolaskruchten added this to the 4.8.0 milestone May 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants