-
-
Notifications
You must be signed in to change notification settings - Fork 1
Plotly express should check if input data is tidy #141
Comments
How to minimally reproduce:
The root cause is an ill-typed dataframe column: the column name is an int when the dataframe is created like so, but it would be good to verify inputs rather than fail so internally. I fail to see how this will negatively impact rendering time, but it will certainly avoid developer usability issues. It's possible to make the verifications conditional so that there is no performance penalty after user code is ready, this will make plotly express much more usable and will boost productivity for users. The solution to this particular case is to explicitly set the column value as a string, something like so:
|
Speaking of tidy, isn't the tidy manifest you link from the docs talking about column names being values and not names? |
Thank you @matanster, this case should indeed either fail gracefully or be handled correctly. We're in the process of accepting a larger variety of input arguments in |
Indeed, erroring out here is not the right thing. I believe that code like the following should just result in an empty plot because nothing is being mapped: import plotly.express as px
import pandas as pd
px.histogram(pd.DataFrame([1,2,3])) |
I'm not sure what this means :) |
Thanks for your positive attitude! it seems that in defiance of python tradition, an input-safe library would be even more awesome than what plotly/express already is, especially given that charts need to be developed ad-hoc very typically, by people who don't recall every caveat or limitation of the API at the moments when they come to create, iterate, and choose between visualizations. |
Here's where one of the current top results for plotly express on google mentions a concept they call tidy, or a tidy dataframe: And here's the link for the "tidy dataframes manifesto" as much as it matters. It says in there that column names should be values and not names, which is conceptually quite opposite to limiting the type allowed for column names in dataframe input to plotly express. Since it's a long write-up there, here's a screenshot where they say and exemplify that: Sorry for the repetitive nature of this follow-up, but I hope it clarifies about where I got tidy from ... and how it is mildly related ... |
Mmmmm. sorry about that. |
Fair enough. In any case, we are working on more flexible input types, although the "tidy" philosophy will remain, i.e. you'll still need to concatenate your vectors to do this sort of thing: https://stackoverflow.com/questions/57988604, you just won't be required to stick them into a data frame first :) |
Today, either of these will work: import pandas as pd
import plotly.express as px
lengths = pd.DataFrame(list(range(1000)))
fig = px.histogram(lengths, x=0)
fig.show()
fig = px.histogram(x=range(1000))
fig.show() |
Update: Plotly Express now accepts non-"tidy" data :) |
Currently, it's easy to run into internal errors like follows:
Whereas it would be good to check that the input dataframe is fine before charting, and issue a pinpointed error message rather than fail internally following the dirty python tradition.
The text was updated successfully, but these errors were encountered: