-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bar base and timeline #2626
bar base and timeline #2626
Conversation
7bcc167
to
22fb60e
Compare
@emmanuelle ready for review |
040c8fd
to
43493cb
Compare
@@ -357,6 +358,53 @@ def bar( | |||
bar.__doc__ = make_docstring(bar, append_dict=_cartesian_append_dict) | |||
|
|||
|
|||
def timeline( | |||
data_frame=None, | |||
x_start=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if these arguments are required they should not be keywords arguments but positional ones I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could think of defaults like providing a global start
or end
parameter which would be the one for all. But I think the easiest solution is to make the arguments positional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really like to keep this function consistent with all the others, which have the data_frame
argument come first. I agree that in general required arguments should come first and be positional, but in this case I prefer consistency. px.pie()
sort of works the same way: you need to provide either values
or names
otherwise you get an empty figure.
To align with something like pie
I could just have x
be unbound if x_start
and x_end
aren't provided instead of raising (i.e. we would render an empty figure just like px.pie(df)
) but I think this would be less helpful for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point but preferring px consistency over python usage (when I see a keyword argument I don't expect the function to raise with its default value, and in particular None usually means optional) is not 100% satisfying. You are right that pie has the same problem, and that if you use all the default values for the other functions, the result is an empty figure so not very interesting. I'd be happy with a modification of the docstring for x_start
and x_end
mentioning that they are compulsory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To align with something like
pie
I could just havex
be unbound ifx_start
andx_end
aren't provided instead of raising (i.e. we would render an empty figure just likepx.pie(df)
) but I think this would be less helpful for users.
but this would be consistent :-)
"Both x_start and x_end must refer to data convertible to datetimes." | ||
) | ||
|
||
# note that we are not adding any columns to the data frame here, so no risk of overwrite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I understand here; there's no risk of overwrite because the dataframe is copied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is saying that we're not adding a new column to the DF, we're just overwriting an existing known key that we control. If we were to add a new column to the DF called x
or base
we would run the risk of clobbering an existing column that was mapped to something like color
and things would break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although... as I write this, I realized that actually this exact situation breaks in this case: if someone maps color="x_end"
we will end up clobbering their column... same problem as we encounter in sunburst/treemap when we mess with the DF :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanations. It's indeed a similar problem as the sunburst / treemap case.
Just a comment about the arguments |
For the record about positional vs keyword arguments, since px functions have only keyword arguments, here is the list of functions which raise when called without any argument (the other return an empty figure) import plotly.express as px
for func_name in px.__all__[:-8]:
try:
fig = getattr(px, func_name)()
except:
print(func_name, "error") which results in
|
Ah that's a good list to have, thanks for raising this issue. Raising with defaults isn't the best thing, I agree, but at least we should ensure that there is a clear error message in each case. The problem the way I see it is that in some cases (not timeline) we can't make arguments required because it's like "either A or B is required" so we need to give them default None, but if neither is specified we need to either render an empty figure or raise. I don't know that empty figure is strictly better than raising because it's less helpful to the user. |
Follow-up: all these functions have reasonable error messages except for |
|
||
try: | ||
x_start = pd.to_datetime(args["data_frame"][args["x_start"]]) | ||
x_end = pd.to_datetime(args["data_frame"][args["x_end"]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some (for me ) unexpected behavior when the start and/or end times are plain numbers (parseable by pandas to_datetime). The default in pandas is to denominate in nano-seconds, whereas the subsequent time delta in process_dataframe_timeline is converted to mili-seconds.
If you try to plot something small such as [1, 2, 10], timeline will happily convert it and end up with all zeros and an empty plot since 1ns, 2ns and 10ns are all less than 1ms.
Should I create a bug report / enhancement for this issue?
Closes #2585
modified existing tests.
new tutorial notebook (please see the doc checklist as well).