Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bar base and timeline #2626

Merged
merged 3 commits into from
Jul 14, 2020
Merged

bar base and timeline #2626

merged 3 commits into from
Jul 14, 2020

Conversation

nicolaskruchten
Copy link
Contributor

@nicolaskruchten nicolaskruchten commented Jul 8, 2020

Closes #2585

  • I have added tests (if submitting a new feature or correcting a bug) or
    modified existing tests.
  • For a new feature, I have added documentation examples in an existing or
    new tutorial notebook (please see the doc checklist as well).
  • I have added a CHANGELOG entry if fixing/changing/adding anything substantial.

@nicolaskruchten nicolaskruchten mentioned this pull request Jul 8, 2020
4 tasks
@nicolaskruchten nicolaskruchten added this to the 4.9.0 milestone Jul 8, 2020
@nicolaskruchten nicolaskruchten force-pushed the timeline branch 4 times, most recently from 7bcc167 to 22fb60e Compare July 9, 2020 17:36
@nicolaskruchten
Copy link
Contributor Author

@emmanuelle ready for review

@@ -357,6 +358,53 @@ def bar(
bar.__doc__ = make_docstring(bar, append_dict=_cartesian_append_dict)


def timeline(
data_frame=None,
x_start=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these arguments are required they should not be keywords arguments but positional ones I think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could think of defaults like providing a global start or end parameter which would be the one for all. But I think the easiest solution is to make the arguments positional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really like to keep this function consistent with all the others, which have the data_frame argument come first. I agree that in general required arguments should come first and be positional, but in this case I prefer consistency. px.pie() sort of works the same way: you need to provide either values or names otherwise you get an empty figure.

To align with something like pie I could just have x be unbound if x_start and x_end aren't provided instead of raising (i.e. we would render an empty figure just like px.pie(df)) but I think this would be less helpful for users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point but preferring px consistency over python usage (when I see a keyword argument I don't expect the function to raise with its default value, and in particular None usually means optional) is not 100% satisfying. You are right that pie has the same problem, and that if you use all the default values for the other functions, the result is an empty figure so not very interesting. I'd be happy with a modification of the docstring for x_start and x_end mentioning that they are compulsory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To align with something like pie I could just have x be unbound if x_start and x_end aren't provided instead of raising (i.e. we would render an empty figure just like px.pie(df)) but I think this would be less helpful for users.

but this would be consistent :-)

"Both x_start and x_end must refer to data convertible to datetimes."
)

# note that we are not adding any columns to the data frame here, so no risk of overwrite
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand here; there's no risk of overwrite because the dataframe is copied?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is saying that we're not adding a new column to the DF, we're just overwriting an existing known key that we control. If we were to add a new column to the DF called x or base we would run the risk of clobbering an existing column that was mapped to something like color and things would break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although... as I write this, I realized that actually this exact situation breaks in this case: if someone maps color="x_end" we will end up clobbering their column... same problem as we encounter in sunburst/treemap when we mess with the DF :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanations. It's indeed a similar problem as the sunburst / treemap case.

@emmanuelle
Copy link
Contributor

Just a comment about the arguments x_start and x_end, which should be positional IMO, otherwise ready to go!

@emmanuelle
Copy link
Contributor

For the record about positional vs keyword arguments, since px functions have only keyword arguments, here is the list of functions which raise when called without any argument (the other return an empty figure)

import plotly.express as px
for func_name in px.__all__[:-8]:
    try:
        fig = getattr(px, func_name)()
    except:
        print(func_name, "error")

which results in

scatter_mapbox error
scatter_matrix error
density_mapbox error
line_mapbox error
parallel_coordinates error
parallel_categories error
timeline error
imshow error

@nicolaskruchten
Copy link
Contributor Author

Ah that's a good list to have, thanks for raising this issue. Raising with defaults isn't the best thing, I agree, but at least we should ensure that there is a clear error message in each case.

The problem the way I see it is that in some cases (not timeline) we can't make arguments required because it's like "either A or B is required" so we need to give them default None, but if neither is specified we need to either render an empty figure or raise. I don't know that empty figure is strictly better than raising because it's less helpful to the user.

@nicolaskruchten
Copy link
Contributor Author

Follow-up: all these functions have reasonable error messages except for line_mapbox and scatter_mapbox which blindly read lat/lon

@nicolaskruchten nicolaskruchten merged commit 8f32b7d into master Jul 14, 2020
@nicolaskruchten nicolaskruchten deleted the timeline branch July 20, 2020 14:35

try:
x_start = pd.to_datetime(args["data_frame"][args["x_start"]])
x_end = pd.to_datetime(args["data_frame"][args["x_end"]])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some (for me ) unexpected behavior when the start and/or end times are plain numbers (parseable by pandas to_datetime). The default in pandas is to denominate in nano-seconds, whereas the subsequent time delta in process_dataframe_timeline is converted to mili-seconds.

If you try to plot something small such as [1, 2, 10], timeline will happily convert it and end up with all zeros and an empty plot since 1ns, 2ns and 10ns are all less than 1ms.

Should I create a bug report / enhancement for this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add base to px.bar
3 participants