Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add ECDF plot to figure factory #2765

Closed
wants to merge 3 commits into from

Conversation

benlindsay
Copy link

Documentation PR

  • I've seen the doc/README.md file
  • This change runs in the current version of Plotly on PyPI and targets the doc-prod branch OR it targets the master branch
  • If this PR modifies the first example in a page or adds a new one, it is a px example if at all possible
  • Every new/modified example has a descriptive title and motivating sentence or paragraph
  • Every new/modified example is independently runnable
  • Every new/modified example is optimized for short line count and focuses on the Plotly/visualization-related aspects of the example rather than the computation required to produce the data being visualized
  • Meaningful/relatable datasets are used for all new examples instead of randomly-generated data where possible
  • The random seed is set if using randomly-generated data in new/modified examples
  • New/modified remote datasets are loaded from https://plotly.github.io/datasets and added to https://github.com/plotly/datasets
  • Large computations are avoided in the new/modified examples in favour of loading remote datasets that represent the output of such computations
  • Imports are plotly.graph_objects as go / plotly.express as px / plotly.io as pio
  • Data frames are always called df
  • fig = <something> call is high up in each new/modified example (either px.<something> or make_subplots or go.Figure)
  • Liberal use is made of fig.add_* and fig.update_* rather than go.Figure(data=..., layout=...) in every new/modified example
  • Specific adders and updaters like fig.add_shape and fig.update_xaxes are used instead of big fig.update_layout calls in every new/modified example
  • fig.show() is at the end of each new/modified example
  • plotly.plot() and plotly.iplot() are not used in any new/modified example
  • Hex codes for colors are not used in any new/modified example in favour of these nice ones

Code PR

  • I have read through the contributing notes and understand the structure of the package. In particular, if my PR modifies code of plotly.graph_objects, my modifications concern the codegen files and not generated files.
  • I have added tests (if submitting a new feature or correcting a bug) or
    modified existing tests.
  • For a new feature, I have added documentation examples in an existing or
    new tutorial notebook (please see the doc checklist as well).
  • I have added a CHANGELOG entry if fixing/changing/adding anything substantial.

@benlindsay
Copy link
Author

Hi @nicolaskruchten, I decided to go ahead and give this a shot, as discussed here. In its current state pretty much all the functionality that I would want is here, but I haven't yet added tests or docs beyond some examples in the docstring. I wanted to get some quick feedback before sinking too much more time into this so I don't go down a wrong path for too long.

One thing I want to run by you is that right now I have this set up to require a pandas dataframe input. With some extra work I could enable passing arrays/lists in instead of a pandas dataframe, since this is basically a small wrapper around the plotly.express.line function. Would you recommend adding that functionality in, or do most users just use dataframes anyway?

Also should I add some additional docs beyond examples in the docstring? I'll definitely add tests.

Any other substance or style feedback would be appreciated.

Thanks in advance!

@nicolaskruchten
Copy link
Contributor

Great to see this PR, thank you! I'll try to review it today or tomorrow and give you feedback :)

The next release of Plotly.py should come by the end of the month 🤞 so if we can get this merged by then it can be part of version 4.11!

@nicolaskruchten
Copy link
Contributor

One thing I want to run by you is that right now I have this set up to require a pandas dataframe input. With some extra work I could enable passing arrays/lists in instead of a pandas dataframe, since this is basically a small wrapper around the plotly.express.line function. Would you recommend adding that functionality in, or do most users just use dataframes anyway?

It's your call, I'll accept the PR with or without non-pandas support, but it'll likely get more usage with a broader regime of input formats

Also should I add some additional docs beyond examples in the docstring? I'll definitely add tests.

Tests are definitely needed, and ideally we'd put together a special documentation page for this figure factory, much like the one that was made for the #2559 PR. This entails creating a Markdown file in the doc/python directory, using Jupytext. I can handle the special front-matter directives to get this page into the right location in our documentation, but I'm imagining essentially a new tile here https://plotly.com/python/statistical-charts/ which would lead to a new page with a URL like https://plotly.com/python/empirical-distribution-function-plots/ that has a little paragraph about the context of such a plot and a few examples. This will then get automatically wired into our search system and will help people discover this functionality.

Any other substance or style feedback would be appreciated.

I'll dig into things a bit more but off the cuff: Plotly.py still supports older versions of Python (i.e. 2.7!) so we can't use cool new features like type annotations and f-strings etc unfortunately.

@benlindsay
Copy link
Author

benlindsay commented Sep 14, 2020

@nicolaskruchten Thanks for the feedback! In the latest commit I got create_ecdf() to accept array-likes for the main variables like plotly.express, rather than forcing use of dataframes to pass in the data. I feel like the code is not as cleanly encapsulated as I would like. If you have recommendations to clean it up I'd be happy to entertain those as well as all other feedback. I also got rid of all the f-strings and type annotations.

Next up I'll add some tests and try to figure out and fix the failing checks. After that, whenever that is, I'll work on getting some examples together for the Jupytext stuff.

EDIT: Actually I can't figure out how to see the circleci logs...clicking on Details takes me to a login page to try to set up a new config. I'd appreciate any pointers. Haven't used circleci before.

@benlindsay
Copy link
Author

OK, I can see the CircleCI logs now, and it lookw like both the python-2.7-optional and python-3.5-optional tests are failing because of this:

    from plotly.figure_factory._ecdf import create_ecdf
E     File "/home/circleci/project/packages/python/plotly/plotly/figure_factory/_ecdf.py", line 22
E       **kwargs,
E               ^
E   SyntaxError: invalid syntax

I thought **kwargs was valid even in 2.7 and 3.5, so I'm not sure what the error is but I'll try installing a couple other python versions and seeing if I can pinpoint the problem.

@nicolaskruchten
Copy link
Contributor

That's weird indeed! **kwargs should work! Is it something like the trailing comma or the outdent?

@nicolaskruchten nicolaskruchten mentioned this pull request Jan 8, 2021
@nicolaskruchten
Copy link
Contributor

Hi @benlindsay and happy new year!

After looking at this some more, and thinking about how to do better than create_distplot() to do what Seaborn does with kdeplot(), I've decided to add an ECDF function to PX itself rather than as a figure factory. The PR isn't quite finished yet (notably the KDE part!) but it's here if you want to take a look: #3011

@benlindsay
Copy link
Author

That's great! I haven't had time to keep working on this lately, so I'm glad to see you'll continue on with it and that it will be part of plotly.express proper :) I keep using my minimal plotly-ecdf and wishing I had time to come back to this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants