diff --git a/CHANGELOG.md b/CHANGELOG.md index 599942e04c..26b34bbec8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,28 @@ All notable changes to this project will be documented in this file. This project adheres to [Semantic Versioning](http://semver.org/). +## [4.8.0] - not yet released + +### Added + +- `plotly` now provides a Plotly Express-backed Pandas-compatible plotting backend, which can be activated via `pandas.options.plotting.backend = "plotly"`. Note that it is not intended to implement every Pandas plotting function, nor is it intended to replicate the behaviour of every argument, although per the changes below, `x` and `y` should behave similarly. ([#2336](https://github.com/plotly/plotly.py/pull/2336)) +- New datasets have been added to `plotly.express.data`: `stocks`, `experiment`, `medals_wide` and `medals_long`. ([#2336](https://github.com/plotly/plotly.py/pull/2336)) +- plotly `go.Figure` and `go.FigureWidget` now have a `_repr_html_` and a `_repr_mimebundle_` method, which are [standard hooks for integration in systems based on IPython](https://ipython.readthedocs.io/en/stable/config/integrating.html). In particular, with `_repr_html_` plotly figures can now be used within [sphinx-gallery](https://sphinx-gallery.github.io/stable/index.html) without any scraper. These additions should not change anything to the way plotly figures are displayed in notebook environments, since the `_ipython_display_` method (already present in earlier versions) takes precedence over the new methods. + +### Updated + +- The behaviour of the `x`, `y`, `orientation`, `histfunc`, `violinmode`, `boxmode` and `stripmode` arguments for 2d-cartesian functions in Plotly Express (i.e. `scatter`, `line`, `area`, `bar`, `histogram`, `violin`, `box`, `strip`, `funnel`, `density_heatmap` and `density_contour`) has been refined ([#2336](https://github.com/plotly/plotly.py/pull/2336)): + - if `x` or `y` is missing, it is inferred to be the index of `data_frame` if `data_frame` provided, otherwise a stable index of integers starting at 0. In the case of `px.bar`, if the provided value is not continuous, the missing value is treated as a column of 1s named "count", so as to behave more like `px.histogram` and to avoid sizing the resulting bars differently based on their position in the column. Previously, missing values defaulted to integers starting at 0 *per trace* which made it potentially inconsistent or misleading. + - if `x` (`y`) is missing, `orientation` now defaults to `v` (`h`). Previously it always defaulted to `v` but this is not considered a breaking change, as the cases in which it now defaults to `h` caused unreadable output if set to `v`. + - if both `x` and `y` are provided and one of them does not contain continuous values, `orientation` defaults to the value perpendicular to that axis. Previously it always defaulted to `v` but this is not considered a breaking change, as the cases in which it now defaults to `h` caused unreadable output if set to `v`. + - if either `x` or `y` (but not both) may now be provided as a list of column references into `data_frame` or columns of data, in which case the imputed data frame will be treated as "wide" data and `melt()`ed internally before applying the usual mapping rules, with function-specific defaults. + - if neither `x` nor `y` is provided but `data_frame` is, the data frame will be treated as "wide" with defaults depending on the value of `orientation` (and `orientation` has accordingly been added to `scatter`, `line`, `density_heatmap`, and `density_contour` for this purpose). Previously this would have resulted in an empty figure. + - if both `x` and `y` are provided to `histogram`, and if `x`, `y` and `z` are provided to `density_heatmap` or `density_contour`, then `histfunc` now defaults to `sum` so as to avoid ignoring the provided data, and to cause `histogram` and `bar` to behave more similarly. + - `violinmode`, `boxmode` and `stripmode` now default to `overlay` if `x` (`y`) in in `v` (`h`) orientation is also mapped to `color`, to avoid strange spacing issues with the previous default of `group` in all cases. +- The Plotly Express arguments `color_discrete_map`, `symbol_map` and `line_dash_map` now accept the string `"identity"` which causes the corresponding input data to be used as-is rather than mapped into `color_discrete_sequence`, `symbol_sequence` or `line_dash_sequence`, respectively. ([#2336](https://github.com/plotly/plotly.py/pull/2336)) +- Plotly Express now accepts `px.Constant` or `px.Range` objects in the place of column references so as to express constant or increasing integer values. ([#2336](https://github.com/plotly/plotly.py/pull/2336)) + + ## [4.7.1] - 2020-05-08 ### Fixed diff --git a/binder/requirements.txt b/binder/requirements.txt index 9e5793b32e..0721954403 100644 --- a/binder/requirements.txt +++ b/binder/requirements.txt @@ -2,7 +2,7 @@ jupytext plotly==4.7.0 jupyter notebook -pandas +pandas==1.0.3 statsmodels==0.10.1 scipy patsy==0.5.1 diff --git a/doc/python/2D-Histogram.md b/doc/python/2D-Histogram.md index 8150ab2936..8f52f3d246 100644 --- a/doc/python/2D-Histogram.md +++ b/doc/python/2D-Histogram.md @@ -42,7 +42,7 @@ A 2D histogram, also known as a density heatmap, is the 2-dimensional generaliza ## Density Heatmaps with Plotly Express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). The Plotly Express function `density_heatmap()` can be used to produce density heatmaps. +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). The Plotly Express function `density_heatmap()` can be used to produce density heatmaps. ```python import plotly.express as px diff --git a/doc/python/2d-histogram-contour.md b/doc/python/2d-histogram-contour.md index 3595591148..50221b18f9 100644 --- a/doc/python/2d-histogram-contour.md +++ b/doc/python/2d-histogram-contour.md @@ -40,7 +40,7 @@ A 2D histogram contour plot, also known as a density contour plot, is a 2-dimens ## Density Contours with Plotly Express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). The Plotly Express function `density_contour()` can be used to produce density contours. +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). The Plotly Express function `density_contour()` can be used to produce density contours. ```python import plotly.express as px diff --git a/doc/python/3d-scatter-plots.md b/doc/python/3d-scatter-plots.md index eedc437f71..2140e4ce90 100644 --- a/doc/python/3d-scatter-plots.md +++ b/doc/python/3d-scatter-plots.md @@ -35,7 +35,7 @@ jupyter: ## 3D scatter plot with Plotly Express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). Like the [2D scatter plot](https://plotly.com/python/line-and-scatter/) `px.scatter`, the 3D function `px.scatter_3d` plots individual data in three-dimensional space. diff --git a/doc/python/bar-charts.md b/doc/python/bar-charts.md index c137c0b213..e63dc20d61 100644 --- a/doc/python/bar-charts.md +++ b/doc/python/bar-charts.md @@ -35,7 +35,7 @@ jupyter: ### Bar chart with Plotly Express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). With `px.bar`, each row of the DataFrame is represented as a rectangular mark. diff --git a/doc/python/box-plots.md b/doc/python/box-plots.md index 55952b6727..694198ee7c 100644 --- a/doc/python/box-plots.md +++ b/doc/python/box-plots.md @@ -40,7 +40,7 @@ A [box plot](https://en.wikipedia.org/wiki/Box_plot) is a statistical representa ## Box Plot with `plotly.express` -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). In a box plot created by `px.box`, the distribution of the column given as `y` argument is represented. diff --git a/doc/python/bubble-charts.md b/doc/python/bubble-charts.md index 348825851d..5d3309e7cc 100644 --- a/doc/python/bubble-charts.md +++ b/doc/python/bubble-charts.md @@ -38,7 +38,7 @@ jupyter: A [bubble chart](https://en.wikipedia.org/wiki/Bubble_chart) is a scatter plot in which a third dimension of the data is shown through the size of markers. For other types of scatter plot, see the [line and scatter page](https://plotly.com/python/line-and-scatter/). -We first show a bubble chart example using Plotly Express. [Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). The size of markers is set from the dataframe column given as the `size` parameter. +We first show a bubble chart example using Plotly Express. [Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). The size of markers is set from the dataframe column given as the `size` parameter. ```python import plotly.express as px diff --git a/doc/python/bubble-maps.md b/doc/python/bubble-maps.md index d9c1b78a20..6c72e2f197 100644 --- a/doc/python/bubble-maps.md +++ b/doc/python/bubble-maps.md @@ -39,7 +39,7 @@ Plotly figures made with `px.scatter_geo`, `px.line_geo` or `px.choropleth` func ### Bubble map with Plotly Express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). With `px.scatter_geo`, each line of the dataframe is represented as a marker point. The column set as the `size` argument gives the size of markers. +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). With `px.scatter_geo`, each line of the dataframe is represented as a marker point. The column set as the `size` argument gives the size of markers. ```python import plotly.express as px diff --git a/doc/python/choropleth-maps.md b/doc/python/choropleth-maps.md index bd3535e2e6..67ccb580f6 100644 --- a/doc/python/choropleth-maps.md +++ b/doc/python/choropleth-maps.md @@ -56,7 +56,7 @@ The GeoJSON data is passed to the `geojson` argument, and the data is passed int ### Choropleth Map with plotly.express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). #### GeoJSON with `feature.id` @@ -208,7 +208,6 @@ fig.show() ```python import plotly.graph_objects as go -# Load data frame and tidy it. import pandas as pd df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2011_us_ag_exports.csv') diff --git a/doc/python/cufflinks.md b/doc/python/cufflinks.md deleted file mode 100644 index 5683bfcd42..0000000000 --- a/doc/python/cufflinks.md +++ /dev/null @@ -1,166 +0,0 @@ ---- -jupyter: - jupytext: - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: markdown - format_version: "1.2" - jupytext_version: 1.3.1 - kernelspec: - display_name: Python 3 - language: python - name: python3 - language_info: - codemirror_mode: - name: ipython - version: 3 - file_extension: .py - mimetype: text/x-python - name: python - nbconvert_exporter: python - pygments_lexer: ipython3 - version: 3.6.8 - plotly: - description: - Cufflinks is a third-party wrapper library around Plotly, inspired by the Pandas .plot() API. - display_as: file_settings - language: python - layout: base - name: Cufflinks - order: 31 - permalink: python/cufflinks/ - thumbnail: thumbnail/plotly-express.png ---- - -### Introduction - -[Cufflinks](https://github.com/santosjorge/cufflinks) is a third-party wrapper library around Plotly, maintained by [Santos Jorge](https://github.com/santosjorge). - -When you import cufflinks, all [Pandas](https://pandas.pydata.org/) data frames and series objects have a new method attached to them called `.iplot()` which has a similar API to Pandas' built-in `.plot()` method. - -By passing the `asFigure=True` argument to `.iplot()`, Cufflinks works similarly to [Plotly Express](/python/plotly-express/), by returning [customizable `go.Figure` objects](/python/styling-plotly-express/) which are compatible with [Dash](https://dash.plot.ly)'s [`dcc.Graph` component](https://dash.plotly.com/dash-core-components/graph). Cufflinks also adds a `.figure()` method which has the same signature as `.iplot()` except that it has `asFigure=True` set as the default. - -This page shows some high-level examples of how to use Cufflinks, and more examples and documentation are available in the [Cufflinks Github repository](https://github.com/santosjorge/cufflinks). - -> Issues and questions regarding Cufflinks should be [raised in the Cufflinks repository](https://github.com/santosjorge/cufflinks/issues/new). - -```python -import cufflinks as cf -import pandas as pd -import numpy as np - -df = pd.DataFrame(np.random.randn(1000, 2), columns=['A', 'B']).cumsum() -fig = df.iplot(asFigure=True, xTitle="The X Axis", - yTitle="The Y Axis", title="The Figure Title") -fig.show() -``` - -Cufflinks has a `datagen` module for generating demo data. - -```python -import cufflinks as cf - -df = cf.datagen.lines() -fig = df.iplot(asFigure=True) -fig.show() -df.head() -``` - -### Scatter Plots - -```python -import cufflinks as cf -import pandas as pd -import numpy as np - -df = pd.DataFrame(np.random.randn(1000, 2), columns=['A', 'B']).cumsum() -fig = df.iplot(asFigure=True, x='A', y='B', mode='markers') -fig.show() -``` - -### Bar Charts - -```python -import cufflinks as cf -import pandas as pd -df = pd.DataFrame(np.random.rand(10, 4), columns=['A', 'B', 'C', 'D']) -fig = df.iplot(asFigure=True, kind="bar") -fig.show() -``` - -### Histograms - -```python -import cufflinks as cf -import pandas as pd -df = pd.DataFrame({'a': np.random.randn(1000) + 1, - 'b': np.random.randn(1000), - 'c': np.random.randn(1000) - 1}) - -fig = df.iplot(asFigure=True, kind="histogram") -fig.show() -``` - -### Box Plots - -```python -import cufflinks as cf -import pandas as pd -df = pd.DataFrame({'a': np.random.randn(1000) + 1, - 'b': np.random.randn(1000), - 'c': np.random.randn(1000) - 1}) - -fig = df.iplot(asFigure=True, kind="box") -fig.show() -``` - -### Subplots - -```python -import cufflinks as cf - -df=cf.datagen.lines(4) -fig = df.iplot(asFigure=True, subplots=True, shape=(4,1), shared_xaxes=True, fill=True) -fig.show() -``` - -```python -import cufflinks as cf - -df=cf.datagen.lines(4) -fig = df.iplot(asFigure=True, subplots=True, subplot_titles=True, legend=False) -fig.show() -``` - -### Line and Box Annotations - -```python -import cufflinks as cf - -df=cf.datagen.lines(4) -fig = df.iplot(asFigure=True, hline=[2,4], vline=['2015-02-10']) -fig.show() -``` - -```python -import cufflinks as cf - -df=cf.datagen.lines(4) -fig = df.iplot(asFigure=True, hspan=[(-1,1),(2,5)]) -fig.show() -``` - -```python -import cufflinks as cf - -df=cf.datagen.lines(4) -fig = df.iplot(asFigure=True, - vspan={'x0':'2015-02-15','x1':'2015-03-15', - 'color':'rgba(30,30,30,0.3)','fill':True,'opacity':.4}) -fig.show() -``` - -### More Examples - -More documentation and examples for Cufflinks can be found in its [Github repository](https://github.com/santosjorge/cufflinks). diff --git a/doc/python/discrete-color.md b/doc/python/discrete-color.md index 1afd6b2fbc..5177bfb5be 100644 --- a/doc/python/discrete-color.md +++ b/doc/python/discrete-color.md @@ -6,7 +6,7 @@ jupyter: extension: .md format_name: markdown format_version: '1.2' - jupytext_version: 1.3.1 + jupytext_version: 1.4.2 kernelspec: display_name: Python 3 language: python @@ -20,7 +20,7 @@ jupyter: name: python nbconvert_exporter: python pygments_lexer: ipython3 - version: 3.6.8 + version: 3.7.7 plotly: description: How to use and configure discrete color sequences, also known as categorical or qualitative color scales. @@ -178,6 +178,15 @@ fig = px.bar(df, y="continent", x="pop", color="continent", orientation="h", hov fig.show() ``` +If your data set already contains valid CSS colors which you wish to use directly, you can pass the special value `"identity"` to `color_discrete_map`, in which case the legend is hidden by default, and the color does not appear in the hover label: + +```python +import plotly.express as px + +fig = px.bar(x=["a","b","c"], y=[1,3,2], color=["red", "goldenrod", "#00D"], color_discrete_map="identity") +fig.show() +``` + ### Controlling Discrete Color Order Plotly Express lets you specify an ordering over categorical variables with `category_orders`, which will apply to colors and legends as well as symbols, [axes](/python/axes/) and [facets](/python/facet-plots/). This can be used with either `color_discrete_sequence` or `color_discrete_map`. diff --git a/doc/python/distplot.md b/doc/python/distplot.md index 9e58ebc804..aba7cfe9bd 100644 --- a/doc/python/distplot.md +++ b/doc/python/distplot.md @@ -37,7 +37,7 @@ jupyter: Several representations of statistical distributions are available in plotly, such as [histograms](https://plotly.com/python/histograms/), [violin plots](https://plotly.com/python/violin/), [box plots](https://plotly.com/python/box-plots/) (see [the complete list here](https://plotly.com/python/statistical-charts/)). It is also possible to combine several representations in the same plot. -For example, the `plotly.express` function `px.histogram` can add a subplot with a different statistical representation than the histogram, given by the parameter `marginal`. [Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +For example, the `plotly.express` function `px.histogram` can add a subplot with a different statistical representation than the histogram, given by the parameter `marginal`. [Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). ```python import plotly.express as px diff --git a/doc/python/dot-plots.md b/doc/python/dot-plots.md index bb212eb406..70d1b773de 100644 --- a/doc/python/dot-plots.md +++ b/doc/python/dot-plots.md @@ -37,9 +37,9 @@ jupyter: Dot plots (also known as [Cleveland dot plots]()) show changes between two (or more) points in time or between two (or more) conditions. Compared to a [bar chart](/python/bar-charts/), dot plots can be less cluttered and allow for an easier comparison between conditions. -For the same data, we show below how to create a dot plot using either `px.scatter` (for a tidy pandas DataFrame) or `go.Scatter`. +For the same data, we show below how to create a dot plot using either `px.scatter` or `go.Scatter`. -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). ```python import plotly.express as px diff --git a/doc/python/error-bars.md b/doc/python/error-bars.md index bd3cf91b6a..4dcfb10145 100644 --- a/doc/python/error-bars.md +++ b/doc/python/error-bars.md @@ -35,7 +35,7 @@ jupyter: ### Error Bars with Plotly Express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). For functions representing 2D data points such as [`px.scatter`](https://plotly.com/python/line-and-scatter/), [`px.line`](https://plotly.com/python/line-charts/), [`px.bar`](https://plotly.com/python/bar-charts/) etc., error bars are given as a column name which is the value of the `error_x` (for the error on x position) and `error_y` (for the error on y position). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). For functions representing 2D data points such as [`px.scatter`](https://plotly.com/python/line-and-scatter/), [`px.line`](https://plotly.com/python/line-charts/), [`px.bar`](https://plotly.com/python/bar-charts/) etc., error bars are given as a column name which is the value of the `error_x` (for the error on x position) and `error_y` (for the error on y position). ```python import plotly.express as px diff --git a/doc/python/filled-area-plots.md b/doc/python/filled-area-plots.md index 347fbeb1ef..63f578c741 100644 --- a/doc/python/filled-area-plots.md +++ b/doc/python/filled-area-plots.md @@ -37,7 +37,7 @@ This example shows how to fill the area enclosed by traces. ## Filled area plot with plotly.express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). `px.area` creates a stacked area plot. Each filled area corresponds to one value of the column given by the `line_group` parameter. diff --git a/doc/python/funnel-charts.md b/doc/python/funnel-charts.md index cf9c6898f9..ad6e5b6766 100644 --- a/doc/python/funnel-charts.md +++ b/doc/python/funnel-charts.md @@ -30,7 +30,7 @@ Funnel charts are often used to represent data in different stages of a business ### Basic Funnel Plot with plotly.express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). With `px.funnel`, each row of the DataFrame is represented as a stage of the funnel. diff --git a/doc/python/heatmaps.md b/doc/python/heatmaps.md index a4d401698f..a92eb30a8b 100644 --- a/doc/python/heatmaps.md +++ b/doc/python/heatmaps.md @@ -36,7 +36,7 @@ jupyter: ### Heatmap with `plotly.express` and `px.imshow` -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). With `px.imshow`, each value of the input array is represented as a heatmap pixel. +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). With `px.imshow`, each value of the input array is represented as a heatmap pixel. For more examples using `px.imshow`, see the [tutorial on displaying image data with plotly](/python/imshow). diff --git a/doc/python/histograms.md b/doc/python/histograms.md index ea60960f3a..b40bf2f266 100644 --- a/doc/python/histograms.md +++ b/doc/python/histograms.md @@ -41,7 +41,7 @@ bar, go to the [Bar Chart tutorial](/python/bar-charts/). ## Histogram with Plotly Express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). ```python import plotly.express as px diff --git a/doc/python/horizontal-bar-charts.md b/doc/python/horizontal-bar-charts.md index d80525f3c5..2578a5c7b9 100644 --- a/doc/python/horizontal-bar-charts.md +++ b/doc/python/horizontal-bar-charts.md @@ -37,7 +37,7 @@ See more examples of bar charts (including vertical bar charts) and styling opti ### Horizontal Bar Chart with Plotly Express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). For a horizontal bar char, use the `px.bar` function with `orientation='h'`. +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). For a horizontal bar char, use the `px.bar` function with `orientation='h'`. #### Basic Horizontal Bar Chart with Plotly Express @@ -64,7 +64,7 @@ fig.show() ### Horizontal Bar Chart with go.Bar -When data are not available as a tidy dataframe, you can use the more generic function `go.Bar` from `plotly.graph_objects`. All the options of `go.Bar` are documented in the reference https://plotly.com/python/reference/#bar +You can also use the more generic function `go.Bar` from `plotly.graph_objects`. All the options of `go.Bar` are documented in the reference https://plotly.com/python/reference/#bar #### Basic Horizontal Bar Chart diff --git a/doc/python/line-and-scatter.md b/doc/python/line-and-scatter.md index 1bd9fe4fdc..28cfbe4cf4 100644 --- a/doc/python/line-and-scatter.md +++ b/doc/python/line-and-scatter.md @@ -36,7 +36,7 @@ jupyter: ## Scatter plot with Plotly Express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). With `px.scatter`, each data point is represented as a marker point, whose location is given by the `x` and `y` columns. diff --git a/doc/python/line-charts.md b/doc/python/line-charts.md index 31c11a8dfa..dcc9c83150 100644 --- a/doc/python/line-charts.md +++ b/doc/python/line-charts.md @@ -37,7 +37,7 @@ jupyter: ### Line Plot with plotly.express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). With `px.line`, each data point is represented as a vertex (which location is given by the `x` and `y` columns) of a **polyline mark** in 2D space. +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). With `px.line`, each data point is represented as a vertex (which location is given by the `x` and `y` columns) of a **polyline mark** in 2D space. For more examples of line plots, see the [line and scatter notebook](https://plotly.com/python/line-and-scatter/). diff --git a/doc/python/linear-fits.md b/doc/python/linear-fits.md index 7d1c970efc..75e27c93e5 100644 --- a/doc/python/linear-fits.md +++ b/doc/python/linear-fits.md @@ -37,7 +37,7 @@ jupyter: ### Linear fit trendlines with Plotly Express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). Plotly Express allows you to add [Ordinary Least](https://en.wikipedia.org/wiki/Ordinary_least_squares) Squares regression trendline to scatterplots with the `trendline` argument. In order to do so, you will need to install `statsmodels` and its dependencies. Hovering over the trendline will show the equation of the line and its R-squared value. diff --git a/doc/python/lines-on-maps.md b/doc/python/lines-on-maps.md index f224d37e77..5f00ee0a95 100644 --- a/doc/python/lines-on-maps.md +++ b/doc/python/lines-on-maps.md @@ -41,7 +41,7 @@ Plotly figures made with `px.scatter_geo`, `px.line_geo` or `px.choropleth` func ## Lines on Maps with Plotly Express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). ```python import plotly.express as px @@ -112,11 +112,11 @@ fig.update_layout( fig.show() ``` ### Performance improvement: put many lines in the same trace -For very large amounts (>1000) of lines, performance may become critcal. If you can relinquish setting individual line styles (e.g. opacity), you can put multiple paths into one trace. This makes the map render faster and reduces the script execution time and memory consumption. +For very large amounts (>1000) of lines, performance may become critcal. If you can relinquish setting individual line styles (e.g. opacity), you can put multiple paths into one trace. This makes the map render faster and reduces the script execution time and memory consumption. -Use ```None``` between path coordinates to create a break in the otherwise connected paths. +Use ```None``` between path coordinates to create a break in the otherwise connected paths. -```python +```python import plotly.graph_objects as go import pandas as pd @@ -164,7 +164,7 @@ fig.add_trace( lat = lats, mode = 'lines', line = dict(width = 1,color = 'red'), - opacity = 0.5 + opacity = 0.5 ) ) diff --git a/doc/python/mapbox-county-choropleth.md b/doc/python/mapbox-county-choropleth.md index a6d73afe76..d6484a9d8a 100644 --- a/doc/python/mapbox-county-choropleth.md +++ b/doc/python/mapbox-county-choropleth.md @@ -80,7 +80,7 @@ df.head() ### Choropleth map using plotly.express and carto base map (no token needed) -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). With `px.choropleth_mapbox`, each row of the DataFrame is represented as a region of the choropleth. diff --git a/doc/python/mapbox-density-heatmaps.md b/doc/python/mapbox-density-heatmaps.md index ab96b9fd3c..52e32eb46a 100644 --- a/doc/python/mapbox-density-heatmaps.md +++ b/doc/python/mapbox-density-heatmaps.md @@ -39,7 +39,7 @@ To plot on Mapbox maps with Plotly you _may_ need a Mapbox account and a public ### Stamen Terrain base map (no token needed): density mapbox with `plotly.express` -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). With `px.density_mapbox`, each row of the DataFrame is represented as a point smoothed with a given radius of influence. diff --git a/doc/python/pandas-backend.md b/doc/python/pandas-backend.md new file mode 100644 index 0000000000..17eb84a711 --- /dev/null +++ b/doc/python/pandas-backend.md @@ -0,0 +1,201 @@ +--- +jupyter: + jupytext: + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: markdown + format_version: '1.2' + jupytext_version: 1.4.2 + kernelspec: + display_name: Python 3 + language: python + name: python3 + language_info: + codemirror_mode: + name: ipython + version: 3 + file_extension: .py + mimetype: text/x-python + name: python + nbconvert_exporter: python + pygments_lexer: ipython3 + version: 3.7.7 + plotly: + description: Cufflinks is a third-party wrapper library around Plotly, inspired + by the Pandas .plot() API. + display_as: file_settings + language: python + layout: base + name: Pandas Plotting Backend + order: 31 + permalink: python/pandas-backend/ + redirect_from: python/cufflinks/ + thumbnail: thumbnail/plotly-express.png +--- + +### Introduction + +The popular [Pandas](https://pandas.pydata.org/) data analysis and manipulation tool provides plotting functions on its `DataFrame` and `Series` objects, which have historically produced `matplotlib` plots. Since version 0.25, Pandas has provided a mechanism to use different backends, and as of version 4.8 of `plotly`, you can now use a [Plotly Express-powered](/python/plotly-express/) backend for Pandas plotting. + +To activate it, you just need to set `pd.options.plotting.backend` to `"plotly"` and call `.plot()` to get a `plotly.graph_objects.Figure` object back, just like if you had called Plotly Express directly: + +```python +import pandas as pd +pd.options.plotting.backend = "plotly" + +df = pd.DataFrame(dict(a=[1,3,2], b=[3,2,1])) +fig = df.plot() +fig.show() +``` + +This functionality wraps [Plotly Express](/python/plotly-express/) and so you can use any of the [styling options available to Plotly Express methods](/python/styling-plotly-expres/). Since what you get back is a regular `Figure` object, you can use any of the update mechanisms supported by these objects to apply [templates](/python/templates/) or further customize [axes](/python/axes/), [colors](/python/colorscales/), [legends](/python/legend/), [fonts](/python/figure-labels/), [hover labels](/python/hover-text-and-formatting/) etc. [Faceting](/python/facet-plots/) is also supported. + +```python +import pandas as pd +pd.options.plotting.backend = "plotly" + +df = pd.DataFrame(dict(a=[1,3,2], b=[3,2,1])) +fig = df.plot(title="Pandas Backend Example", template="simple_white", + labels=dict(index="time", value="money", variable="option")) +fig.update_yaxes(tickprefix="$") +fig.show() +``` + +### A Note on API Compatibility + +> The Plotly plotting backend for Pandas is *not intended* to be a drop-in replacement for the default; it does not implement all or even most of the same keyword arguments, such as `subplots=True` etc. + +The Plotly plotting backend for Pandas is a more convenient way to invoke certain [Plotly Express](/python/plotly-express/) functions by chaining a `.plot()` call without having to import Plotly Express directly. Plotly Express, as of version 4.8 with [wide-form data support](/python/wide-form/) implements behaviour for the `x` and `y` keywords that are very simlar to the `matplotlib` backend. + +In practice, this means that the following two ways of making a chart are identical and support the same additional arguments, because they call the same underlying code: + +```python +import pandas as pd +pd.options.plotting.backend = "plotly" +df = pd.DataFrame(dict(a=[1,3,2], b=[3,2,1])) + +# using Plotly Express via the Pandas backend +fig1 = df.plot.bar() +fig1.show() + +# using Plotly Express directly +import plotly.express as px +fig2 = px.bar(df) +fig2.show() +``` + +To achieve a similar effect to `subplots=True`, the [Plotly Express `facet_row` and `facet_col` options](/python/facet-plots/) can be used, the same was as they work when directly calling [Plotly Express with wide-form data](/python/wide-form/): + +```python +import pandas as pd +pd.options.plotting.backend = "plotly" +df = pd.DataFrame(dict(a=[1,3,2], b=[3,2,1])) + +fig = df.plot.bar(facet_row="variable") +fig.show() +``` + +### Supported Methods + +The Plotly backend supports the following `kind`s of Pandas plots: `scatter`, `line`, `area`, `bar`, `barh`, `hist` and `box`, via the call pattern `df.plot(kind='scatter')` or `df.plot.scatter()`. + +```python +import pandas as pd +import numpy as np +pd.options.plotting.backend = "plotly" +np.random.seed(1) + +df = pd.DataFrame(dict( + a=np.random.normal(loc=1, scale=2, size=100), + b=np.random.normal(loc=2, scale=1, size=100) +)) +fig = df.plot.scatter(x="a", y="b") +fig.show() +``` + +```python +import pandas as pd +pd.options.plotting.backend = "plotly" + +df = pd.DataFrame(dict(a=[1,3,2], b=[3,2,1])) +fig = df.plot.line() +fig.show() +``` + +```python +import pandas as pd +pd.options.plotting.backend = "plotly" + +df = pd.DataFrame(dict(a=[1,3,2], b=[3,2,1])) +fig = df.plot.area() +fig.show() +``` + +```python +import pandas as pd +pd.options.plotting.backend = "plotly" + +df = pd.DataFrame(dict(a=[1,3,2], b=[3,2,1])) +fig = df.plot.bar() +fig.show() +``` + +```python +import pandas as pd +pd.options.plotting.backend = "plotly" + +df = pd.DataFrame(dict(a=[1,3,2], b=[3,2,1])) +fig = df.plot.barh() +fig.show() +``` + +```python +import pandas as pd +import numpy as np +pd.options.plotting.backend = "plotly" +np.random.seed(1) + +df = pd.DataFrame(dict( + a=np.random.normal(loc=1, scale=2, size=100), + b=np.random.normal(loc=2, scale=1, size=100) +)) +fig = df.plot.hist() +fig.show() +``` + +```python +import pandas as pd +import numpy as np +pd.options.plotting.backend = "plotly" +np.random.seed(1) + +df = pd.DataFrame(dict( + a=np.random.normal(loc=1, scale=2, size=100), + b=np.random.normal(loc=2, scale=1, size=100) +)) +fig = df.plot.box() +fig.show() +``` + +### `Series` and `DataFrame` functions: `hist` and `boxplot` + +The Pandas plotting API also exposes `.hist()` on `DataFrame`s and `Series` objects, and `.boxplot()` on `DataFrames`, which can also be used with the Plotly backend. + +```python +import pandas as pd +import numpy as np +pd.options.plotting.backend = "plotly" +np.random.seed(1) + +df = pd.DataFrame(dict( + a=np.random.normal(loc=1, scale=2, size=100), + b=np.random.normal(loc=2, scale=1, size=100) +)) +fig = df.boxplot() +fig.show() +``` + +### What about Cufflinks? + +There also exists an independent third-party wrapper library around Plotly called [Cufflinks](https://github.com/santosjorge/cufflinks), which provides similar functionality (with an API closer to that of Pandas' default `matplotlib` backend) by adding a `.iplot()` method to Pandas dataframes, as it was developed before Pandas supported configurable backends. Issues and questions regarding Cufflinks should be [raised in the Cufflinks repository](https://github.com/santosjorge/cufflinks/issues/new). diff --git a/doc/python/parallel-coordinates-plot.md b/doc/python/parallel-coordinates-plot.md index 034f372e0f..327f9702d0 100644 --- a/doc/python/parallel-coordinates-plot.md +++ b/doc/python/parallel-coordinates-plot.md @@ -37,7 +37,7 @@ jupyter: ## Parallel Coordinates plot with Plotly Express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). In a parallel coordinates plot with `px.parallel_coordinates`, each row of the DataFrame is represented by a polyline mark which traverses a set of parallel axes, one for each of the dimensions. For other representations of multivariate data, also see [parallel categories](/python/parallel-categories-diagram/), [radar charts](/python/radar-chart/) and [scatterplot matrix (SPLOM)](/python/splom/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). In a parallel coordinates plot with `px.parallel_coordinates`, each row of the DataFrame is represented by a polyline mark which traverses a set of parallel axes, one for each of the dimensions. For other representations of multivariate data, also see [parallel categories](/python/parallel-categories-diagram/), [radar charts](/python/radar-chart/) and [scatterplot matrix (SPLOM)](/python/splom/). ```python import plotly.express as px diff --git a/doc/python/pie-charts.md b/doc/python/pie-charts.md index dce6a8dea1..cd9ffbf348 100644 --- a/doc/python/pie-charts.md +++ b/doc/python/pie-charts.md @@ -40,7 +40,7 @@ If you're looking instead for a multilevel hierarchical pie-like chart, go to th ### Pie chart with plotly express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). In `px.pie`, data visualized by the sectors of the pie is set in `values`. The sector labels are set in `names`. diff --git a/doc/python/plotly-express.md b/doc/python/plotly-express.md index e0396d9c21..d46a2c3733 100644 --- a/doc/python/plotly-express.md +++ b/doc/python/plotly-express.md @@ -37,7 +37,7 @@ jupyter: ### Plotly Express -Plotly Express is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). Every Plotly Express function returns a `graph_objects.Figure` object whose `data` and `layout` has been pre-populated according to the provided arguments. +Plotly Express is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). Every Plotly Express function returns a `graph_objects.Figure` object whose `data` and `layout` has been pre-populated according to the provided arguments. > **Note**: Plotly Express was previously its own separately-installed `plotly_express` package but is now part of `plotly` and importable via `import plotly.express as px`. diff --git a/doc/python/polar-chart.md b/doc/python/polar-chart.md index ad8b8671ea..3430fe6b51 100644 --- a/doc/python/polar-chart.md +++ b/doc/python/polar-chart.md @@ -37,7 +37,7 @@ jupyter: A polar chart represents data along radial and angular axes. With Plotly Express, it is possible to represent polar data as scatter markers with `px.scatter_polar`, and as lines with `px.line_polar`. -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). For other types of arguments, see the section below using `go.Scatterpolar`. diff --git a/doc/python/px-arguments.md b/doc/python/px-arguments.md index 9cb7827738..4b140759ab 100644 --- a/doc/python/px-arguments.md +++ b/doc/python/px-arguments.md @@ -5,8 +5,8 @@ jupyter: text_representation: extension: .md format_name: markdown - format_version: "1.1" - jupytext_version: 1.1.1 + format_version: '1.2' + jupytext_version: 1.4.2 kernelspec: display_name: Python 3 language: python @@ -20,9 +20,9 @@ jupyter: name: python nbconvert_exporter: python pygments_lexer: ipython3 - version: 3.6.8 + version: 3.7.7 plotly: - description: Arguments accepted by Plotly Express functions + description: Input data arguments accepted by Plotly Express functions display_as: file_settings language: python layout: base @@ -33,41 +33,73 @@ jupyter: thumbnail: thumbnail/plotly-express.png --- -### Tidy Data +### Column-oriented, Matrix or Geographic Data -[Plotly Express](/python/plotly-express) operates on "tidy" or "long" data rather than "wide" data. You may pass data in either as a Pandas `DataFrame` objects or as individual array-like objects which `px` will assemble into a data frame internally, such as lists, `numpy` arrays or Pandas `Series` objects. +Plotly Express provides functions to visualize a variety of types of data. Most functions such as `px.bar` or `px.scatter` expect to operate on column-oriented data of the type you might store in a Pandas `DataFrame` (in either "long" or "wide" format, see below). [`px.imshow` operates on matrix-like data](/python/imshow/) you might store in a `numpy` or `xarray` array and functions like [`px.choropleth` and `px.choropleth_mapbox` can operate on geographic data](/python/maps/) of the kind you might store in a GeoPandas `GeoDataFrame`. This page details how to provide column-oriented data to most Plotly Express functions. -What follows is a very short example of the difference between wide and tidy/long data, and the excellent [Tidy Data in Python blog post](https://www.jeannicholashould.com/tidy-data-in-python.html) contains much more information about the tidy approach to structuring data. + + +### Long-, Wide-, and Mixed-Form Data + +*Until version 4.8, Plotly Express only operated on long-form (previously called "tidy") data, but [now accepts wide-form and mixed-form data](/python/wide-form/) as well.* + +There are three common conventions for storing column-oriented data, usually in a data frame with column names: + +* **long-form data** is suitable for storing multivariate data (i.e. dimensions greater than 2), with one row per observation, and one column per variable. +* **wide-form data** is suitable for storing 2-dimensional data, with one row per value of one of the first variable, and one column per value of the second variable. +* **mixed-form data** is a hybrid of long-form and wide-form data, with one row per value of one variable, and some columns representing values of another, and some columns representing more variables (see our [wide-form documentation](/python/wide-form/) for examples of how to use Plotly Express to visualize this kind of data) + +All Plotly Express functions can operate on long-form data, and the following 2D-Cartesian functions can operate on wide-form data as well:: `px.scatter`, `px.line`, `px.area`, `px.bar`, `px.histogram`, `px.violin`, `px.box`, `px.strip`, `px.funnel`, `px.density_heatmap` and `px.density_contour`. Read on for a short example of the differences between these forms, or check out our [detailed documentation about wide-form support](/python/wide-form/). + +By way of example here is the same data, represented in long-form first, and then in wide-form: ```python -import pandas as pd -print("This is 'wide' data, unsuitable as-is for Plotly Express:") -wide_df = pd.DataFrame(dict(Month=["Jan", "Feb", "Mar"], London=[1,2,3], Paris=[3,1,2])) +import plotly.express as px +long_df = px.data.medals_long() +long_df +``` + +```python +import plotly.express as px +wide_df = px.data.medals_wide() wide_df ``` +Plotly Express can produce the same plot from either form: + ```python -import pandas as pd -print("This is the same data in 'long' format, ready for Plotly Express:") -wide_df = pd.DataFrame(dict(Month=["Jan", "Feb", "Mar"], London=[1,2,3], Paris=[3,1,2])) -tidy_df = wide_df.melt(id_vars="Month") -tidy_df +import plotly.express as px +long_df = px.data.medals_long() + +fig = px.bar(long_df, x="nation", y="count", color="medal", title="Long-Form Input") +fig.show() ``` ```python import plotly.express as px -import pandas as pd +wide_df = px.data.medals_wide() + +fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"], title="Wide-Form Input") +fig.show() +``` + +You might notice that y-axis and legend labels are slightly different for the second plot: they are "value" and "variable", respectively, and this is also reflected in the hoverlabel text. This is because Plotly Express performed an [internal Pandas `melt()` operation](https://pandas.pydata.org/docs/reference/api/pandas.melt.html) to convert the wide-form data into long-form for plotting, and used the Pandas convention for assign column names to the intermediate long-form data. Note that the labels "medal" and "count" do not appear in the wide-form data frame, so in this case, you must supply these yourself, or [you can use a data frame with named row- and column-indexes](/python/wide-form/). You can [rename these labels with the `labels` argument](/python/styling-plotly-express/): -wide_df = pd.DataFrame(dict(Month=["Jan", "Feb", "Mar"], London=[1,2,3], Paris=[3,1,2])) -tidy_df = wide_df.melt(id_vars="Month") +```python +import plotly.express as px +wide_df = px.data.medals_wide() -fig = px.bar(tidy_df, x="Month", y="value", color="variable", barmode="group") +fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"], title="Wide-Form Input, relabelled", + labels={"value": "count", "variable": "medal"}) fig.show() ``` -### pandas DataFrame input data +Many more examples of wide-form and messy data input can be found in our [detailed wide-form support documentation](/python/wide-form/). -`px` functions supports natively pandas DataFrame. Arguments can either be passed as dataframe columns, or as column names if the `data_frame` argument is provided. + +### Input Data as Pandas `DataFrame`s + +As shown above, `px` functions supports natively pandas DataFrame. Arguments can either be passed as dataframe columns, or as column names if the `data_frame` argument is provided. #### Passing columns as arguments @@ -101,7 +133,7 @@ fig = px.scatter(df, x=df.sepal_length, y=df.sepal_width, size=df.petal_length, fig.show() ``` -### Columns not in the data_frame argument +### Columns not in the `data_frame` argument In the addition to columns from the `data_frame` argument, one may also pass columns from a different DataFrame, _as long as all columns have the same length_. It is also possible to pass columns without passing the `data_frame` argument. @@ -132,9 +164,9 @@ fig = px.bar(df, x='year', y=gdp, color='continent', labels={'y':'gdp'}, fig.show() ``` -### Using array-like arguments: NumPy arrays, lists... +### Input Data as array-like columns: NumPy arrays, lists... -`px` arguments can also be array-like objects such as lists, NumPy arrays. +`px` arguments can also be array-like objects such as lists, NumPy arrays, in both long-form or wide-form (for certain functions). ```python import plotly.express as px @@ -144,6 +176,16 @@ fig = px.line(x=[1, 2, 3, 4], y=[3, 5, 4, 8]) fig.show() ``` +```python +import plotly.express as px + +# List arguments in wide form +series1 = [3, 5, 4, 8] +series2 = [5, 4, 8, 3] +fig = px.line(x=[1, 2, 3, 4], y=[series1, series2]) +fig.show() +``` + ```python import numpy as np import plotly.express as px diff --git a/doc/python/radar-chart.md b/doc/python/radar-chart.md index 22e021a231..eb753aec3b 100644 --- a/doc/python/radar-chart.md +++ b/doc/python/radar-chart.md @@ -35,11 +35,11 @@ jupyter: A [Radar Chart](https://en.wikipedia.org/wiki/Radar_chart) (also known as a spider plot or star plot) displays multivariate data in the form of a two-dimensional chart of quantitative variables represented on axes originating from the center. The relative position and angle of the axes is typically uninformative. It is equivalent to a [parallel coordinates plot](/python/parallel-coordinates-plot/) with the axes arranged radially. -For a Radar Chart, use a [polar chart](/python/polar-chart/) with categorical angular variables, with `px.line_polar` for data available as a tidy pandas DataFrame, or with `go.Scatterpolar` in the general case. See more examples of [polar charts here](/python/polar-chart/). +For a Radar Chart, use a [polar chart](/python/polar-chart/) with categorical angular variables, with `px.line_polar`, or with `go.Scatterpolar`. See [more examples of polar charts](/python/polar-chart/). #### Radar Chart with Plotly Express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). Use `line_close=True` for closed lines. diff --git a/doc/python/scatter-plots-on-maps.md b/doc/python/scatter-plots-on-maps.md index 6fa0a7bde1..c2130aa74b 100644 --- a/doc/python/scatter-plots-on-maps.md +++ b/doc/python/scatter-plots-on-maps.md @@ -43,7 +43,7 @@ Plotly figures made with `px.scatter_geo`, `px.line_geo` or `px.choropleth` func Here we show the [Plotly Express](/python/plotly-express/) function `px.scatter_geo` for a geographical scatter plot. The `size` argument is used to set the size of markers from a given column of the DataFrame. -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). ```python import plotly.express as px diff --git a/doc/python/scattermapbox.md b/doc/python/scattermapbox.md index e6eff5d373..03789fba8f 100644 --- a/doc/python/scattermapbox.md +++ b/doc/python/scattermapbox.md @@ -41,7 +41,7 @@ To plot on Mapbox maps with Plotly you _may_ need a Mapbox account and a public Here we show the [Plotly Express](/python/plotly-express/) function `px.scatter_mapbox` for a scatter plot on a tile map. -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). ```python import plotly.express as px diff --git a/doc/python/splom.md b/doc/python/splom.md index 0a35089ac3..4176719b1e 100644 --- a/doc/python/splom.md +++ b/doc/python/splom.md @@ -42,7 +42,7 @@ A scatterplot matrix is a matrix associated to n numerical arrays (data variable Here we show the Plotly Express function `px.scatter_matrix` to plot the scatter matrix for the columns of the dataframe. By default, all columns are considered. -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). ```python import plotly.express as px diff --git a/doc/python/styling-plotly-express.md b/doc/python/styling-plotly-express.md index 3151b09f15..c4e98f0cb1 100644 --- a/doc/python/styling-plotly-express.md +++ b/doc/python/styling-plotly-express.md @@ -35,7 +35,7 @@ jupyter: ### Styling Figures made with Plotly Express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/). Every Plotly Express function returns a `graph_objects.Figure` object whose `data` and `layout` has been pre-populated according to the provided arguments. +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/). Every Plotly Express function returns a `graph_objects.Figure` object whose `data` and `layout` has been pre-populated according to the provided arguments. > You can style and customize figures made with Plotly Express _in all the same ways_ as you can style figures made more manually by explicitly assembling `graph_objects` into a figure. diff --git a/doc/python/sunburst-charts.md b/doc/python/sunburst-charts.md index cef5dbafac..a6ec9f5980 100644 --- a/doc/python/sunburst-charts.md +++ b/doc/python/sunburst-charts.md @@ -43,7 +43,7 @@ Main arguments: ### Basic Sunburst Plot with plotly.express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). With `px.sunburst`, each row of the DataFrame is represented as a sector of the sunburst. diff --git a/doc/python/ternary-plots.md b/doc/python/ternary-plots.md index efd01136dc..66f7e7f928 100644 --- a/doc/python/ternary-plots.md +++ b/doc/python/ternary-plots.md @@ -39,7 +39,7 @@ A ternary plot depicts the ratios of three variables as positions in an equilate ## Ternary scatter plot with Plotly Express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). Here we use `px.scatter_ternary` to visualize thre three-way split between the three major candidates in a municipal election. @@ -55,7 +55,7 @@ We can scale and color the markers to produce a ternary bubble chart. ```python import plotly.express as px df = px.data.election() -fig = px.scatter_ternary(df, a="Joly", b="Coderre", c="Bergeron", hover_name="district", +fig = px.scatter_ternary(df, a="Joly", b="Coderre", c="Bergeron", hover_name="district", color="winner", size="total", size_max=15, color_discrete_map = {"Joly": "blue", "Bergeron": "green", "Coderre":"red"} ) fig.show() diff --git a/doc/python/treemaps.md b/doc/python/treemaps.md index 2d73b895bb..33f7df7bf9 100644 --- a/doc/python/treemaps.md +++ b/doc/python/treemaps.md @@ -37,7 +37,7 @@ jupyter: ### Basic Treemap with plotly.express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). With `px.treemap`, each row of the DataFrame is represented as a sector of the treemap. diff --git a/doc/python/violin.md b/doc/python/violin.md index 7f6be074e9..f9e791be93 100644 --- a/doc/python/violin.md +++ b/doc/python/violin.md @@ -42,7 +42,7 @@ See also the [list of other statistical charts](https://plotly.com/python/statis ### Basic Violin Plot with Plotly Express -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). ```python import plotly.express as px diff --git a/doc/python/wide-form.md b/doc/python/wide-form.md new file mode 100644 index 0000000000..9097b4cb6e --- /dev/null +++ b/doc/python/wide-form.md @@ -0,0 +1,305 @@ +--- +jupyter: + jupytext: + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: markdown + format_version: '1.2' + jupytext_version: 1.4.2 + kernelspec: + display_name: Python 3 + language: python + name: python3 + language_info: + codemirror_mode: + name: ipython + version: 3 + file_extension: .py + mimetype: text/x-python + name: python + nbconvert_exporter: python + pygments_lexer: ipython3 + version: 3.7.7 + plotly: + description: Plotly Express' 2D-Cartesian functions accept data in long-, wide-, + and mixed-form. + display_as: file_settings + language: python + layout: base + name: Plotly Express Wide-Form Support + order: 33 + page_type: u-guide + permalink: python/wide-form/ + thumbnail: thumbnail/plotly-express.png +--- + +### Plotly Express with Column-oriented, Matrix or Geographic Data + +Plotly Express provides functions to visualize a variety of types of data. Most functions such as `px.bar` or `px.scatter` expect to operate on column-oriented data of the type you might store in a Pandas `DataFrame` (in either "long" or "wide" format, see below). [`px.imshow` operates on matrix-like data](/python/imshow/) you might store in a `numpy` or `xarray` array and functions like [`px.choropleth` and `px.choropleth_mapbox` can operate on geographic data](/python/maps/) of the kind you might store in a GeoPandas `GeoDataFrame`. This page details how to provide a specific form of column-oriented data to 2D-Cartesian Plotly Express functions, but you can also check out our [detailed column-input-format documentation](/python/px-arguments/). + +### Plotly Express with Long-, Wide-, and Mixed-Form Data + +*Until version 4.8, Plotly Express only operated on long-form (previously called "tidy") data, but now accepts wide-form and mixed-form data as well.* + +There are three common conventions for storing column-oriented data, usually in a data frame with column names: + +* **long-form data** is suitable for storing multivariate data (i.e. dimensions greater than 2), with one row per observation, and one column per variable. +* **wide-form data** is suitable for storing 2-dimensional data, with one row per value of one of the first variable, and one column per value of the second variable. +* **mixed-form data** is a hybrid of long-form and wide-form data, with one row per value of one variable, and some columns representing values of another, and some columns representing more variables + +All Plotly Express functions other than `imshow` can operate on long-form data, and in addition, the following 2D-Cartesian functions can operate on wide-form and mixed-form data: `px.scatter`, `px.line`, `px.area`, `px.bar`, `px.histogram`, `px.violin`, `px.box`, `px.strip`, `px.funnel`, `px.density_heatmap` and `px.density_contour`. + +By way of example here is the same data, represented in long-form first, and then in wide-form: + +```python +import plotly.express as px +long_df = px.data.medals_long() +long_df +``` + +```python +import plotly.express as px +wide_df = px.data.medals_wide() +wide_df +``` + +Plotly Express can produce **the same plot from either form**. For the long-form input, `x` and `y` are set to the respective column names. + +```python +import plotly.express as px +long_df = px.data.medals_long() + +fig = px.bar(long_df, x="nation", y="count", color="medal", title="Long-Form Input") +fig.show() +``` + +For the wide-form input, we **pass in a list of column-names `y`**, which is enough to trigger the wide-form processing mode. Wide-form mode is also the default if neither `x` nor `y` are specified, see section at bottom regarding Wide-Form Defaults. + +```python +import plotly.express as px +wide_df = px.data.medals_wide() + +fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"], title="Wide-Form Input") +fig.show() +``` + +### Labeling axes, legends and hover text + +You might notice that y-axis and legend labels are slightly different for the second plot: they are "value" and "variable", respectively, and this is also reflected in the hoverlabel text. This is because Plotly Express performed an [internal Pandas `melt()` operation](https://pandas.pydata.org/docs/reference/api/pandas.melt.html) to convert the wide-form data into long-form for plotting, and used the Pandas convention for assign column names to the intermediate long-form data. Note that the labels "medal" and "count" do not appear in the wide-form data frame, so in this case, you must supply these yourself, (or see below regarding using a data frame with named row- and column-indexes). You can [rename these labels with the `labels` argument](/python/styling-plotly-express/): + +```python +import plotly.express as px +wide_df = px.data.medals_wide() + +fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"], title="Wide-Form Input, relabelled", + labels={"value": "count", "variable": "medal"}) +fig.show() +``` + +Plotly Express figures created using wide-form data can be [styled just like any other Plotly Express figure](/python/styling-plotly-express/): + +```python +import plotly.express as px +wide_df = px.data.medals_wide() + +fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"], + title="Wide-Form Input, styled", + labels={"value": "Medal Count", "variable": "Medal", "nation": "Olympic Nation"}, + color_discrete_map={"gold": "gold", "silver": "silver", "bronze": "#c96"}, + template="simple_white" + ) +fig.update_layout(font_family="Rockwell", showlegend=False) +fig.show() +``` + +### Data Frames with Named Indexes + +Pandas `DataFrames` support not only column names and "row names" via the value of `index`, but the indexes themselves can be named. Here is how to assign one column of the wide sample data frame above as the index, and to name the column index. The result "indexed" sample data frame can also be obtained by calling `px.data.medals_wide(indexed=True)` + +```python +import plotly.express as px +wide_df = px.data.medals_wide() +wide_df = wide_df.set_index("nation") +wide_df.columns.name = "medals" +wide_df +``` + +When working with a data frame like the one above, you can pass the index references directly as arguments, to benefit from automatic labelling for everything except the y axis label, which will default to "values", but this can be overridden with the `labels` argument as above: + +```python +import plotly.express as px +wide_df = px.data.medals_wide(indexed=True) + +fig = px.bar(wide_df, x=wide_df.index, y=wide_df.columns) +fig.show() +``` + +If you transpose `x` and `y`, thereby assigning the columns to `x`, the orientation will be switched to horizontal: + +```python +import plotly.express as px +wide_df = px.data.medals_wide(indexed=True) + +fig = px.bar(wide_df, x=wide_df.columns, y=wide_df.index) +fig.show() +``` + +### Assigning Inferred Columns to Non-Default Arguments + + +In the examples above, the columns of the wide data frame are assigned by default as an "inferred" column named `variable` to the `color` argument (see section below for documentation of the default behaviours), but this is not a hard constraint. The `variable` column can be assigned to any Plotly Express argument, for example to accomplish faceting, and `color` can be reassigned to any other value. More generally, when plotting with a data frame without named indexes, you can reassign the inferred column named `variable` and `value` to any argument: + +```python +import plotly.express as px +wide_df = px.data.medals_wide(indexed=False) + +fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"], facet_col="variable", color="nation") +fig.show() +``` + +If using a data frame's named indexes, either explicitly or relying on the defaults, the row-index references (i.e. `df.index`) or column-index names (i.e. the value of `df.columns.name`) must be used: + +```python +import plotly.express as px +wide_df = px.data.medals_wide(indexed=True) + +fig = px.bar(wide_df, facet_col="medal", color=wide_df.index) +fig.show() +``` + +### Mixed-Form Data + +In some cases, a data frame is neither clearly long-form nor wide-form, and we can call this "mixed-form". For example, in the data frame below, if it contained only the `experiment` columns, the data could be described as wide-form, and if it contained only `gender` and `group` it could be described as long-form, but it contains both, so it is best described as mixed-form data: + +```python +import plotly.express as px +mixed_df = px.data.experiment(indexed=True) +mixed_df.head() +``` + +We can visualize just the wide-form portion of the data frame easily with a [violin chart](/python/violin/). As a special note, we'll assign the index, which is the participant ID, to the `hover_data`, so that hovering over outlier points will identify their row. + +```python +import plotly.express as px +mixed_df = px.data.experiment(indexed=True) + +fig = px.violin(mixed_df, y=["experiment_1", "experiment_2", "experiment_3"], hover_data=[mixed_df.index]) +fig.show() +``` + + + + +We are not limited to visualizing only the wide-form portion of the data, however. We can also leverage the long-form portion of the data frame, for example to color by participant `gender` and facet by participant `group`, all without having to manipulate the data frame: + +```python +import plotly.express as px +mixed_df = px.data.experiment(indexed=True) + +fig = px.violin(mixed_df, y=["experiment_1", "experiment_2", "experiment_3"], + color="gender", facet_col="group", hover_data=[mixed_df.index]) +fig.show() +``` + +In the plots above, the column names provided to `y` are internally mapped to long-form column called `variable`, as is apparent in the x-axis labels. We can reassign `variable` to another argument as well, in this case we'll assign it to `facet_col` and reassign `group` to the `x` axis. We'll switch to a [box plot](/python/box-plots/) for variety. + +```python +import plotly.express as px +mixed_df = px.data.experiment(indexed=True) + +fig = px.box(mixed_df, x="group", y=["experiment_1", "experiment_2", "experiment_3"], + color="gender", facet_col="variable", hover_data=[mixed_df.index]) +fig.show() +``` + +One interesting thing about a mixed-form data frame like this is that it remains easy to plot, say, one experiment against another, which would require some preliminary data wrangling if this was represented as a pure long-form dataset: + +```python +import plotly.express as px +mixed_df = px.data.experiment(indexed=True) + +fig = px.scatter(mixed_df, x="experiment_1", y="experiment_2", + color="group", facet_col="gender", hover_data=[mixed_df.index]) +fig.show() +``` + +In fact, we can even visualize the results of every experiment against every other, using a [scatterplot matrix](/python/splom/): + +```python +import plotly.express as px +mixed_df = px.data.experiment(indexed=True) + +fig = px.scatter_matrix(mixed_df, dimensions=["experiment_1", "experiment_2", "experiment_3"], color="gender") +fig.show() +``` + +### Wide-Form Defaults + +For bar, scatter, line and area charts, the pattern of assigning `x=df.index`, `y=df.columns`, `color="variable"` is so common that if you provide neither `x` nor `y` this is the default behaviour. An exception is made for bar charts when the values are not continuous variables, in which case the default is similar to histograms, with `x=df.columns`, `color="variable"` and `y=`. + +For violin and box plots, the default is to assign `x=variable`, `y=df.columns` and for histograms the default is `x=df.columns`, `color="variable"` + +These defaults are also filled in if you specify only `y` (`x` for histograms) as a list-of-columns. See below for orientation control. + +```python +import plotly.express as px +wide_df = px.data.medals_wide(indexed=True) + +fig = px.bar(wide_df) +fig.show() + +fig = px.area(wide_df) +fig.show() + +fig = px.line(wide_df) +fig.show() + +fig = px.scatter(wide_df) +fig.show() +``` + +```python +import plotly.express as px + +mixed_df = px.data.experiment(indexed=True) +wide_df = mixed_df[["experiment_1", "experiment_2", "experiment_3"]] + +fig = px.histogram(wide_df) +fig.show() + +fig = px.violin(wide_df) +fig.show() + +fig = px.box(wide_df) +fig.show() +``` + +### Orientation Control When Using Defaults + +If you specify neither `x` nor `y`, you can swap the default behaviour of `x` and `y` by setting `orientation="h"`. + +If you specify only `x` as a list-of-columns (`y` in the case of histograms), then the defaults are filled in as if `orientation="h"` + +```python +import plotly.express as px +wide_df = px.data.medals_wide(indexed=True) + +fig = px.bar(wide_df, orientation="h") +fig.show() + +fig = px.area(wide_df, x=wide_df.columns) +fig.show() + +mixed_df = px.data.experiment(indexed=True) +wide_df = mixed_df[["experiment_1", "experiment_2", "experiment_3"]] + +fig = px.histogram(wide_df, orientation="h") +fig.show() + +fig = px.violin(wide_df, orientation="h") +fig.show() + +fig = px.box(wide_df, orientation="h") +fig.show() +``` diff --git a/doc/python/wind-rose-charts.md b/doc/python/wind-rose-charts.md index 1ed4fda77b..ed1c98deef 100644 --- a/doc/python/wind-rose-charts.md +++ b/doc/python/wind-rose-charts.md @@ -39,7 +39,7 @@ jupyter: A [wind rose chart](https://en.wikipedia.org/wiki/Wind_rose) (also known as a polar bar chart) is a graphical tool used to visualize how wind speed and direction are typically distributed at a given location. You can use the `px.bar_polar` function from Plotly Express as below, otherwise use `go.Barpolar` as explained in the next section. -[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on "tidy" data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). +[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/). ```python import plotly.express as px diff --git a/doc/requirements.txt b/doc/requirements.txt index d28ddc9c15..62874967f1 100644 --- a/doc/requirements.txt +++ b/doc/requirements.txt @@ -2,7 +2,7 @@ plotly==4.7.1 jupytext==1.1.1 jupyter notebook -pandas==0.23.0 +pandas==1.0.3 statsmodels==0.10.1 scipy==1.3.1 patsy==0.5.1 diff --git a/packages/python/plotly/plotly/__init__.py b/packages/python/plotly/plotly/__init__.py index b6e35378c4..118c00dce4 100644 --- a/packages/python/plotly/plotly/__init__.py +++ b/packages/python/plotly/plotly/__init__.py @@ -75,3 +75,75 @@ ], [".version.__version__"], ) + + +def plot(data_frame, kind, **kwargs): + """ + Pandas plotting backend function, not meant to be called directly. + To activate, set pandas.options.plotting.backend="plotly" + See https://github.com/pandas-dev/pandas/blob/master/pandas/plotting/__init__.py + """ + from .express import scatter, line, area, bar, box, histogram + + if kind == "scatter": + new_kwargs = {k: kwargs[k] for k in kwargs if k not in ["s", "c"]} + return scatter(data_frame, **new_kwargs) + if kind == "line": + return line(data_frame, **kwargs) + if kind == "area": + return area(data_frame, **kwargs) + if kind == "bar": + return bar(data_frame, **kwargs) + if kind == "barh": + return bar(data_frame, orientation="h", **kwargs) + if kind == "box": + new_kwargs = {k: kwargs[k] for k in kwargs if k not in ["by"]} + return box(data_frame, **new_kwargs) + if kind in "hist": + new_kwargs = {k: kwargs[k] for k in kwargs if k not in ["by", "bins"]} + return histogram(data_frame, **new_kwargs) + raise NotImplementedError( + "kind='%s' not yet supported for plotting.backend='plotly'" % kind + ) + + +def boxplot_frame(data_frame, **kwargs): + """ + Pandas plotting backend function, not meant to be called directly. + To activate, set pandas.options.plotting.backend="plotly" + See https://github.com/pandas-dev/pandas/blob/master/pandas/plotting/__init__.py + """ + from .express import box + + skip = ["by", "column", "ax", "fontsize", "rot", "grid", "figsize", "layout"] + skip += ["return_type"] + new_kwargs = {k: kwargs[k] for k in kwargs if k not in skip} + return box(data_frame, **new_kwargs) + + +def hist_frame(data_frame, **kwargs): + """ + Pandas plotting backend function, not meant to be called directly. + To activate, set pandas.options.plotting.backend="plotly" + See https://github.com/pandas-dev/pandas/blob/master/pandas/plotting/__init__.py + """ + from .express import histogram + + skip = ["column", "by", "grid", "xlabelsize", "xrot", "ylabelsize", "yrot"] + skip += ["ax", "sharex", "sharey", "figsize", "layout", "bins"] + new_kwargs = {k: kwargs[k] for k in kwargs if k not in skip} + return histogram(data_frame, **new_kwargs) + + +def hist_series(data_frame, **kwargs): + """ + Pandas plotting backend function, not meant to be called directly. + To activate, set pandas.options.plotting.backend="plotly" + See https://github.com/pandas-dev/pandas/blob/master/pandas/plotting/__init__.py + """ + from .express import histogram + + skip = ["by", "grid", "xlabelsize", "xrot", "ylabelsize", "yrot", "ax"] + skip += ["figsize", "bins"] + new_kwargs = {k: kwargs[k] for k in kwargs if k not in skip} + return histogram(data_frame, **new_kwargs) diff --git a/packages/python/plotly/plotly/data/__init__.py b/packages/python/plotly/plotly/data/__init__.py index dfedcfd28a..f64e326c9d 100644 --- a/packages/python/plotly/plotly/data/__init__.py +++ b/packages/python/plotly/plotly/data/__init__.py @@ -93,7 +93,7 @@ def election_geojson(): def carshare(): """ Each row represents the availability of car-sharing services near the centroid of a zone -in Montreal. +in Montreal over a month-long period. Returns: A `pandas.DataFrame` with 249 rows and the following columns: @@ -102,6 +102,76 @@ def carshare(): return _get_dataset("carshare") +def stocks(indexed=False): + """ +Each row in this wide dataset represents closing prices from 6 tech stocks in 2018/2019. + +Returns: + A `pandas.DataFrame` with 100 rows and the following columns: + `['date', 'GOOG', 'AAPL', 'AMZN', 'FB', 'NFLX', 'MSFT']`. + If `indexed` is True, the 'date' column is used as the index and the column index + is named 'company' +""" + df = _get_dataset("stocks") + if indexed: + df = df.set_index("date") + df.columns.name = "company" + return df + + +def experiment(indexed=False): + """ +Each row in this wide dataset represents the results of 100 simulated participants +on three hypothetical experiments, along with their gender and control/treatment group. + + +Returns: + A `pandas.DataFrame` with 100 rows and the following columns: + `['experiment_1', 'experiment_2', 'experiment_3', 'gender', 'group']`. + If `indexed` is True, the data frame index is named "participant" +""" + df = _get_dataset("experiment") + if indexed: + df.index.name = "participant" + return df + + +def medals_wide(indexed=False): + """ +This dataset represents the medal table for Olympic Short Track Speed Skating for the +top three nations as of 2020. + +Returns: + A `pandas.DataFrame` with 3 rows and the following columns: + `['nation', 'gold', 'silver', 'bronze']`. + If `indexed` is True, the 'nation' column is used as the index and the column index + is named 'medal' +""" + df = _get_dataset("medals") + if indexed: + df = df.set_index("nation") + df.columns.name = "medal" + return df + + +def medals_long(indexed=False): + """ +This dataset represents the medal table for Olympic Short Track Speed Skating for the +top three nations as of 2020. + +Returns: + A `pandas.DataFrame` with 9 rows and the following columns: + `['nation', 'medal', 'count']`. + If `indexed` is True, the 'nation' column is used as the index. +""" + df = _get_dataset("medals").melt( + id_vars=["nation"], value_name="count", var_name="medal" + ) + if indexed: + df = df.set_index("nation") + return df + + def _get_dataset(d): import pandas import os diff --git a/packages/python/plotly/plotly/express/__init__.py b/packages/python/plotly/plotly/express/__init__.py index fb334c1b97..72d0b44554 100644 --- a/packages/python/plotly/plotly/express/__init__.py +++ b/packages/python/plotly/plotly/express/__init__.py @@ -55,6 +55,8 @@ get_trendline_results, ) +from ._special_inputs import IdentityMap, Constant, Range # noqa: F401 + from . import data, colors # noqa: F401 __all__ = [ @@ -95,4 +97,7 @@ "colors", "set_mapbox_access_token", "get_trendline_results", + "IdentityMap", + "Constant", + "Range", ] diff --git a/packages/python/plotly/plotly/express/_chart_types.py b/packages/python/plotly/plotly/express/_chart_types.py index f7e7da8cbf..2d41c40590 100644 --- a/packages/python/plotly/plotly/express/_chart_types.py +++ b/packages/python/plotly/plotly/express/_chart_types.py @@ -2,6 +2,12 @@ from ._doc import make_docstring import plotly.graph_objs as go +_wide_mode_xy_append = [ + "Either `x` or `y` can optionally be a list of column references or array_likes, ", + "in which case the data will be treated as if it were 'wide' rather than 'long'.", +] +_cartesian_append_dict = dict(x=_wide_mode_xy_append, y=_wide_mode_xy_append) + def scatter( data_frame=None, @@ -25,6 +31,7 @@ def scatter( animation_group=None, category_orders={}, labels={}, + orientation=None, color_discrete_sequence=None, color_discrete_map={}, color_continuous_scale=None, @@ -55,7 +62,7 @@ def scatter( return make_figure(args=locals(), constructor=go.Scatter) -scatter.__doc__ = make_docstring(scatter) +scatter.__doc__ = make_docstring(scatter, append_dict=_cartesian_append_dict) def density_contour( @@ -73,6 +80,7 @@ def density_contour( animation_group=None, category_orders={}, labels={}, + orientation=None, color_discrete_sequence=None, color_discrete_map={}, marginal_x=None, @@ -112,7 +120,17 @@ def density_contour( ) -density_contour.__doc__ = make_docstring(density_contour) +density_contour.__doc__ = make_docstring( + density_contour, + append_dict=dict( + x=_wide_mode_xy_append, + y=_wide_mode_xy_append, + z=[ + "For `density_heatmap` and `density_contour` these values are used as the inputs to `histfunc`.", + ], + histfunc=["The arguments to this function are the values of `z`."], + ), +) def density_heatmap( @@ -129,6 +147,7 @@ def density_heatmap( animation_group=None, category_orders={}, labels={}, + orientation=None, color_continuous_scale=None, range_color=None, color_continuous_midpoint=None, @@ -167,7 +186,17 @@ def density_heatmap( ) -density_heatmap.__doc__ = make_docstring(density_heatmap) +density_heatmap.__doc__ = make_docstring( + density_heatmap, + append_dict=dict( + x=_wide_mode_xy_append, + y=_wide_mode_xy_append, + z=[ + "For `density_heatmap` and `density_contour` these values are used as the inputs to `histfunc`.", + ], + histfunc=["The arguments to this function are the values of `z`.",], + ), +) def line( @@ -192,6 +221,7 @@ def line( animation_group=None, category_orders={}, labels={}, + orientation=None, color_discrete_sequence=None, color_discrete_map={}, line_dash_sequence=None, @@ -214,7 +244,7 @@ def line( return make_figure(args=locals(), constructor=go.Scatter) -line.__doc__ = make_docstring(line) +line.__doc__ = make_docstring(line, append_dict=_cartesian_append_dict) def area( @@ -236,7 +266,7 @@ def area( labels={}, color_discrete_sequence=None, color_discrete_map={}, - orientation="v", + orientation=None, groupnorm=None, log_x=False, log_y=False, @@ -256,13 +286,11 @@ def area( return make_figure( args=locals(), constructor=go.Scatter, - trace_patch=dict( - stackgroup=1, mode="lines", orientation=orientation, groupnorm=groupnorm - ), + trace_patch=dict(stackgroup=1, mode="lines", groupnorm=groupnorm), ) -area.__doc__ = make_docstring(area) +area.__doc__ = make_docstring(area, append_dict=_cartesian_append_dict) def bar( @@ -291,7 +319,7 @@ def bar( range_color=None, color_continuous_midpoint=None, opacity=None, - orientation="v", + orientation=None, barmode="relative", log_x=False, log_y=False, @@ -309,12 +337,12 @@ def bar( return make_figure( args=locals(), constructor=go.Bar, - trace_patch=dict(orientation=orientation, textposition="auto"), + trace_patch=dict(textposition="auto"), layout_patch=dict(barmode=barmode), ) -bar.__doc__ = make_docstring(bar) +bar.__doc__ = make_docstring(bar, append_dict=_cartesian_append_dict) def histogram( @@ -335,7 +363,7 @@ def histogram( color_discrete_map={}, marginal=None, opacity=None, - orientation="v", + orientation=None, barmode="relative", barnorm=None, histnorm=None, @@ -361,19 +389,24 @@ def histogram( args=locals(), constructor=go.Histogram, trace_patch=dict( - orientation=orientation, - histnorm=histnorm, - histfunc=histfunc, - nbinsx=nbins if orientation == "v" else None, - nbinsy=None if orientation == "v" else nbins, - cumulative=dict(enabled=cumulative), - bingroup="x" if orientation == "v" else "y", + histnorm=histnorm, histfunc=histfunc, cumulative=dict(enabled=cumulative), ), layout_patch=dict(barmode=barmode, barnorm=barnorm), ) -histogram.__doc__ = make_docstring(histogram) +histogram.__doc__ = make_docstring( + histogram, + append_dict=dict( + x=["If `orientation` is `'h'`, these values are used as inputs to `histfunc`."] + + _wide_mode_xy_append, + y=["If `orientation` is `'v'`, these values are used as inputs to `histfunc`."] + + _wide_mode_xy_append, + histfunc=[ + "The arguments to this function are the values of `y`(`x`) if `orientation` is `'v'`(`'h'`).", + ], + ), +) def violin( @@ -393,8 +426,8 @@ def violin( labels={}, color_discrete_sequence=None, color_discrete_map={}, - orientation="v", - violinmode="group", + orientation=None, + violinmode=None, log_x=False, log_y=False, range_x=None, @@ -414,18 +447,13 @@ def violin( args=locals(), constructor=go.Violin, trace_patch=dict( - orientation=orientation, - points=points, - box=dict(visible=box), - scalegroup=True, - x0=" ", - y0=" ", + points=points, box=dict(visible=box), scalegroup=True, x0=" ", y0=" ", ), layout_patch=dict(violinmode=violinmode), ) -violin.__doc__ = make_docstring(violin) +violin.__doc__ = make_docstring(violin, append_dict=_cartesian_append_dict) def box( @@ -445,8 +473,8 @@ def box( labels={}, color_discrete_sequence=None, color_discrete_map={}, - orientation="v", - boxmode="group", + orientation=None, + boxmode=None, log_x=False, log_y=False, range_x=None, @@ -470,14 +498,12 @@ def box( return make_figure( args=locals(), constructor=go.Box, - trace_patch=dict( - orientation=orientation, boxpoints=points, notched=notched, x0=" ", y0=" " - ), + trace_patch=dict(boxpoints=points, notched=notched, x0=" ", y0=" "), layout_patch=dict(boxmode=boxmode), ) -box.__doc__ = make_docstring(box) +box.__doc__ = make_docstring(box, append_dict=_cartesian_append_dict) def strip( @@ -497,8 +523,8 @@ def strip( labels={}, color_discrete_sequence=None, color_discrete_map={}, - orientation="v", - stripmode="group", + orientation=None, + stripmode=None, log_x=False, log_y=False, range_x=None, @@ -516,7 +542,6 @@ def strip( args=locals(), constructor=go.Box, trace_patch=dict( - orientation=orientation, boxpoints="all", pointpos=0, hoveron="points", @@ -529,7 +554,7 @@ def strip( ) -strip.__doc__ = make_docstring(strip) +strip.__doc__ = make_docstring(strip, append_dict=_cartesian_append_dict) def scatter_3d( @@ -1384,7 +1409,7 @@ def funnel( color_discrete_sequence=None, color_discrete_map={}, opacity=None, - orientation="h", + orientation=None, log_x=False, log_y=False, range_x=None, @@ -1398,12 +1423,10 @@ def funnel( In a funnel plot, each row of `data_frame` is represented as a rectangular sector of a funnel. """ - return make_figure( - args=locals(), constructor=go.Funnel, trace_patch=dict(orientation=orientation), - ) + return make_figure(args=locals(), constructor=go.Funnel) -funnel.__doc__ = make_docstring(funnel) +funnel.__doc__ = make_docstring(funnel, append_dict=_cartesian_append_dict) def funnel_area( diff --git a/packages/python/plotly/plotly/express/_core.py b/packages/python/plotly/plotly/express/_core.py index ae11d40f2b..216b16716c 100644 --- a/packages/python/plotly/plotly/express/_core.py +++ b/packages/python/plotly/plotly/express/_core.py @@ -1,6 +1,7 @@ import plotly.graph_objs as go import plotly.io as pio from collections import namedtuple, OrderedDict +from ._special_inputs import IdentityMap, Constant, Range from _plotly_utils.basevalidators import ColorscaleValidator from .colors import qualitative, sequential @@ -15,6 +16,28 @@ ) +# Declare all supported attributes, across all plot types +direct_attrables = ( + ["x", "y", "z", "a", "b", "c", "r", "theta", "size"] + + ["hover_name", "text", "names", "values", "parents", "wide_cross"] + + ["ids", "error_x", "error_x_minus", "error_y", "error_y_minus", "error_z"] + + ["error_z_minus", "lat", "lon", "locations", "animation_group"] +) +array_attrables = ["dimensions", "custom_data", "hover_data", "path", "wide_variable"] +group_attrables = ["animation_frame", "facet_row", "facet_col", "line_group"] +renameable_group_attrables = [ + "color", # renamed to marker.color or line.color in infer_config + "symbol", # renamed to marker.symbol in infer_config + "line_dash", # renamed to line.dash in infer_config +] +all_attrables = ( + direct_attrables + array_attrables + group_attrables + renameable_group_attrables +) + +cartesians = [go.Scatter, go.Scattergl, go.Bar, go.Funnel, go.Box, go.Violin] +cartesians += [go.Histogram, go.Histogram2d, go.Histogram2dContour] + + class PxDefaults(object): __slots__ = [ "template", @@ -41,6 +64,7 @@ def __init__(self): defaults = PxDefaults() del PxDefaults + MAPBOX_TOKEN = None @@ -92,6 +116,10 @@ def get_label(args, column): return column +def _is_continuous(df, col_name): + return df[col_name].dtype.kind in "ifc" + + def get_decorated_label(args, column, role): label = get_label(args, column) if "histfunc" in args and ( @@ -137,11 +165,15 @@ def make_mapping(args, variable): if variable == "dash": arg_name = "line_dash" vprefix = "line_dash" + if args[vprefix + "_map"] == "identity": + val_map = IdentityMap() + else: + val_map = args[vprefix + "_map"].copy() return Mapping( show_in_trace_name=True, variable=variable, grouper=args[arg_name], - val_map=args[vprefix + "_map"].copy(), + val_map=val_map, sequence=args[vprefix + "_sequence"], updater=lambda trace, v: trace.update({parent: {variable: v}}), facet=None, @@ -188,7 +220,7 @@ def make_trace_kwargs(args, trace_spec, trace_data, mapping_labels, sizeref): if ((not attr_value) or (name in attr_value)) and ( trace_spec.constructor != go.Parcoords - or args["data_frame"][name].dtype.kind in "ifc" + or _is_continuous(args["data_frame"], name) ) and ( trace_spec.constructor != go.Parcats @@ -418,14 +450,6 @@ def make_trace_kwargs(args, trace_spec, trace_data, mapping_labels, sizeref): def configure_axes(args, constructor, fig, orders): configurators = { - go.Scatter: configure_cartesian_axes, - go.Scattergl: configure_cartesian_axes, - go.Bar: configure_cartesian_axes, - go.Box: configure_cartesian_axes, - go.Violin: configure_cartesian_axes, - go.Histogram: configure_cartesian_axes, - go.Histogram2dContour: configure_cartesian_axes, - go.Histogram2d: configure_cartesian_axes, go.Scatter3d: configure_3d_axes, go.Scatterternary: configure_ternary_axes, go.Scatterpolar: configure_polar_axes, @@ -437,6 +461,8 @@ def configure_axes(args, constructor, fig, orders): go.Scattergeo: configure_geo, go.Choropleth: configure_geo, } + for c in cartesians: + configurators[c] = configure_cartesian_axes if constructor in configurators: configurators[constructor](args, fig, orders) @@ -702,6 +728,21 @@ def frame_args(duration): def make_trace_spec(args, constructor, attrs, trace_patch): + if constructor in [go.Scatter, go.Scatterpolar]: + if "render_mode" in args and ( + args["render_mode"] == "webgl" + or ( + args["render_mode"] == "auto" + and len(args["data_frame"]) > 1000 + and args["animation_frame"] is None + ) + ): + if constructor == go.Scatter: + constructor = go.Scattergl + if "orientation" in trace_patch: + del trace_patch["orientation"] + else: + constructor = go.Scatterpolargl # Create base trace specification result = [TraceSpec(constructor, attrs, trace_patch, None)] @@ -858,12 +899,12 @@ def _check_name_not_reserved(field_name, reserved_names): return field_name else: raise NameError( - "A name conflict was encountered for argument %s. " - "A column with name %s is already used." % (field_name, field_name) + "A name conflict was encountered for argument '%s'. " + "A column or index with name '%s' is ambiguous." % (field_name, field_name) ) -def _get_reserved_col_names(args, attrables, array_attrables): +def _get_reserved_col_names(args): """ This function builds a list of columns of the data_frame argument used as arguments, either as str/int arguments or given as columns @@ -872,7 +913,7 @@ def _get_reserved_col_names(args, attrables, array_attrables): df = args["data_frame"] reserved_names = set() for field in args: - if field not in attrables: + if field not in all_attrables: continue names = args[field] if field in array_attrables else [args[field]] if names is None: @@ -888,10 +929,37 @@ def _get_reserved_col_names(args, attrables, array_attrables): in_df = arg is df[arg_name] if in_df: reserved_names.add(arg_name) + elif arg is df.index and arg.name is not None: + reserved_names.add(arg.name) return reserved_names +def _is_col_list(df_input, arg): + """Returns True if arg looks like it's a list of columns or references to columns + in df_input, and False otherwise (in which case it's assumed to be a single column + or reference to a column). + """ + if arg is None or isinstance(arg, str) or isinstance(arg, int): + return False + if isinstance(arg, pd.MultiIndex): + return False # just to keep existing behaviour for now + try: + iter(arg) + except TypeError: + return False # not iterable + for c in arg: + if isinstance(c, str) or isinstance(c, int): + if df_input is None or c not in df_input.columns: + return False + else: + try: + iter(c) + except TypeError: + return False # not iterable + return True + + def _isinstance_listlike(x): """Returns True if x is an iterable which can be transformed into a pandas Series, False for the other types of possible values of a `hover_data` dict. @@ -908,46 +976,29 @@ def _isinstance_listlike(x): return True -def build_dataframe(args, attrables, array_attrables): - """ - Constructs a dataframe and modifies `args` in-place. +def _escape_col_name(df_input, col_name, extra): + while df_input is not None and (col_name in df_input.columns or col_name in extra): + col_name = "_" + col_name + return col_name - The argument values in `args` can be either strings corresponding to - existing columns of a dataframe, or data arrays (lists, numpy arrays, - pandas columns, series). - Parameters - ---------- - args : OrderedDict - arguments passed to the px function and subsequently modified - attrables : list - list of keys into `args`, all of whose corresponding values are - converted into columns of a dataframe. - array_attrables : list - argument names corresponding to iterables, such as `hover_data`, ... +def process_args_into_dataframe(args, wide_mode, var_name, value_name): """ - for field in args: - if field in array_attrables and args[field] is not None: - args[field] = ( - OrderedDict(args[field]) - if isinstance(args[field], dict) - else list(args[field]) - ) - # Cast data_frame argument to DataFrame (it could be a numpy array, dict etc.) - df_provided = args["data_frame"] is not None - if df_provided and not isinstance(args["data_frame"], pd.DataFrame): - args["data_frame"] = pd.DataFrame(args["data_frame"]) + After this function runs, the `all_attrables` keys of `args` all contain only + references to columns of `df_output`. This function handles the extraction of data + from `args["attrable"]` and column-name-generation as appropriate, and adds the + data to `df_output` and then replaces `args["attrable"]` with the appropriate + reference. + """ + df_input = args["data_frame"] + df_provided = df_input is not None - # We start from an empty DataFrame df_output = pd.DataFrame() - - # Initialize set of column names - # These are reserved names - if df_provided: - reserved_names = _get_reserved_col_names(args, attrables, array_attrables) - else: - reserved_names = set() + constants = dict() + ranges = list() + wide_id_vars = set() + reserved_names = _get_reserved_col_names(args) if df_provided else set() # Case of functions with a "dimensions" kw: scatter_matrix, parcats, parcoords if "dimensions" in args and args["dimensions"] is None: @@ -971,8 +1022,13 @@ def build_dataframe(args, attrables, array_attrables): args["hover_data"][k] = (True, args["hover_data"][k]) if not isinstance(args["hover_data"][k], tuple): args["hover_data"][k] = (args["hover_data"][k], None) + if df_provided and args["hover_data"][k][1] is not None and k in df_input: + raise ValueError( + "Ambiguous input: values for '%s' appear both in hover_data and data_frame" + % k + ) # Loop over possible arguments - for field_name in attrables: + for field_name in all_attrables: # Massaging variables argument_list = ( [args.get(field_name)] @@ -995,6 +1051,7 @@ def build_dataframe(args, attrables, array_attrables): length = len(df_output) if argument is None: continue + col_name = None # Case of multiindex if isinstance(argument, pd.MultiIndex): raise TypeError( @@ -1002,8 +1059,18 @@ def build_dataframe(args, attrables, array_attrables): "pandas MultiIndex is not supported by plotly express " "at the moment." % field ) + # ----------------- argument is a special value ---------------------- + if isinstance(argument, Constant) or isinstance(argument, Range): + col_name = _check_name_not_reserved( + str(argument.label) if argument.label is not None else field, + reserved_names, + ) + if isinstance(argument, Constant): + constants[col_name] = argument.value + else: + ranges.append(col_name) # ----------------- argument is a col name ---------------------- - if isinstance(argument, str) or isinstance( + elif isinstance(argument, str) or isinstance( argument, int ): # just a column name given as str or int @@ -1012,11 +1079,28 @@ def build_dataframe(args, attrables, array_attrables): and hover_data_is_dict and args["hover_data"][str(argument)][1] is not None ): + # hover_data has onboard data + # previously-checked to have no name-conflict with data_frame col_name = str(argument) - df_output[col_name] = args["hover_data"][col_name][1] - continue - - if not df_provided: + real_argument = args["hover_data"][col_name][1] + + if length and len(real_argument) != length: + raise ValueError( + "All arguments should have the same length. " + "The length of hover_data key `%s` is %d, whereas the " + "length of previously-processed arguments %s is %d" + % ( + argument, + len(real_argument), + str(list(df_output.columns)), + length, + ) + ) + if hasattr(real_argument, "values"): + df_output[col_name] = real_argument.values + else: + df_output[col_name] = np.array(real_argument) + elif not df_provided: raise ValueError( "String or int arguments are only possible when a " "DataFrame or an array is provided in the `data_frame` " @@ -1024,22 +1108,23 @@ def build_dataframe(args, attrables, array_attrables): "'%s' is of type str or int." % field ) # Check validity of column name - if argument not in df_input.columns: - err_msg = ( - "Value of '%s' is not the name of a column in 'data_frame'. " - "Expected one of %s but received: %s" - % (field, str(list(df_input.columns)), argument) - ) - if argument == "index": - err_msg += ( - "\n To use the index, pass it in directly as `df.index`." + elif argument not in df_input.columns: + if wide_mode and argument in (value_name, var_name): + continue + else: + err_msg = ( + "Value of '%s' is not the name of a column in 'data_frame'. " + "Expected one of %s but received: %s" + % (field, str(list(df_input.columns)), argument) ) - raise ValueError(err_msg) - if length and len(df_input[argument]) != length: + if argument == "index": + err_msg += "\n To use the index, pass it in directly as `df.index`." + raise ValueError(err_msg) + elif length and len(df_input[argument]) != length: raise ValueError( "All arguments should have the same length. " "The length of column argument `df[%s]` is %d, whereas the " - "length of previous arguments %s is %d" + "length of previously-processed arguments %s is %d" % ( field, len(df_input[argument]), @@ -1047,40 +1132,35 @@ def build_dataframe(args, attrables, array_attrables): length, ) ) - col_name = str(argument) - df_output[col_name] = df_input[argument].values + else: + col_name = str(argument) + df_output[col_name] = df_input[argument].values # ----------------- argument is a column / array / list.... ------- else: - is_index = isinstance(argument, pd.RangeIndex) - # First pandas - # pandas series have a name but it's None - if ( - hasattr(argument, "name") and argument.name is not None - ) or is_index: - col_name = argument.name # pandas df - if col_name is None and is_index: - col_name = "index" - if not df_provided: - col_name = field - else: - if is_index: - keep_name = df_provided and argument is df_input.index + if df_provided and hasattr(argument, "name"): + if argument is df_input.index: + if argument.name is None or argument.name in df_input: + col_name = "index" else: - keep_name = ( - col_name in df_input and argument is df_input[col_name] - ) - col_name = ( - col_name - if keep_name - else _check_name_not_reserved(field, reserved_names) + col_name = argument.name + col_name = _escape_col_name( + df_input, col_name, [var_name, value_name] ) - else: # numpy array, list... + else: + if ( + argument.name is not None + and argument.name in df_input + and argument is df_input[argument.name] + ): + col_name = argument.name + if col_name is None: # numpy array, list... col_name = _check_name_not_reserved(field, reserved_names) + if length and len(argument) != length: raise ValueError( "All arguments should have the same length. " "The length of argument `%s` is %d, whereas the " - "length of previous arguments %s is %d" + "length of previously-processed arguments %s is %d" % (field, len(argument), str(list(df_output.columns)), length) ) if hasattr(argument, "values"): @@ -1089,12 +1169,234 @@ def build_dataframe(args, attrables, array_attrables): df_output[str(col_name)] = np.array(argument) # Finally, update argument with column name now that column exists + assert col_name is not None, ( + "Data-frame processing failure, likely due to a internal bug. " + "Please report this to " + "https://github.com/plotly/plotly.py/issues/new and we will try to " + "replicate and fix it." + ) if field_name not in array_attrables: args[field_name] = str(col_name) elif isinstance(args[field_name], dict): pass else: args[field_name][i] = str(col_name) + if field_name != "wide_variable": + wide_id_vars.add(str(col_name)) + + for col_name in ranges: + df_output[col_name] = range(len(df_output)) + + for col_name in constants: + df_output[col_name] = constants[col_name] + + return df_output, wide_id_vars + + +def build_dataframe(args, constructor): + """ + Constructs a dataframe and modifies `args` in-place. + + The argument values in `args` can be either strings corresponding to + existing columns of a dataframe, or data arrays (lists, numpy arrays, + pandas columns, series). + + Parameters + ---------- + args : OrderedDict + arguments passed to the px function and subsequently modified + constructor : graph_object trace class + the trace type selected for this figure + """ + + # make copies of all the fields via dict() and list() + for field in args: + if field in array_attrables and args[field] is not None: + args[field] = ( + dict(args[field]) + if isinstance(args[field], dict) + else list(args[field]) + ) + + # Cast data_frame argument to DataFrame (it could be a numpy array, dict etc.) + df_provided = args["data_frame"] is not None + if df_provided and not isinstance(args["data_frame"], pd.DataFrame): + args["data_frame"] = pd.DataFrame(args["data_frame"]) + df_input = args["data_frame"] + + # now we handle special cases like wide-mode or x-xor-y specification + # by rearranging args to tee things up for process_args_into_dataframe to work + no_x = args.get("x", None) is None + no_y = args.get("y", None) is None + wide_x = False if no_x else _is_col_list(df_input, args["x"]) + wide_y = False if no_y else _is_col_list(df_input, args["y"]) + + wide_mode = False + var_name = None # will likely be "variable" in wide_mode + wide_cross_name = None # will likely be "index" in wide_mode + value_name = None # will likely be "value" in wide_mode + hist2d_types = [go.Histogram2d, go.Histogram2dContour] + if constructor in cartesians: + if wide_x and wide_y: + raise ValueError( + "Cannot accept list of column references or list of columns for both `x` and `y`." + ) + if df_provided and no_x and no_y: + wide_mode = True + if isinstance(df_input.columns, pd.MultiIndex): + raise TypeError( + "Data frame columns is a pandas MultiIndex. " + "pandas MultiIndex is not supported by plotly express " + "at the moment." + ) + args["wide_variable"] = list(df_input.columns) + var_name = df_input.columns.name + if var_name in [None, "value", "index"] or var_name in df_input: + var_name = "variable" + if constructor == go.Funnel: + wide_orientation = args.get("orientation", None) or "h" + else: + wide_orientation = args.get("orientation", None) or "v" + args["orientation"] = wide_orientation + args["wide_cross"] = None + elif wide_x != wide_y: + wide_mode = True + args["wide_variable"] = args["y"] if wide_y else args["x"] + if df_provided and args["wide_variable"] is df_input.columns: + var_name = df_input.columns.name + if isinstance(args["wide_variable"], pd.Index): + args["wide_variable"] = list(args["wide_variable"]) + if var_name in [None, "value", "index"] or ( + df_provided and var_name in df_input + ): + var_name = "variable" + if constructor == go.Histogram: + wide_orientation = "v" if wide_x else "h" + else: + wide_orientation = "v" if wide_y else "h" + args["y" if wide_y else "x"] = None + args["wide_cross"] = None + if not no_x and not no_y: + wide_cross_name = "__x__" if wide_y else "__y__" + + if wide_mode: + value_name = _escape_col_name(df_input, "value", []) + var_name = _escape_col_name(df_input, var_name, []) + + missing_bar_dim = None + if constructor in [go.Scatter, go.Bar, go.Funnel] + hist2d_types: + if not wide_mode and (no_x != no_y): + for ax in ["x", "y"]: + if args.get(ax, None) is None: + args[ax] = df_input.index if df_provided else Range() + if constructor == go.Bar: + missing_bar_dim = ax + else: + if args["orientation"] is None: + args["orientation"] = "v" if ax == "x" else "h" + if wide_mode and wide_cross_name is None: + if no_x != no_y and args["orientation"] is None: + args["orientation"] = "v" if no_x else "h" + if df_provided: + if isinstance(df_input.index, pd.MultiIndex): + raise TypeError( + "Data frame index is a pandas MultiIndex. " + "pandas MultiIndex is not supported by plotly express " + "at the moment." + ) + args["wide_cross"] = df_input.index + else: + args["wide_cross"] = Range( + label=_escape_col_name(df_input, "index", [var_name, value_name]) + ) + + # now that things have been prepped, we do the systematic rewriting of `args` + + df_output, wide_id_vars = process_args_into_dataframe( + args, wide_mode, var_name, value_name + ) + + # now that `df_output` exists and `args` contains only references, we complete + # the special-case and wide-mode handling by further rewriting args and/or mutating + # df_output + + count_name = _escape_col_name(df_output, "count", [var_name, value_name]) + if not wide_mode and missing_bar_dim and constructor == go.Bar: + # now that we've populated df_output, we check to see if the non-missing + # dimension is categorical: if so, then setting the missing dimension to a + # constant 1 is a less-insane thing to do than setting it to the index by + # default and we let the normal auto-orientation-code do its thing later + other_dim = "x" if missing_bar_dim == "y" else "y" + if not _is_continuous(df_output, args[other_dim]): + args[missing_bar_dim] = count_name + df_output[count_name] = 1 + else: + # on the other hand, if the non-missing dimension is continuous, then we + # can use this information to override the normal auto-orientation code + if args["orientation"] is None: + args["orientation"] = "v" if missing_bar_dim == "x" else "h" + + if constructor in hist2d_types: + del args["orientation"] + + if wide_mode: + # at this point, `df_output` is semi-long/semi-wide, but we know which columns + # are which, so we melt it and reassign `args` to refer to the newly-tidy + # columns, keeping track of various names and manglings set up above + wide_value_vars = [c for c in args["wide_variable"] if c not in wide_id_vars] + del args["wide_variable"] + if wide_cross_name == "__x__": + wide_cross_name = args["x"] + elif wide_cross_name == "__y__": + wide_cross_name = args["y"] + else: + wide_cross_name = args["wide_cross"] + del args["wide_cross"] + dtype = None + for v in wide_value_vars: + if dtype is None: + dtype = df_output[v].dtype + elif dtype != df_output[v].dtype: + raise ValueError( + "Plotly Express cannot process wide-form data with columns of different type." + ) + df_output = df_output.melt( + id_vars=wide_id_vars, + value_vars=wide_value_vars, + var_name=var_name, + value_name=value_name, + ) + assert len(df_output.columns) == len(set(df_output.columns)), ( + "Wide-mode name-inference failure, likely due to a internal bug. " + "Please report this to " + "https://github.com/plotly/plotly.py/issues/new and we will try to " + "replicate and fix it." + ) + df_output[var_name] = df_output[var_name].astype(str) + orient_v = wide_orientation == "v" + + if constructor in [go.Scatter, go.Funnel] + hist2d_types: + args["x" if orient_v else "y"] = wide_cross_name + args["y" if orient_v else "x"] = value_name + if constructor != go.Histogram2d: + args["color"] = args["color"] or var_name + if constructor == go.Bar: + if _is_continuous(df_output, value_name): + args["x" if orient_v else "y"] = wide_cross_name + args["y" if orient_v else "x"] = value_name + args["color"] = args["color"] or var_name + else: + args["x" if orient_v else "y"] = value_name + args["y" if orient_v else "x"] = count_name + df_output[count_name] = 1 + args["color"] = args["color"] or var_name + if constructor in [go.Violin, go.Box]: + args["x" if orient_v else "y"] = wide_cross_name or var_name + args["y" if orient_v else "x"] = value_name + if constructor == go.Histogram: + args["x" if orient_v else "y"] = value_name + args["y" if orient_v else "x"] = wide_cross_name + args["color"] = args["color"] or var_name args["data_frame"] = df_output return args @@ -1188,7 +1490,7 @@ def aggfunc_discrete(x): agg_f[count_colname] = "sum" if args["color"]: - if df[args["color"]].dtype.kind not in "ifc": + if not _is_continuous(df, args["color"]): aggfunc_color = aggfunc_discrete discrete_color = True elif not aggfunc_color: @@ -1254,29 +1556,8 @@ def aggfunc_continuous(x): return args -def infer_config(args, constructor, trace_patch): - # Declare all supported attributes, across all plot types - attrables = ( - ["x", "y", "z", "a", "b", "c", "r", "theta", "size", "dimensions"] - + ["custom_data", "hover_name", "hover_data", "text"] - + ["names", "values", "parents", "ids"] - + ["error_x", "error_x_minus"] - + ["error_y", "error_y_minus", "error_z", "error_z_minus"] - + ["lat", "lon", "locations", "animation_group", "path"] - ) - array_attrables = ["dimensions", "custom_data", "hover_data", "path"] - group_attrables = ["animation_frame", "facet_row", "facet_col", "line_group"] - all_attrables = attrables + group_attrables + ["color"] - group_attrs = ["symbol", "line_dash"] - for group_attr in group_attrs: - if group_attr in args: - all_attrables += [group_attr] - - args = build_dataframe(args, all_attrables, array_attrables) - if constructor in [go.Treemap, go.Sunburst] and args["path"] is not None: - args = process_dataframe_hierarchy(args) - - attrs = [k for k in attrables if k in args] +def infer_config(args, constructor, trace_patch, layout_patch): + attrs = [k for k in direct_attrables + array_attrables if k in args] grouped_attrs = [] # Compute sizeref @@ -1290,10 +1571,7 @@ def infer_config(args, constructor, trace_patch): if "color_discrete_sequence" not in args: attrs.append("color") else: - if ( - args["color"] - and args["data_frame"][args["color"]].dtype.kind in "ifc" - ): + if args["color"] and _is_continuous(args["data_frame"], args["color"]): attrs.append("color") args["color_is_continuous"] = True elif constructor in [go.Sunburst, go.Treemap]: @@ -1332,8 +1610,55 @@ def infer_config(args, constructor, trace_patch): if "symbol" in args: grouped_attrs.append("marker.symbol") - # Compute final trace patch - trace_patch = trace_patch.copy() + if "orientation" in args: + has_x = args["x"] is not None + has_y = args["y"] is not None + if args["orientation"] is None: + if constructor in [go.Histogram, go.Scatter]: + if has_y and not has_x: + args["orientation"] = "h" + elif constructor in [go.Violin, go.Box, go.Bar, go.Funnel]: + if has_x and not has_y: + args["orientation"] = "h" + + if args["orientation"] is None and has_x and has_y: + x_is_continuous = _is_continuous(args["data_frame"], args["x"]) + y_is_continuous = _is_continuous(args["data_frame"], args["y"]) + if x_is_continuous and not y_is_continuous: + args["orientation"] = "h" + if y_is_continuous and not x_is_continuous: + args["orientation"] = "v" + + if args["orientation"] is None: + args["orientation"] = "v" + + if constructor == go.Histogram: + if has_x and has_y and args["histfunc"] is None: + args["histfunc"] = trace_patch["histfunc"] = "sum" + + orientation = args["orientation"] + nbins = args["nbins"] + trace_patch["nbinsx"] = nbins if orientation == "v" else None + trace_patch["nbinsy"] = None if orientation == "v" else nbins + trace_patch["bingroup"] = "x" if orientation == "v" else "y" + trace_patch["orientation"] = args["orientation"] + + if constructor in [go.Violin, go.Box]: + mode = "boxmode" if constructor == go.Box else "violinmode" + if layout_patch[mode] is None and args["color"] is not None: + if args["y"] == args["color"] and args["orientation"] == "h": + layout_patch[mode] = "overlay" + elif args["x"] == args["color"] and args["orientation"] == "v": + layout_patch[mode] = "overlay" + if layout_patch[mode] is None: + layout_patch[mode] = "group" + + if ( + constructor == go.Histogram2d + and args["z"] is not None + and args["histfunc"] is None + ): + args["histfunc"] = trace_patch["histfunc"] = "sum" if constructor in [go.Histogram2d, go.Densitymapbox]: show_colorbar = True @@ -1381,7 +1706,7 @@ def infer_config(args, constructor, trace_patch): # Create trace specs trace_specs = make_trace_spec(args, constructor, attrs, trace_patch) - return args, trace_specs, grouped_mappings, sizeref, show_colorbar + return trace_specs, grouped_mappings, sizeref, show_colorbar def get_orderings(args, grouper, grouped): @@ -1425,11 +1750,17 @@ def get_orderings(args, grouper, grouped): return orders, group_names, group_values -def make_figure(args, constructor, trace_patch={}, layout_patch={}): +def make_figure(args, constructor, trace_patch=None, layout_patch=None): + trace_patch = trace_patch or {} + layout_patch = layout_patch or {} apply_default_cascade(args) - args, trace_specs, grouped_mappings, sizeref, show_colorbar = infer_config( - args, constructor, trace_patch + args = build_dataframe(args, constructor) + if constructor in [go.Treemap, go.Sunburst] and args["path"] is not None: + args = process_dataframe_hierarchy(args) + + trace_specs, grouped_mappings, sizeref, show_colorbar = infer_config( + args, constructor, trace_patch, layout_patch ) grouper = [x.grouper or one_group for x in grouped_mappings] or [one_group] grouped = args["data_frame"].groupby(grouper, sort=False) @@ -1468,9 +1799,10 @@ def make_figure(args, constructor, trace_patch={}, layout_patch={}): for col, val, m in zip(grouper, group_name, grouped_mappings): if col != one_group: key = get_label(args, col) - mapping_labels[key] = str(val) - if m.show_in_trace_name: - trace_name_labels[key] = str(val) + if not isinstance(m.val_map, IdentityMap): + mapping_labels[key] = str(val) + if m.show_in_trace_name: + trace_name_labels[key] = str(val) if m.variable == "animation_frame": frame_name = val trace_name = ", ".join(trace_name_labels.values()) @@ -1479,23 +1811,8 @@ def make_figure(args, constructor, trace_patch={}, layout_patch={}): trace_names = trace_names_by_frame[frame_name] for trace_spec in trace_specs: - constructor_to_use = trace_spec.constructor - if constructor_to_use in [go.Scatter, go.Scatterpolar]: - if "render_mode" in args and ( - args["render_mode"] == "webgl" - or ( - args["render_mode"] == "auto" - and len(args["data_frame"]) > 1000 - and args["animation_frame"] is None - ) - ): - constructor_to_use = ( - go.Scattergl - if constructor_to_use == go.Scatter - else go.Scatterpolargl - ) # Create the trace - trace = constructor_to_use(name=trace_name) + trace = trace_spec.constructor(name=trace_name) if trace_spec.constructor not in [ go.Parcats, go.Parcoords, @@ -1605,7 +1922,7 @@ def make_figure(args, constructor, trace_patch={}, layout_patch={}): frame_list = sorted( frame_list, key=lambda f: orders[args["animation_frame"]].index(f["name"]) ) - layout_patch = layout_patch.copy() + if show_colorbar: colorvar = "z" if constructor in [go.Histogram2d, go.Densitymapbox] else "color" range_color = args["range_color"] or [None, None] diff --git a/packages/python/plotly/plotly/express/_doc.py b/packages/python/plotly/plotly/express/_doc.py index 05a5b214ca..4c7b591f78 100644 --- a/packages/python/plotly/plotly/express/_doc.py +++ b/packages/python/plotly/plotly/express/_doc.py @@ -25,19 +25,16 @@ colref_type, colref_desc, "Values from this column or array_like are used to position marks along the x axis in cartesian coordinates.", - "For horizontal histograms, these values are used as inputs to `histfunc`.", ], y=[ colref_type, colref_desc, "Values from this column or array_like are used to position marks along the y axis in cartesian coordinates.", - "For vertical histograms, these values are used as inputs to `histfunc`.", ], z=[ colref_type, colref_desc, "Values from this column or array_like are used to position marks along the z axis in cartesian coordinates.", - "For `density_heatmap` and `density_contour` these values are used as the inputs to `histfunc`.", ], a=[ colref_type, @@ -173,7 +170,7 @@ colref_desc, "Values from this column or array_like are used to assign mark sizes.", ], - radius=["int (default is 30)", "Sets the radius of influence of each point.",], + radius=["int (default is 30)", "Sets the radius of influence of each point."], hover_name=[ colref_type, colref_desc, @@ -247,12 +244,14 @@ "String values should define plotly.js symbols", "Used to override `symbol_sequence` to assign a specific symbols to marks corresponding with specific values.", "Keys in `symbol_map` should be values in the column denoted by `symbol`.", + "Alternatively, if the values of `symbol` are valid symbol names, the string `'identity'` may be passed to cause them to be used directly.", ], line_dash_map=[ "dict with str keys and str values (default `{}`)", "Strings values define plotly.js dash-patterns.", "Used to override `line_dash_sequences` to assign a specific dash-patterns to lines corresponding with specific values.", "Keys in `line_dash_map` should be values in the column denoted by `line_dash`.", + "Alternatively, if the values of `line_dash` are valid line-dash names, the string `'identity'` may be passed to cause them to be used directly.", ], line_dash_sequence=[ "list of str", @@ -270,6 +269,7 @@ "String values should define valid CSS-colors", "Used to override `color_discrete_sequence` to assign a specific colors to marks corresponding with specific values.", "Keys in `color_discrete_map` should be values in the column denoted by `color`.", + "Alternatively, if the values of `color` are valid colors, the string `'identity'` may be passed to cause them to be used directly.", ], color_continuous_scale=[ "list of str", @@ -386,12 +386,9 @@ "Sets start angle for the angular axis, with 0 being due east and 90 being due north.", ], histfunc=[ - "str (default `'count'`)", + "str (default `'count'` if no arguments are provided, else `'sum'`)", "One of `'count'`, `'sum'`, `'avg'`, `'min'`, or `'max'`." "Function used to aggregate values for summarization (note: can be normalized with `histnorm`).", - "The arguments to this function for `histogram` are the values of `y` if `orientation` is `'v'`,", - "otherwise the arguements are the values of `x`.", - "The arguments to this function for `density_heatmap` and `density_contour` are the values of `z`.", ], histnorm=[ "str (default `None`)", @@ -443,8 +440,10 @@ ], zoom=["int (default `8`)", "Between 0 and 20.", "Sets map zoom level."], orientation=[ - "str (default `'v'`)", - "One of `'h'` for horizontal or `'v'` for vertical)", + "str, one of `'h'` for horizontal or `'v'` for vertical. ", + "(default `'v'` if `x` and `y` are provided and both continous or both categorical, ", + "otherwise `'v'`(`'h'`) if `x`(`y`) is categorical and `y`(`x`) is continuous, ", + "otherwise `'v'`(`'h'`) if only `x`(`y`) is provided) ", ], line_close=[ "boolean (default `False`)", @@ -518,14 +517,16 @@ ) -def make_docstring(fn, override_dict={}): +def make_docstring(fn, override_dict={}, append_dict={}): tw = TextWrapper(width=75, initial_indent=" ", subsequent_indent=" ") result = (fn.__doc__ or "") + "\nParameters\n----------\n" for param in getfullargspec(fn)[0]: if override_dict.get(param): - param_doc = override_dict[param] + param_doc = list(override_dict[param]) else: - param_doc = docs[param] + param_doc = list(docs[param]) + if append_dict.get(param): + param_doc += append_dict[param] param_desc_list = param_doc[1:] param_desc = ( tw.fill(" ".join(param_desc_list or "")) diff --git a/packages/python/plotly/plotly/express/_special_inputs.py b/packages/python/plotly/plotly/express/_special_inputs.py new file mode 100644 index 0000000000..c1b3d4d102 --- /dev/null +++ b/packages/python/plotly/plotly/express/_special_inputs.py @@ -0,0 +1,40 @@ +class IdentityMap(object): + """ + `dict`-like object which acts as if the value for any key is the key itself. Objects + of this class can be passed in to arguments like `color_discrete_map` to + use the provided data values as colors, rather than mapping them to colors cycled + from `color_discrete_sequence`. This works for any `_map` argument to Plotly Express + functions, such as `line_dash_map` and `symbol_map`. + """ + + def __getitem__(self, key): + return key + + def __contains__(self, key): + return True + + def copy(self): + return self + + +class Constant(object): + """ + Objects of this class can be passed to Plotly Express functions that expect column + identifiers or list-like objects to indicate that this attribute should take on a + constant value. An optional label can be provided. + """ + + def __init__(self, value, label=None): + self.value = value + self.label = label + + +class Range(object): + """ + Objects of this class can be passed to Plotly Express functions that expect column + identifiers or list-like objects to indicate that this attribute should be mapped + onto integers starting at 0. An optional label can be provided. + """ + + def __init__(self, label=None): + self.label = label diff --git a/packages/python/plotly/plotly/package_data/datasets/experiment.csv.gz b/packages/python/plotly/plotly/package_data/datasets/experiment.csv.gz new file mode 100644 index 0000000000..427e540990 Binary files /dev/null and b/packages/python/plotly/plotly/package_data/datasets/experiment.csv.gz differ diff --git a/packages/python/plotly/plotly/package_data/datasets/medals.csv.gz b/packages/python/plotly/plotly/package_data/datasets/medals.csv.gz new file mode 100644 index 0000000000..ae2e1dd90f Binary files /dev/null and b/packages/python/plotly/plotly/package_data/datasets/medals.csv.gz differ diff --git a/packages/python/plotly/plotly/package_data/datasets/stocks.csv.gz b/packages/python/plotly/plotly/package_data/datasets/stocks.csv.gz new file mode 100644 index 0000000000..4f178a4d49 Binary files /dev/null and b/packages/python/plotly/plotly/package_data/datasets/stocks.csv.gz differ diff --git a/packages/python/plotly/plotly/tests/test_core/test_px/__init__.py b/packages/python/plotly/plotly/tests/test_core/test_px/__init__.py new file mode 100644 index 0000000000..e69de29bb2 diff --git a/packages/python/plotly/plotly/tests/test_core/test_px/test_pandas_backend.py b/packages/python/plotly/plotly/tests/test_core/test_px/test_pandas_backend.py new file mode 100644 index 0000000000..3121341014 --- /dev/null +++ b/packages/python/plotly/plotly/tests/test_core/test_px/test_pandas_backend.py @@ -0,0 +1,30 @@ +import plotly.express as px +import numpy as np +import pandas as pd +import pytest + + +@pytest.mark.skipif( + not hasattr(pd.options.plotting, "backend"), + reason="Currently installed pandas doesn't support plotting backends.", +) +@pytest.mark.parametrize( + "pandas_fn,px_fn", + [ + (lambda df: df.plot(), px.line), + (lambda df: df.plot.scatter("A", "B"), lambda df: px.scatter(df, "A", "B"),), + (lambda df: df.plot.line(), px.line), + (lambda df: df.plot.area(), px.area), + (lambda df: df.plot.bar(), px.bar), + (lambda df: df.plot.barh(), lambda df: px.bar(df, orientation="h")), + (lambda df: df.plot.box(), px.box), + (lambda df: df.plot.hist(), px.histogram), + (lambda df: df.boxplot(), px.box), + (lambda df: df.hist(), px.histogram), + (lambda df: df["A"].hist(), lambda df: px.histogram(df["A"])), + ], +) +def test_pandas_equiv(pandas_fn, px_fn): + pd.options.plotting.backend = "plotly" + df = pd.DataFrame(np.random.randn(100, 4), columns=list("ABCD")).cumsum() + assert pandas_fn(df) == px_fn(df) diff --git a/packages/python/plotly/plotly/tests/test_core/test_px/test_px.py b/packages/python/plotly/plotly/tests/test_core/test_px/test_px.py index 60699e6e21..21b8b0ba0b 100644 --- a/packages/python/plotly/plotly/tests/test_core/test_px/test_px.py +++ b/packages/python/plotly/plotly/tests/test_core/test_px/test_px.py @@ -1,6 +1,7 @@ import plotly.express as px import numpy as np import pytest +from itertools import permutations def test_scatter(): @@ -185,44 +186,40 @@ def test_px_templates(): assert fig.layout.yaxis3.showgrid -def test_orthogonal_orderings(): - from itertools import permutations - - df = px.data.tips() - +def assert_orderings(days_order, days_check, times_order, times_check): symbol_sequence = ["circle", "diamond", "square", "cross"] color_sequence = ["red", "blue"] + fig = px.scatter( + px.data.tips(), + x="total_bill", + y="tip", + facet_row="time", + facet_col="day", + color="time", + symbol="day", + symbol_sequence=symbol_sequence, + color_discrete_sequence=color_sequence, + category_orders=dict(day=days_order, time=times_order), + ) + + for col in range(len(days_check)): + for trace in fig.select_traces(col=col + 1): + assert days_check[col] in trace.hovertemplate - def assert_orderings(days_order, days_check, times_order, times_check): - fig = px.scatter( - df, - x="total_bill", - y="tip", - facet_row="time", - facet_col="day", - color="time", - symbol="day", - symbol_sequence=symbol_sequence, - color_discrete_sequence=color_sequence, - category_orders=dict(day=days_order, time=times_order), - ) - - for col in range(len(days_check)): - for trace in fig.select_traces(col=col + 1): - assert days_check[col] in trace.hovertemplate - - for row in range(len(times_check)): - for trace in fig.select_traces(row=2 - row): - assert times_check[row] in trace.hovertemplate - - for trace in fig.data: - for i, day in enumerate(days_check): - if day in trace.name: - assert trace.marker.symbol == symbol_sequence[i] - for i, time in enumerate(times_check): - if time in trace.name: - assert trace.marker.color == color_sequence[i] + for row in range(len(times_check)): + for trace in fig.select_traces(row=2 - row): + assert times_check[row] in trace.hovertemplate + for trace in fig.data: + for i, day in enumerate(days_check): + if day in trace.name: + assert trace.marker.symbol == symbol_sequence[i] + for i, time in enumerate(times_check): + if time in trace.name: + assert trace.marker.color == color_sequence[i] + + +def test_noisy_orthogonal_orderings(): assert_orderings( ["x", "Sun", "Sat", "y", "Fri", "z"], # add extra noise, missing Thur ["Sun", "Sat", "Fri", "Thur"], # Thur is at the back @@ -230,9 +227,11 @@ def assert_orderings(days_order, days_check, times_order, times_check): ["Lunch", "Dinner"], # Dinner is at the back ) - for days in permutations(df["day"].unique()): - for times in permutations(df["time"].unique()): - assert_orderings(days, days, times, times) + +@pytest.mark.parametrize("days", permutations(["Sun", "Sat", "Fri", "Thur"])) +@pytest.mark.parametrize("times", permutations(["Lunch", "Dinner"])) +def test_orthogonal_orderings(days, times): + assert_orderings(days, days, times, times) def test_permissive_defaults(): diff --git a/packages/python/plotly/plotly/tests/test_core/test_px/test_px_hover.py b/packages/python/plotly/plotly/tests/test_core/test_px/test_px_hover.py index 509f48d199..f63696e8de 100644 --- a/packages/python/plotly/plotly/tests/test_core/test_px/test_px_hover.py +++ b/packages/python/plotly/plotly/tests/test_core/test_px/test_px_hover.py @@ -2,7 +2,6 @@ import numpy as np import pandas as pd import pytest -import plotly.graph_objects as go from collections import OrderedDict # an OrderedDict is needed for Python 2 @@ -74,24 +73,69 @@ def test_newdatain_hover_data(): def test_fail_wrong_column(): - with pytest.raises(ValueError): - fig = px.scatter( + with pytest.raises(ValueError) as err_msg: + px.scatter( {"a": [1, 2], "b": [3, 4], "c": [2, 1]}, x="a", y="b", hover_data={"d": True}, ) - with pytest.raises(ValueError): - fig = px.scatter( + assert ( + "Value of 'hover_data_0' is not the name of a column in 'data_frame'." + in str(err_msg.value) + ) + with pytest.raises(ValueError) as err_msg: + px.scatter( {"a": [1, 2], "b": [3, 4], "c": [2, 1]}, x="a", y="b", hover_data={"d": ":.1f"}, ) - with pytest.raises(ValueError): - fig = px.scatter( + assert ( + "Value of 'hover_data_0' is not the name of a column in 'data_frame'." + in str(err_msg.value) + ) + with pytest.raises(ValueError) as err_msg: + px.scatter( + {"a": [1, 2], "b": [3, 4], "c": [2, 1]}, + x="a", + y="b", + hover_data={"d": [3, 4, 5]}, # d is too long + ) + assert ( + "All arguments should have the same length. The length of hover_data key `d` is 3" + in str(err_msg.value) + ) + with pytest.raises(ValueError) as err_msg: + px.scatter( + {"a": [1, 2], "b": [3, 4], "c": [2, 1]}, + x="a", + y="b", + hover_data={"d": (True, [3, 4, 5])}, # d is too long + ) + assert ( + "All arguments should have the same length. The length of hover_data key `d` is 3" + in str(err_msg.value) + ) + with pytest.raises(ValueError) as err_msg: + px.scatter( + {"a": [1, 2], "b": [3, 4], "c": [2, 1]}, + x="a", + y="b", + hover_data={"c": [3, 4]}, + ) + assert ( + "Ambiguous input: values for 'c' appear both in hover_data and data_frame" + in str(err_msg.value) + ) + with pytest.raises(ValueError) as err_msg: + px.scatter( {"a": [1, 2], "b": [3, 4], "c": [2, 1]}, x="a", y="b", - hover_data={"d": (True, [3, 4, 5])}, + hover_data={"c": (True, [3, 4])}, ) + assert ( + "Ambiguous input: values for 'c' appear both in hover_data and data_frame" + in str(err_msg.value) + ) diff --git a/packages/python/plotly/plotly/tests/test_core/test_px/test_px_input.py b/packages/python/plotly/plotly/tests/test_core/test_px/test_px_input.py index e3786f6af9..e440fefbe8 100644 --- a/packages/python/plotly/plotly/tests/test_core/test_px/test_px_input.py +++ b/packages/python/plotly/plotly/tests/test_core/test_px/test_px_input.py @@ -1,21 +1,10 @@ import plotly.express as px +import plotly.graph_objects as go import numpy as np import pandas as pd import pytest from plotly.express._core import build_dataframe -from pandas.util.testing import assert_frame_equal - -attrables = ( - ["x", "y", "z", "a", "b", "c", "r", "theta", "size", "dimensions"] - + ["custom_data", "hover_name", "hover_data", "text"] - + ["error_x", "error_x_minus"] - + ["error_y", "error_y_minus", "error_z", "error_z_minus"] - + ["lat", "lon", "locations", "animation_group"] -) -array_attrables = ["dimensions", "custom_data", "hover_data"] -group_attrables = ["animation_frame", "facet_row", "facet_col", "line_group"] - -all_attrables = attrables + group_attrables + ["color"] +from pandas.testing import assert_frame_equal def test_numpy(): @@ -49,9 +38,7 @@ def test_with_index(): # We do not allow "x=index" with pytest.raises(ValueError) as err_msg: fig = px.scatter(tips, x="index", y="total_bill") - assert "To use the index, pass it in directly as `df.index`." in str( - err_msg.value - ) + assert "To use the index, pass it in directly as `df.index`." in str(err_msg.value) tips = px.data.tips() tips.index.name = "item" fig = px.scatter(tips, x=tips.index, y="total_bill") @@ -86,10 +73,10 @@ def test_several_dataframes(): # Name conflict with pytest.raises(NameError) as err_msg: fig = px.scatter(df, x="z", y=df2.money, size="y") - assert "A name conflict was encountered for argument y" in str(err_msg.value) + assert "A name conflict was encountered for argument 'y'" in str(err_msg.value) with pytest.raises(NameError) as err_msg: fig = px.scatter(df, x="z", y=df2.money, size=df.y) - assert "A name conflict was encountered for argument y" in str(err_msg.value) + assert "A name conflict was encountered for argument 'y'" in str(err_msg.value) # No conflict when the dataframe is not given, fields are used df = pd.DataFrame(dict(x=[0, 1], y=[3, 4])) @@ -167,42 +154,42 @@ def test_arrayattrable_numpy(): def test_wrong_column_name(): with pytest.raises(ValueError) as err_msg: - fig = px.scatter(px.data.tips(), x="bla", y="wrong") - assert "Value of 'x' is not the name of a column in 'data_frame'" in str( - err_msg.value - ) + px.scatter(px.data.tips(), x="bla", y="wrong") + assert "Value of 'x' is not the name of a column in 'data_frame'" in str( + err_msg.value + ) def test_missing_data_frame(): with pytest.raises(ValueError) as err_msg: - fig = px.scatter(x="arg1", y="arg2") - assert "String or int arguments are only possible" in str(err_msg.value) + px.scatter(x="arg1", y="arg2") + assert "String or int arguments are only possible" in str(err_msg.value) def test_wrong_dimensions_of_array(): with pytest.raises(ValueError) as err_msg: - fig = px.scatter(x=[1, 2, 3], y=[2, 3, 4, 5]) - assert "All arguments should have the same length." in str(err_msg.value) + px.scatter(x=[1, 2, 3], y=[2, 3, 4, 5]) + assert "All arguments should have the same length." in str(err_msg.value) def test_wrong_dimensions_mixed_case(): with pytest.raises(ValueError) as err_msg: df = pd.DataFrame(dict(time=[1, 2, 3], temperature=[20, 30, 25])) - fig = px.scatter(df, x="time", y="temperature", color=[1, 3, 9, 5]) - assert "All arguments should have the same length." in str(err_msg.value) + px.scatter(df, x="time", y="temperature", color=[1, 3, 9, 5]) + assert "All arguments should have the same length." in str(err_msg.value) def test_wrong_dimensions(): with pytest.raises(ValueError) as err_msg: - fig = px.scatter(px.data.tips(), x="tip", y=[1, 2, 3]) - assert "All arguments should have the same length." in str(err_msg.value) + px.scatter(px.data.tips(), x="tip", y=[1, 2, 3]) + assert "All arguments should have the same length." in str(err_msg.value) # the order matters with pytest.raises(ValueError) as err_msg: - fig = px.scatter(px.data.tips(), x=[1, 2, 3], y="tip") - assert "All arguments should have the same length." in str(err_msg.value) + px.scatter(px.data.tips(), x=[1, 2, 3], y="tip") + assert "All arguments should have the same length." in str(err_msg.value) with pytest.raises(ValueError): - fig = px.scatter(px.data.tips(), x=px.data.iris().index, y="tip") - # assert "All arguments should have the same length." in str(err_msg.value) + px.scatter(px.data.tips(), x=px.data.iris().index, y="tip") + assert "All arguments should have the same length." in str(err_msg.value) def test_multiindex_raise_error(): @@ -211,12 +198,10 @@ def test_multiindex_raise_error(): ) df = pd.DataFrame(np.random.random((6, 3)), index=index, columns=["A", "B", "C"]) # This is ok - fig = px.scatter(df, x="A", y="B") + px.scatter(df, x="A", y="B") with pytest.raises(TypeError) as err_msg: - fig = px.scatter(df, x=df.index, y="B") - assert "pandas MultiIndex is not supported by plotly express" in str( - err_msg.value - ) + px.scatter(df, x=df.index, y="B") + assert "pandas MultiIndex is not supported by plotly express" in str(err_msg.value) def test_build_df_from_lists(): @@ -225,7 +210,7 @@ def test_build_df_from_lists(): output = {key: key for key in args} df = pd.DataFrame(args) args["data_frame"] = None - out = build_dataframe(args, all_attrables, array_attrables) + out = build_dataframe(args, go.Scatter) assert_frame_equal(df.sort_index(axis=1), out["data_frame"].sort_index(axis=1)) out.pop("data_frame") assert out == output @@ -235,7 +220,7 @@ def test_build_df_from_lists(): output = {key: key for key in args} df = pd.DataFrame(args) args["data_frame"] = None - out = build_dataframe(args, all_attrables, array_attrables) + out = build_dataframe(args, go.Scatter) assert_frame_equal(df.sort_index(axis=1), out["data_frame"].sort_index(axis=1)) out.pop("data_frame") assert out == output @@ -244,25 +229,27 @@ def test_build_df_from_lists(): def test_build_df_with_index(): tips = px.data.tips() args = dict(data_frame=tips, x=tips.index, y="total_bill") - out = build_dataframe(args, all_attrables, array_attrables) + out = build_dataframe(args, go.Scatter) assert_frame_equal(tips.reset_index()[out["data_frame"].columns], out["data_frame"]) def test_non_matching_index(): df = pd.DataFrame(dict(y=[1, 2, 3]), index=["a", "b", "c"]) - expected = pd.DataFrame(dict(x=["a", "b", "c"], y=[1, 2, 3])) + expected = pd.DataFrame(dict(index=["a", "b", "c"], y=[1, 2, 3])) args = dict(data_frame=df, x=df.index, y="y") - out = build_dataframe(args, all_attrables, array_attrables) + out = build_dataframe(args, go.Scatter) assert_frame_equal(expected, out["data_frame"]) + expected = pd.DataFrame(dict(x=["a", "b", "c"], y=[1, 2, 3])) + args = dict(data_frame=None, x=df.index, y=df.y) - out = build_dataframe(args, all_attrables, array_attrables) + out = build_dataframe(args, go.Scatter) assert_frame_equal(expected, out["data_frame"]) args = dict(data_frame=None, x=["a", "b", "c"], y=df.y) - out = build_dataframe(args, all_attrables, array_attrables) + out = build_dataframe(args, go.Scatter) assert_frame_equal(expected, out["data_frame"]) @@ -299,7 +286,7 @@ def test_arguments_not_modified(): iris = px.data.iris() petal_length = iris.petal_length hover_data = [iris.sepal_length] - fig = px.scatter(iris, x=petal_length, y="petal_width", hover_data=hover_data) + px.scatter(iris, x=petal_length, y="petal_width", hover_data=hover_data) assert iris.petal_length.equals(petal_length) assert iris.sepal_length.equals(hover_data[0]) @@ -323,3 +310,214 @@ def test_size_column(): df = px.data.tips() fig = px.scatter(df, x=df["size"], y=df.tip) assert fig.data[0].hovertemplate == "size=%{x}
tip=%{y}" + + +def test_identity_map(): + fig = px.scatter( + x=[1, 2], + y=[1, 2], + symbol=["a", "b"], + color=["red", "blue"], + color_discrete_map=px.IdentityMap(), + ) + assert fig.data[0].marker.color == "red" + assert fig.data[1].marker.color == "blue" + assert "color=" not in fig.data[0].hovertemplate + assert "symbol=" in fig.data[0].hovertemplate + assert fig.layout.legend.title.text == "symbol" + + fig = px.scatter( + x=[1, 2], + y=[1, 2], + symbol=["a", "b"], + color=["red", "blue"], + color_discrete_map="identity", + ) + assert fig.data[0].marker.color == "red" + assert fig.data[1].marker.color == "blue" + assert "color=" not in fig.data[0].hovertemplate + assert "symbol=" in fig.data[0].hovertemplate + assert fig.layout.legend.title.text == "symbol" + + +def test_constants(): + fig = px.scatter(x=px.Constant(1), y=[1, 2]) + assert fig.data[0].x[0] == 1 + assert fig.data[0].x[1] == 1 + assert "x=" in fig.data[0].hovertemplate + + fig = px.scatter(x=px.Constant(1, label="time"), y=[1, 2]) + assert fig.data[0].x[0] == 1 + assert fig.data[0].x[1] == 1 + assert "x=" not in fig.data[0].hovertemplate + assert "time=" in fig.data[0].hovertemplate + + fig = px.scatter( + x=[1, 2], + y=[1, 2], + symbol=["a", "b"], + color=px.Constant("red", label="the_identity_label"), + hover_data=[px.Constant("data", label="the_data")], + color_discrete_map=px.IdentityMap(), + ) + assert fig.data[0].marker.color == "red" + assert fig.data[0].customdata[0][0] == "data" + assert fig.data[1].marker.color == "red" + assert "color=" not in fig.data[0].hovertemplate + assert "the_identity_label=" not in fig.data[0].hovertemplate + assert "symbol=" in fig.data[0].hovertemplate + assert "the_data=" in fig.data[0].hovertemplate + assert fig.layout.legend.title.text == "symbol" + + +def test_ranges(): + fig = px.scatter(x=px.Range(), y=[1, 2], hover_data=[px.Range()]) + assert fig.data[0].x[0] == 0 + assert fig.data[0].x[1] == 1 + assert fig.data[0].customdata[0][0] == 0 + assert fig.data[0].customdata[1][0] == 1 + assert "x=" in fig.data[0].hovertemplate + + fig = px.scatter(x=px.Range(label="time"), y=[1, 2]) + assert fig.data[0].x[0] == 0 + assert fig.data[0].x[1] == 1 + assert "x=" not in fig.data[0].hovertemplate + assert "time=" in fig.data[0].hovertemplate + + +@pytest.mark.parametrize( + "fn", + [px.scatter, px.line, px.area, px.violin, px.box, px.strip] + + [px.bar, px.funnel, px.histogram], +) +@pytest.mark.parametrize( + "x,y,result", + [ + ("numerical", "categorical", "h"), + ("categorical", "numerical", "v"), + ("categorical", "categorical", "v"), + ("numerical", "numerical", "v"), + ("numerical", "none", "h"), + ("categorical", "none", "h"), + ("none", "categorical", "v"), + ("none", "numerical", "v"), + ], +) +def test_auto_orient_x_and_y(fn, x, y, result): + series = dict(categorical=["a", "a", "b", "b"], numerical=[1, 2, 3, 4], none=None) + + if "none" not in [x, y]: + assert fn(x=series[x], y=series[y]).data[0].orientation == result + else: + if fn == px.histogram or (fn == px.bar and "categorical" in [x, y]): + assert fn(x=series[x], y=series[y]).data[0].orientation != result + else: + assert fn(x=series[x], y=series[y]).data[0].orientation == result + + +def test_histogram_auto_orient(): + numerical = [1, 2, 3, 4] + assert px.histogram(x=numerical, nbins=5).data[0].nbinsx == 5 + assert px.histogram(y=numerical, nbins=5).data[0].nbinsy == 5 + assert px.histogram(x=numerical, y=numerical, nbins=5).data[0].nbinsx == 5 + + +def test_auto_histfunc(): + a = [1, 2] + assert px.histogram(x=a).data[0].histfunc is None + assert px.histogram(y=a).data[0].histfunc is None + assert px.histogram(x=a, y=a).data[0].histfunc == "sum" + assert px.histogram(x=a, y=a, histfunc="avg").data[0].histfunc == "avg" + + assert px.density_heatmap(x=a, y=a).data[0].histfunc is None + assert px.density_heatmap(x=a, y=a, z=a).data[0].histfunc == "sum" + assert px.density_heatmap(x=a, y=a, z=a, histfunc="avg").data[0].histfunc == "avg" + + +@pytest.mark.parametrize( + "fn,mode", [(px.violin, "violinmode"), (px.box, "boxmode"), (px.strip, "boxmode")] +) +@pytest.mark.parametrize( + "x,y,color,result", + [ + ("categorical1", "numerical", None, "group"), + ("categorical1", "numerical", "categorical2", "group"), + ("categorical1", "numerical", "categorical1", "overlay"), + ("numerical", "categorical1", None, "group"), + ("numerical", "categorical1", "categorical2", "group"), + ("numerical", "categorical1", "categorical1", "overlay"), + ], +) +def test_auto_boxlike_overlay(fn, mode, x, y, color, result): + df = pd.DataFrame( + dict( + categorical1=["a", "a", "b", "b"], + categorical2=["a", "a", "b", "b"], + numerical=[1, 2, 3, 4], + ) + ) + assert fn(df, x=x, y=y, color=color).layout[mode] == result + + +@pytest.mark.parametrize("fn", [px.scatter, px.line, px.area, px.bar]) +def test_x_or_y(fn): + categorical = ["a", "a", "b", "b"] + numerical = [1, 2, 3, 4] + constant = [1, 1, 1, 1] + range_4 = [0, 1, 2, 3] + index = [11, 12, 13, 14] + numerical_df = pd.DataFrame(dict(col=numerical), index=index) + categorical_df = pd.DataFrame(dict(col=categorical), index=index) + + fig = fn(x=numerical) + assert list(fig.data[0].x) == numerical + assert list(fig.data[0].y) == range_4 + assert fig.data[0].orientation == "h" + fig = fn(y=numerical) + assert list(fig.data[0].x) == range_4 + assert list(fig.data[0].y) == numerical + assert fig.data[0].orientation == "v" + fig = fn(numerical_df, x="col") + assert list(fig.data[0].x) == numerical + assert list(fig.data[0].y) == index + assert fig.data[0].orientation == "h" + fig = fn(numerical_df, y="col") + assert list(fig.data[0].x) == index + assert list(fig.data[0].y) == numerical + assert fig.data[0].orientation == "v" + + if fn != px.bar: + fig = fn(x=categorical) + assert list(fig.data[0].x) == categorical + assert list(fig.data[0].y) == range_4 + assert fig.data[0].orientation == "h" + fig = fn(y=categorical) + assert list(fig.data[0].x) == range_4 + assert list(fig.data[0].y) == categorical + assert fig.data[0].orientation == "v" + fig = fn(categorical_df, x="col") + assert list(fig.data[0].x) == categorical + assert list(fig.data[0].y) == index + assert fig.data[0].orientation == "h" + fig = fn(categorical_df, y="col") + assert list(fig.data[0].x) == index + assert list(fig.data[0].y) == categorical + assert fig.data[0].orientation == "v" + + else: + fig = fn(x=categorical) + assert list(fig.data[0].x) == categorical + assert list(fig.data[0].y) == constant + assert fig.data[0].orientation == "v" + fig = fn(y=categorical) + assert list(fig.data[0].x) == constant + assert list(fig.data[0].y) == categorical + assert fig.data[0].orientation == "h" + fig = fn(categorical_df, x="col") + assert list(fig.data[0].x) == categorical + assert list(fig.data[0].y) == constant + assert fig.data[0].orientation == "v" + fig = fn(categorical_df, y="col") + assert list(fig.data[0].x) == constant + assert list(fig.data[0].y) == categorical + assert fig.data[0].orientation == "h" diff --git a/packages/python/plotly/plotly/tests/test_core/test_px/test_px_wide.py b/packages/python/plotly/plotly/tests/test_core/test_px/test_px_wide.py new file mode 100644 index 0000000000..2c49b4bb63 --- /dev/null +++ b/packages/python/plotly/plotly/tests/test_core/test_px/test_px_wide.py @@ -0,0 +1,744 @@ +import plotly.express as px +import plotly.graph_objects as go +import pandas as pd +from plotly.express._core import build_dataframe, _is_col_list +from pandas.testing import assert_frame_equal +import pytest + + +def test_is_col_list(): + df_input = pd.DataFrame(dict(a=[1, 2], b=[1, 2])) + assert _is_col_list(df_input, ["a"]) + assert _is_col_list(df_input, ["a", "b"]) + assert _is_col_list(df_input, [[3, 4]]) + assert _is_col_list(df_input, [[3, 4], [3, 4]]) + assert not _is_col_list(df_input, pytest) + assert not _is_col_list(df_input, False) + assert not _is_col_list(df_input, ["a", 1]) + assert not _is_col_list(df_input, "a") + assert not _is_col_list(df_input, 1) + assert not _is_col_list(df_input, ["a", "b", "c"]) + assert not _is_col_list(df_input, [1, 2]) + df_input = pd.DataFrame([[1, 2], [1, 2]]) + assert _is_col_list(df_input, [0]) + assert _is_col_list(df_input, [0, 1]) + assert _is_col_list(df_input, [[3, 4]]) + assert _is_col_list(df_input, [[3, 4], [3, 4]]) + assert not _is_col_list(df_input, pytest) + assert not _is_col_list(df_input, False) + assert not _is_col_list(df_input, ["a", 1]) + assert not _is_col_list(df_input, "a") + assert not _is_col_list(df_input, 1) + assert not _is_col_list(df_input, [0, 1, 2]) + assert not _is_col_list(df_input, ["a", "b"]) + df_input = None + assert _is_col_list(df_input, [[3, 4]]) + assert _is_col_list(df_input, [[3, 4], [3, 4]]) + assert not _is_col_list(df_input, [0]) + assert not _is_col_list(df_input, [0, 1]) + assert not _is_col_list(df_input, pytest) + assert not _is_col_list(df_input, False) + assert not _is_col_list(df_input, ["a", 1]) + assert not _is_col_list(df_input, "a") + assert not _is_col_list(df_input, 1) + assert not _is_col_list(df_input, [0, 1, 2]) + assert not _is_col_list(df_input, ["a", "b"]) + + +@pytest.mark.parametrize( + "px_fn", + [px.scatter, px.line, px.area, px.bar, px.violin, px.box, px.strip] + + [px.histogram, px.funnel, px.density_contour, px.density_heatmap], +) +@pytest.mark.parametrize("orientation", [None, "v", "h"]) +@pytest.mark.parametrize("style", ["implicit", "explicit"]) +def test_wide_mode_external(px_fn, orientation, style): + # here we test this feature "black box" style by calling actual PX functions and + # inspecting the figure... this is important but clunky, and is mostly a smoke test + # allowing us to do more "white box" testing below + + if px_fn != px.funnel: + x, y = ("y", "x") if orientation == "h" else ("x", "y") + else: + x, y = ("y", "x") if orientation != "v" else ("x", "y") + xaxis, yaxis = x + "axis", y + "axis" + + df = pd.DataFrame(dict(a=[1, 2, 3], b=[4, 5, 6], c=[7, 8, 9]), index=[11, 12, 13]) + if style == "implicit": + fig = px_fn(df, orientation=orientation) + + if px_fn in [px.scatter, px.line, px.area, px.bar, px.funnel, px.density_contour]: + if style == "explicit": + fig = px_fn(**{"data_frame": df, y: list(df.columns), x: df.index}) + assert len(fig.data) == 3 + assert list(fig.data[0][x]) == [11, 12, 13] + assert list(fig.data[0][y]) == [1, 2, 3] + assert list(fig.data[1][x]) == [11, 12, 13] + assert list(fig.data[1][y]) == [4, 5, 6] + assert fig.layout[xaxis].title.text == "index" + assert fig.layout[yaxis].title.text == "value" + assert fig.layout.legend.title.text == "variable" + if px_fn in [px.density_heatmap]: + if style == "explicit": + fig = px_fn(**{"data_frame": df, y: list(df.columns), x: df.index}) + assert len(fig.data) == 1 + assert list(fig.data[0][x]) == [11, 12, 13, 11, 12, 13, 11, 12, 13] + assert list(fig.data[0][y]) == [1, 2, 3, 4, 5, 6, 7, 8, 9] + assert fig.layout[xaxis].title.text == "index" + assert fig.layout[yaxis].title.text == "value" + if px_fn in [px.violin, px.box, px.strip]: + if style == "explicit": + fig = px_fn(**{"data_frame": df, y: list(df.columns)}) + assert len(fig.data) == 1 + assert list(fig.data[0][x]) == ["a"] * 3 + ["b"] * 3 + ["c"] * 3 + assert list(fig.data[0][y]) == list(range(1, 10)) + assert fig.layout[yaxis].title.text == "value" + assert fig.layout[xaxis].title.text == "variable" + if px_fn in [px.histogram]: + if style == "explicit": + fig = px_fn(**{"data_frame": df, x: list(df.columns)}) + assert len(fig.data) == 3 + assert list(fig.data[1][x]) == [4, 5, 6] + assert fig.layout.legend.title.text == "variable" + assert fig.layout[xaxis].title.text == "value" + + +def test_wide_mode_labels_external(): + # here we prove that the _uglylabels_ can be renamed using the usual labels kwarg + df = pd.DataFrame(dict(a=[1, 2, 3], b=[4, 5, 6], c=[7, 8, 9]), index=[11, 12, 13]) + fig = px.bar(df) + assert fig.layout.xaxis.title.text == "index" + assert fig.layout.yaxis.title.text == "value" + assert fig.layout.legend.title.text == "variable" + labels = dict(index="my index", value="my value", variable="my column") + fig = px.bar(df, labels=labels) + assert fig.layout.xaxis.title.text == "my index" + assert fig.layout.yaxis.title.text == "my value" + assert fig.layout.legend.title.text == "my column" + df.index.name = "my index" + df.columns.name = "my column" + fig = px.bar(df) + assert fig.layout.xaxis.title.text == "my index" + assert fig.layout.yaxis.title.text == "value" + assert fig.layout.legend.title.text == "my column" + + +# here we do basic exhaustive testing of the various graph_object permutations +# via build_dataframe directly, which leads to more compact test code: +# we pass in args (which includes df) and look at how build_dataframe mutates +# both args and the df, and assume that since the rest of the downstream PX +# machinery has no wide-mode-specific code, and the tests above pass, that this is +# enough to prove things work +@pytest.mark.parametrize( + "trace_type,x,y,color", + [ + (go.Scatter, "index", "value", "variable"), + (go.Histogram2dContour, "index", "value", "variable"), + (go.Histogram2d, "index", "value", None), + (go.Bar, "index", "value", "variable"), + (go.Funnel, "index", "value", "variable"), + (go.Box, "variable", "value", None), + (go.Violin, "variable", "value", None), + (go.Histogram, "value", None, "variable"), + ], +) +@pytest.mark.parametrize("orientation", [None, "v", "h"]) +def test_wide_mode_internal(trace_type, x, y, color, orientation): + df_in = pd.DataFrame(dict(a=[1, 2, 3], b=[4, 5, 6]), index=[11, 12, 13]) + args_in = dict(data_frame=df_in, color=None, orientation=orientation) + args_out = build_dataframe(args_in, trace_type) + df_out = args_out.pop("data_frame") + expected = dict(variable=["a", "a", "a", "b", "b", "b"], value=[1, 2, 3, 4, 5, 6],) + if x == "index": + expected["index"] = [11, 12, 13, 11, 12, 13] + assert_frame_equal( + df_out.sort_index(axis=1), pd.DataFrame(expected).sort_index(axis=1), + ) + if trace_type in [go.Histogram2dContour, go.Histogram2d]: + if orientation is None or orientation == "v": + assert args_out == dict(x=x, y=y, color=color) + else: + assert args_out == dict(x=y, y=x, color=color) + else: + if (orientation is None and trace_type != go.Funnel) or orientation == "v": + assert args_out == dict(x=x, y=y, color=color, orientation="v") + else: + assert args_out == dict(x=y, y=x, color=color, orientation="h") + + +cases = [] +for transpose in [True, False]: + for tt in [go.Scatter, go.Bar, go.Funnel, go.Histogram2dContour, go.Histogram2d]: + color = None if tt == go.Histogram2d else "variable" + df_in = dict(a=[1, 2], b=[3, 4]) + args = dict(x=None, y=["a", "b"], color=None, orientation=None) + df_exp = dict( + variable=["a", "a", "b", "b"], value=[1, 2, 3, 4], index=[0, 1, 0, 1], + ) + cases.append((tt, df_in, args, "index", "value", color, df_exp, transpose)) + + df_in = dict(a=[1, 2], b=[3, 4], c=[5, 6]) + args = dict(x="c", y=["a", "b"], color=None, orientation=None) + df_exp = dict( + variable=["a", "a", "b", "b"], value=[1, 2, 3, 4], c=[5, 6, 5, 6], + ) + cases.append((tt, df_in, args, "c", "value", color, df_exp, transpose)) + + args = dict(x=None, y=[[1, 2], [3, 4]], color=None, orientation=None) + df_exp = dict( + variable=[ + "wide_variable_0", + "wide_variable_0", + "wide_variable_1", + "wide_variable_1", + ], + value=[1, 2, 3, 4], + index=[0, 1, 0, 1], + ) + cases.append((tt, None, args, "index", "value", color, df_exp, transpose)) + + for tt in [go.Bar]: # bar categorical exception + df_in = dict(a=["q", "r"], b=["s", "t"]) + args = dict(x=None, y=["a", "b"], color=None, orientation=None) + df_exp = dict( + variable=["a", "a", "b", "b"], + value=["q", "r", "s", "t"], + index=[0, 1, 0, 1], + count=[1, 1, 1, 1], + ) + cases.append((tt, df_in, args, "value", "count", "variable", df_exp, transpose)) + + for tt in [go.Violin, go.Box]: + df_in = dict(a=[1, 2], b=[3, 4]) + args = dict(x=None, y=["a", "b"], color=None, orientation=None) + df_exp = dict(variable=["a", "a", "b", "b"], value=[1, 2, 3, 4],) + cases.append((tt, df_in, args, "variable", "value", None, df_exp, transpose)) + + df_in = dict(a=[1, 2], b=[3, 4], c=[5, 6]) + args = dict(x="c", y=["a", "b"], color=None, orientation=None) + df_exp = dict( + variable=["a", "a", "b", "b"], value=[1, 2, 3, 4], c=[5, 6, 5, 6], + ) + cases.append((tt, df_in, args, "c", "value", None, df_exp, transpose)) + + args = dict(x=None, y=[[1, 2], [3, 4]], color=None, orientation=None) + df_exp = dict( + variable=[ + "wide_variable_0", + "wide_variable_0", + "wide_variable_1", + "wide_variable_1", + ], + value=[1, 2, 3, 4], + ) + cases.append((tt, None, args, "variable", "value", None, df_exp, transpose)) + + for tt in [go.Histogram]: + df_in = dict(a=[1, 2], b=[3, 4]) + args = dict(x=None, y=["a", "b"], color=None, orientation=None) + df_exp = dict(variable=["a", "a", "b", "b"], value=[1, 2, 3, 4],) + cases.append((tt, df_in, args, None, "value", "variable", df_exp, transpose)) + + df_in = dict(a=[1, 2], b=[3, 4], c=[5, 6]) + args = dict(x="c", y=["a", "b"], color=None, orientation=None) + df_exp = dict( + variable=["a", "a", "b", "b"], value=[1, 2, 3, 4], c=[5, 6, 5, 6], + ) + cases.append((tt, df_in, args, "c", "value", "variable", df_exp, transpose)) + + args = dict(x=None, y=[[1, 2], [3, 4]], color=None, orientation=None) + df_exp = dict( + variable=[ + "wide_variable_0", + "wide_variable_0", + "wide_variable_1", + "wide_variable_1", + ], + value=[1, 2, 3, 4], + ) + cases.append((tt, None, args, None, "value", "variable", df_exp, transpose)) + + +@pytest.mark.parametrize("tt,df_in,args_in,x,y,color,df_out_exp,transpose", cases) +def test_wide_x_or_y(tt, df_in, args_in, x, y, color, df_out_exp, transpose): + if transpose: + args_in["y"], args_in["x"] = args_in["x"], args_in["y"] + args_in["data_frame"] = df_in + args_out = build_dataframe(args_in, tt) + df_out = args_out.pop("data_frame").sort_index(axis=1) + assert_frame_equal(df_out, pd.DataFrame(df_out_exp).sort_index(axis=1)) + if transpose: + args_exp = dict(x=y, y=x, color=color) + else: + args_exp = dict(x=x, y=y, color=color) + if tt not in [go.Histogram2dContour, go.Histogram2d]: + orientation_exp = args_in["orientation"] + if (args_in["x"] is None) != (args_in["y"] is None) and tt != go.Histogram: + orientation_exp = "h" if transpose else "v" + args_exp["orientation"] = orientation_exp + assert args_out == args_exp + + +@pytest.mark.parametrize("orientation", [None, "v", "h"]) +def test_wide_mode_internal_bar_exception(orientation): + df_in = pd.DataFrame(dict(a=["q", "r", "s"], b=["t", "u", "v"]), index=[11, 12, 13]) + args_in = dict(data_frame=df_in, color=None, orientation=orientation) + args_out = build_dataframe(args_in, go.Bar) + df_out = args_out.pop("data_frame") + assert_frame_equal( + df_out.sort_index(axis=1), + pd.DataFrame( + dict( + index=[11, 12, 13, 11, 12, 13], + variable=["a", "a", "a", "b", "b", "b"], + value=["q", "r", "s", "t", "u", "v"], + count=[1, 1, 1, 1, 1, 1], + ) + ).sort_index(axis=1), + ) + if orientation is None or orientation == "v": + assert args_out == dict(x="value", y="count", color="variable", orientation="v") + else: + assert args_out == dict(x="count", y="value", color="variable", orientation="h") + + +# given all of the above tests, and given that the melt() code is not sensitive +# to the trace type, we can do all sorts of special-case testing just by focusing +# on build_dataframe(args, go.Scatter) for various values of args, and looking at +# how args and df get mutated +special_cases = [] + + +def append_special_case(df_in, args_in, args_expect, df_expect): + special_cases.append((df_in, args_in, args_expect, df_expect)) + + +# input is single bare array: column comes out as string "0" +append_special_case( + df_in=[1, 2, 3], + args_in=dict(x=None, y=None, color=None), + args_expect=dict(x="index", y="value", color="variable", orientation="v"), + df_expect=pd.DataFrame( + dict(index=[0, 1, 2], value=[1, 2, 3], variable=["0", "0", "0"]) + ), +) + +# input is single bare Series: column comes out as string "0" +append_special_case( + df_in=pd.Series([1, 2, 3]), + args_in=dict(x=None, y=None, color=None), + args_expect=dict(x="index", y="value", color="variable", orientation="v"), + df_expect=pd.DataFrame( + dict(index=[0, 1, 2], value=[1, 2, 3], variable=["0", "0", "0"]) + ), +) + +# input is a Series from a DF: we pick up the name and index values automatically +df = pd.DataFrame(dict(my_col=[1, 2, 3]), index=["a", "b", "c"]) +append_special_case( + df_in=df["my_col"], + args_in=dict(x=None, y=None, color=None), + args_expect=dict(x="index", y="value", color="variable", orientation="v"), + df_expect=pd.DataFrame( + dict( + index=["a", "b", "c"], + value=[1, 2, 3], + variable=["my_col", "my_col", "my_col"], + ) + ), +) + +# input is an index from a DF: treated like a Series basically +df = pd.DataFrame(dict(my_col=[1, 2, 3]), index=["a", "b", "c"]) +df.index.name = "my_index" +append_special_case( + df_in=df.index, + args_in=dict(x=None, y=None, color=None), + args_expect=dict(x="index", y="value", color="variable", orientation="v"), + df_expect=pd.DataFrame( + dict( + index=[0, 1, 2], + value=["a", "b", "c"], + variable=["my_index", "my_index", "my_index"], + ) + ), +) + +# input is a data frame with named row and col indices: we grab those +df = pd.DataFrame(dict(my_col=[1, 2, 3]), index=["a", "b", "c"]) +df.index.name = "my_index" +df.columns.name = "my_col_name" +append_special_case( + df_in=df, + args_in=dict(x=None, y=None, color=None), + args_expect=dict(x="my_index", y="value", color="my_col_name", orientation="v"), + df_expect=pd.DataFrame( + dict( + my_index=["a", "b", "c"], + value=[1, 2, 3], + my_col_name=["my_col", "my_col", "my_col"], + ) + ), +) + +# input is array of arrays: treated as rows, columns come out as string "0", "1" +append_special_case( + df_in=[[1, 2], [4, 5]], + args_in=dict(x=None, y=None, color=None), + args_expect=dict(x="index", y="value", color="variable", orientation="v"), + df_expect=pd.DataFrame( + dict(index=[0, 1, 0, 1], value=[1, 4, 2, 5], variable=["0", "0", "1", "1"],) + ), +) + +# partial-melting by assigning symbol: we pick up that column and don't melt it +append_special_case( + df_in=pd.DataFrame(dict(a=[1, 2], b=[3, 4], symbol_col=["q", "r"])), + args_in=dict(x=None, y=None, color=None, symbol="symbol_col"), + args_expect=dict( + x="index", y="value", color="variable", symbol="symbol_col", orientation="v", + ), + df_expect=pd.DataFrame( + dict( + index=[0, 1, 0, 1], + value=[1, 2, 3, 4], + variable=["a", "a", "b", "b"], + symbol_col=["q", "r", "q", "r"], + ) + ), +) + +# partial-melting by assigning the same column twice: we pick it up once +append_special_case( + df_in=pd.DataFrame(dict(a=[1, 2], b=[3, 4], symbol_col=["q", "r"])), + args_in=dict( + x=None, y=None, color=None, symbol="symbol_col", custom_data=["symbol_col"], + ), + args_expect=dict( + x="index", + y="value", + color="variable", + symbol="symbol_col", + custom_data=["symbol_col"], + orientation="v", + ), + df_expect=pd.DataFrame( + dict( + index=[0, 1, 0, 1], + value=[1, 2, 3, 4], + variable=["a", "a", "b", "b"], + symbol_col=["q", "r", "q", "r"], + ) + ), +) + +# partial-melting by assigning more than one column: we pick them both up +append_special_case( + df_in=pd.DataFrame( + dict(a=[1, 2], b=[3, 4], symbol_col=["q", "r"], data_col=["i", "j"]) + ), + args_in=dict( + x=None, y=None, color=None, symbol="symbol_col", custom_data=["data_col"], + ), + args_expect=dict( + x="index", + y="value", + color="variable", + symbol="symbol_col", + custom_data=["data_col"], + orientation="v", + ), + df_expect=pd.DataFrame( + dict( + index=[0, 1, 0, 1], + value=[1, 2, 3, 4], + variable=["a", "a", "b", "b"], + symbol_col=["q", "r", "q", "r"], + data_col=["i", "j", "i", "j"], + ) + ), +) + +# partial-melting by assigning symbol to a bare array: we pick it up with the attr name +append_special_case( + df_in=pd.DataFrame(dict(a=[1, 2], b=[3, 4])), + args_in=dict(x=None, y=None, color=None, symbol=["q", "r"]), + args_expect=dict( + x="index", y="value", color="variable", symbol="symbol", orientation="v" + ), + df_expect=pd.DataFrame( + dict( + index=[0, 1, 0, 1], + value=[1, 2, 3, 4], + variable=["a", "a", "b", "b"], + symbol=["q", "r", "q", "r"], + ) + ), +) + +# assigning color to variable explicitly: just works +append_special_case( + df_in=pd.DataFrame(dict(a=[1, 2], b=[3, 4])), + args_in=dict(x=None, y=None, color="variable"), + args_expect=dict(x="index", y="value", color="variable", orientation="v"), + df_expect=pd.DataFrame( + dict(index=[0, 1, 0, 1], value=[1, 2, 3, 4], variable=["a", "a", "b", "b"]) + ), +) + +# assigning color to a different column: variable drops out of args +append_special_case( + df_in=pd.DataFrame(dict(a=[1, 2], b=[3, 4], color_col=["q", "r"])), + args_in=dict(x=None, y=None, color="color_col"), + args_expect=dict(x="index", y="value", color="color_col", orientation="v"), + df_expect=pd.DataFrame( + dict( + index=[0, 1, 0, 1], + value=[1, 2, 3, 4], + variable=["a", "a", "b", "b"], + color_col=["q", "r", "q", "r"], + ) + ), +) + +# assigning variable to something else: just works +append_special_case( + df_in=pd.DataFrame(dict(a=[1, 2], b=[3, 4])), + args_in=dict(x=None, y=None, color=None, symbol="variable"), + args_expect=dict( + x="index", y="value", color="variable", symbol="variable", orientation="v" + ), + df_expect=pd.DataFrame( + dict(index=[0, 1, 0, 1], value=[1, 2, 3, 4], variable=["a", "a", "b", "b"],) + ), +) + +# swapping symbol and color: just works +append_special_case( + df_in=pd.DataFrame(dict(a=[1, 2], b=[3, 4], color_col=["q", "r"])), + args_in=dict(x=None, y=None, color="color_col", symbol="variable"), + args_expect=dict( + x="index", y="value", color="color_col", symbol="variable", orientation="v", + ), + df_expect=pd.DataFrame( + dict( + index=[0, 1, 0, 1], + value=[1, 2, 3, 4], + variable=["a", "a", "b", "b"], + color_col=["q", "r", "q", "r"], + ) + ), +) + +# a DF with a named column index: have to use that instead of variable +df = pd.DataFrame(dict(a=[1, 2], b=[3, 4])) +df.columns.name = "my_col_name" +append_special_case( + df_in=df, + args_in=dict(x=None, y=None, color=None, facet_row="my_col_name"), + args_expect=dict( + x="index", + y="value", + color="my_col_name", + facet_row="my_col_name", + orientation="v", + ), + df_expect=pd.DataFrame( + dict(index=[0, 1, 0, 1], value=[1, 2, 3, 4], my_col_name=["a", "a", "b", "b"],) + ), +) + +# passing the DF index into some other attr: works +df = pd.DataFrame(dict(a=[1, 2], b=[3, 4])) +df.columns.name = "my_col_name" +df.index.name = "my_index_name" +append_special_case( + df_in=df, + args_in=dict(x=None, y=None, color=None, hover_name=df.index), + args_expect=dict( + x="my_index_name", + y="value", + color="my_col_name", + hover_name="my_index_name", + orientation="v", + ), + df_expect=pd.DataFrame( + dict( + my_index_name=[0, 1, 0, 1], + value=[1, 2, 3, 4], + my_col_name=["a", "a", "b", "b"], + ) + ), +) + +# assigning value to something: works +df = pd.DataFrame(dict(a=[1, 2], b=[3, 4])) +df.columns.name = "my_col_name" +df.index.name = "my_index_name" +append_special_case( + df_in=df, + args_in=dict(x=None, y=None, color=None, hover_name="value"), + args_expect=dict( + x="my_index_name", + y="value", + color="my_col_name", + hover_name="value", + orientation="v", + ), + df_expect=pd.DataFrame( + dict( + my_index_name=[0, 1, 0, 1], + value=[1, 2, 3, 4], + my_col_name=["a", "a", "b", "b"], + ) + ), +) + +# assigning a px.Constant: works +df = pd.DataFrame(dict(a=[1, 2], b=[3, 4])) +df.columns.name = "my_col_name" +df.index.name = "my_index_name" +append_special_case( + df_in=df, + args_in=dict(x=None, y=None, color=None, symbol=px.Constant(1)), + args_expect=dict( + x="my_index_name", + y="value", + color="my_col_name", + symbol="symbol", + orientation="v", + ), + df_expect=pd.DataFrame( + dict( + my_index_name=[0, 1, 0, 1], + value=[1, 2, 3, 4], + my_col_name=["a", "a", "b", "b"], + symbol=[1, 1, 1, 1], + ) + ), +) + +# df has columns named after every special string +df = pd.DataFrame(dict(index=[1, 2], value=[3, 4], variable=[5, 6]), index=[7, 8]) +append_special_case( + df_in=df, + args_in=dict(x=None, y=None, color=None), + args_expect=dict(x="_index", y="_value", color="_variable", orientation="v",), + df_expect=pd.DataFrame( + dict( + _index=[7, 8, 7, 8, 7, 8], + _value=[1, 2, 3, 4, 5, 6], + _variable=["index", "index", "value", "value", "variable", "variable"], + ) + ), +) + +# df has columns with name collisions with indexes +df = pd.DataFrame(dict(a=[1, 2], b=[3, 4]), index=[7, 8]) +df.index.name = "a" +df.columns.name = "b" +append_special_case( + df_in=df, + args_in=dict(x=None, y=None, color=None), + args_expect=dict(x="index", y="value", color="variable", orientation="v",), + df_expect=pd.DataFrame( + dict(index=[7, 8, 7, 8], value=[1, 2, 3, 4], variable=["a", "a", "b", "b"],) + ), +) + +# everything is called value, OMG +df = pd.DataFrame(dict(b=[1, 2], value=[3, 4]), index=[7, 8]) +df.index.name = "value" +df.columns.name = "value" +append_special_case( + df_in=df, + args_in=dict(x=None, y=None, color=None), + args_expect=dict(x="index", y="_value", color="variable", orientation="v",), + df_expect=pd.DataFrame( + dict( + index=[7, 8, 7, 8], + _value=[1, 2, 3, 4], + variable=["b", "b", "value", "value"], + ) + ), +) + +# y = columns +df = pd.DataFrame(dict(a=[1, 2], b=[3, 4]), index=[7, 8]) +df.index.name = "c" +df.columns.name = "d" +append_special_case( + df_in=df, + args_in=dict(x=df.index, y=df.columns, color=None), + args_expect=dict(x="c", y="value", color="d"), + df_expect=pd.DataFrame( + dict(c=[7, 8, 7, 8], d=["a", "a", "b", "b"], value=[1, 2, 3, 4]) + ), +) + +# y = columns subset +df = pd.DataFrame(dict(a=[1, 2], b=[3, 4]), index=[7, 8]) +df.index.name = "c" +df.columns.name = "d" +append_special_case( + df_in=df, + args_in=dict(x=df.index, y=df.columns[:1], color=None), + args_expect=dict(x="c", y="value", color="variable"), + df_expect=pd.DataFrame(dict(c=[7, 8], variable=["a", "a"], value=[1, 2])), +) + +# list-like hover_data +df = pd.DataFrame(dict(a=[1, 2], b=[3, 4]), index=[7, 8]) +df.index.name = "c" +df.columns.name = "d" +append_special_case( + df_in=df, + args_in=dict(x=None, y=None, color=None, hover_data=dict(new=[5, 6])), + args_expect=dict( + x="c", + y="value", + color="d", + orientation="v", + hover_data=dict(new=(True, [5, 6])), + ), + df_expect=pd.DataFrame( + dict( + c=[7, 8, 7, 8], d=["a", "a", "b", "b"], new=[5, 6, 5, 6], value=[1, 2, 3, 4] + ) + ), +) + + +@pytest.mark.parametrize("df_in, args_in, args_expect, df_expect", special_cases) +def test_wide_mode_internal_special_cases(df_in, args_in, args_expect, df_expect): + args_in["data_frame"] = df_in + args_out = build_dataframe(args_in, go.Scatter) + df_out = args_out.pop("data_frame") + assert args_out == args_expect + assert_frame_equal( + df_out.sort_index(axis=1), df_expect.sort_index(axis=1), + ) + + +def test_multi_index(): + df = pd.DataFrame([[1, 2, 3, 4], [3, 4, 5, 6], [1, 2, 3, 4], [3, 4, 5, 6]]) + df.index = [["a", "a", "b", "b"], ["c", "d", "c", "d"]] + with pytest.raises(TypeError) as err_msg: + px.scatter(df) + assert "pandas MultiIndex is not supported by plotly express" in str(err_msg.value) + + df = pd.DataFrame([[1, 2, 3, 4], [3, 4, 5, 6], [1, 2, 3, 4], [3, 4, 5, 6]]) + df.columns = [["e", "e", "f", "f"], ["g", "h", "g", "h"]] + with pytest.raises(TypeError) as err_msg: + px.scatter(df) + assert "pandas MultiIndex is not supported by plotly express" in str(err_msg.value) + + +@pytest.mark.parametrize("df", [px.data.stocks(), dict(a=[1, 2], b=["1", "2"])]) +def test_mixed_input_error(df): + with pytest.raises(ValueError) as err_msg: + px.line(df) + assert ( + "Plotly Express cannot process wide-form data with columns of different type" + in str(err_msg.value) + )