Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PX ECDF #3330

Merged
merged 12 commits into from
Aug 13, 2021
Merged

PX ECDF #3330

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ This project adheres to [Semantic Versioning](http://semver.org/).
- `px.scatter` and `px.density_contours` now support new `trendline` types `'rolling'`, `'expanding'` and `'ewm'` [#2997](https://github.com/plotly/plotly.py/pull/2997)
- `px.scatter` and `px.density_contours` now support new `trendline_options` argument to parameterize trendlines, with support for constant control and log-scaling in `'ols'` and specification of the fraction used for `'lowess'`, as well as pass-through to Pandas for `'rolling'`, `'expanding'` and `'ewm'` [#2997](https://github.com/plotly/plotly.py/pull/2997)
- `px.scatter` and `px.density_contours` now support new `trendline_scope` argument that accepts the value `'overall'` to request a single trendline for all traces, including across facets and animation frames [#2997](https://github.com/plotly/plotly.py/pull/2997)
- A new `px.ecdf()` function for Empirical Cumulative Distribution Functions [#3330](https://github.com/plotly/plotly.py/pull/3330)

### Fixed
- Fixed regression introduced in version 5.0.0 where pandas/numpy arrays with `dtype` of Object were being converted to `list` values when added to a Figure ([#3292](https://github.com/plotly/plotly.py/issues/3292), [#3293](https://github.com/plotly/plotly.py/pull/3293))
Expand Down
11 changes: 8 additions & 3 deletions doc/python/box-plots.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ jupyter:
extension: .md
format_name: markdown
format_version: '1.2'
jupytext_version: 1.6.0
jupytext_version: 1.4.2
kernelspec:
display_name: Python 3
language: python
Expand All @@ -20,7 +20,7 @@ jupyter:
name: python
nbconvert_exporter: python
pygments_lexer: ipython3
version: 3.7.6
version: 3.7.7
plotly:
description: How to make Box Plots in Python with Plotly.
display_as: statistical
Expand All @@ -36,13 +36,18 @@ jupyter:
thumbnail: thumbnail/box.jpg
---

A [box plot](https://en.wikipedia.org/wiki/Box_plot) is a statistical representation of numerical data through their quartiles. The ends of the box represent the lower and upper quartiles, while the median (second quartile) is marked by a line inside the box. For other statistical representations of numerical data, see [other statistical charts](https://plotly.com/python/statistical-charts/).
<!-- #region -->
A [box plot](https://en.wikipedia.org/wiki/Box_plot) is a statistical representation of the distribution of a variable through its quartiles. The ends of the box represent the lower and upper quartiles, while the median (second quartile) is marked by a line inside the box. For other statistical representations of numerical data, see [other statistical charts](https://plotly.com/python/statistical-charts/).


Alternatives to box plots for visualizing distributions include [histograms](https://plotly.com/python/histograms/), [violin plots](https://plotly.com/python/violin/), [ECDF plots](https://plotly.com/python/ecdf-plots/) and [strip charts](https://plotly.com/python/strip-charts/).

## Box Plot with `plotly.express`

[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/).

In a box plot created by `px.box`, the distribution of the column given as `y` argument is represented.
<!-- #endregion -->

```python
import plotly.express as px
Expand Down
182 changes: 182 additions & 0 deletions doc/python/ecdf-plots.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
---
jupyter:
jupytext:
notebook_metadata_filter: all
text_representation:
extension: .md
format_name: markdown
format_version: '1.2'
jupytext_version: 1.4.2
kernelspec:
display_name: Python 3
language: python
name: python3
language_info:
codemirror_mode:
name: ipython
version: 3
file_extension: .py
mimetype: text/x-python
name: python
nbconvert_exporter: python
pygments_lexer: ipython3
version: 3.7.7
plotly:
description: How to add empirical cumulative distribution function (ECDF) plots.
display_as: statistical
language: python
layout: base
name: Empirical Cumulative Distribution Plots
order: 16
page_type: u-guide
permalink: python/ecdf-plots/
thumbnail: thumbnail/figure-labels.png
---

### Overview

[Empirical cumulative distribution function plots](https://en.wikipedia.org/wiki/Empirical_distribution_function) are a way to visualize the distribution of a variable, and Plotly Express has a built-in function, `px.ecdf()` to generate such plots. [Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/).

Alternatives to ECDF plots for visualizing distributions include [histograms](https://plotly.com/python/histograms/), [violin plots](https://plotly.com/python/violin/), [box plots](https://plotly.com/python/box-plots/) and [strip charts](https://plotly.com/python/strip-charts/).

### Simple ECDF Plots

Providing a single column to the `x` variable yields a basic ECDF plot.

```python
import plotly.express as px
df = px.data.tips()
fig = px.ecdf(df, x="total_bill")
fig.show()
```

Providing multiple columns leverage's Plotly Express' [wide-form data support](https://plotly.com/python/wide-form/) to show multiple variables on the same plot.

```python
import plotly.express as px
df = px.data.tips()
fig = px.ecdf(df, x=["total_bill", "tip"])
fig.show()
```

It is also possible to map another variable to the color dimension of a plot.

```python
import plotly.express as px
df = px.data.tips()
fig = px.ecdf(df, x="total_bill", color="sex")
fig.show()
```

### Configuring the Y axis

By default, the Y axis shows probability, but it is also possible to show raw counts by setting the `ecdfnorm` argument to `None` or to show percentages by setting it to `percent`.

```python
import plotly.express as px
df = px.data.tips()
fig = px.ecdf(df, x="total_bill", color="sex", ecdfnorm=None)
fig.show()
```

If a `y` value is provided, the Y axis is set to the sum of `y` rather than counts.

```python
import plotly.express as px
df = px.data.tips()
fig = px.ecdf(df, x="total_bill", y="tip", color="sex", ecdfnorm=None)
fig.show()
```

### Reversed and Complementary CDF plots

By default, the Y value represents the fraction of the data that is *at or below* the value on on the X axis. Setting `ecdfmode` to `"reversed"` reverses this, with the Y axis representing the fraction of the data *at or above* the X value. Setting `ecdfmode` to `"complementary"` plots `1-ECDF`, meaning that the Y values represent the fraction of the data *above* the X value.

In `standard` mode (the default), the right-most point is at 1 (or the total count/sum, depending on `ecdfnorm`) and the right-most point is above 0.

```python
import plotly.express as px
fig = px.ecdf(df, x=[1,2,3,4], markers=True, ecdfmode="standard",
title="ecdfmode='standard' (Y=fraction at or below X value, this the default)")
fig.show()
```

In `reversed` mode, the right-most point is at 1 (or the total count/sum, depending on `ecdfnorm`) and the left-most point is above 0.

```python
import plotly.express as px
fig = px.ecdf(df, x=[1,2,3,4], markers=True, ecdfmode="reversed",
title="ecdfmode='reversed' (Y=fraction at or above X value)")
fig.show()
```

In `complementary` mode, the right-most point is at 0 and no points are at 1 (or the total count/sum) per the definition of the CCDF as 1-ECDF, which has no point at 0.

```python
import plotly.express as px
fig = px.ecdf(df, x=[1,2,3,4], markers=True, ecdfmode="complementary",
title="ecdfmode='complementary' (Y=fraction above X value)")
fig.show()
```

### Orientation

By default, plots are oriented vertically (i.e. the variable is on the X axis and counted/summed upwards), but this can be overridden with the `orientation` argument.

```python
import plotly.express as px
df = px.data.tips()
fig = px.ecdf(df, x="total_bill", y="tip", color="sex", ecdfnorm=None, orientation="h")
fig.show()
```

### Markers and/or Lines

ECDF Plots can be configured to show lines and/or markers.

```python
import plotly.express as px
df = px.data.tips()
fig = px.ecdf(df, x="total_bill", color="sex", markers=True)
fig.show()
```

```python
import plotly.express as px
df = px.data.tips()
fig = px.ecdf(df, x="total_bill", color="sex", markers=True, lines=False)
fig.show()
```

### Marginal Plots

ECDF plots also support [marginal plots](https://plotly.com/python/marginal-plots/)

```python
import plotly.express as px
df = px.data.tips()
fig = px.ecdf(df, x="total_bill", color="sex", markers=True, lines=False, marginal="histogram")
fig.show()
```

```python
import plotly.express as px
df = px.data.tips()
fig = px.ecdf(df, x="total_bill", color="sex", marginal="rug")
fig.show()
```

### Facets

ECDF Plots also support [faceting](https://plotly.com/python/facet-plots/)

```python
import plotly.express as px
df = px.data.tips()
fig = px.ecdf(df, x="total_bill", color="sex", facet_row="time", facet_col="day")
fig.show()
```

```python

```
24 changes: 20 additions & 4 deletions doc/python/graph-objects.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,13 +56,19 @@ Graph objects have several benefits compared to plain Python dictionaries:
5. Graph object constructors and update methods accept "magic underscores" (e.g. `go.Figure(layout_title_text="The Title")` rather than `dict(layout=dict(title=dict(text="The Title")))`) for more compact code.
6. Graph objects support attached rendering (`.show()`) and exporting functions (`.write_image()`) that automatically invoke the appropriate functions from [the `plotly.io` module](https://plotly.com/python-api-reference/plotly.io.html).

### When to use Graph Objects Directly
### When to use Graph Objects vs Plotly Express

The recommended way to create figures is using the [functions in the plotly.express module](https://plotly.com/python-api-reference/), [collectively known as Plotly Express](/python/plotly-express/), which all return instances of `plotly.graph_objects.Figure`, so every figure produced with the plotly library, actually uses graph objects under the hood, unless manually constructed out of dictionaries.
The recommended way to create figures is using the [functions in the plotly.express module](https://plotly.com/python-api-reference/), [collectively known as Plotly Express](/python/plotly-express/), which all return instances of `plotly.graph_objects.Figure`, so every figure produced with the `plotly` library actually uses graph objects under the hood, unless manually constructed out of dictionaries.

That said, certain kinds of figures are not yet possible to create with Plotly Express, such as figures that use certain 3D trace-types like [`mesh`](/python/3d-mesh/) or [`isosurface`](/python/3d-isosurface-plots/). In addition, certain figures are cumbersome to create by starting from a figure created with Plotly Express, for example figures with [subplots of different types](/python/mixed-subplots/), [dual-axis plots](/python/multiple-axes/), or [faceted plots](/python/facet-plots/) with multiple different types of traces. To construct such figures, it can be easier to start from an empty `plotly.graph_objects.Figure` object (or one configured with subplots via the [make_subplots() function](/python/subplots/)) and progressively add traces and update attributes as above. Every `plotly` documentation page lists the Plotly Express option at the top if a Plotly Express function exists to make the kind of chart in question, and then the graph objects version below.

Note that the figures produced by Plotly Express **in a single function-call** are [easy to customize at creation-time](/python/styling-plotly-express/), and to [manipulate after creation](/python/creating-and-updating-figures/) using the `update_*` and `add_*` methods. The figures produced by Plotly Express can always be built from the ground up using graph objects, but this approach typically takes **5-100 lines of code rather than 1**. Here is a simple example of how to produce the same figure object from the same data, once with Plotly Express and once without. The data in this example is in "long form" but [Plotly Express also accepts data in "wide form"](/python/wide-form/) and the line-count savings from Plotly Express over graph objects are comparable. More complex figures such as [sunbursts](/python/sunburst-charts/), [parallel coordinates](/python/parallel-coordinates-plot/), [facet plots](/python/facet-plots/) or [animations](/python/animations/) require many more lines of figure-specific graph objects code, whereas switching from one representation to another with Plotly Express usually involves changing just a few characters.
Note that the figures produced by Plotly Express **in a single function-call** are [easy to customize at creation-time](/python/styling-plotly-express/), and to [manipulate after creation](/python/creating-and-updating-figures/) using the `update_*` and `add_*` methods.

### Comparing Graph Objects and Plotly Express

The figures produced by Plotly Express can always be built from the ground up using graph objects, but this approach typically takes **5-100 lines of code rather than 1**.

Here is a simple example of how to produce the same figure object from the same data, once with Plotly Express and once without. The data in this example is in "long form" but [Plotly Express also accepts data in "wide form"](/python/wide-form/) and the line-count savings from Plotly Express over graph objects are comparable. More complex figures such as [sunbursts](/python/sunburst-charts/), [parallel coordinates](/python/parallel-coordinates-plot/), [facet plots](/python/facet-plots/) or [animations](/python/animations/) require many more lines of figure-specific graph objects code, whereas switching from one representation to another with Plotly Express usually involves changing just a few characters.

```python
import pandas as pd
Expand All @@ -73,11 +79,17 @@ df = pd.DataFrame({
"Number Eaten": [2, 1, 3, 1, 3, 2],
})


# Plotly Express

import plotly.express as px

fig = px.bar(df, x="Fruit", y="Number Eaten", color="Contestant", barmode="group")
fig.show()


# Graph Objects

import plotly.graph_objects as go

fig = go.Figure()
Expand All @@ -88,4 +100,8 @@ fig.update_layout(legend_title_text = "Contestant")
fig.update_xaxes(title_text="Fruit")
fig.update_yaxes(title_text="Number Eaten")
fig.show()
```
```

```python

```
13 changes: 9 additions & 4 deletions doc/python/histograms.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,19 @@ jupyter:
thumbnail: thumbnail/histogram.jpg
---

In statistics, a [histogram](https://en.wikipedia.org/wiki/Histogram) is representation of the distribution of numerical data, where the data are binned and the count for each bin is represented. More generally, in plotly a histogram is an aggregated bar chart, with several possible aggregation functions (e.g. sum, average, count...).
<!-- #region -->
In statistics, a [histogram](https://en.wikipedia.org/wiki/Histogram) is representation of the distribution of numerical data, where the data are binned and the count for each bin is represented. More generally, in Plotly a histogram is an aggregated bar chart, with several possible aggregation functions (e.g. sum, average, count...) which can be used to visualize data on categorical and date axes as well as linear axes.

If you're looking instead for bar charts, i.e. representing *raw, unaggregated* data with rectangular

Alternatives to violin plots for visualizing distributions include [violin plots](https://plotly.com/python/violin/), [box plots](https://plotly.com/python/box-plots/), [ECDF plots](https://plotly.com/python/ecdf-plots/) and [strip charts](https://plotly.com/python/strip-charts/).

> If you're looking instead for bar charts, i.e. representing *raw, unaggregated* data with rectangular
bar, go to the [Bar Chart tutorial](/python/bar-charts/).

## Histograms with Plotly Express

[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/).
<!-- #endregion -->

```python
import plotly.express as px
Expand Down Expand Up @@ -160,7 +165,7 @@ fig = px.histogram(df, x="total_bill", color="sex")
fig.show()
```

#### Using histfunc
#### Aggregating with other functions than `count`

For each bin of `x`, one can compute a function of data using `histfunc`. The argument of `histfunc` is the dataframe column given as the `y` argument. Below the plot shows that the average tip increases with the total bill.

Expand Down Expand Up @@ -193,7 +198,7 @@ fig.show()

#### Visualizing the distribution

With the `marginal` keyword, a subplot is drawn alongside the histogram, visualizing the distribution. See [the distplot page](https://plotly.com/python/distplot/)for more examples of combined statistical representations.
With the `marginal` keyword, a [marginal](https://plotly.com/python/marginal-plots/) is drawn alongside the histogram, visualizing the distribution. See [the distplot page](https://plotly.com/python/distplot/) for more examples of combined statistical representations.

```python
import plotly.express as px
Expand Down
56 changes: 56 additions & 0 deletions doc/python/line-and-scatter.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,15 @@ fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species",
fig.show()
```

Color can be [continuous](https://plotly.com/python/colorscales/) as follows, or [discrete/categorical](https://plotly.com/python/discrete-color/) as above.

```python
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", color='petal_length')
fig.show()
```

The `symbol` argument can be mapped to a column as well. A [wide variety of symbols](https://plotly.com/python/marker-style/) are available.

```python
Expand Down Expand Up @@ -104,6 +113,53 @@ fig.update_traces(marker_size=10)
fig.show()
```

### Error Bars

Scatter plots support [error bars](https://plotly.com/python/error-bars/).

```python
import plotly.express as px
df = px.data.iris()
df["e"] = df["sepal_width"]/100
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species",
error_x="e", error_y="e")
fig.show()
```

### Marginal Distribution Plots

Scatter plots support [marginal distribution plots](https://plotly.com/python/marginal-plots/)

```python
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df, x="sepal_length", y="sepal_width", marginal_x="histogram", marginal_y="rug")
fig.show()
```

### Facetting

Scatter plots support [faceting](https://plotly.com/python/facet-plots/).

```python
import plotly.express as px
df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", color="smoker", facet_col="sex", facet_row="time")
fig.show()
```

### Linear Regression and Other Trendlines

Scatter plots support [linear and non-linear trendlines](https://plotly.com/python/linear-fits/).

```python
import plotly.express as px

df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", trendline="ols")
fig.show()
```

## Line plots with Plotly Express

```python
Expand Down
Loading