Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trendline options #2997

Merged
merged 24 commits into from
Aug 13, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ This project adheres to [Semantic Versioning](http://semver.org/).
### Added
- Extra flags were added to the `gapminder` and `stocks` dataset to facilitate testing, documentation and demos [#3305](https://github.com/plotly/plotly.py/issues/3305)
- All line-like Plotly Express functions now accept `markers` argument to display markers, and all but `line_mapbox` accept `symbol` to map a field to the symbol attribute, similar to scatter-like functions [#3326](https://github.com/plotly/plotly.py/issues/3326)
- `px.scatter` and `px.density_contours` now support new `trendline` types `'rolling'`, `'expanding'` and `'ewm'` [#2997](https://github.com/plotly/plotly.py/pull/2997)
- `px.scatter` and `px.density_contours` now support new `trendline_options` argument to parameterize trendlines, with support for constant control and log-scaling in `'ols'` and specification of the fraction used for `'lowess'`, as well as pass-through to Pandas for `'rolling'`, `'expanding'` and `'ewm'` [#2997](https://github.com/plotly/plotly.py/pull/2997)
- `px.scatter` and `px.density_contours` now support new `trendline_scope` argument that accepts the value `'overall'` to request a single trendline for all traces, including across facets and animation frames [#2997](https://github.com/plotly/plotly.py/pull/2997)

### Fixed
- Fixed regression introduced in version 5.0.0 where pandas/numpy arrays with `dtype` of Object were being converted to `list` values when added to a Figure ([#3292](https://github.com/plotly/plotly.py/issues/3292), [#3293](https://github.com/plotly/plotly.py/pull/3293))
Expand Down
3 changes: 3 additions & 0 deletions doc/apidoc/plotly.express.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ plotly's high-level API for rapid figure generation. ::
density_heatmap
density_mapbox
imshow
set_mapbox_access_token
get_trendline_results


`plotly.express` subpackages
Expand All @@ -60,3 +62,4 @@ plotly's high-level API for rapid figure generation. ::

generated/plotly.express.data.rst
generated/plotly.express.colors.rst
generated/plotly.express.trendline_functions.rst
165 changes: 156 additions & 9 deletions doc/python/linear-fits.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ jupyter:
text_representation:
extension: .md
format_name: markdown
format_version: '1.1'
jupytext_version: 1.1.1
format_version: '1.2'
jupytext_version: 1.4.2
kernelspec:
display_name: Python 3
language: python
Expand All @@ -20,11 +20,12 @@ jupyter:
name: python
nbconvert_exporter: python
pygments_lexer: ipython3
version: 3.6.8
version: 3.7.7
plotly:
description: Add linear Ordinary Least Squares (OLS) regression trendlines or
non-linear Locally Weighted Scatterplot Smoothing (LOWESS) trendlines to scatterplots
in Python.
in Python. Options for moving averages (rolling means) as well as exponentially-weighted
and expanding functions.
display_as: statistical
language: python
layout: base
Expand All @@ -39,7 +40,7 @@ jupyter:

[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/).

Plotly Express allows you to add [Ordinary Least](https://en.wikipedia.org/wiki/Ordinary_least_squares) Squares regression trendline to scatterplots with the `trendline` argument. In order to do so, you will need to install `statsmodels` and its dependencies. Hovering over the trendline will show the equation of the line and its R-squared value.
Plotly Express allows you to add [Ordinary Least Squares](https://en.wikipedia.org/wiki/Ordinary_least_squares) regression trendline to scatterplots with the `trendline` argument. In order to do so, you will need to [install `statsmodels` and its dependencies](https://www.statsmodels.org/stable/install.html). Hovering over the trendline will show the equation of the line and its R-squared value.

```python
import plotly.express as px
Expand All @@ -66,14 +67,160 @@ print(results)
results.query("sex == 'Male' and smoker == 'Yes'").px_fit_results.iloc[0].summary()
```

### Non-Linear Trendlines
### Displaying a single trendline with multiple traces

Plotly Express also supports non-linear [LOWESS](https://en.wikipedia.org/wiki/Local_regression) trendlines.
_new in v5.2_

To display a single trendline using the entire dataset, set the `trendline_scope` argument to `"overall"`. The same trendline will be overlaid on all facets and animation frames. The trendline color can be overridden with `trendline_color_override`.

```python
import plotly.express as px

df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", symbol="smoker", color="sex", trendline="ols", trendline_scope="overall")
fig.show()
```

```python
import plotly.express as px

df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", facet_col="smoker", color="sex",
trendline="ols", trendline_scope="overall", trendline_color_override="black")
fig.show()
```

### OLS Parameters

_new in v5.2_

OLS trendlines can be fit with log transformations to both X or Y data using the `trendline_options` argument, independently of whether or not the plot has [logarithmic axes](https://plotly.com/python/log-plot/).

```python
import plotly.express as px

df = px.data.gapminder(year=2007)
fig = px.scatter(df, x="gdpPercap", y="lifeExp",
trendline="ols", trendline_options=dict(log_x=True),
title="Log-transformed fit on linear axes")
fig.show()
```

```python
import plotly.express as px

df = px.data.gapminder(year=2007)
fig = px.scatter(df, x="gdpPercap", y="lifeExp", log_x=True,
trendline="ols", trendline_options=dict(log_x=True),
title="Log-scaled X axis and log-transformed fit")
fig.show()
```

### Locally WEighted Scatterplot Smoothing (LOWESS)

Plotly Express also supports non-linear [LOWESS](https://en.wikipedia.org/wiki/Local_regression) trendlines. In order use this feature, you will need to [install `statsmodels` and its dependencies](https://www.statsmodels.org/stable/install.html).

```python
import plotly.express as px

df = px.data.stocks(datetimes=True)
fig = px.scatter(df, x="date", y="GOOG", trendline="lowess")
fig.show()
```

_new in v5.2_

The level of smoothing can be controlled via the `frac` trendline option, which indicates the fraction of the data that the LOWESS smoother should include. The default is a fairly smooth line with `frac=0.6666` and lowering this fraction will give a line that more closely follows the data.

```python
import plotly.express as px

df = px.data.stocks(datetimes=True)
fig = px.scatter(df, x="date", y="GOOG", trendline="lowess", trendline_options=dict(frac=0.1))
fig.show()
```

### Moving Averages

_new in v5.2_

Plotly Express can leverage Pandas' [`rolling`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html), [`ewm`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.ewm.html) and [`expanding`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.expanding.html) functions in trendlines as well, for example to display moving averages. Values passed to `trendline_options` are passed directly to the underlying Pandas function (with the exception of the `function` and `function_options` keys, see below).

```python
import plotly.express as px

df = px.data.stocks(datetimes=True)
fig = px.scatter(df, x="date", y="GOOG", trendline="rolling", trendline_options=dict(window=5),
title="5-point moving average")
fig.show()
```

```python
import plotly.express as px

df = px.data.stocks(datetimes=True)
fig = px.scatter(df, x="date", y="GOOG", trendline="ewm", trendline_options=dict(halflife=2),
title="Exponentially-weighted moving average (halflife of 2 points)")
fig.show()
```

```python
import plotly.express as px

df = px.data.stocks(datetimes=True)
fig = px.scatter(df, x="date", y="GOOG", trendline="expanding", title="Expanding mean")
fig.show()
```

### Other Functions

The `rolling`, `expanding` and `ewm` trendlines support other functions than the default `mean`, enabling, for example, a moving-median trendline, or an expanding-max trendline.

```python
import plotly.express as px

df = px.data.gapminder().query("year == 2007")
fig = px.scatter(df, x="gdpPercap", y="lifeExp", color="continent", trendline="lowess")
df = px.data.stocks(datetimes=True)
fig = px.scatter(df, x="date", y="GOOG", trendline="rolling", trendline_options=dict(function="median", window=5),
title="Rolling Median")
fig.show()
```

```python
import plotly.express as px

df = px.data.stocks(datetimes=True)
fig = px.scatter(df, x="date", y="GOOG", trendline="expanding", trendline_options=dict(function="max"),
title="Expanding Maximum")
fig.show()
```

In some cases, it is necessary to pass options into the underying Pandas function, for example the `std` parameter must be provided if the `win_type` argument to `rolling` is `"gaussian"`. This is possible with the `function_args` trendline option.

```python
import plotly.express as px

df = px.data.stocks(datetimes=True)
fig = px.scatter(df, x="date", y="GOOG", trendline="rolling",
trendline_options=dict(window=5, win_type="gaussian", function_args=dict(std=2)),
title="Rolling Mean with Gaussian Window")
fig.show()
```

### Displaying only the trendlines

In some cases, it may be desirable to show only the trendlines, by removing the scatter points.

```python
import plotly.express as px

df = px.data.stocks(indexed=True, datetimes=True)
fig = px.scatter(df, trendline="rolling", trendline_options=dict(window=5),
title="5-point moving average")
fig.data = [t for t in fig.data if t.mode == "lines"]
fig.update_traces(showlegend=True) #trendlines have showlegend=False by default
fig.show()
```

```python

```
3 changes: 2 additions & 1 deletion packages/python/plotly/plotly/express/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@

from ._special_inputs import IdentityMap, Constant, Range # noqa: F401

from . import data, colors # noqa: F401
from . import data, colors, trendline_functions # noqa: F401

__all__ = [
"scatter",
Expand Down Expand Up @@ -100,6 +100,7 @@
"imshow",
"data",
"colors",
"trendline_functions",
"set_mapbox_access_token",
"get_trendline_results",
"IdentityMap",
Expand Down
4 changes: 4 additions & 0 deletions packages/python/plotly/plotly/express/_chart_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,9 @@ def scatter(
marginal_x=None,
marginal_y=None,
trendline=None,
trendline_options=None,
trendline_color_override=None,
trendline_scope="trace",
log_x=False,
log_y=False,
range_x=None,
Expand Down Expand Up @@ -90,7 +92,9 @@ def density_contour(
marginal_x=None,
marginal_y=None,
trendline=None,
trendline_options=None,
trendline_color_override=None,
trendline_scope="trace",
log_x=False,
log_y=False,
range_x=None,
Expand Down
Loading