Add plot_cap (Plot Conditional Adjusted Predictions) #517

tomicapretto · 2022-06-04T14:37:50Z

This PR adds a new sub-package called plots. Right now it only contains one function, plot_cap(), which is very versatile and very powerful. This function is highly inspired by the plot_cap function in the R package {marginaleffects}.

I'll let some examples talk for themselves

import arviz as az
import bambi as bmb
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from bambi.plots import plot_cap

data = pd.read_csv("mtcars.csv")
data["cyl"] = data["cyl"].replace({4: "low", 6: "medium", 8: "high"})
data["gear"] = data["gear"].replace({3: "A", 4: "B", 5: "C"})
data["cyl"] = pd.Categorical(data["cyl"], categories=["low", "medium", "high"], ordered=True)

model = bmb.Model("mpg ~ 0 + hp * wt + cyl + gear", data)
idata = model.fit(draws=1000, target_accept=0.95, random_seed=1234)

One numerical covariate.

fig, ax = plt.subplots(figsize=(7, 5), dpi=120)
plot_cap(model, idata, "hp", ax=ax);

Two numerical covariates (the second is interpreted as a group, quantiles are used)

fig, ax = plt.subplots(figsize=(7, 5), dpi=120)
plot_cap(model, idata, ["hp", "wt"], ax=ax);

Main numerical and grouping categoric

fig, ax = plt.subplots(figsize=(7, 5), dpi=120)
plot_cap(model, idata, ["hp", "cyl"], ax=ax);

Main categoric

fig, ax = plt.subplots(figsize=(7, 5), dpi=120)
plot_cap(model, idata, ["gear"], ax=ax);

Main categoric and grouping categoric

fig, ax = plt.subplots(figsize=(7, 5), dpi=120)
plot_cap(model, idata, ["gear", "cyl"], ax=ax);

Main categoric and grouping numeric

fig, ax = plt.subplots(figsize=(7, 5), dpi=120, tight_layout=True)
plot_cap(model, idata, ["gear", "wt"], ax=ax);

Now let's see another example, using logistic regression. This is also borrowed from {marginaleffects} documentation.

data = pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/ggplot2movies/movies.csv")

data["style"] = "Other"
data.loc[data["Action"] == 1, "style"] = "Action"
data.loc[data["Comedy"] == 1, "style"] = "Comedy"
data.loc[data["Drama"] == 1, "style"] = "Drama"
data["certified_fresh"] = (data["rating"] >= 8) * 1
data = data[data["length"] < 240]

priors = {"style": bmb.Prior("Normal", mu=0, sigma=2)}
model = bmb.Model("certified_fresh ~ 0 + length * style", data=data, priors=priors, family="bernoulli")
model

Formula: certified_fresh ~ 0 + length * style
Family name: Bernoulli
Link: logit
Observations: 58662
Priors:
  Common-level effects
    length ~ Normal(mu: 0.0, sigma: 0.0708)
    style ~ Normal(mu: 0, sigma: 2)
    length:style ~ Normal(mu: [0. 0. 0.], sigma: [0.0702 0.0509 0.0611])

idata = model.fit(random_seed=1234, target_accept=0.9, init="adapt_diag")

fig, ax = plt.subplots(figsize=(7, 5), dpi=120, tight_layout=True)
plot_cap(model, idata, "length", ax=ax)

fig, ax = plt.subplots(figsize=(9, 5), dpi=120, tight_layout=True)
plot_cap(model, idata, ["length", "style"], ax=ax)

Extra point: This model is an excellent example of how sometimes adapt_diag+jitter isn't good. If we use adapt_diag+jitter sampling never finishes. Chain don't mix. All types of problems.

I honestly think this is a very cool addition. Would like to know your thoughts @aloctavodia @canyon289

review-notebook-app · 2022-06-04T14:37:54Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov-commenter · 2022-06-04T14:57:27Z

Codecov Report

Merging #517 (5cf4958) into main (4886452) will decrease coverage by 3.98%.
The diff coverage is 0.00%.

❗ Current head 5cf4958 differs from pull request most recent head c886a16. Consider uploading reports for the commit c886a16 to get more accurate results

@@            Coverage Diff             @@
##             main     #517      +/-   ##
==========================================
- Coverage   90.86%   86.88%   -3.99%     
==========================================
  Files          29       32       +3     
  Lines        2442     2562     +120     
==========================================
+ Hits         2219     2226       +7     
- Misses        223      336     +113

Impacted Files	Coverage Δ
bambi/plots/__init__.py	`0.00% <0.00%> (ø)`
bambi/plots/plot_cap.py	`0.00% <0.00%> (ø)`
bambi/plots/utils.py	`0.00% <0.00%> (ø)`
bambi/backend/terms.py	`96.22% <0.00%> (ø)`
bambi/tests/test_built_models.py	`98.91% <0.00%> (+<0.01%)`	⬆️
bambi/backend/pymc.py	`80.28% <0.00%> (+0.28%)`	⬆️
bambi/backend/utils.py	`90.00% <0.00%> (+1.11%)`	⬆️
bambi/models.py	`88.65% <0.00%> (+3.58%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4886452...c886a16. Read the comment docs.

bambi/plots/plot_cap.py

aloctavodia · 2022-06-04T15:05:38Z

Definitive, a cool addition! I already want to try it! I left a couple of comments. I think this PR is good as is, a future addition could be one or more kwargs so users are able to fine tune the plots.

tomicapretto · 2022-06-04T15:17:10Z

Definitive, a cool addition! I already want to try it! I left a couple of comments. I think this PR is good as is, a future addition could be one or more kwargs so users are able to fine tune the plots.

Thanks for the prompt review. I agree this function should incorporate optional arguments in the future so users can tune more things. For example, it would be great to have another dimension that is mapped to the axes, so you can create plots with multiple axes. And we could let them choose if they want to map a dimension to the color or to the axes.

One of the things I don't really like is having functions with extremely long signatures... but it seems it is how we are used to working with Matplotlib and I don't think we could do much to change it.

I think we could merge this as it is (maybe after adding some tests? I'm not sure how to test plotting functions btw) and then iterate to refine how it works.

canyon289 · 2022-06-04T19:22:40Z

I'll review in a couple of hours, at a glance this is already look pretty cool

tomicapretto · 2022-06-04T20:40:04Z

Matplotlib is not in our dependencies, but it's an indirect dependency because of ArviZ. Do you think we should change anything in our requirements.txt file?

aloctavodia · 2022-06-05T12:15:33Z

I don't think we don't need to change our requirements.txt

aloctavodia · 2022-06-05T12:20:09Z

bambi/plots/plot_cap.py

+    lower_bound = round((1 - hdi_prob) / 2, 4)
+    upper_bound = 1 - lower_bound
+
+    y_hat = idata.posterior[f"{model.response.name}_mean"]
+    y_hat_mean = y_hat.mean(("chain", "draw"))
+    y_hat_bounds = y_hat.quantile(q=(lower_bound, upper_bound), dim=("chain", "draw"))


Use az.hdi and pass hdi_prob directly to it. Additionally, we could have the option to use quantiles or HDI, but still HDI should probably be the default.

Agree with this

What are the advantages of HDI? I recall this discussion arviz-devs/arviz#2021 where HDI can result in an unexpected result.

Consistency, we use HDI everywhere is ArviZ. Having both is probably a better option.

canyon289 · 2022-06-05T23:31:39Z

I think we could merge this as it is (maybe after adding some tests? I'm not sure how to test plotting functions btw) and then iterate to refine how it works.
Testing plotting functions is challenging. One level is just testing it runs with reasonable parameters which I suggest doing. In ArviZ another hacky way we do it is by having the plotting functions save their output if a keyword is passed and inspecting it manually.

For extra kwargs I agree with Tomas its hard to anticipate everything without adding a ton of kwargs. Maybe we can just merge as is and add flexibility as needed in future PRs once we run into cases in actual usage

vishalthatsme · 2022-06-22T20:51:46Z

bambi/plots/plot_cap.py

+    lower_bound = round((1 - hdi_prob) / 2, 4)
+    upper_bound = 1 - lower_bound
+
+    y_hat = idata.posterior[f"{model.response.name}_mean"]


Love plot_cap so far! One potential issue here: after using plot_cap once, I realized the function adds 200 (grid_n) new entries into the idata's posterior data variables. The issue is, when running az.summary(idata) after plot_cap, it will include all of those y_hat_mean values (as would any other az plot like plot_trace).

Oh! That should be a problem with the .predict() method ignoring the inplace=False argument. Could you please open an issue with a minimum reproducible example that shows the problem? Thanks @vishalthatsme !

tomicapretto added 7 commits May 31, 2022 22:58

example plots

9d53de3

more advances on this plot stuff

2d8a926

Add labels

e77a9fb

add other example

eb988b8

Add plots module

ab6643d

Add function to automatically adjust offsets

4607fff

finish implementation plot_cap

9997566

tomicapretto added 2 commits June 4, 2022 11:39

remove plots notebook

3dc995a

update upper bound of group offset to 0.4

5cf4958

aloctavodia reviewed Jun 4, 2022

View reviewed changes

bambi/plots/plot_cap.py Outdated Show resolved Hide resolved

tomicapretto added 2 commits June 4, 2022 17:30

[no ci] minor change in comparison order

88634d6

[no ci] docstrings and replace level with hdi_prob

c886a16

aloctavodia reviewed Jun 5, 2022

View reviewed changes

aloctavodia merged commit 62fed83 into bambinos:main Jun 6, 2022

tomicapretto deleted the plots branch June 7, 2022 11:42

tomicapretto mentioned this pull request Jun 12, 2022

Surprising predict() behaviour #533

Closed

vishalthatsme reviewed Jun 22, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add plot_cap (Plot Conditional Adjusted Predictions) #517

Add plot_cap (Plot Conditional Adjusted Predictions) #517

tomicapretto commented Jun 4, 2022

review-notebook-app bot commented Jun 4, 2022

codecov-commenter commented Jun 4, 2022 •

edited

Loading

aloctavodia commented Jun 4, 2022

tomicapretto commented Jun 4, 2022

canyon289 commented Jun 4, 2022

tomicapretto commented Jun 4, 2022

aloctavodia commented Jun 5, 2022

aloctavodia Jun 5, 2022 •

edited

Loading

canyon289 Jun 5, 2022

tomicapretto Jun 6, 2022

aloctavodia Jun 6, 2022

canyon289 commented Jun 5, 2022 •

edited

Loading

vishalthatsme Jun 22, 2022

tomicapretto Jun 22, 2022

Add plot_cap (Plot Conditional Adjusted Predictions) #517

Add plot_cap (Plot Conditional Adjusted Predictions) #517

Conversation

tomicapretto commented Jun 4, 2022

review-notebook-app bot commented Jun 4, 2022

codecov-commenter commented Jun 4, 2022 • edited Loading

Codecov Report

aloctavodia commented Jun 4, 2022

tomicapretto commented Jun 4, 2022

canyon289 commented Jun 4, 2022

tomicapretto commented Jun 4, 2022

aloctavodia commented Jun 5, 2022

aloctavodia Jun 5, 2022 • edited Loading

Choose a reason for hiding this comment

canyon289 Jun 5, 2022

Choose a reason for hiding this comment

tomicapretto Jun 6, 2022

Choose a reason for hiding this comment

aloctavodia Jun 6, 2022

Choose a reason for hiding this comment

canyon289 commented Jun 5, 2022 • edited Loading

vishalthatsme Jun 22, 2022

Choose a reason for hiding this comment

tomicapretto Jun 22, 2022

Choose a reason for hiding this comment

codecov-commenter commented Jun 4, 2022 •

edited

Loading

aloctavodia Jun 5, 2022 •

edited

Loading

canyon289 commented Jun 5, 2022 •

edited

Loading