Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[715] Adding Mr P docs #716

Merged
merged 7 commits into from
Sep 19, 2023
Merged

[715] Adding Mr P docs #716

merged 7 commits into from
Sep 19, 2023

Conversation

NathanielF
Copy link
Contributor

@NathanielF NathanielF commented Sep 3, 2023

Mister P

  • This is a PR trying to explain and illustrate the technique and value of Multilevel regression and post-stratification techniques using bambi. It is related to issue 715.

The structure of the notebook is seperated in two parts: (a) a refresher on how to think about the properties of the regression modelling as a method for generating an appropriately weighted average across the conditional strata in your in X matrix. We demonstrate this in some detail because MrP can be viewed as a correction to the effects of regression weighting when regression is applied to non-representative data.

We then replicate the kind of analysis achieved with R here

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@NathanielF NathanielF mentioned this pull request Sep 3, 2023
@NathanielF
Copy link
Contributor Author

Quarto site seems to render ok too:
image

@NathanielF
Copy link
Contributor Author

Issue with new contrast function:
image

@NathanielF
Copy link
Contributor Author

issue with new predictions function:
image

@tomicapretto
Copy link
Collaborator

@NathanielF thanks a lot for proactively writing this notebook! I'm adding some minor comments.

docs/notebooks/mister_p.ipynb Show resolved Hide resolved
docs/notebooks/mister_p.ipynb Show resolved Hide resolved
docs/notebooks/mister_p.ipynb Show resolved Hide resolved
@tomicapretto
Copy link
Collaborator

issue with new predictions function: image

I'll try to review this ASAP

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
@NathanielF NathanielF marked this pull request as ready for review September 9, 2023 22:07
@NathanielF
Copy link
Contributor Author

I've added some usages of the new plot comparisons and comparisons functionality to this notebook. I specified a base model without the (1 | edu) syntax and demonstrated using this new functionality how the variables of race and education interact within states and used this to justify the complexity of our final model. I think i'm happy enough with the notebook as it stands so i'm marking this as ready for review.

@GStechschulte
Copy link
Collaborator

issue with new predictions function: image

I'll try to review this ASAP

I am back from holidays and should be able to review this issue this week.

@NathanielF
Copy link
Contributor Author

Sorry @GStechschulte I must have deleted my earlier comment. Let me know if you have any questions about the errors i found. They only seemed to occur in my "complex" model where i had allot of group specfic intercept terms (1 | group).

@tomicapretto
Copy link
Collaborator

issue with new predictions function: image

I'll try to review this ASAP

I am back from holidays and should be able to review this issue this week.

@GStechschulte I did a dirty and quick check (translation: added a print before .predict() is called inside predictions()) and saw the dataframe that is passed to predict does not contain the variable state.

image
image

@GStechschulte
Copy link
Collaborator

GStechschulte commented Sep 13, 2023

The problem is that no default values are being computed for model terms specified as group level effects. This is because group level effects do not have a components attribute, and thus never satisfy the logic in the set_default_values() function of ../interpret/utils.py.

For example, the output of the for loop in utils.py with print statements added:

# ../interpret/utils.py
    for term in terms.values():
        print(f"term variable names: {term.var_names}")
        if hasattr(term, "components"):
            print(f"component variable names: {term.var_names}")
            for component in term.components:
                # If the component is a function call, use the argument names
                if isinstance(component, Call):
                    names = [arg.name for arg in component.call.args]
                else:
                    names = [component.name]
                for name in names:
                    if name not in data_dict:
                        # For numeric predictors, select the mean.
                        if component.kind == "numeric":
                            data_dict[name] = np.mean(model.data[name])
                        # For categoric predictors, select the most frequent level.
                        elif component.kind == "categoric":
                            data_dict[name] = mode(model.data[name])
term variable names: set()
term variable names: {'male'}
component variable names: {'male'}
term variable names: {'repvote'}
component variable names: {'repvote'}
term variable names: {'state'}
term variable names: {'eth'}
term variable names: {'edu'}
term variable names: {'male', 'eth'}
term variable names: {'age', 'edu'}
term variable names: {'edu', 'eth'}

The created cap_data indeed includes the user provided predictors and repvote and male. However, no default values are computed for terms that do not have the attribute component.

Group specific effects do have attributes such as expr, factor, and groups that could be used to identify such terms. However, this is still problematic since it is not a component, the kind attribute (used to determine the data type) is intercept or slope.

Furthermore, it is possible to specify interaction terms within the group level effect. This would require us to find the unique set of term names defined by the modeller to ensure default values are not computed more than once.

I am currently attempting to find a solution.

@GStechschulte
Copy link
Collaborator

GStechschulte commented Sep 15, 2023

predictions, comparisons, and slopes now works for models with common and group specific effects as the bug for computing default values for group specific effects is resolved.

Now, to compute default values, all unique common and group specific effects variable names are identified and put into a list. From there, the data type is determined by identifying the data types used to fit the model.

However, with group specific effects, when the user (or default values are computed) passes values, it is possible these values are new unseen data. Thus, I leverage PR #693 and add the arg sample_new_groups to the functions predictions, comparisons, and slopes (and the associated plotting functions) to allow plotting of new groups.

The code and plots below are for the hierarchical model with group specific effects.

bmb.interpret.plot_predictions(
    model=model_hierarchical,
    idata=result_hierarchical,
    covariates=["edu", "eth"],
    sample_new_groups=True,
    pps=False,
    legend=False,
    fig_kwargs={"figsize": (12, 5), "sharey": True}
)

image

fig, ax = bmb.interpret.plot_comparisons(
    model=model_hierarchical,
    idata=result_hierarchical,
    contrast={"eth": ["Black", "White"]},
    conditional=["age", "edu"],
    comparison_type="diff",
    sample_new_groups=True,
    subplot_kwargs={"main": "age", "group": "edu"},
    fig_kwargs={"figsize": (7, 3), "sharey": True},
    legend=True,
)

image

@NathanielF I will create a separate PR today, and then you can pull these changes into your branch 👍🏼

@NathanielF
Copy link
Contributor Author

Great!

@GStechschulte
Copy link
Collaborator

Great!

Changes are now up in PR #721

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
@NathanielF
Copy link
Contributor Author

I've tightened some of the writing and updated some of the plots. The broad thrust of the article is where I want it to be.

Moving from a discussion of regression as a means for automating stratum specific effect modification into a discussion of MrP as a technique for recovering the correct modification effects.

Throughout, I make use of the new model interepretability functions to emphasise the variation of response across different strata. I think the themes of the piece tie together well with a demonstration of these new functions.

On reflection I think the way I've plotted the output of the final model doesn't need the new plotting functions because in a sense the plots I need require I adjust the model, not plot the predictions as they are.

I'm happy enough with it now as it stands, but would like a review of the content - maybe in particular with an emphasis on what aspects of the new functionality I could add to tie things together better... @tomicapretto , @GStechschulte

@tomicapretto
Copy link
Collaborator

The error in the test is because the 1.26 release of NumPy. See pymc-devs/pytensor#441

@@ -0,0 +1,8615 @@
{
Copy link
Collaborator

@tomicapretto tomicapretto Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: warnings is part of the standard library, can you import like this?

import warnings

import arviz as az

import bambi as bmb

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

import pymc as pm


Reply via ReviewNB

@@ -0,0 +1,8615 @@
{
Copy link
Collaborator

@tomicapretto tomicapretto Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it explain the coding of the data? For example, what is the meaning of Risk_Strata when it's 0 or 1. Similar for Outcome and Treatment (this last one is easier to infer)


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!

@@ -0,0 +1,8615 @@
{
Copy link
Collaborator

@tomicapretto tomicapretto Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

others may have the same question as me: where does the "0.25" come from?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separated out the calculation of mean outcome from weighted average calc to make this clearer

@tomicapretto
Copy link
Collaborator

@NathanielF thanks a lot for this great example. Couldn't finish the review now but I'll try to do it later today!

@@ -0,0 +1,8615 @@
{
Copy link
Collaborator

@tomicapretto tomicapretto Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask for one or two sentences explaining what you're computing here and what's the information in the table? I can follow it because I'm familiar with what's going on, but I think it would be better if it was more digested for users.


Reply via ReviewNB

@@ -0,0 +1,8615 @@
{
Copy link
Collaborator

@tomicapretto tomicapretto Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the %%capture so users see how much time it took?


Reply via ReviewNB

docs/notebooks/mister_p.ipynb Show resolved Hide resolved
docs/notebooks/mister_p.ipynb Show resolved Hide resolved
docs/notebooks/mister_p.ipynb Show resolved Hide resolved
@tomicapretto
Copy link
Collaborator

@NathanielF the notebook is in great shape. It's a lot of great work. I'm still requesting some changes. Many of them are minor. In most cases I'm asking to expand a bit on the explanations/interpretations. Once that is done this will be ready to be merged. Thanks a lot!

@NathanielF
Copy link
Contributor Author

Great to hear. Will adjust today

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
@tomicapretto
Copy link
Collaborator

@NathanielF thanks for all the updates! If you confirm you're not modifying it anymore, I can merge.

@NathanielF
Copy link
Contributor Author

NathanielF commented Sep 19, 2023

Thank you! Confirmed

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
@tomicapretto tomicapretto merged commit bbc2cb1 into bambinos:main Sep 19, 2023
1 of 4 checks passed
GStechschulte pushed a commit to GStechschulte/bambi that referenced this pull request Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants