Fix PRB intercept and CI sampling #27

dfsnow · 2024-11-27T17:23:37Z

This PR fixes two bugs, one longstanding and the other introduced by #24.

The prb() formula specified in statsmodels OLS was missing an intercept value. statsmodel bizarrely doesn't add an intercept by default, unlike R's lm().
The sample size (n) specified in boot_ci() was incorrectly based on df.size, rather than len(df).

Further, I screwed up the CI test results by mis-specifying the alpha value. I updated the CI tests to use (mostly) the test result from assessr.

dfsnow · 2024-11-27T17:25:08Z

assesspy/ci.py

@@ -60,13 +61,13 @@ def boot_ci(
        raise ValueError("'nboot' must be a positive integer greater than 0.")
    check_inputs(estimate, sale_price)
    df = pd.DataFrame({"estimate": estimate, "sale_price": sale_price})
-    n: int = df.size
+    n: int = len(df)


df.size is the length of all columns in the DataFrame, not the number of rows 🤦

My bad for missing this as well!

dfsnow · 2024-11-27T17:25:58Z

assesspy/ci.py

@@ -122,6 +123,6 @@ def prb_ci(
        :func:`boot_ci`
    """
    prb_model = _calculate_prb(estimate, sale_price)
-    prb_ci = prb_model.conf_int(alpha=alpha)[0].tolist()
+    prb_ci = prb_model.conf_int(alpha=alpha)[1].tolist()


This index also needs to change since 0 now specifies the intercept.

[Question, non-blocking] I'm a little bit confused by this change. I get why we need to adjust the index into prb_model.params in metrics.prb(), since it makes sense for params to include the intercept, but why is the intercept the first element of the return value for conf_int()? Is there a situation in which it makes sense for a confidence interval to include the intercept of the model? Or is it just that we've manually added a constant to the model, so it gets propagated to all return values for the model's results?

You can have a confidence interval on $b_0$ as well as $b_1$; it makes perfect sense that statsmodels would return it.

dfsnow · 2024-11-27T17:30:28Z

assesspy/tests/test_ci.py

@@ -8,33 +8,33 @@ class TestCI:
    def metric(self, request):
        return request.param

-    @pt.fixture(params=[0.80, 0.90, 0.95])
+    @pt.fixture(params=[0.50, 0.20, 0.10, 0.05])


I'm big dumb and specified the alpha values as 1 - alpha, thinking "This is the range of the CI I want i.e. 95%." I fixed the values and pulled the test results used in assessr.

dfsnow · 2024-11-27T17:31:03Z

assesspy/tests/test_metrics.py

@@ -18,7 +18,7 @@ def test_metric_value_is_correct(self, metric, metric_val):
        expected = {
            "cod": 17.81456901196891,
            "prd": 1.0484192615223522,
-            "prb": 0.0009470721642262903,
+            "prb": 0.0024757,


This is the output value in assessr, so now both packages match.

dfsnow · 2024-11-27T17:32:22Z

assesspy/metrics.py

+    prb_model = sm.OLS(
+        endog=lhs.to_numpy(), exog=sm.tools.tools.add_constant(rhs.to_numpy())
+    ).fit(method="qr")


This was the crux of the PRB model issue. statsmodel doesn't add an intercept by default (unlike R's lm()). The method change to "qr" was just to match R, but doesn't seem to actually make a difference.

[Praise] Nice catch! This is a super annoying interface.

jeancochrane

Some very tricky problems here, thanks for fixing 😅 I'm a little bit confused why we need to change the way we index into the return value from prb_model.conf_int(), but if you feel confident in that change, feel free to merge!

jeancochrane · 2024-11-27T17:48:37Z

assesspy/ci.py

@@ -60,13 +61,13 @@ def boot_ci(
        raise ValueError("'nboot' must be a positive integer greater than 0.")
    check_inputs(estimate, sale_price)
    df = pd.DataFrame({"estimate": estimate, "sale_price": sale_price})
-    n: int = df.size
+    n: int = len(df)


My bad for missing this as well!

jeancochrane · 2024-11-27T17:56:22Z

assesspy/ci.py

@@ -122,6 +123,6 @@ def prb_ci(
        :func:`boot_ci`
    """
    prb_model = _calculate_prb(estimate, sale_price)
-    prb_ci = prb_model.conf_int(alpha=alpha)[0].tolist()
+    prb_ci = prb_model.conf_int(alpha=alpha)[1].tolist()


[Question, non-blocking] I'm a little bit confused by this change. I get why we need to adjust the index into prb_model.params in metrics.prb(), since it makes sense for params to include the intercept, but why is the intercept the first element of the return value for conf_int()? Is there a situation in which it makes sense for a confidence interval to include the intercept of the model? Or is it just that we've manually added a constant to the model, so it gets propagated to all return values for the model's results?

jeancochrane · 2024-11-27T17:56:49Z

assesspy/metrics.py

+    prb_model = sm.OLS(
+        endog=lhs.to_numpy(), exog=sm.tools.tools.add_constant(rhs.to_numpy())
+    ).fit(method="qr")


[Praise] Nice catch! This is a super annoying interface.

jeancochrane · 2024-11-27T17:58:01Z

assesspy/metrics.py

@@ -178,7 +180,7 @@ def prb(
        ap.prb(ap.ccao_sample().estimate, ap.ccao_sample().sale_price)
    """
    prb_model = _calculate_prb(estimate, sale_price)
-    prb = float(prb_model.params[0])
+    prb = float(prb_model.params[1])


[Suggestion, non-blocking] Since the order of coefficients has proven a little bit tricky, perhaps we can persist a comment explaining our choice?

Suggested change

prb = float(prb_model.params[1])

# Get the coefficient from the OLS model.

# We select element 1, since element 0 is the intercept

prb = float(prb_model.params[1])

Done in 065dfcf!

wrridgeway

Looks great to me, @jeancochrane already asked the interesting q's. Thank you both @dfsnow and @Damonamajor for catching this.

dfsnow added 3 commits November 27, 2024 17:04

Fix PRB formula missing intercept

1b53836

Fix bootstrap CI sample size

426744f

Update bootstrapped CI alpha and results

29a0c89

dfsnow commented Nov 27, 2024

View reviewed changes

Format with ruff

bb0c9ec

dfsnow commented Nov 27, 2024

View reviewed changes

dfsnow self-assigned this Nov 27, 2024

dfsnow requested a review from jeancochrane November 27, 2024 17:33

dfsnow marked this pull request as ready for review November 27, 2024 17:33

dfsnow requested a review from wrridgeway as a code owner November 27, 2024 17:33

jeancochrane approved these changes Nov 27, 2024

View reviewed changes

Add comments about index positions

065dfcf

wrridgeway approved these changes Nov 27, 2024

View reviewed changes

dfsnow merged commit 002ce64 into main Nov 27, 2024
14 checks passed

dfsnow deleted the dfsnow/fix-prb-formula branch November 27, 2024 18:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PRB intercept and CI sampling #27

Fix PRB intercept and CI sampling #27

dfsnow commented Nov 27, 2024

dfsnow Nov 27, 2024

jeancochrane Nov 27, 2024

dfsnow Nov 27, 2024

jeancochrane Nov 27, 2024 •

edited

Loading

dfsnow Nov 27, 2024

dfsnow Nov 27, 2024

dfsnow Nov 27, 2024

dfsnow Nov 27, 2024 •

edited

Loading

jeancochrane Nov 27, 2024

jeancochrane left a comment

jeancochrane Nov 27, 2024

jeancochrane Nov 27, 2024 •

edited

Loading

jeancochrane Nov 27, 2024

jeancochrane Nov 27, 2024

dfsnow Nov 27, 2024

wrridgeway left a comment

Fix PRB intercept and CI sampling #27

Fix PRB intercept and CI sampling #27

Conversation

dfsnow commented Nov 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeancochrane Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dfsnow Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeancochrane left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeancochrane Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wrridgeway left a comment

Choose a reason for hiding this comment

jeancochrane Nov 27, 2024 •

edited

Loading

dfsnow Nov 27, 2024 •

edited

Loading

jeancochrane Nov 27, 2024 •

edited

Loading