-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linear regression with confidence bands #945
Conversation
A few remarks after minimal testing.
(I can explore variants in a PR to the PR, just raising them as comments for now) |
Okay, I think I’ve done everything you’ve asked! How’s it now? 😄 |
3734eb8
to
ca8cbf8
Compare
What should happen with p=0? The error message is inconsistent |
In https://observablehq.com/@fil/plot-regression-945 I've added a quick comparison with ggplot2: p=0.05 (this default) corresponds to level=0.90 of ggplot2, and p=0.025 corresponds to level=0.95 (ggplot2 default). Suggest to change the variable name and default so that we get the same definitions as ggplot2? It would also make it easier to document the mark, because we wouldn't have to get into details about the Student t test, cumulative distributions etc. Seems easier to write "the band in which the linear relation lay with a confidence of 95%". |
I've added a bit of documentation, but I don't know how to describe the band simply in terms of p. |
I’ve replaced the p option with the more understandable ci option representing the confidence interval in [0, 1). This corresponds to ggplot2’s level option. (I think that “ci” is more self-describing than “level”.) Also I think there is a bug in Torben’s notebook, because a confidence interval of ci = 0.95 corresponds to the old p = 0.025, not 0.05. I’ve confirmed this with a visual comparison of ggplot2’s behavior using the mtcars notebook based on a blog post by Thomas Neitmann. |
Thanks for implementing confidence intervals - really fantastic work! As a note, to clear up any confusion (and I recognize you may have already figured this out, but in case anyone else might come across this): the confidence level C, often defaulting at 95%, corresponds to p=.05. Above the regression line, you have half your confidence interval, which is p=.025, and same below the line. I suspect that's the cause of the uncertainty about .025 being mentioned in some places. And FWIW I think using "CI" instead of "p" was a good idea. If it's useful for your documentation, or for anyone else who might be reading, there is an excellent paper with several different recommendations for simple but accurate ways to describe the CI: |
Inspired by https://observablehq.com/@toja/linear-regression-with-confidence-bands, using jStat for probability functions. Fixes #168.
TODO