Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type_density() function #284

Merged
merged 13 commits into from
Jan 9, 2025
Merged

Add type_density() function #284

merged 13 commits into from
Jan 9, 2025

Conversation

grantmcdermott
Copy link
Owner

@grantmcdermott grantmcdermott commented Dec 27, 2024

Closes #194
Closes #243

We might need some back and forth to get this over the line. I'm about to head out for a cabin break without my laptop, although I will have intermittent cell signal. Hopefully I can respond to general review questions.

Some high-level notes:

  1. I've overhauled (simplified) the joint bandwidth logic, following on from our discussion in type_density() using the new type_data/type_draw system #243 (comment). The tl;dr version is that I've gone for option (B) re-using the same bandwidth that we would obtain from the full dataset. I'm not wedded to this approach, but it was the simplest and I wanted to get the ball rolling for discussion/review on this PR. It does end up breaking a lot of tests, though. (Sorry if that adds to the review noise.)

  2. I decided to keep the tinyplot.density method, but only for simple cases, i.e., no groups or facets. Passing either groups or facets through this method now triggers an error with a prompt to rather use the dedicated type = type_density() / type = "density" approach.

  3. There are a couple of TODO / FIXME notes still in the code, which I believe we can resolve once this type_density() refactoring goes through. One is for the tinyplot_add() call here (@vincentarelbundock I think this is you; please confirm if you can).

@grantmcdermott grantmcdermott requested review from vincentarelbundock and zeileis and removed request for vincentarelbundock December 27, 2024 19:35
@vincentarelbundock
Copy link
Collaborator

I left a few minor comments but don't have much to add. This looks really nice. Feels simpler and cleaner.

I don't have a view on default bandwidth selection (global vs. individual), but having the argument to control this is nice.

@grantmcdermott
Copy link
Owner Author

grantmcdermott commented Jan 7, 2025

Hi @zeileis just a gentle push about the joint bandwith issue above, since I see you're back from vacation ;-) Again, I've laid out four options here:

  • (A) Don't do any joint bandwidth calculation (i.e., just use the individual bandwidths calculated independently for each sub group).

  • (B) Re-use the same bandwidth that we would obtain from the full dataset.

  • (C) Take the mean of the individual bandwidths.

  • (D) Take the weighted mean of the individual bandwidths (where the weights indicate relative number of observations in each sub group).

For this PR, I've gone with option (B). But users can also opt into option (A) if they've like.

- add joint.bw arg for override
- reword other sections while we're at it
@grantmcdermott
Copy link
Owner Author

Thanks both. I've incorporated your suggestions and will merge once the checks pass.

One quick example: Based on our discussion in #243, the default for grouped densities is now to use individual bandwidths (also consistent with ggplot2). But users can override and select a joint bandwidth using either the full dataset or one computed as the observation-weighted mean.

tinyplot(~Sepal.Length | Species, data = iris, type = "density") # "none" (default)
tinyplot_add(type = type_density(joint.bw = "full"), lty = 2)    # full dataset
tinyplot_add(type = type_density(joint.bw = "owm"), lty = 3)     # obs-weighted mean
legend("topright", c("None", "Full", "OWM"), lty = 1:3, title = "Joint BW")

@grantmcdermott grantmcdermott merged commit c376b59 into main Jan 9, 2025
3 checks passed
@grantmcdermott grantmcdermott deleted the type_density_data branch January 22, 2025 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants