Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggtrace_inspect_(before/after)_(stat/geom/scale) functions? #97

Closed
JoFrhwld opened this issue Jan 23, 2023 · 6 comments
Closed

ggtrace_inspect_(before/after)_(stat/geom/scale) functions? #97

JoFrhwld opened this issue Jan 23, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@JoFrhwld
Copy link

In line with your rstudio::conf talk, functions named for (before/after)_(stat/scale) could add even one more beginning step to onboarding interested users to the {ggplot2} internals without needing to remember whether it’s ggtrace_inspect_args() or ggtrace_inspect_return() they ought to use in combination with which internals. If I followed, it they would be something like :

  • ggtrace_inspect_before_stat() = ggtrace_inspect_args(method = ggplot2::: Layer$compute_statistic, …)$data
  • ggtrace_inspect_after_stat() = ggtrace_inspect_return(method = ggplot2:::Layer$compute statistic, …)$data
  • ggtrace_inspect_before_geom() = ggtrace_inspect_args(method = ggplot2:::Layer$compute_geom_1, …)$data
  • ggtrace_inspect_after_scale() = ggtrace_inspect_return(method = ggplot2:::Layer$compute_geom_2, …)$data

In the interest of eventually moving people to ggtrace_inspect_(args/return), maybe these functions could print a message about which {ggplot2} internal method was just inspected? The relationship between before/after and args/return seems clear enough.

@yjunechoe
Copy link
Owner

Thanks for the suggestion! Something like this is definitely planned for the future - part of the reason for the delay was that I wanted to make sure that specifically those four stages are sufficiently motivated before I "export those concepts", so to speak.

I think there are a few considerations to iron out, although in hindsight they're pretty trivial. I'm writing some down here for my own future reference (but feel free to also comment with your thoughts):

  • Should these have shorter names? ggtrace_inspect_before_stat() vs. inspect_before_stat(), etc.
  • Should they also come with a x = ggplot2::last_plot() default for convenience?
  • How should unexported ggplot objects like Layer be called? We could either store the full ggtrace_inspect call as an expression and simply eval the whole thing in the parent frame when the function is called, or do a work around and "export" it via asNamespace() or something (which ggtrace already does in a few places)
  • Does this also call for shorthand forms of other workflow functions, e.g. ggtrace_highjack_before_stat(value=)? (But I can come back to this later)

Anyways, I'm convinced that these kinds of convenience features are appropriate at this point of ggtrace's lifecycle (in fact I just added 2 such functions in the dev version), so I'll close this issue open when I add them.

@yjunechoe yjunechoe added the enhancement New feature or request label Jan 24, 2023
@aphalo
Copy link

aphalo commented Jan 24, 2023

I have a question: how fragile is your approach to changes in the code of 'ggplot2'?

Over the years, there have been a couple of big overhauls to 'ggplot2' code, the most recent one replacing the object system used and much of the internal code. Years back someone developed a nice interactive graphical display of the structure of 'gg' objects, that I still miss, but it stopped working with an update to 'ggplot2' and was abandoned soon.

When I developed 'gginnards' I tried to avoid depending on 'ggplot2' internals. Functionality of my package is quite limited, but until now maintenance has required very little of my time. As someone with long experience as a PhD supervisor, and as maintainer of several R packages, what I have learnt is that one needs to be extremely careful not to publish code that is fragile to what others may decide to do. Once a package is in CRAN, if it breaks because of another package's update, one needs to fix the problem sometimes in a couple of weeks for the package not to be archived. If your package becomes popular, as maintainer busy with something else, problems like this can cause quite a bit of stress.

Most likely you already know all this, but just in case, I thought best to share my experince.

By the way, I think the shorter names are better. One can always use ggtrace::inspect_before_stat() if needed.

@yjunechoe
Copy link
Owner

Thank you for sharing your thoughts - this is a hard question but it was indeed a big consideration in the design of ggtrace.

I unfortunately only have time for a shallow answer, which is that ggtrace remains largely agnostic to what happens where in the internals by making the user specify those details (for example, via the method argument of workflow functions). That of course means the user needs to be able to "speak the language of the internals" to some extent (namely, know which ggproto method does what), but that's an intentional design. It reflects my intent to make ggtrace a primarily pedagogical tool to help experiences users develop at least a conceptual understanding of the internals, without having to reason too much about the implementational details of the internals like ggproto, base R functions, vctrs, cli, and so on. I originally planned for ggtrace to be heavy on vignettes teaching ggplot2 internals, but I (fortunately?) got too occupied with talk prep and paper writing for ggtrace instead.

More experienced users can of course build on ggtrace functionalities to "extend" ggplot in unconventional ways, but it'd actually be on them to make sure their ggtrace code works across changes to the ggplot2 codebase. For example, one person wrote a function wrapping around ggtrace::with_ggtrace() to crop polar coordinate plots, and they do so by injecting a custom expression after the 4th line of the Layout$render() method when it's called while rendering the plot. That's extremely fragile, but it's a risk that users take on if they want to use ggtrace to hack the internals. Crucially, the ggtrace "meta-package" itself doesn't rely on the internals being in any particular state. In fact, it doesn't even rely on the internals being implemented in ggproto - ggtrace work for any functions/function-like objects (not just ggproto methods, despite the argument name):

# Example from `?stat_summary`
p <- ggplot(mtcars, aes(mpg, factor(cyl))) +
  geom_point() +
  stat_summary(fun.data = "mean_cl_boot", colour = "red", linewidth = 2, size = 3)
  
ggtrace_inspect_return(p, mean_cl_boot)
#>          y     ymin     ymax
#> 1 26.66364 24.29091 29.28273

This brings me back to the specific issue at hand, of exporting convenience functions like inspect_before_stat() that hardcodes method = Layer$compute_statistic. This goes against everything I said above about how I've designed ggtrace to be agnostic to the state of the codebase, but I'm comfortable making exceptions for these since they're a small, well-defined set, and because I hope to make ggtrace a bit more accessible than currently is. Of course I can't guarantee that I'll always maintain these four functions, but that's the plan (sorry, you've probably heard this a billion times from other devs). I'd be more cautious myself if I was exporting something that worked backwards from the ggplot grob to recover some value generated by a random internal function, but I think that the combination of targeting a Layer ggproto method (which is arguably the most "inert" of all ggproto methods) and surgically extracting its input/outputat the precise moment it's called should be sufficiently robust for at least minor version changes in the foreseeable future. I would love to hear more of your experiences on this point, though, as I did not live through the v2.0.0 ggproto overhaul and such in my R career so I don't know to what extent I should be aware of lurking major breaking changes.

Lastly on the note of CRAN, I currently don't have plans to until I'm convinced otherwise (especially given #92). Funnily, I think the fact that ggtrace is powered by base::trace() is a bigger concern to it's dangerousness and fragility than the fact that it can hack ggplot2 internals.

@aphalo
Copy link

aphalo commented Jan 25, 2023

Thanks for the detailed explanation! Your approach makes very good sense to me. I should play more with 'ggtrace' as I am sure I will find it useful for debugging. My experience with 'ggplot2' has been good, as Thomas Pedersen is very careful with testing and helpful when one needs a hand to solve problems caused by updates (or otherwise). I have had more difficulties with changes to tidyverse packages like 'lubridate', 'vctrs', 'tidyr', 'readr' and 'dplyr' as the maintainers have regularly replaced some of the functions and methods and/or modified interfaces and defaults. In a couple of cases functions in my packages continued "working" but started returning wrong results. It becomes time-consuming to maintain one's packages if one needs to test what version of an imported package is loaded and compute things differently. In most cases I ended just making my packages dependent on the newer version of the dependency.

@yjunechoe
Copy link
Owner

That makes sense and glad to hear that ggplot2 is at least more stable than others in the tidyverse (that was my impression as well). I haven't ran into version-dependency issues with ggtrace so far in the past year or so since its inception, so fingers crossed for the future! Thanks again for sharing your perspectives as an extension dev :)

@yjunechoe
Copy link
Owner

I implemented them in the sublayer-stages branch in layer_*() forms (ex: layer_before_stat()), to make their interface consistent with ggplot2::layer_data(). As such they take the identically-named arguments plot (=x) and i (=cond), along with some other stuff passed down to ggtrace_inspect*() calls.

They also gain a default of plot = last_plot() and i = 1L (same as layer_data()) so you can call them easily in interactive contexts. For convenience they return the layer dataframe as tibble and they also print the corresponding ggtrace code in a message.

If you have thoughts on the specific design, we can continue the discussion in a new issue!


A short demo:

library(ggplot2)
library(ggtrace)

p1 <- ggplot(mtcars, aes(as.factor(cyl))) +
  stat_count() +
  stat_count(
    geom = "label", size = 5,
    aes(label = after_stat(count)),
    position = position_nudge(y = 1)
  )
p1

layer_before_stat() # layer_before_stat(plot = last_plot(), i = 1)
#> ✔ Executed `ggtrace_inspect_args(last_plot(), ggplot2:::Layer$compute_statistic)$data`
#> # A tibble: 32 × 3
#>    x          PANEL group
#>    <mppd_dsc> <fct> <int>
#>  1 2          1         2
#>  2 2          1         2
#>  3 1          1         1
#>  4 2          1         2
#>  5 3          1         3
#>  6 2          1         2
#>  7 3          1         3
#>  8 1          1         1
#>  9 1          1         1
#> 10 2          1         2
#> # … with 22 more rows
layer_after_stat() # layer_after_stat(plot = last_plot(), i = 1)
#> ✔ Executed `ggtrace_inspect_return(last_plot(), ggplot2:::Layer$compute_statistic)`
#> # A tibble: 3 × 7
#>   count  prop x          width flipped_aes PANEL group
#>   <dbl> <dbl> <mppd_dsc> <dbl> <lgl>       <fct> <int>
#> 1    11     1 1            0.9 FALSE       1         1
#> 2     7     1 2            0.9 FALSE       1         2
#> 3    14     1 3            0.9 FALSE       1         3

identical(layer_after_stat(i = 1), layer_after_stat(i = 2))
#> ✔ Executed `ggtrace_inspect_return(last_plot(), ggplot2:::Layer$compute_statistic)`
#> ✔ Executed `ggtrace_inspect_return(last_plot(), ggplot2:::Layer$compute_statistic, cond = 2)`
#> [1] TRUE
waldo::compare(layer_before_geom(i = 1), layer_before_geom(i = 2))
#> ✔ Executed `ggtrace_inspect_args(last_plot(), ggplot2:::Layer$compute_geom_1)$data`
#> ✔ Executed `ggtrace_inspect_args(last_plot(), ggplot2:::Layer$compute_geom_1, cond = 2)$data`
#> `old` is length 8
#> `new` is length 9
#> 
#> `names(old)[1:3]`:         "y" "count" "prop"
#> `names(new)[1:4]`: "label" "y" "count" "prop"
#> 
#> `old$label` is absent
#> `new$label` is a double vector (11, 7, 14)
p2 <- ggplot(mpg, aes(class, hwy)) +
  geom_boxplot(aes(fill = stage(class, after_scale = alpha(fill, .5))))
p2

table( layer_before_stat()$fill )
#> ✔ Executed `ggtrace_inspect_args(last_plot(), ggplot2:::Layer$compute_statistic)$data`
#>    2seater    compact    midsize    minivan     pickup subcompact        suv 
#>          5         47         41         11         33         35         62 
layer_after_stat()$fill
#> ✔ Executed `ggtrace_inspect_return(last_plot(), ggplot2:::Layer$compute_statistic)`
#> [1] "2seater"    "compact"    "midsize"    "minivan"    "pickup"    
#> [6] "subcompact" "suv"
layer_before_geom()$fill
#> ✔ Executed `ggtrace_inspect_args(last_plot(), ggplot2:::Layer$compute_geom_1)$data`
#> [1] "2seater"    "compact"    "midsize"    "minivan"    "pickup"    
#> [6] "subcompact" "suv"
layer_after_scale()$fill
#> ✔ Executed `ggtrace_inspect_return(last_plot(), ggplot2:::Layer$compute_geom_2)`
#> [1] "#F8766D80" "#C49A0080" "#53B40080" "#00C09480" "#00B6EB80" "#A58AFF80"
#> [7] "#FB61D780"

identical(
  layer_after_scale(p2),
  layer_data(p2) |> tibble::as_tibble()
)
#> ✔ Executed `ggtrace_inspect_return(p2, ggplot2:::Layer$compute_geom_2)`
#> [1] TRUE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants