Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pseudo_log_trans could provide better default breaks #219

Closed
bjedwards opened this issue Aug 29, 2019 · 4 comments
Closed

pseudo_log_trans could provide better default breaks #219

bjedwards opened this issue Aug 29, 2019 · 4 comments

Comments

@bjedwards
Copy link

By default, at least for base=10, pseudo_log_trans doesn't provide great default breaks.

Example:

library(tidyverse)
library(scales)

# Random Log Normal
set.seed(8675309)
data <- tibble(x = rlnorm(1000, 0, 2))

ggplot(data, aes(x = x)) +
  geom_histogram() + 
  scale_x_continuous(trans = pseudo_log_trans(0.001, 10))

Which results in:
image

Setting them to something more reasonable:

ggplot(data, aes(x = x)) +
  geom_histogram() + 
  scale_x_continuous(trans = pseudo_log_trans(0.001, 10),
                     breaks=c(0, 0.01, 0.1, 1, 10, 100))

provides a slightly better set of breaks:

image

I thought that trans_new would be the solution and tried:

plt <- function(sigma = 1, base = exp(1)) {
  trans_new(
    "pseudo_log",
    function(x) asinh(x / (2 * sigma)) / log(base),
    function(x) 2 * sigma * sinh(x * log(base)),
    breaks = log_breaks(n=5, base=base)
  )
}

ggplot(data, aes(x = x)) +
  geom_histogram() + 
  scale_x_continuous(trans = plt(0.001, 10))

But this raises the following error:

Error in if (max == min) return(base^min) : missing value where TRUE/FALSE needed

Which is occurring in here in log_breaks.

Not sure what they best path forward is or if this is actually expected behavior.

@hadley
Copy link
Member

hadley commented Oct 25, 2019

Unfortunately this is pretty low priority for us; I'd definitely review a PR that implemented better breaks but due to the very specialised nature of the transformation it's unlikely to be something we'd work on.

@hadley hadley closed this as completed Oct 25, 2019
@jarodmeng
Copy link

log_breaks would work if you turn off expand.

ggplot(data, aes(x = x)) +
  geom_histogram() + 
  scale_x_continuous(trans = pseudo_log_trans(0.001, 10), breaks = log_breaks(base = 10), expand = c(0, 0))

However, log_breaks still wouldn't work if the range of x covers negative values, so the issue reported is only mitigated by using log_breaks + expand, but not totally solved.

@jmpanfil
Copy link

Can you clarify (or point me to the source code) to understand how trans and breaks work with each other? Does the data x get sent through each independently? Or something like trans(x) becomes the input to the breaks function?

@rseiter
Copy link

rseiter commented Nov 5, 2024

How about just modifying log_breaks to create something like this?

force_all <- function(...) list(...) # Helper function defined in scales

pseudo_log <- function (val, sigma = 1, base = exp(1)) {
  xval <- asinh(val/(2*sigma))/log(base)
}

#' Pseudo log breaks (integer breaks on pseudo log-transformed scales).
#'
#' @param n desired number of breaks
#' @param sigma scale factor of pseudo logarithm to use
#' @param base base of pseudo logarithm to use
#' @export
#' @examples
pseudo_log_breaks <- function(n = 5, sigma = 1, base = 10) {
  force_all(n, base)
  n_default = n
  function(x, n = n_default) {
    rng <- pseudo_log(range(x, na.rm = TRUE), sigma = sigma, base = base)
    min <- floor(rng[1])
    max <- ceiling(rng[2])

    if (max == min) return(base^min)

    by <- floor((max - min) / n) + 1
    breaks <- base^seq(min, max, by = by)
    relevant_breaks <- base^rng[1] <= breaks & breaks <= base^rng[2]
    if (sum(relevant_breaks) >= (n - 2)) return(breaks)

    # the easy solution to get more breaks is to decrease 'by'
    while (by > 1) {
      by <- by - 1
      breaks <- base^seq(min, max, by = by)
      relevant_breaks <- base^rng[1] <= breaks & breaks <= base^rng[2]
      if (sum(relevant_breaks) >= (n - 2)) return(breaks)
    }
    log_sub_breaks(rng, n = n, base = base)
  }
}

plt <- function(sigma = 1, base = exp(1)) {
  trans_new(
    "pseudo_log",
    function(x) asinh(x / (2 * sigma)) / log(base),
    function(x) 2 * sigma * sinh(x * log(base)),
    breaks = pseudo_log_breaks(n=5, sigma=sigma, base=base)
  )
}

ggplot(data, aes(x = x)) +
  geom_histogram() + 
  scale_x_continuous(trans = plt(0.001, 10))

Rplot_pseudo_log_breaks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants