Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boxplot outliers are shown in black using ggplotly #1114

Open
dfrail24 opened this issue Sep 8, 2017 · 17 comments
Open

Boxplot outliers are shown in black using ggplotly #1114

dfrail24 opened this issue Sep 8, 2017 · 17 comments
Labels

Comments

@dfrail24
Copy link

dfrail24 commented Sep 8, 2017

If I create a boxplot in ggplot2 and convert it using ggplotly command, the outliers are outlined in black.
Here is a simple example:

library(ggplot2)
library(plotly)

p <- ggplot(mpg, aes(class, hwy))
g <- p + geom_boxplot(aes(colour = "red"))
ggplotly(g)

ggplot would show this chart:
image

whereas plotly would show this chart:
image

Is this something that can be fixed?

@jonocarroll
Copy link

This persists even when the outliers should be discarded, in the examples also

library(plotly)
set.seed(123)

df <- diamonds[sample(1:nrow(diamonds), size = 1000),]

p <- ggplot(df, aes(cut, price, fill = cut)) + 
  geom_boxplot(outlier.shape = NA) + 
  ggtitle("Ignore outliers in ggplot2")

# Need to modify the plotly object and make outlier points have opacity equal to 0
p <- plotly_build(p)

p$data <- lapply(p$data, FUN = function(x){
  x$marker = list(opacity = 0)
  return(x)
})

p

screen shot 2017-11-17 at 6 54 06 pm

@petehilljnr
Copy link

petehilljnr commented Apr 30, 2018

I managed to set the opacity property of the outliers using the code below. This seems to work for the faceted charts I have tried so far also.

library(ggplot2)
library(plotly)

set.seed(123)

df <- diamonds[sample(1:nrow(diamonds), size = 1000),]

p <- ggplot(df, aes(cut, price, fill = cut)) + 
  geom_boxplot(outlier.shape = NA) + 
  ggtitle("Ignore outliers in ggplot2")

# Need to modify the plotly object and make outlier points have opacity equal to 0
p <- plotly_build(p)

for(i in 1:length(p$x$data)) {
  p$x$data[[i]]$marker$opacity = 0
}

p

image

@jonocarroll
Copy link

The replacement lapply code is then

p$x$data <- lapply(p$x$data, FUN = function(x){
  x$marker = list(opacity = 0)
  return(x)
})

(note p$x$data rather than p$data). I'm happy to PR this to the documentation if someone can point to the source.

@dmattek
Copy link

dmattek commented Jul 10, 2018

The problem is that when you also have geom_jitter in the plot (in addition to geom_boxplot), the lapply part will remove all the points. Is there a way to selectively remove outliers that belong to geom_boxplot only?

@ningjingzhiyuan507
Copy link

p$x$data <- lapply(p$x$data, FUN = function(x){ x$marker$line$width = 0 return(x) })

modify marker$line$color

@brshallo
Copy link

brshallo commented Feb 25, 2019

The problem is that when you also have geom_jitter in the plot (in addition to geom_boxplot), the lapply part will remove all the points. Is there a way to selectively remove outliers that belong to geom_boxplot only?

You can use the code above and just index to the layer you want to remove, e.g. say the boxplot outliers are on the first layer.

p$x$data[1] <- lapply(p$x$data[1], FUN = function(x){
  x$marker = list(opacity = 0)
  return(x)
})

@kojisposts
Copy link

Hi! Just wanted to bring this issue to your attention again, as none of the workarounds mentioned above seem to be working (and aren't working in the documentation either)!
(There's also an interesting phenomenon where, for coloured barplots, the most extreme outliers are coloured with black outlines, but closer to the barplot, they're black with coloured outlines, i.e. the reverse.)

@cpsievert
Copy link
Collaborator

There's a WIP here #1514 that fixes this issue, feel free to test it out and let me know if you run into problems.

@jcoronel25
Copy link

jcoronel25 commented Dec 12, 2019

I didn't see the solution being mentioned #1514 on the last release. I was to get the visual I wanted by altering the lapply function to filter only layer that are type == "box"

p$x$data <- lapply(p$x$data, FUN = function(x){

  if (x$type == "box") {
    x$marker = list(opacity = 0)
  }
  return(x)
})

@isaaczhao23
Copy link

This will do the trick for the original question coloring outliers! Plotly differentiates outliers from extreme outliers. We go under the hood and override all outlier colors manually.

library(ggplot2)
library(plotly)

p <- ggplot(mpg, aes(class, hwy)) + geom_boxplot(color="red")
output = ggplotly(p)

# overrides black outline of outliers
output$x$data[[1]]$marker$line$color = "red"
# overrides black extreme outlier color
output$x$data[[1]]$marker$outliercolor = "red"
# overrides black not as extreme outlier color
output$x$data[[1]]$marker$color = "red"

output

image

@meldarionqeusse
Copy link

Using this code:

set.seed(123)

df <- diamonds[sample(1:nrow(diamonds), size = 1000),]
df <- subset(df,df$cut=="Fair")


p <- ggplot(df, aes(cut, price, fill = cut)) + 
  geom_boxplot() 

# Need to modify the plotly object and make outlier points have opacity equal to 0
fig <- plotly_build(p)

fig$x$data <- lapply(fig$x$data, FUN = function(x){
  x$marker$line = list(opacity = 0)
  x$marker$line$color = list(opacity = 0)
  x$marker$outliercolor = list(opacity = 0)
  x$marker$color = list(opacity = 0)
  return(x)
})
fig

I was able to get to this plot:
Rplot

As you can see the outliers are different. I am aiming to get a transparent outlier in this case but the extreme outlier remains filled. From the output of fig$x$data I cannot understand which parameter effects the extreme outlier. Looks at the html styling output I can say that the two points are definitely getting different rgb values and fill-opacity values.
image

[[1]]$marker
[[1]]$marker$opacity
[1] NA

[[1]]$marker$outliercolor
[[1]]$marker$outliercolor$opacity
[1] 0


[[1]]$marker$line
[[1]]$marker$line$opacity
[1] 0

[[1]]$marker$line$color
[[1]]$marker$line$color$opacity
[1] 0



[[1]]$marker$size
[1] 5.669291

[[1]]$marker$color
[[1]]$marker$color$opacity
[1] 0



[[1]]$line
[[1]]$line$color
[1] "rgba(51,51,51,1)"

[[1]]$line$width
[1] 1.889764


[[1]]$name
[1] "Fair"

[[1]]$legendgroup
[1] "Fair"

[[1]]$showlegend
[1] TRUE

[[1]]$xaxis
[1] "x"

[[1]]$yaxis
[1] "y"

[[1]]$frame
[1] NA

@meldarionqeusse
Copy link

meldarionqeusse commented Mar 6, 2021

While I have not solved the issue above. I would like to add a different more complete solution to the provided by @isaaczhao23 in cases where there are boxplots with different colors.

df <- diamonds[sample(1:nrow(diamonds), size = 1000),]
#df <- subset(df,df$cut=="Fair")


p <- ggplot(df, aes(cut, price, color = cut)) + 
  geom_boxplot() 

# Need to modify the plotly object and make outlier points have opacity equal to 0
fig <- plotly_build(p)

fig$x$data <- lapply(fig$x$data, FUN = function(x){
  x$marker$outliercolor = x$line$color # When creating plot p with ggplot if you specify fill = cut use x$fill$color instead of $line$color
  x$marker$color = x$line$color # When creating plot p with ggplot if you specify fill = cut use x$fill$color instead $line$color
  x$marker$line = x$line$color # When creating plot p with ggplot if you specify fill = cut use x$fill$color instead $line$color
  return(x)
})
fig

Will produce this:

Rplot01

@lucazav
Copy link

lucazav commented Aug 24, 2021

I'm still having the issue of outliers shown after apply ggplotly() even if outlier.shape = NA is passed to geom_boxplot(). I'm using Plotly 4.9.4.1. Any chance to see this issue fixed? Thanks.

@lucazav
Copy link

lucazav commented Aug 24, 2021

In the meantime I solved the issue of hiding outliers using the following code:

library(purrr)

hideOutliers <- function(x) {  
  if (x$hoverinfo == 'y') {  
    x$marker = list(opacity = 0)
    x$hoverinfo = NA    
  }  
  return(x)  
}

p[["x"]][["data"]] <- map(p[["x"]][["data"]], ~ hideOutliers(.))

It works also with facets.

@swati-mahapatra
Copy link

The problem is that when you also have geom_jitter in the plot (in addition to geom_boxplot), the lapply part will remove all the points. Is there a way to selectively remove outliers that belong to geom_boxplot only?

You can use the code above and just index to the layer you want to remove, e.g. say the boxplot outliers are on the first layer.

p$x$data[1] <- lapply(p$x$data[1], FUN = function(x){
  x$marker = list(opacity = 0)
  return(x)
})

Thank you so much for this solution, @brshallo ! This fixed my problem after hours of looking for a fix!

@Tb0nes
Copy link

Tb0nes commented Aug 29, 2023

The problem is that when you also have geom_jitter in the plot (in addition to geom_boxplot), the lapply part will remove all the points. Is there a way to selectively remove outliers that belong to geom_boxplot only?

You can use the code above and just index to the layer you want to remove, e.g. say the boxplot outliers are on the first layer.

p$x$data[1] <- lapply(p$x$data[1], FUN = function(x){
  x$marker = list(opacity = 0)
  return(x)
})

As a novice, I'm having a hard time understanding what this code means. Would you or anyone else mind using this in a reproducible example? Thank you.

@LDSamson
Copy link

LDSamson commented Nov 9, 2023

The problem is that when you also have geom_jitter in the plot (in addition to geom_boxplot), the lapply part will remove all the points. Is there a way to selectively remove outliers that belong to geom_boxplot only?

You can use the code above and just index to the layer you want to remove, e.g. say the boxplot outliers are on the first layer.

p$x$data[1] <- lapply(p$x$data[1], FUN = function(x){
  x$marker = list(opacity = 0)
  return(x)
})

As a novice, I'm having a hard time understanding what this code means. Would you or anyone else mind using this in a reproducible example? Thank you.

I used the information in this thread to create the following function, combining techniques described earlier for the specific use case that you mention. It might be helpful for others:

remove_boxplot_outliers <- function(fig){
  stopifnot("plotly" %in% class(fig))
  fig$x$data <- lapply(
    fig$x$data,
    \(i){
      if(i$type != "box") return(i)
      i$marker = list(opacity = 0)
      i$hoverinfo = "none"
      i
    }
  )
  fig
}

it only removes markers if type == "box". It also removes hoverinfo, otherwise you will still be able to see any hoverinfo of the outlier. Works also in faceted plots. Other layers, for example created with geom_jitter(), will be untouched. See below for an example

library(ggplot2)
library(plotly)
fig <- ggplotly({
  ggplot(iris, aes(factor(1), Petal.Width)) + geom_boxplot(outlier.shape = NA) + 
    geom_jitter(col = "red") + 
    facet_wrap(~Species)
})

Figure with outliers:

fig

grafik

Figure with outliers removed:

remove_boxplot_outliers(fig)

grafik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests