Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with ggplot histograms #702

Closed
edublancas opened this issue Jun 30, 2023 · 6 comments · Fixed by #711
Closed

problem with ggplot histograms #702

edublancas opened this issue Jun 30, 2023 · 6 comments · Fixed by #711
Assignees
Labels
stash Label used to categorize issues that will be worked on next

Comments

@edublancas
Copy link

edublancas commented Jun 30, 2023

there's some error in the ggplot histograms when using the fill option:

Screenshot 2023-06-30 at 1 38 40 p m

from what I understand, fill should display one histogram on top of another with different colors. since these are histograms, they should not have vertical color breaks (beginning from the bottom) where they start with color A, then color B and then they are color A again. since the value in the Y axis is a count. see: https://stackoverflow.com/questions/22216350/overlay-histograms-in-r

hist

@edublancas edublancas added stash Label used to categorize issues that will be worked on next med complexity labels Jun 30, 2023
@edublancas
Copy link
Author

the example code is here: https://github.com/ploomber/doc/blob/example/examples/quickstart/notebook.ipynb you can run the notebook end-to-end, then click on the "plot" button that appears at the end

@edublancas
Copy link
Author

@yafimvo: if you can share your thoughts on why we have this issue, please do. it'll help @bbeat2782

@yafimvo
Copy link

yafimvo commented Jul 2, 2023

@edublancas This happens due to 2 reasons

  1. Using fill attribute
  2. Width of the bar

The fill attribute creates a stacked bar, which means each bar is divided into X segments that represent different subcategories of the overall data. In these cases, it's island (3 segments), species (3 segments), and sex (2 segments).

The width of each bar is dynamic. In these cases, the bars are wide and overlap each other. Since we don't have a border, it looks a bit weird.

I added a border to the bars

image

I changed the width of the bars
image

There is an option to show 2 histograms on one chart by replacing this:

p = (ggplot("plotdata", with_="plotdata", mapping=aes(x=column))
         + geom_histogram(bins=20, fill=fill))

with this:

p = (ggplot("plotdata", with_="plotdata", mapping=aes(x=[column, fill], fill=["green", "magenta"]))
        + geom_histogram(bins=20))    

The problem is that the difference between the X values of these fields is big which makes it difficult to understand the graph

image

@edublancas
Copy link
Author

thanks for sharing your feedback @yafimvo.

@bbeat2782 the best way to validate this is to create the same plots using the same data in ggplot (in R) , the interface should be pretty similar.

@bbeat2782
Copy link

create the same plots using the same data in ggplot (in R)

Oh, never thought of this way to validate the plots. Thanks.

@bbeat2782
Copy link

Acceptance Criteria

  1. Fix code to make sure the vertical breaks in histograms do not show up
  2. Change histogram test images
  3. Validate the new test images with ggplot (in R)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stash Label used to categorize issues that will be worked on next
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants