Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doug add to ggplot #16

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
161 changes: 157 additions & 4 deletions PlottingInR/inst/extdata/presRaw/ggplot2.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ knitr::opts_chunk$set(echo = TRUE, tidy = T)
if(params$isSlides == "yes"){AsSlides=T}else{AsSlides=F}
```


```{r, results='asis',include=TRUE,echo=FALSE}
if(params$isSlides != "yes"){
cat("# Plotting in R with ggplot2
Expand Down Expand Up @@ -474,6 +475,18 @@ pcPlot_violin <- pcPlot+geom_violin()
pcPlot_violin
```

---
## Combining geoms in the same plot
```{r, mult_geom_ggplot2, fig.height=4.5, fig.width=8}

pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Sex,y=Height))

pcPlot_combine <- pcPlot+ geom_boxplot() + geom_jitter(width = 0.25)

pcPlot_combine
```

---
## There are a world of geoms
An overview of geoms and thier arguments can be found in the ggplot2 documentation or within the ggplot2 quick reference guides.
Expand Down Expand Up @@ -867,7 +880,7 @@ pcPlot + geom_point() + facet_grid(Smokes~Sex)+

---
## Discrete axes scales
Similary control over discrete scales is shown below.
Similarly control over discrete scales is shown below.

```{r scaleDiscrete_ggplot2, facet_grid_smokesBySex_scaleDisceteX, fig.height=5, fig.width=9}
pcPlot <- ggplot(data=patients_clean,aes(x=Sex,y=Height))
Expand Down Expand Up @@ -1107,6 +1120,21 @@ pcPlot + geom_point(size=4,alpha=0.8)+
name="Body Mass Index")
```

---

## Conditional scales and colors

We can also use an ifelse conditional statement to apply discrete color cutoffs for groups of data points that aren't represented by categorical variables in the data set.

```{r, conditional_colors_scales, fig.height=4, fig.width=9}
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,shape=Sex, colour=ifelse(BMI > 30, "high",
ifelse(BMI < 25, "low", "middle"))))
pcPlot + geom_point(size=4,alpha=0.8) +
scale_shape_discrete(name="Gender") +
scale_color_manual(name = "BMI category", values=c("red", "blue", "grey"))
```

---
```{r, results='asis',include=TRUE,echo=FALSE}
if(params$isSlides == "yes"){
Expand Down Expand Up @@ -1155,6 +1183,7 @@ By default a "loess" smooth line is plotted by stat_smooth. Other methods availa
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height))
pcPlot+geom_point()+stat_smooth(method="lm")

```

---
Expand All @@ -1166,7 +1195,7 @@ If color by Sex is an aesthetic mapping then two smooth lines are drawn, one for
```{r, stat_smoothlmgroups_ggplot2, fig.height=4, fig.width=9}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height,colour=Sex))
pcPlot+geom_point()+stat_smooth(method="lm")
pcPlot+geom_point()+stat_smooth(aes(x=Weight,y=Height), method="lm")
```

---
Expand All @@ -1176,11 +1205,38 @@ This behavior can be overridden by specifying an aes within the stat_smooth() fu
```{r, stat_smoothlmgroupsOverridden_ggplot2, fig.height=4, fig.width=9}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height,colour=Sex))
pcPlot+geom_point()+stat_smooth(aes(x=Weight,y=Height),method="lm",
pcPlot+geom_point()+
stat_smooth(aes(x=Weight,y=Height),method="lm",
inherit.aes = F)
```

---

## Marginal plots with ggEtxtra
Plots for either the X or Y axis variables can easily be added to the margins of the plot to display the distributions of variables. By default this is a line. The groupColour and groupFill arguments carry over the colour aesthetic of the main plot.

```{r, marginal_ggplot, fig.height=4, fig.width=9, message = F}
library(ggExtra)
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height,colour=Sex)) + geom_point()
ggMarginal(pcPlot, groupColour = TRUE, groupFill = TRUE)
```

---

## Marginal plots with ggEtxtra
We can easily turn this into a histogram and only display either X or Y axis.

```{r, marginal_ggplot2, fig.height=4, fig.width=9, message = F}

pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height,colour=Sex)) + geom_point()
ggMarginal(pcPlot, groupColour = TRUE, groupFill = TRUE, type = "histogram", margins = "x")
```


---

## Summary statistics
Another useful method is stat_summary() which allows for a custom statistical function to be performed and then visualized.

Expand All @@ -1189,10 +1245,107 @@ The fun parameter specifies a function to apply to the y variables for every val
```{r, stat_summary_ggplot2, fig.height=3.5, fig.width=9}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Sex,y=Height)) + geom_jitter()
pcPlot + stat_summary(fun=quantile, geom="point",
pcPlot +
stat_summary(fun=quantile, geom="point",
colour="purple", size=8)
```

---

## Displaying fitted line statistics
The ggpubr package has many handy functions to make publication quality graphics by displaying statistics. Here we add the equation and the R squared value for the line of best fit


```{r, line_eqn, fig.height=3, fig.width=8}
library(ggpubr)

pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height)) +
geom_point() +
stat_smooth(method="lm", formula = y ~ x)
pcPlot +
stat_regline_equation(label.y = 185, aes(label = paste("Eqn: ", after_stat(eq.label))), formula = y ~ x) +
stat_regline_equation(label.y = 183, aes(label = after_stat(rr.label)), formula = y ~ x)
```

---

## Displaying fitted line statistics
By giving subsets of the data to the stat_regline_equation() function, we can display statistics for each group that we make a line of best fit

```{r, line_eqn_groups, fig.height=3, fig.width=8}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height,colour=Sex)) +geom_point()+ stat_smooth(aes(x=Weight,y=Height), method="lm", formula = y ~ x)
pcPlot +
stat_regline_equation(data = patients_clean[patients_clean$Sex == "Male", ],
label.y = 183, aes(label = after_stat(rr.label)), formula = y ~ x) +
stat_regline_equation(data = patients_clean[patients_clean$Sex == "Female", ],
label.x = 80, label.y = 160, aes(label = after_stat(rr.label)), formula = y ~ x)

```
---

## Displaying stats on the plot
The ggpubr package also has useful functions that allows the display p-values on plots when combined with the rstatix package.

Here we use rstatix to create a data frame with relevant statistics for our desired comparison, and then we add x and y position information. [Check out](https://rpkgs.datanovia.com/rstatix/) the the many other functions rstatix has to add information to this table (adhusted p, otehr stats tests, etc.)

```{r, add_p1, message = F}
library(rstatix)
# https://rpkgs.datanovia.com/rstatix/

stat.test <- t_test(patients_clean, Height ~ Sex)
stat.test <- add_xy_position(stat.test, x = "Sex", dodge = 0.8)

data.frame(stat.test)
```
---

## Displaying stats on the plot
This data frame can then be used in the stat_pvalue_manual() function from ggpubr to add the pvalue.

```{r, add_p2, fig.height=4, fig.width=8}

pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Sex,y=Height)) +
geom_boxplot()

# don't inherit aesthetic to make this work
pcPlot + stat_pvalue_manual(stat.test, label = "p", inherit.aes = F) + scale_y_continuous(expand = expansion(mult = 0.1))

```

---

## Displaying stats for grouped data

By grouping the dataframe, we can look for differences between smokers within sex. We also add an adjusted p-value.
```{r, grouped_p1}

grouped_data <- group_by(patients_clean, Sex)
stat.test <- t_test(grouped_data, formula = Height ~ Smokes)
stat.test <- adjust_pvalue(stat.test, method = "BH")
stat.test <- add_xy_position(stat.test, x = "Sex", dodge = 0.8)

data.frame(stat.test)
```

---

## Displaying stats for grouped data

We can also modify the label of the p-values by putting the column used for the p-value inside curly brackets {} within a string.
```{r, grouped_p2, fig.height=4, fig.width=8}

pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Sex ,y=Height, fill = Smokes)) +
geom_boxplot()

# don't inherit aesthetic to make this work
pcPlot + stat_pvalue_manual(stat.test, label = "p = {p.adj}", inherit.aes = F) + scale_y_continuous(expand = expansion(mult = 0.1))
```


---
```{r, results='asis',include=TRUE,echo=FALSE}
if(params$isSlides == "yes"){
Expand Down
Loading