Skip to content

Segmentation Tutorial Part 5

vrrenske edited this page Oct 24, 2019 · 4 revisions

Using ggplot2 to compare segmentations

Finally, we are there: the analysis. Is there a difference in segmentation between programs? And between people?

Below, I’ll show how to use ggplot2 and basic R tools to get answers. If you're just here to learn BactMAP, you can skip through to the next section where we will visualize the segmentation results next to each other.

Distribution of cell length

To plot the distribution of cell length, I can quickly make a box plot or a violin plot of the cell distribution. For this, I use ggplot2, a plotting system where you first give the basics (dataset, variables) to ggplot(), after which you add layer upon layer to make it look nicer.

Note that now I first load ggplot2, because I’m going to use a lot of ggplot2 functions below.


ggplot(allFrames$finalframe, aes(x=condition, y=max_um)) + geom_violin()

You can see that there is quite a difference in the distributions! It’s even so large, that I would like to get a bit more information on the single data points. You can plot those using geom_dotplot() or geom_jitter() instead of geom_violin().

However, there’s one issue. Because the mesh dataset contains many points of x/y coordinates for the outline of each cell, each cell is represented many times in the dataset. Therefore, it is better to make the dataset smaller by only taking the variables which are uniform per cell before doing any other plotting - I do this by using the command unique(). A bonus point is that a small dataset is always a bit faster to work with.

Below, I take the unique data points of the columns "cell", "frame", "condition", "max_um", "maxwum" and "area" of allFrames$finalframe:

onePerCell <- unique(allFrames$finalframe[,c("cell", "frame", "condition", "max_um", "maxwum", "area")])

Now I have a dataset with only 1 row per cell, I can plot the single data points. To give you an idea of the options available, I also add a boxplot on top to show the distribution, use theme_minimal() to get a white background and change the y axis label using ylab. Check the documentation on geom_dotplot and geom_boxplot for more information on their layout options.

ggplot(onePerCell, aes(x=condition, y=max_um)) + #base plot
  geom_dotplot(binaxis="y", stackdir="center", binwidth=0.05, color=NA, fill="grey") + #dotplot over y axis, centered, grey fill
  geom_boxplot(fill="white", size=1, width=0.1, outlier.color=NA) + # small (width=0.1) boxplot, removed outliers
  theme_minimal() + #black/white simple layout
  ylab("cell length (micron)") #y axis label

I like to see the single data points because you get a better view of the outliers. In the last paragraph of this tutorial I’ll list a few extra options for plotting nice graphs.

Number of cells per segmentation

When looking at the previous graph, it looks like the amount of cells found per segmentation is pretty similar. Let’s check just to be sure. We can do that by having a look at the frequency table:

##        Clement_Oufti            Jun_Oufti      Jun_SuperSegger
##                  844                  952                  867
##       Lance_MicrobeJ Renske_Morphometrics
##                  806                  802

Alternatively, we can plot a bar graph:

ggplot(onePerCell, aes(x=condition)) + geom_bar()

Apart from cell length and number, we can also look at other cell dimensions, as width or area. I will show these in the end of the tutorial.

⬅️ Segmentation Tutorial part 4: Using combineDataframes Segmentation Tutorial part 6: Compare segmentations visually ➡️
Clone this wiki locally