R tips

This repository contains R programming tips covering topics across data cleaning, data visualisation, machine learning, statistical theory and data productionisation.

Many kudos to Dr Chuanxin Liu, my former PhD student and code editor, for teaching me how to code in R in my past life as an immunologist.

Content summary

Legend	Category
📚	Data cleaning
🎨	Data visualisation
🔮	Machine learning
🔨	Productionisation
🔢	Statistical theory

Tutorials

🎨 Data visualisation

An introduction to ggplot2 using volcano plots (Updated)
Using DiagrammeR to draw flow charts (Updated)

📚 Data cleaning

🔨 Productionisation

🔮 Machine learning

Working with dummy variables and factors

🔢 Statistical theory

Tutorial style guide

A painful form of technical debt is inconsistent code style. This repository now contains the following file naming and code style rules.

Folders are no longer ordered with a numerical prefix and names are no longer case sensitive e.e.g r_tips\tutorials\... and r_tips\figures\...
Tutorial subtopics share the same prefix e.g. r_tips\tutorials\dv-... and r_tips\tutorials\st-...
File names contain - to separate file name prefixes and _ instead of other white space e.g. r_tips\figures\dv-using_diagrammer-simple_flowchart.svg
Comments are styled according to the tidyverse style guide:
- The first comment explains the purpose of the code chunk and is styled differently for enhanced readability e.g. # Code as header --------
- Comments are written in sentence case and only end with a full stop if they contain at least two sentences
- Short comments explaining a function argument do not have to be written on a new line
- Comments should not be followed by a blank line, unless the comment is a stand-alone paragraph containing in-depth rationale or an alternative solution
R code chunks are styled as follows:
- Each R chunk should be named with a short unique description written in the active voice e.g. create basic plot and modify plot labels
- Arguments inside code chunks should not contain white space and boolean argument options should be written in capitals e.g. {r load libraries, message=FALSE, warning = FALSE}
- To render the github document, results are generally suppressed using results='hide' and manually entered in a new line beneath the code.
- To render the github document, figures are generally outputed using fig.show='hold' and figure outputs can then be suppressed at the local chunk level using fig.show='hide'
Set a margin of 80 characters length in RStudio through Tools\Global options --> Code --> Display --> Show margin and use this margin as the cut-off for code and comments length

Citations

Hadley Wickham (2017). tidyverse: Easily Install and Load the 'Tidyverse'. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse
Matt Dowle and Arun Srinivasan (2019). data.table: Extension of data.frame. R package version 1.12.6. https://CRAN.R-project.org/package=data.table
Hadley Wickham (2019). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.4.0. https://CRAN.R-project.org/package=stringr
Max Kuhn. (2019). caret: Classification and Regression Training. R package version 6.0-84. https://CRAN.R-project.org/package=caret
- Contributions from Jed Wing, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton Kenkel, the R Core Team, Michael Benesty, Reynald Lescarbeau, Andrew Ziem, Luca Scrucca, Yuan Tang, Can Candan and Tyler Hunt.
Jacob Kaplan (2020). fastDummies: Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables. R package version 1.6.1. https://CRAN.R-project.org/package=fastDummies
Kirill Müller (2017). here: A Simpler Way to Find Your Files. R package version 0.1. https://CRAN.R-project.org/package=here
Paul Murrell (2015). compare: Comparing Objects for Differences. R package version 0.2-6. https://CRAN.R-project.org/package=compare
A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18--22.
Tianqi Chen, Tong He, Michael Benesty, Vadim Khotilovich, Yuan Tang, Hyunsu Cho, Kailong Chen, Rory Mitchell, Ignacio Cano, Tianyi Zhou, Mu Li, Junyuan Xie, Min Lin, Yifeng Geng and Yutian Li (2020). xgboost: Extreme Gradient Boosting. R package version 1.0.0.2. https://CRAN.R-project.org/package=xgboost
Alexandros Karatzoglou, Alex Smola, Kurt Hornik, Achim Zeileis (2004). kernlab - An S4 Package for Kernel Methods in R. Journal of Statistical Software 11(9), 1-20. URL http://www.jstatsoft.org/v11/i09/
Microsoft Corporation and Steve Weston (2019). doParallel: Foreach Parallel Adaptor for the parallel Package. R package version 1.0.15. https://CRAN.R-project.org/package=doParallel
Richard Iannone (2020). DiagrammeR: Graph/Network Visualization. R package version 1.0.6.1. https://CRAN.R-project.org/package=DiagrammeR

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
figures		figures
raw_data		raw_data
tutorials		tutorials
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
r_tips.Rproj		r_tips.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

R tips

Content summary

Tutorials

🎨 Data visualisation

📚 Data cleaning

🔨 Productionisation

🔮 Machine learning

🔢 Statistical theory

Tutorial style guide

Citations

About

Releases

Packages

Languages

License

CoreyOdonis/r_tips

Folders and files

Latest commit

History

Repository files navigation

R tips

Content summary

Tutorials

🎨 Data visualisation

📚 Data cleaning

🔨 Productionisation

🔮 Machine learning

🔢 Statistical theory

Tutorial style guide

Citations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages