Skip to content

Latest commit

 

History

History
265 lines (169 loc) · 15.9 KB

README.md

File metadata and controls

265 lines (169 loc) · 15.9 KB

Tutorial 1: How to use Git with R and RStudio

This tutorial in the context of the Reproducible Research Workshop provides you with the first steps on how to use git with R and RStudio.

The repository provides you with a step by step tutorial, that you are reading right now and is at the same time the repository you play with to create your first git project in R.

Objectives of this tutorial:

  • Setup and install Git
  • Setup Git in RStudio
  • Create new Git project in RStudio
  • Clone/fork an existing project from Github
  • Make some commits to your own project.

Motivation

R in combination with the distributed version control system Git provides a convenient setup to make your research project reproducible. Git allows you to track and share your code and analysis.

Some reasons to use version control are:

  • It makes sharing of your projects easy (once it's setup, you'll get there)
  • It facilitates collaboration. People can contribute to your project and vis-versa. Also you can report errors (bugs) or suggest new additions (features) to projects.
  • You can revert back to a previous version, if you find errors or accidently deleted something.
  • You can see what changes between different versions of your code, analysis or written text!
  • In R it makes sharing of your packages easy. And you can install development packages of others with two lines of code. install.packages("devtools"); devtools::install_github("username/packagename") (Development of R packages is more advanced in R, but is a well structured well to keep your projects tidy see: R Packages by Hadley Wickahm)

Github is a user-friendly webservice that allows you to store your project repository remotely. Alternatives are gitlab and bitbucket.

RStudio integrates support for git and svn, hence we are going to use the widely used combination R + Git + RStudio.

Part 1: Installation and setup

1. Installation: To get started you need the following software installed on your computer: Git and if you you are new to R, then you also need to install R and Rstudio. Additionally you will also need a Github account.

  1. Git (Download Git): Download and Install Git . Optional Git clients: SourceTree or GitHub Desktop.
  2. R (Download R): Download and Install R (if not already installed).
  3. RStudio (Download RStudio Desktop): Download and Install RStudio (if not already installed)
  4. GitHub account: On Github create yourself a free GitHub account. If you you are new to Git follow the 15 min TryGit Tutorial to get a quick introduction to Git.

2. Setup Git in RStudio: Tell RStudio where to find the Git installation (see Figure 1).

  1. Open RStudio and go to Tools > Global Options... click on Git/SVN
  2. Check Enable version control interface for RStudio projects
  3. Set the path to the Git executable, that you just installed. Open a shell, if you don't know where Git is installed.
    Windows: type where git and hit enter. The path should be something like: C:/Program Files (x86)/Git/bin/git.exe
    Linux/OS X: type which git and hit enter. The path should be something like: /usr/bin/git
  4. Restart RStudio, if it worked out you will find the Git icon on the top toolbar (see Figure 1).

Figure 1: RStudio: Global Options for Git/SVN

3. Setup Git: Configure Git and set your user name and email (The email address you used to register on GitHub). You can directly open the Git prompt from within RStudio. User name and email needs to be set only once. Go to Tools > Shell to open the Git Shell to tell Git your username and GitHub email (!).

git config --global user.name 'yourGitHubusername'
git config --global user.email 'name@provider.com'

Figure 2: RStudio: Git Shell

Part 2: Create a new RStudio project with Git

There are three ways to create version control for a RStudio project.

a) Create a new project and create a local git repository: Select File > New Project.., create a project from a New Directory and check the option Create a git repository. In order to push to a remote repository later on you add that remote repository by using the Git shell. If you already know which online repository you want to use for your projects, option c) is more convenient.

b) Create a new project from a folder under version control: In this case you only need to create new RStudio project for that directory and version control is automatically enabled. Go to File > New Project, select create a new project from an Existing Directory and create the project.

c) Create a new project based on a remote Git repository: Select File > New Project.. and from the opening menu select to create a new project from Version Control, Choose Git, then provide the repository url (use the https link of the url if you want to avoid all the ssh trouble) from the the repository you want to clone and create the project.

In this tutorial we create a project based on a remote GitHub repository (c). Hence we first create a new repository on github and create our github project from that repository.

1. Create a new GitHub repository: Login to your GitHub account and create a new GitHub repository. Give your new repository a short and memmorable name e.g. rstudio-git-test, check the option to initialize this repository with a README and create the repository (see Figure 3).

Figure 3: GitHub: Create a new repository

2. Copy the repository HTTPS url: To create a new Git based project in RStudio we need the repository url. You find the repository HTTPS url on the just created GitHub project page. There press the button Clone or download and copy the HTTPS link of the project (see Figure 4). The link will be something like https://github.com/yourusername/rstudio-git-test.git.

Figure 4: GitHub: Copy repository HTTPS link

3. Create a new RStudio project with Git version control: Now everything is ready to create a new project with Git version control in RStudio. In RStudio Select File > New Project.., select Version Control, Choose Git, then provide the repository HTTPS link, select the R workspace folder and create the project. RStudio now copies (clone in Git terms) the content of the repository to your project folder. The content of the GitHub repository should now appear in the Files pane of RStudio and you should see there the created README.md.

Figure 5: RStudio: Create a new Git project

Part 3: Make local changes, commit and push to GitHub

1. Make local changes: Open the README.md file and edit and save the file.

# RR project in RStudio
RR workshop RStudio + Git repository

My first commit to GitHub with R

2. Commit the changes: Now we commit the local changes to the local Git repository.
In RStudio press the Git icon and select Commit.. from Git menu (Ctrl+Alt+M) to open the commit window to review the changes in the repository. In the Staged column we select by checking the checkbox the files we want to commit. The lower pane shows the edits in green and red of the file (see Figure 6). Enter a commit message to indicate what has changed in this commit e.g.Readme update and press the Commit button.

Figure 6: RStudio: Commit window

3. Push to the remote repository: To push the changes to the remote GitHub repository press the Push button on the upper right corner of the commit window. You will be prompted to enter the username and password of your GitHub account, enter them and check on the GitHub page if the changes got pushed to your online repository on GitHub.

Now that you successfully pushed your first edits to a remote repository, repeat the above steps with a further file or R script that you create and edit, such as for instance the one below.

# Simple R file
# R example data.frame "cars"
str(cars)     # show the structure
summary(cars) # summary of the variables
plot(cars)    # plot speed against distance

Part 4: Fork a repository

Forking a project allows you to clone a repository on server-side and make it the starting point of your own project. A fork creates a personal copy of another repository. (See also the Github Forking guide)

1. Fork a repository on GitHub: Open https://github.com/314a/rr-r-publication and press the fork icon (in the upper right side of the project page) to fork this project to your own github account. On your GitHub page https://github.com/username the forked project should appear then in the list of your repositories.
Figure 7: GitHub: Fork a repository

2. Copy the HTTPS repository url: https://github.com/yourusername/rr-r-publication.git from your forked repository (see Part 2 Step 2) 3. Create a new RStudio project with Git like you have already done in part 2 step 3 of this tutorial

Advanced: Link to a (different) remote repository

You may already have a local repository, like in this tutorial the rstudio-git-test repository, or want to fork your own project (which GitHub interestingly doesn't provide the options to do so). In this case you need to link your local repository with a (new) remote repository (and remove the old remote repository).

1. Create a new GitHub repository and copy the HTTPS url of the new repository e.g. https://github.com/yourusername/rstudio-git-test2.git.

2. Set the new remote repository in the Git shell: Open the Git shell from RStudio Tools > Shell.. and type the following commands to set the new remote repository.
Type git remote -v show to show current remote repository to verify the location of the current repository.

git remote set-url origin https://github.com/yourusername/rstudio-git-test2.git
git remote add upstream https://github.com/yourusername/rstudio-git-test2.git
git push origin master
git push --all

Note: In case push/pull is greyed out in R stackoverflow use git push -f origin master and then git push -u origin master.

Note 2: Use git remote rm origin and git remote rm upstream if you want to remove the remote location from the current git folder. (origin is rather a convention than a command)

Tutorial 2: Writing publications with R (Work in progress!)

This tutorial in the context of the Reproducible Research Workshop provides you with the first steps on how to write publications in R.

Objectives of this tutorial:

  • Install Miktex etc.
  • Load a template project to RStudio (or Fork it from GitHub, see part 4 of the previous tutorial )
  • Generate an example report as an HTML, Word or Latex document
  • Prepare a publications to use in overleaf

Report generation

R - with the help of some tools - enables you to automatically generate reports from your analysis.

: R report generation


RMarkdown: Convenient to produce reproducible documents. It allows to combine your text content and code in one single file. Good for version control. Markdown: Simple markup language, fast to write and easy to read, lacks fancy formating options (needed?) knitr: R package for dynamic report generation in R Pandoc: Universal document converter, pandoc is your swiss army-knife to render documents from one markup language into another.


Report generation workflow

Statistical report creation with RMarkdown + knitr + pandoc in various formats

  1. Create an .Rmd file (Markdown with R code snippets)
  2. Write your report and include your data, code, analysis and text
  3. Use knitr to create a markdown file
  4. Convert the files with pandoc to generate html (can be self-contained), doc, Word documents

In RStudio the knit button combines steps 3+4 behind the scene to compile the documents from the RMarkdown file.

RStudio process

RMarkdown + RStudio

RStudio:
Simply go to File -> New File -> R Markdown...
and select in the opening interface "Document".

Note: For PDF generation you need an installed latex environment (e.g. Miktex )

R Markdown File

RMarkdown

RMarkown file consists of a YAML Metadata block, Markdown text elements and R Code chunks.

R Markdown

Create and structure an R project


Folder structure: What structure might be useful? Data organisation: How do you organise your (raw/derived) data? Documentation: What do you document? Will you be reusing the data? Difficult parts, will you remember how you did it? Scope: What's the scope of the project? Report: Report structure Reusability: Which functionalities will you reuse? Extendable: What if the project becomes larger?


Example folder structure


R R folder storing all the .r code files data Data folder with the raw and the derived data (e.g. data.csv, data.RData) figures Figure folder (e.g. pictures, logo etc.) myproject.RProj RStudio project file ProjectReport.Rmd RMarkdown storing the report text and R analysis code ProjectReport.pdf Generated report from the RMarkdown file Readme.txt Information about the project. (good practice)


This project folder:
project folder

Workflow

Structure your project into the following steps:

  1. Data collection
  2. Preprocessing
  3. Analysis
  4. Presentation

Practical tips

  1. Create an RStudio project for every project in a separate folder
  2. Document everything, your documents should be understandable by someone other than you
  3. Plan your project, organise and store your data, code and reports
  4. Start small, with a subset of your data
  5. Link your workflow (e.g. data files as an input to your analysis files)

Cheatsheets

Further reading

[1] L. Brundsdon, Chris; Comber, An Introduction to R for Spatial Analysis & Mapping. London: Sage Publications Ltd, 2015.
[2] J. Paulson, “Version Control with Git and SVN,” 2016. [Online]. Available: https://support.rstudio.com/hc/en-us/articles/200532077-Version-Control-with-Git-and-SVN.
[3] H. Wickham, “Git and GitHub,” R packages, 2015. [Online]. Available: http://r-pkgs.had.co.nz/git.html.
[4] www.codeschool.com, “tryGit Tutorial.” [Online]. Available: https://try.github.io. [5] K. Broman, “git/github guid.” [Online]. Available: http://kbroman.org/github_tutorial/.