-
Notifications
You must be signed in to change notification settings - Fork 1
/
Github_intro.Rmd
875 lines (502 loc) · 47 KB
/
Github_intro.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
---
title: "Version control with Git and GitHub"
output:
html_document:
css: custom.css
toc: yes
toc_depth: 3
---
\
This short(ish!) tutorial will introduce you to the basics of using a version control system to keep track of all your important PhD documents and facilitate collaboration with colleagues and the wider world. The tutorial will focus on using the software 'Git' in combination with the web-based hosting service 'GitHub'. By the end of the tutorial, you will be able to install and configure Git and GitHub on your computer and setup and work with a version controlled project in RStudio. We won't be covering more advanced topics such as branching, forking and pull requests in much detail but I do give an overview [later](#collab) on in the tutorial.
I estimate that this tutorial should take you roughly 1.5 to 2.5 hours to complete in one sitting (to be honest I have no idea really so this is just a guess), but feel free to dip in and out over a longer period if that suits you better.
Just a few notes of caution. In this tutorial we'll be using RStudio to interface with Git as it gives you a nice friendly graphical user interface which generally makes life a little bit easier (and who doesn't want that?). However, one downside to using RStudio with Git is that RStudio only provides pretty basic Git functionality through its menu system. That's fine for most of what we'll be doing during this tutorial (although I will introduce a few Git commands as we go along) but if you really want to benefit from using Git's power you will need to [learn](#resources) some Git commands and syntax. This leads me on to my next point. I'm not going to lie, Git can become a little bewildering and frustrating when you first start using it. This is mostly due to the terminology and liberal use of jargon associated with Git, but there's no hiding the fact that it's quite easy to get yourself and your Git repository into a pickle. Therefore, I have tried hard to keep things as straight forward as I can during this tutorial and as a result I do occasionally show you a couple of very 'un-Git' ways of doing things (mostly about reverting to previous versions of documents). Don't get hung up about this, there's no shame to using these low tech solutions and if it works then it works. Lastly, GitHub was not designed to host very large files and will warn you if you try to add files greater than 50 MB and block you adding files greater than 100 MB. If your project involves using large file sizes there are a [few solutions](https://help.github.com/en/github/managing-large-files/configuring-git-large-file-storage) but I find the easiest is to host these files elsewhere (Googledrive, Dropbox etc) and create a link to them in a README file or R markdown document on Github.
\
## What is version control?
\
A [Version Control System](https://en.wikipedia.org/wiki/Version_control) (VCS) keeps a record of all the changes you make to your files that make up a particular project and allows you to revert to previous versions of files if you need to. To put it another way, if you muck things up or accidentally lose important files you can easily roll back to a previous stage in your project to sort things out. Version control was originally designed for collaborative software development, but it's equally useful for scientific research and collaborations (although admittedly a lot of the terms, jargon and functionality are focused on the software development side). There are many different version control systems currently available, but we' we'll focus on using *Git*, because it's free and open source and it integrates nicely with RStudio. This means that its can easily become part of your usual workflow with minimal additional overhead.
\
## Why use version control?
\
So why should you worry about version control? Well, first of all it helps avoid this (familiar?) situation when you're working on a project
\
```{r youNeedVC1, echo=FALSE, out.width="75%", fig.align="center", fig.cap="You need version control"}
knitr::include_graphics(path = "images/Messy_folder.PNG")
```
\
usually arising from this (familiar?) scenario
\
```{r youNeedVC2, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/final_doc.gif")
```
\
Version control automatically takes care of keeping a record of all the versions of a particular file and allows you to revert back to previous versions if you need to. Version control also helps you (especially the future you) keep track of all your files in a single place and it helps others (especially collaborators) review, contribute to and reuse your work through the GitHub website. Lastly, your files are always available from anywhere and on any computer, all you need is an internet connection.
\
## What is Git and GitHub?
\
**Git** is a version control system originally developed by [Linus Torvalds](https://en.wikipedia.org/wiki/Linus_Torvalds) that lets you track changes to a set of files. These files can be any type of file including the menagerie of files that typically make up a data orientated project (.pdf, .Rmd, .docx, .txt, .jpg etc) although plain text files work the best. All the files that make up a project is called a **repository** (or just **repo**).
**GitHub** is a web-based hosting service for Git repositories which allows you to create a remote copy of your local version-controlled project. This can be used as a backup or archive of your project or make it accessible to you and to your colleagues so you can work collaboratively.
At the start of a project we typically (but not always) create a **remote** repository on GitHub, then **clone** (think of this as copying) this repository to our **local** computer (the one in front of you). This cloning is usually a one time event and you shouldn't need to clone this repository again unless you really muck things up. Once you have cloned your repository you can then work locally on your project as usual, creating and saving files for your data analysis (scripts, R markdown documents, figures etc). Along the way you can take snapshots (called **commits**) of these files after you've made important changes. We can then **push** these changes to the remote GitHub repository to make a backup or make available to our collaborators. If other people are working on the same project (**repository**), or maybe you're working on a different computer, you can **pull** any changes back to your local repository so everything is synchronised.
\
```{r git-cart, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/Github_cartoon2.png")
```
\
## Getting started {#setup_git}
\
This tutorial assumes that you have already installed the latest versions of R and RStudio. If you haven't done this yet you can find instructions [here](setup.html).
\
### Install Git
\
To get started, you first need to install Git. If you're lucky you may already have Git installed (especially if you have a Mac or Linux computer). You can check if you already have Git installed by clicking on the Terminal tab in the Console window in RStudio and typing `git --version` (the space after the `git` command is important). If you see something that looks like `git version 2.25.0` (the version number may be different on your computer) then you already have Git installed (happy days). If you get an error (something like `git: command not found`) this means you don't have Git installed (yet!).
\
```{r git-version, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/git_version.png")
```
\
You can also do this check outside RStudio by opening up a separate Terminal if you want. On Windows go to the 'Start menu' and in the search bar (or run box) type `cmd` and press enter. On a Mac go to 'Applications' in Finder, click on the 'Utilities' folder and then on the 'Terminal' program. On a Linux machine simply open the Terminal (Ctrl+Alt+T often does it).
To install Git on a **Windows** computer we recommend you download and install Git for Windows (also known as 'Git Bash'). You can find the download file and installation instructions [here](https://git-scm.com/downloads).
For those of you using a **Mac** computer we recommend you download Git from [here](https://git-scm.com/downloads) and install in the usual way (double click on the installer package once downloaded). If you've previously installed Xcode on your Mac and want to use a more up to date version of Git then you will need to follow a few more steps documented [here](https://github.com/timcharper/git_osx_installer). If you've never heard of Xcode then don't worry about it!
For those of you lucky enough to be working on a **Linux** machine you can simply use your OS package manager to install Git from the official repository. For Ubuntu Linux (or variants of) open your Terminal and type
\
```markup
sudo apt update
sudo apt install git
```
\
You will need administrative privileges to do this. For other versions of Linux see [here](https://git-scm.com/download/linux) for further installation instructions.
Whatever version of Git you're installing, once the installation has finished verify that the installation process has been successful by running the command `git --version` in the Terminal tab in RStudio (as described above). On some installations of Git (yes I'm looking at you MS Windows) this may still produce an error as you will also need to setup RStudio so it can find the Git executable (described [below](#rs_config)).
\
### Configure Git
\
After installing Git, you need to configure it so you can use it. Click on the Terminal tab in the Console window again and type the following:
\
```
git config --global user.email 'you@youremail.com'
git config --global user.name 'Your Name'
```
\
substituting `'Your Name'` for your actual name and `'you@youremail.com'` with your email address. We recommend you use your University email address as you will also use this address when you register for your GitHub account (coming up in a bit).
If this was successful, you should see no error messages from these commands. To verify that you have successfully configured Git type the following into the Terminal
\
```
git config --global --list
```
\
You should see both your `user.name` and `user.email` configured.
\
### Configure RStudio {#rs_config}
\
As you can see above, Git can be used from the command line, but it also integrates well with RStudio, providing a friendly graphical user interface. If you want to use RStudio's Git integration (we recommend you do - at least at the start), you need to check that the path to the Git executable is specified correctly. In RStudio, go to the menu `Tools` -> `Global Options` -> `Git/SVN` and make sure that 'Enable version control interface for RStudio projects' is ticked and that the 'Git executable:' path is correct for your installation. If it's not correct hit the `Browse...` button and navigate to where you installed git and click on the executable file. You will need to restart RStudio after doing this.
\
```{r git-rs, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/git_path.png")
```
\
### Register a GitHub account
\
If all you want to do is to keep track of files and file versions on your local computer then Git is sufficient. If however, you would like to make an off-site copy of your project or make it available to your collaborators then you will need a web-based hosting service for your Git repositories. This is where GitHub comes into play (there are also other services like [GitLab](https://about.gitlab.com/), [Bitbucket](https://bitbucket.org/product) and [Savannah](https://savannah.gnu.org/)). You can sign up for a free account on GitHub [here](https://github.com/join?source=header-home). You will need to specify a username, an email address and a strong password. We suggest that you use your University email address as this will also allow you to apply for a free [educator or researcher account](https://help.github.com/en/github/teaching-and-learning-with-github-education/applying-for-an-educator-or-researcher-discount) later on which gives you some useful [benefits](https://help.github.com/en/github/teaching-and-learning-with-github-education/about-github-education-for-educators-and-researchers) (don't worry about this now though). When it comes to choosing a username we suggest you give this some thought. Choose a short(ish) rather than a long username, use all lowercase and hyphenate if you want to include multiple words, find a way of incorporating your actual name and lastly, choose a username that you will feel comfortable revealing to your future employer!
Next click on the 'Select a plan' (you may have to solve a simple puzzle first to verify you're human) and choose the 'Free Plan' option. Github will send an email to the email address you supplied for you to verify.
Once you've completed all those steps you should have both Git and GitHub setup up ready for you to use (Finally!).
\
## Setting up a project in RStudio
\
Now that you're all set up, let's create your first version controlled RStudio project. There are a couple of different approaches you can use to do this. You can either setup a remote GitHub repository first then connect an RStudio project to this repository (we'll call this Option 1). Another option is to setup a local repository first and then link a remote GitHub repository to this repository (Option 2). You can also connect an existing project to a GitHub repository but we won't cover this here. I suggest that if you're completely new to Git and GitHub then use Option 1 as this approach sets up your local Git repository nicely and you can **push** and **pull** immediately. Option 2 requires a little more work and therefore there are more opportunities to go wrong. We will cover both of these options below.
\
### Option 1 - GitHub first {#opt1}
\
To use the GitHub first approach you will first need to create a **repository (repo)** on GitHub. Go to your [GitHub page](https://github.com/) and sign in if necessary. Click on the 'Repositories' tab at the top and then on the green 'New' button on the right
\
```{r new-repo1, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/new_repo.png")
```
\
Give your new repo a name (let's call it `first_repo` for this tutorial), select 'Public', tick on the 'Initialize this repository with a README' (this is important) and then click on 'Create repository' (ignore the other options for now).
\
```{r new-repo2, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/new_repo_opt.png")
```
\
Your new GitHub repository will now be created. Notice the README has been rendered in GitHub and is in markdown (.md) format (see the other [tutorial](Rmarkdown_intro.html) on R markdown if this doesn't mean anything to you). Next click on the green 'Clone or Download' button and copy the `https//...` URL that pops up for later (either highlight it all and copy or click on the copy to clipboard icon to the right).
\
```{r new-repo3, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/new_repo_clone.png")
```
\
Ok, we now switch our attention to RStudio. In RStudio click on the `File` -> `New Project` menu. In the pop up window select `Version Control`.
\
```{r new-proj1, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/proj_setup1.png")
```
\
Now paste the the URL you previously copied from GitHub into the `Repository URL:` box. This should then automatically fill out the `Project Directory Name:` section with the correct repository name (it's important that the name of this directory has the same name as the repository you created in GitHub). You can then select where you want to create this directory by clicking on the `Browse` button opposite the `Create project as a subdirectory of:` option. Navigate to where you want the directory created and click OK. I also tick the `Open in new session` option.
\
```{r new-proj2, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/proj_setup2.png")
```
\
RStudio will now create a new directory with the same name as your repository on your local computer and will then **clone** your remote repository to this directory. The directory will contain three new files; `first_repo.Rproj` (or whatever you called your repository), `README.md` and `.gitignore`. You can check this out in the `Files` tab usually in the bottom right pane in RStudio. You will also have a `Git` tab in the top right pane with two files listed (we will come to this later on in the tutorial). That's it for Option 1, you now have a remote GitHub repository set up and this is linked to your local repository managed by RStudio. Any changes you make to files in this directory will be version controlled by Git.
\
```{r new-proj2a, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/proj_setup3.png")
```
\
### Option 2 - RStudio first {#opt2}
\
An alternative approach is to create a local RStudio project first and then link to a remote Github repository. As I mentioned before, this option is more involved than Option 1 so feel free to skip this now and come back later to it if you're interested. This option is also useful if you just want to create a local RStudio project linked to a local Git repository (i.e. no GitHub involved). If you want to do this then just follow the instructions below omitting the GitHub bit.
\
In RStudio click on the `File` -> `New Project` menu and select the `New Directory` option.
\
```{r new-proj3, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/new_proj_RS1.png")
```
\
In the pop up window select the `New Project` option
\
```{r new-proj4, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/proj_setup_RS.png")
```
\
In the New Project window specify a `Directory name` (choose `second_repo` for this tutorial) and select where you would like to create this directory on you computer (click the `Browse` button). Make sure the `Create a git repository` option is ticked
\
```{r new-proj5, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/new_proj_RS2.png")
```
\
This will create a version controlled directory called `second_repo` on your computer that contains two files, `second_repo.Rproj` and `.gitignore` (there might also be a `.Rhistory` file but ignore this). You can check this by looking in the `Files` tab in RStudio (usually in the bottom right pane).
\
```{r new-proj6, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/new_proj_RS3.png")
```
\
OK, before we go on to create a repository on GitHub we need to do one more thing - we need to place our `second_repo.Rproj` and `.gitignore`files under version control. Unfortunately we haven't covered this in detail yet so just follow the next few instructions (blindly!) and we'll revisit them in the [Using Git](#use_git) section of this tutorial.
To get our two files under version control click on the 'Git' tab which is usually in the top tight pane in RStudio
\
```{r new-proj7, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/new_proj_RS4.png")
```
\
You can see that both files are listed. Next, tick the boxes under the 'Staged' column for both files and then click on the 'Commit' button.
\
```{r new-proj8, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/new_proj_RS5.png")
```
\
This will take you to the 'Review Changes' window. Type in the commit message 'First commit' in the 'Commit message' window and click on the 'Commit' button. A new window will appear with some messages which you can ignore for now. Click 'Close' to close this window and also close the 'Review Changes' window. The two files should now have disappeared from the Git pane in RStudio indicating a successful commit.
\
```{r new-proj9, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/new_proj_RS6.png")
```
\
OK, that's those two files now under version control. Now we need to create a new repository on GitHub. In your browser go to your [GitHub page](https://github.com/) and sign in if necessary. Click on the 'Repositories' tab and then click on the green 'New' button on the right. Give your new repo the name `second_repo` (the same as your version controlled directory name) and select 'Public'. This time **do not** tick the 'Initialize this repository with a README' (this is important) and then click on 'Create repository'.
\
```{r new-proj10, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/new_proj_RS7.png")
```
\
This will take you to a Quick setup page which provides you with some code for various situations. The code we are interested in is the code under `...or push an existing repository from the command line` heading.
\
```{r new-proj11, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/new_proj_RS8.png")
```
\
Highlight and copy the first line of code (note: yours will be slightly different as it will include your GitHub username not mine)
`git remote add origin https://github.com/alexd106/second_repo.git`
Switch to RStudio, click on the 'Terminal' tab and paste the command into the Terminal. Now go back to GitHub and copy the second line of code
`git push -u origin master`
and paste this into the Terminal in RStudio. You should see something like this
\
```{r new-proj12, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/new_prog_RS9.png")
```
\
If you take a look at your repo back on GitHub (click on the `/second_repo` link at the top) you will see the `second_repo.Rproj` and `.gitignore` files have now been **pushed** to GitHub from your local repository.
\
```{r new-proj13, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/new_proj_RS10.png")
```
\
The last thing we need to do is create and add a **README** file to your repository. A README file describes your project and is written using the same Markdown language you learned in the R markdown [tutorial](Rmarkdown_intro.html). A good README file makes it easy for others (or the future you!) to use your code and reproduce your project. You can create a README file in RStudio or in GitHub. Let's use the second option.
In your repository on GitHub click on the green `Add a README` button.
\
```{r new-proj14, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/new_proj_RS11.png")
```
\
Now write a short description of your project in the `<> Edit new file` section and then click on the green `Commit new file` button.
\
```{r new-proj15, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/new_proj_RS12.png")
```
\
You should now see the `README.md` file listed in your repository. It won't actually exist on your computer yet as you will need to **pull** these changes back to your local repository, but more about that in the next section.
\
Whether you followed Option 1 or Option 2 (or both) you have now successfully setup a version controlled RStudio project (and associated directory) and linked this to a GitHub repository. Git will now monitor this directory for any changes you make to files and also if you add or delete files. If the steps above seem like a bit of an ordeal, just remember, you only need to do this once for each project and it gets much easier over time.
\
## Using Git {#use_git}
\
Now that we have our project and repositories (both local and remote) set up, it's finally time to learn how to use Git in RStudio!
Typically, when using Git your workflow will go something like this:
1. You create/delete and edit files in your project directory on your computer as usual (saving these changes as you go)
2. Once you've reached a natural 'break point' in your progress (i.e. you'd be sad if you lost this progress) you **stage** these files
3. You then **commit** the changes you made to these staged files (along with a useful commit message) which creates a permanent snapshot of these changes
4. You keep on with this cycle until you get to a point when you would like to **push** these changes to GitHub
5. If you're working with other people on the same project you may also need to **pull** their changes to your local computer
```{r workflow, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/workflow_3.png")
```
\
OK, let's go through an example to help clarify this workflow. In RStudio open up the `first_repo.Rproj` you created previously during Option 1. Either use the `File` -> `Open Project` menu or click on the top right project icon and select the appropriate project.
\
```{r rm0, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/first_Rmd0.png")
```
\
Create an R markdown document inside this project by clicking on the `File` -> `New File` -> `R markdown` menu (remember from the R markdown [tutorial](Rmarkdown_intro.html)?).
Once created, we can delete all the example R markdown code (except the YAML header) as usual and write some interesting R markdown text and include a plot. We'll use the inbuilt `cars` dataset to do this. Save this file (cmd + s for Mac or ctrl + s in Windows). Your R markdown document should look something like the following (it doesn't matter if it's not exactly the same).
\
```{r rm1, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/first_Rmd.png")
```
\
Take a look at the 'Git' tab which should list your new R markdown document (`first_doc.Rmd` in this example) along with `first_repo.Rproj`, and `.gitignore` (you created these files previously when following Option 1).
\
```{r rm2, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/first_Rmd2.png")
```
\
Following our workflow, we now need to **stage** these files. To do this tick the boxes under the 'Staged' column for all files. Notice that there is a status icon next to the box which gives you an indication of how the files were changed. In our case all of the files are to be added (capital A) as we have just created them.
\
```{r rm3, echo=FALSE, out.width="15%", fig.align="center"}
knitr::include_graphics(path = "images/rstudio_git_cols.png")
```
\
After you have staged the files the next step is to **commit** the files. This is done by clicking on the 'Commit' button.
\
```{r rm4, echo=FALSE, out.width="50%", fig.align="center"}
knitr::include_graphics(path = "images/first_Rmd3.png")
```
\
After clicking on the 'Commit' button you will be taken to the 'Review Changes' window. You should see the three files you staged from the previous step in the left pane. If you click on the file name `first_doc.Rmd` you will see the changes you have made to this file highlighted in the bottom pane. Any content that you have added is highlighted in green and deleted content is highlighted in red. As you have only just created this file, all the content is highlighted in green. To commit these files (take a snapshot) first enter a mandatory commit message in the 'Commit message' box. This message should be relatively short and informative (to you and your collaborators) and indicate why you made the changes, not what you changed. This makes sense as Git keeps track of *what* has changed and so it is best not to use commit messages for this purpose. It's traditional to enter the message 'First commit' (or 'Initial commit') when you commit files for the first time. Now click on the 'Commit' button to commit these changes.
\
```{r rm5, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/first_Rmd4.png")
```
\
A summary of the commit you just performed will be shown. Now click on the 'Close' button to return to the 'Review Changes' window. Note that the staged files have now been removed.
\
```{r rm6, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/first_Rmd5.png")
```
\
Now that you have committed your changes the next step is to **push** these changes to GitHub. Before you push your changes it's good practice to first **pull** any changes from GitHub. This is especially important if both you and your collaborators are working on the same files as it keeps you local copy up to date and avoids any potential conflicts. In this case your repository will already be up to date but it's a good habit to get into. To do this, click on the 'Pull' button on the top right of the 'Review Changes' window. Once you have pulled any changes click on the green 'Push' button to push your changes. You will see a summary of the push you just performed. Hit the 'Close' button and then close the 'Review Changes' window.
\
```{r rm7, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/first_Rmd6.png")
```
\
To confirm the changes you made to the project have been pushed to GitHub, open your GitHub page, click on the Repositories link and then click on the `first_repo` repository. You should see four files listed including the `first_doc.Rmd` you just pushed. Along side the file name you will see your last commit message ('First commit' in this case) and when you made the last commit.
\
```{r rm8, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/first_Rmd7.png")
```
\
To see the contents of the file click on the `first_doc.Rmd` file name.
\
```{r rm9, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/first_Rmd8.png")
```
\
### Tracking changes
\
After following the steps outlined above, you will have successfully modified an RStudio project by creating a new R markdown document, staged and then committed these changes and finally pushed the changes to your GitHub repository. Now let's make some further changes to your R markdown file and follow the workflow once again but this time we 'll take a look at how to identify changes made to files, examine the commit history and how to restore to a previous version of the document.
In RStudio open up the `first_repo.Rproj` file you created previously (if not already open) then open the `first_doc.Rmd` file (click on the file name in the `Files` tab in RStudio).
Let's make some changes to this document. Delete the line beginning with 'My first version controlled ...' and replace it with something more informative (see figure below). We will also change the plotted symbols to red and give the plot axes labels. Lastly, let's add a summary table of the dataframe using the `kable()` and `summary()` functions (you may need to install the `knitr` package if you haven't done so previously to use the `kable()` function) and finally render this document to pdf by changing the YAML option to `output: pdf_document`.
\
```{r mod1, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/mod1.png")
```
\
Now save these changes and then click the `knit` button to render to pdf. A new pdf file named `first_doc.pdf` will be created which you can view by clicking on the file name in the `Files` tab in RStudio.
Notice that these two files have been added to the `Git` tab in RStudio. The status icons indicate that the `first_doc.Rmd` file has been modified (capital M) and the `first_doc.pdf` file is currently untracked (question mark).
\
```{r mod2, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/mod2.png")
```
\
To stage these files tick the 'Staged' box for each file and click on the 'Commit' button to take you to the 'Review Changes' window
\
```{r mod3, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/mod3.png")
```
\
Before you commit your changes notice the status of `first_doc.pdf` has changed from untracked to added (A). You can view the changes you have made to the `first_doc.Rmd` by clicking on the file name in the top left pane which will provide you with a useful summary of the changes in the bottom pane (technically called diffs). Lines that have been deleted are highlighted in red and lines that have been added are highlighted in green (note that from Git's point of view, a modification to a line is actually two operations: the removal of the original line followed by the creation of a new line). Once you're happy, commit these changes by writing a suitable commit message and click on the 'Commit' button.
\
```{r mod4, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/mod4.png")
```
\
To push the changes to GitHub, click on the 'Pull' button first (remember this is good practice even though you are only collaborating with yourself at the moment) and then click on the 'Push' button. Go to your online GitHub repository and you will see your new commits, including the `first_doc.pdf` file you created when you rendered your R markdown document.
\
```{r mod5, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/mod5.png")
```
\
To view the changes in `first_doc.Rmd` click on the file name for this file.
\
```{r mod6, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/mod6.png")
```
\
### Commit history
\
One of the great things about Git and GitHub is that you can view the history of all the commits you have made along with the associated commit messages. You can do this locally using RStudio (or the Git command line) or if you have pushed your commits to GitHub you can check them out on the GitHub website.
To view your commit history in RStudio click on the 'History' button (the one that looks like a clock) in the Git pane to bring up the history view in the 'Review Changes' window. You can also click on the 'Commit' or 'Diff' buttons which takes you to the same window (you just need to additionally click on the 'History' button in the 'Review Changes' window).
\
```{r his1, echo=FALSE, out.width="60%", fig.align="center"}
knitr::include_graphics(path = "images/c_history.png")
```
\
The history window is split into two parts. The top pane lists every commit you have made in this repository (with associated commit messages) starting with the most recent one at the top and oldest at the bottom. You can click on each of these commits and the bottom pane shows you the changes you have made along with a summary of the **Date** the commit was made, **Author** of the commit and the commit message (**Subject**). There is also a unique identifier for the commit (**SHA** - Secure Hash Algorithm) and a **Parent** SHA which identifies the previous commit. These SHA identifiers are really important as you can use them to view and revert to previous versions of files (details [below](#undo)). You can also view the contents of each file by clicking on the 'View file @ SHA key` link (in our case 'View file @ 2b4693d1').
\
```{r his2, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/c_history2.png")
```
\
You can also view your commit history on GitHub website but this will be limited to only those commits you have already pushed to GitHub. To view the commit history navigate to the repository and click on the 'commits' link (in our case the link will be labelled '3 commits' as we have made 3 commits).
\
```{r his3, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/c_history3.png")
```
\
You will see a list of all the commits you have made, along with commit messages, date of commit and the SHA identifier (these are the same SHA identifiers you saw in the RStudio history). You can even browse the repository at a particular point in time by clicking on the `<>` link. To view the changes in files associated with the commit simply click on the relevant commit link in the list.
\
```{r his4, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/c_history4.png")
```
\
Which will display changes using the usual format of green for additions and red for deletions.
\
```{r his5, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/c_history5.png")
```
\
### Reverting changes {#undo}
\
One the great things about using Git is that you are able to revert to previous versions of files if you've made a mistake, broke something or just prefer and earlier approach. How you do this will depend on whether the changes you want to discard have been staged, committed or pushed to GitHub. We'll go through some common scenarios below mostly using RStudio but occasionally we will need to resort to using the Terminal (still in RStudio though).
\
#### Changes saved but not staged, committed or pushed
If you have saved changes to your file(s) but not staged, committed or pushed these files to GitHub you can right click on the offending file in the Git pane and select 'Revert ...'. This will roll back all of the changes you have made to the same state as your last commit. Just be aware that you cannot undo this operation so use with caution.
\
```{r rev1, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/revert1.png")
```
\
You can also undo changes to just part of a file by opening up the 'Diff' window (click on the 'Diff' button in the Git pane). Select the line you wish to discard by double clicking on the line and then click on the 'Discard line' button. In a similar fashion you can discard chunks of code by clicking on the 'Discard chunk' button.
\
```{r rev2, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/revert2.png")
```
\
#### Staged but not committed and not pushed
If you have staged your files, but not committed them then simply unstage them by clicking on the 'Staged' check box in the Git pane (or in the 'Review Changes' window) to remove the tick. You can then revert all or parts of the file as described in the section above.
\
#### Staged and committed but not pushed
If you have made a mistake or have forgotten to include a file in your last commit which you have not yet pushed to GitHub, you can just fix your mistake, save your changes, and then amend your previous commit. You can do this by staging your file and then tick the 'Amend previous commit` box in the 'Review Changes' window before committing.
\
```{r rev3, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/revert3.png")
```
\
If we check out our commit history you can see that our latest commit contains both changes to the file rather than having two separate commits. I use the amend commit approach alot but it's important to understand that you should **not** do this if you have already pushed your last commit to GitHub as you are effectively rewriting history and all sorts bad things may happen!
\
```{r rev4, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/revert4.png")
```
\
If you spot a mistake that has happened multiple commits back or you just want to revert to a previous version of a document you have a number of options.
**Option 1** - (probably the easiest but very unGit - but like, whatever!) is to look in your commit history in RStudio, find the commit that you would like to go back to and click on the 'View file @ ' button to show the file contents.
\
```{r rev5, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/revert5.png")
```
\
You can then copy the contents of the file to the clipboard and paste it into your current file to replace your duff code or text. Alternatively, you can click on the 'Save As' button and save the file with a different file name. Once you have saved your new file you can delete your current unwanted file and then carry on working on your new file. Don't forget to stage and commit this new file.
\
```{r rev6, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/revert6.png")
```
\
**Option 2** - (Git like) Go to your Git history, find the commit you would like to roll back to and write down (or copy) its SHA identifier.
\
```{r rev7, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/revert7.png")
```
\
Now go to the Terminal in RStudio and type `git checkout <SHA> <filename>`. In our case the SHA key is `2b4693d1` and the filename is `first_doc.Rmd` so our command would look like this:
\
```
git checkout 2b4693d1 first_doc.Rmd
```
\
The command above will copy the selected file version from the past and place it into the present. RStudio may ask you whether you want to reload the file as it now changed - select yes. You will also need to stage and commit the file as usual.
If you want to revert all your files to the same state as a previous commit rather than just one file you can use (the single 'dot' `.` is important otherwise your HEAD will detach!):
\
```
git rm -r .
git checkout 2b4693d1 .
```
\
Note that this will delete all files that you have created since you made this commit so be careful!
\
#### Staged, committed and pushed
If you have already pushed your commits to GitHub you can use the `git checkout` strategy described above and then commit and push to update GitHub (although this is not really considered 'best' practice). Another approach would be to use `git revert` (Note: as far as I can tell `git revert` is not the same as the 'Revert' option in RStudio). The `revert` command in Git essentially creates a new commit based on a previous commit and therefore preserves all of your commit history. To rollback to a previous state (commit) you first need to identify the SHA for the commit you wish to go back to (as we did above) and then use the `revert` command in the Terminal. Let's say we want to revert back to our 'First commit' which has a SHA identifier `d27e79f1`.
\
```{r rev8, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/revert8.png")
```
\
We can use the `revert` command as shown below in the Terminal. The `--no-commit` option is used to prevent us from having to deal with each intermediate commit.
\
```
git revert --no-commit d27e79f1..HEAD
```
\
Your `first_doc.Rmd` file will now revert back to the same state as it was when you did your 'First commit'. Notice also that the `first_doc.pdf` file has been deleted as this wasn't present when we made our first commit. You can now stage and commit these files with a new commit message and finally push them to GitHub. Notice that if we look at our commit history all of the commits we have made are still present.
\
```{r rev9, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/revert9.png")
```
\
and our repo on GitHub also reflects these changes
\
```{r rev10, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/revert10.png")
```
\
### Collaborate with Git {#collab}
\
GitHub is a great tool for collaboration, it can seem scary and complicated at first, but it is worth investing some time to learn how it works. What makes GitHub so good for collaboration is that it is a *distributed system*, which means that every collaborator works on their own copy of the project and changes are then merged together in the remote repository. There are two main ways you can set up a collaborative project on GitHub. One is the workflow we went through above, where everybody connects their local repository to the same remote one; this system works well with small projects where different people mainly work on different aspects of the project but can quickly become unwieldy if many people are collaborating and are working on the same files (merge misery!). The second approach consists of every collaborator creating a copy (or **fork**) of the main repository, which becomes their remote repository. Every collaborator then needs to send a request (a **pull request**) to the owner of the main repository to incorporate any changes into the main repository and this includes a review process before the changes are integrated. More detail of these topics can be found in the [Further resources](#resources) section.
\
### Git tips
\
Generally speaking you should commit often (including amended commits) but push much less often. This makes collaboration easier and also makes the process of reverting to previous versions of documents much more straight forward. I generally only push changes to GitHub when I'm happy for my collaborators (or the rest of the world) to see my work. However, this is entirely up to you and depends on the project (and who you are working with) and what your priorities are when using Git.
\
If you don't want to track a file in your repository (maybe they are too large or transient files) you can get Git to ignore the file by right clicking on the filename in the Git pane and selecting 'Ignore...'
\
```{r tip1, echo=FALSE, out.width="80%", fig.align="center"}
knitr::include_graphics(path = "images/tip1.png")
```
\
This will add the filename to the `.gitignore` file. If you want to ignore multiple files or a particular type of file you can also include wildcards in the `.gitignore` file. For example to ignore all `png` files you can include the expression `*.png` in your `.gitignore` file and save.
\
If it all goes pear shaped and you end up completely trashing your Git repository don't despair (we've all been there!). As long as your GitHub repository is good, all you need to do is delete the offending project directory on your computer, create a new RStudio project and link this with your remote GitHub repository using [Option 2](#opt2). Once you have cloned the remote repository you should be good to go.
\
## Further resources {#resources}
\
There are many good online guides to learn more about git and GitHub and as with any open source software there is a huge community that can be a great resource:
- The British Ecological Society guide to [Reproducible Code](https://www.britishecologicalsociety.org/wp-content/uploads/2019/06/BES-Guide-Reproducible-Code-2019.pdf)
- The [GitHub guides](https://guides.github.com/)
- The Mozilla Science Lab [GitHub for Collaboration on Open Projects guide](https://mozillascience.github.io/working-open-workshop/github_for_collaboration/)
- Jenny Bryan's [Happy Git and GitHub](https://happygitwithr.com/)
- If you have done something terribly wrong and don't know how to fix it try [Oh Shit, Git](https://ohshitgit.com/) or if you're easily offended [Dangit, Git](https://dangitgit.com/)
These are only a couple of examples, all you need to do is Google "version control with git and GitHub" to see how huge the community around these open source projects is and how many free resources are available for you to become a version control expert.