-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathR_et_co.Rmd
204 lines (156 loc) · 6.68 KB
/
R_et_co.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
---
title: "R and related tools for research and teaching"
author: "Jouni Helske & Satu Helske"
date: "4 December 2015"
output: ioslides_presentation
---
<!-- -->
## Working more efficiently with R
![kuva1][kuva1]
[kuva1]: kuva1.png
<!-- These cover a wide range of academic work from data manipulation and analysis to software development, reporting and teaching. We glance at topics such as reproducibility, version control, packaging, and testing. -->
## R
<!-- - More and more popular (also outside academia) -->
- More than 2 million users ([Oracle estimate](http://www.oracle.com/us/corporate/press/1515738) in 2012)
- 6th most popular programming language by [IEEE Spectrum](http://spectrum.ieee.org/computing/software/the-2015-top-ten-programming-languages) (in 2015)
- 3rd in number of scholarly articles in Google Scholar (in 2014; [blog post](http://r4stats.com/articles/popularity/) by R.A.Muenchen)
- UseR! 2015 conference: 660 participants, 284 from industry
- R activity around the world: http://rapporter.net/custom/R-activity
- https://www.r-project.org/
<!-- Gergely Daróczi; Switzerland 1st, Finland 18th -->
## Rstudio
- Open source graphical interface for R and related tools
- Code completion, syntax highlighting, code diagnostics
- Autosave, command history, plotting history, environment browser, integrated searchable help
- R projects for easy return to and switching between jobs
- https://www.rstudio.com/
## R Markdown
- Simple formatting syntax for authoring HTML and PDF documents and slides
- Based on Markdown language, `knitr`, and pandoc
- Fully reproducible documents
- Closely integrated to Rstudio (New File -> R Markdown)
- http://rmarkdown.rstudio.com/
<!-- This is an R Markdown presentation. -->
![Rmarkdown][Rmarkdown]
[Rmarkdown]: Rmarkdown.png
## Rmd features
- Equations like in LaTex:
\$y_i = \\beta x_i\$ produces $y_i = \beta x_i$
- Embedding R code in the document:
```{r}
summary(cars)
```
## Rmd figures
```{r}
plot(cars)
```
## Rmd tables
```
Type | Freq | %
---------| ------|-----
Apples | 7 | 44
Oranges | 9 | 56
```
Type | Freq | %
---------| ------|-----
Apples | 7 | 44
Oranges | 9 | 56
## R package creation
- Sharing work with others
+ Co-workers, CRAN etc.
- Personal projects
+ Loading functions, data and other packages at once
- Easier integration with C/C++/Fortran codes with R
- Getting started with `package.skeleton`
- In Rstudio: New Project -> New Directory -> R package
## Git
- Version control system originally for software development
- Useful for writing research articles, theses, R packages, ...
- Basics are easy and sufficient for small projects
- Embedded in Rstudio (Tools -> Version Control)
- http://rogerdudler.github.io/git-guide/
## Github
- Web-based Git repository hosting service
- Interacting with other developers and users
- Bug tracking, feature requests, task management, ... <!--assign tasks -->
- Free for public repositories
+ Free private repositories e.g. in [Bitbucket](https://bitbucket.org/)
- https://github.com/helske
## Useful packages for graphics
- `ggplot2`
+ Plotting system based on the grammar of graphics
+ Easy to produce complex multi-layered graphics
- `ggvis`
+ Similar to `ggplot2`, faster but more restricted <!-- newer, more features coming -->
+ Interactive graphics in RStudio or a browser
- `grid`, `gridBase`
+ Control and flexibility in appearance and arrangement
+ Basis for developing high-level functions <!-- (e.g., lattice and ggplot2 packages -->
<!-- grid cannot be used with base graphics, gridbase available for mixing grid and base graphics -->
## Useful packages for data manipulation
- `magrittr`
+ Piping via `%>%` operator
+ Improves readability and maintainability of code
- `dplyr` and `data.table`
+ Enhanced versions of `data.frame`
+ Fast and memory-efficient
+ More flexible data manipulation
+ Working with remote databases, automatic translation to SQL (`dplyr`)
## Useful packages for reporting
- `knitr`
+ Dynamic report generation (PDF, html, Word, ...)
+ Easy to re-compile and update outputs <!-- e.g. PDFs -->
- `shiny`
+ Building interactive web applications from R
+ Web development skills not required
## Useful packages for packaging
- `testthat`
+ Testing framework for R
+ Catching errors, warnings, messages, ...
<!-- store tests so they can be re-run automatically when code gets changed -->
<!-- Easy to learn and use -->
- `devtools`
+ Simplifying tasks in package development
- `roxygen2`
+ Writing documentation for functions etc.
- `Rcpp`
+ Simple C++ integration in R
+ Writing C++ separately, Rcpp handles the ugly stuff
<!-- - `inline`
+ Dynamically defining R functions with inlined C/C++/Fortran code -->
## Learning
- Read
+ [RStudio Online learning](https://www.rstudio.com/resources/training/online-learning/)
+ [Quick-R](http://www.statmethods.net/)
+ [OpenIntro](https://www.openintro.org/)
- Hands-on learning online
+ [DataCamp](https://www.datacamp.com/)
* R tutorials and data science courses in browser
<!-- * Checks code on-the-fly, gives useful feedback -->
<!-- * Registration needed, many courses are free One free month was enough for me -->
+ [Codeschool](https://www.codeschool.com/)
* Learn R, Git, SQL, ...
<!-- * Basics are free, some accessible without registration -->
- Hands-on in R
+ [`swirl` package](http://swirlstats.com/)
## Teaching I
- [OpenIntro](https://www.openintro.org/)
+ Free material for teachers to use and modify
+ Textbooks, slides, excercises, ...
+ Statistics, R, SAS, ...
- [DataCamp](https://www.datacamp.com/)
+ Resources for building own courses
- [`testwhat` package](https://github.com/Data-Camp/testwhat)
+ Wrapper around `testthat` for checking exercises
## Teaching II
- [Shiny apps](http://statistics.calpoly.edu/shiny) by Cal Poly State University
+ Correlation and regression game, sampling distribution demonstration, longest run of heads or tails, ...
- [Using Github](https://github.com/rundel/Presentations/blob/master/UseR2015/user2015.pdf) (experiences by Colin Rundel)
+ Assignments turned in via Github
+ Students learn version control
+ Simpler course administration
+ Keeping track of contributions
<!-- He talks about automating checks -> could use checks for CRAN -->
## Getting help
- [Stack overflow](http://stackoverflow.com/)
- [Google](http://google.com/)...