forked from tidyverse/tidyr
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
82 lines (54 loc) · 2.99 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# tidyr <img src="man/figures/logo.png" align="right" />
[![Build Status](https://travis-ci.org/tidyverse/tidyr.svg?branch=master)](https://travis-ci.org/tidyverse/tidyr)
[![codecov.io](http://codecov.io/github/tidyverse/tidyr/coverage.svg?branch=master)](http://codecov.io/github/tidyverse/tidyr?branch=master)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/tidyr)](https://cran.r-project.org/package=tidyr)
## Overview
The goal of tidyr is to help you create __tidy data__. Tidy data is data where:
1. Each variable is in a column.
1. Each observation is a row.
1. Each value is a cell.
Tidy data describes a standard way of storing data that is used wherever possible throughout the [tidyverse](http://tidyverse.org). If you ensure that your data is tidy, you'll spend less timing fighting with the tools and more time working on your analysis.
## Installation
```{r, eval = FALSE}
# The easiest way to get tidyr is to install the whole tidyverse:
install.packages("tidyverse")
# Alternatively, install just tidyr:
install.packages("tidyr")
# Or the the development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/tidyr")
```
## Getting started
```{r}
library(tidyr)
```
There are two fundamental verbs of data tidying:
* `gather()` takes multiple columns, and gathers them into key-value pairs: it
makes "wide" data longer.
* `spread()`. takes two columns (key & value) and spreads in to multiple
columns, it makes "long" data wider.
tidyr also provides `separate()` and `extract()` functions which makes it easier to pull apart a column that represents multiple variables. The complement to `separate()` is `unite()`.
To get started, read the tidy data vignette (`vignette("tidy-data")`) and check out the demos, `demo(package = "tidyr")`).
## Related work
tidyr replaces reshape2 (2010-2014) and reshape (2005-2010). Somewhat counterintuitively each iteration of the package has done less. tidyr is designed specifically for tidying data, not general reshaping (reshape2), or the general aggregation (reshape).
If you'd like to read more about data reshaping from a CS perspective, I'd recommend the following three papers:
* [Wrangler: Interactive visual specification of data transformation scripts](http://vis.stanford.edu/papers/wrangler)
* [An interactive framework for data cleaning](https://www.eecs.berkeley.edu/Pubs/TechRpts/2000/CSD-00-1110.pdf) (Potter's wheel)
* [On efficiently implementing SchemaSQL on a SQL database system](http://www.vldb.org/conf/1999/P45.pdf)
To guide your reading, here's translation between the terminology used in different places:
| tidyr | gather | spread |
|--------------|---------|--------|
| reshape(2) | melt | cast |
| spreadsheets | unpivot | pivot |
| databases | fold | unfold |