Skip to content

Latest commit

 

History

History
29 lines (24 loc) · 2.11 KB

README.md

File metadata and controls

29 lines (24 loc) · 2.11 KB

Data Manipulation with the Tidyverse

Tidyverse workshop for Northwestern Data Science and Programming Workshops Fall 2019

Instructor: Katie Evans

What is the Tidyverse?

  • Collection of packages for data manipulation, exploration, and visualization that share a common syntax
  • Intended to make data scientists more productive by guiding them through workflows
  • Allows for connections between tools

Topics to cover

  • dplyr: The dplyr package is the most useful package in R for data manipulation. One of the greatest advantages of the package is that you can use the pipe function (%>%) to combine different functions.
  • tidyr: The tidyr package complements dplyr perfectly. It boosts the power of dplyr for data manipulation and pre-processing.

Other Tidyverse packages to check out:

  • readr: The readr package is used to import and export data as tibbles in R.
  • stringr: The stringr package is used for strings. It provides a cohesive set of functions designed to make working with strings as easy as possible.
  • ggplot2: Data scientists universally love using ggplot2 to produce their charts and visualizations!
  • lubridate: The lubridate package is the best way to deal with dates and times in R! From converting strings to dates to calculating hours between two time points.
  • purrr: The purrr package in R provides a complete toolkit for enhancing R’s functional programming. We can use the functions provide by purrr to avoid many loops with just one line of code.
  • forecats: The forecats package is dedicated to dealing with categorical variables or factors.
  • broom: The broom package takes the messy output of built-in functions in R and turns them into tidy dataframes

Tidyverse Resources