GitHub Repository for STAA 577
RStudio lab notebooks, full R code, cheat sheets, resources, and ad hoc notes from “Applied Machine Learning” course Spring 2019.
We have decided to place the course materials in a GitHub
repository:
- to familiarize you with this widly used collaborative coding tool
- so that you will have access to them beyond your tenure at CSU when
you venture into the official job market. Jenny
Bryan and Jim
Hester summarize the benefits of
GitHub
in this fantastic reference here:
If you ever plan to use verion control with GitHub
I strongly
recommend reading it in detail.
- Intro
Labs
- Lab 00: Basic Exploring
- Lab 01: Subsetting (data frames)
- Lab 02:
Data Wrangling with
dplyr
and thetidyverse
- Lab 03: Skipped to synchronize course and textbook ISLR
- Lab 04:
Classification
- The
S&P
Stock Market Data Set - Logistic Regression
- Discriminant Analysis
- KNN: K-Nearest Neighbors
- The
- Lab 05:
Cross Validation
- The
Auto
Data Set - Cross Validation (by hand)
- LOOCV (leave-one-out)
- K-fold CV
- The Bootstrap
- The
- Lab 06:
Subset Selection
- The
Hitters
Data Set - Subset Selection
- Shrinkage Methods: Ridge Regression
- Shrinkage Methods: The Lasso
- The
- Lab 07:
Beyond Linearity
- The
Wage
Data Set - Polynomial Regression
- Polynomial Logistic Regression
- Spline Regression
- General Additive Models
- The
- Lab 08:
Tree-based Methods
- The
Carseats
Data Set - Classification Trees
- Regression Trees
- Bagging
- Random Forest
- Boosting
- Appendices
- Resources
- The
- Lab 09:
Support Vector Machines
- Create training data
- Support Vector Classifier
- Support Vector Machine
- ROC curves
- Lab 10:
Unsupervised Learning
- Principal Component Analysis (PCA)
- K-means Clustering
- Heirarchial Clustering
- nyflights13
- new york city airport flight data from 2013 (must install)
- install with
install.packages("nyflights13", repos="http://cran.rstudio.com")
- iris
- classic iris flower data set from Fisher (comes with R installed)
- mtcars
- mtcars: USA motor trend cannonical data set (comes with R installed)
- Fairly useful tool to preview HTML docs without having to clone the repository
- Right-click the *.html file, copy the link, then go here, paste the GitHub specific HTML link
- Always use a vectorized solution over iteration when possible, otherwise … go to #2.
- Use a functional. Since R is a functional language and for
readability, usually of the
apply()
family, or a loop-wrapper function, unless …- modifying in place: if you are modifying or transforming certain subsets (columns) of a data frame.
- recursive problems: whenever an iteration depends on the previous iteration, a loop is better suited because a functional does not have access to variables outside the present lexical scope.
- while loops: in problems where it is unknown how many iterations will be performed, while-loops are well suited and preferred over a functional.
- If you must use a loop, ensure the following:
- Initialize new objects: prior to the loop, allocate the necessary space ahead of time. Do NOT “grow” a vector on-the-fly within a loop (this is terribly slow).
- Optimize operations: do NOT perform operations inside the loop that could be done either up front of applied in a vectorized fashion following the loop. Enter the loop, do the bare minimum, then get out.
- Advanced R
- R Packages
- R for Data Science Book
- Twitter: @hadleywickham
- GitHub: https://github.com/hadley
- Happy Git with R
- Website: Jenny Bryan
- Twitter: @JennyBryan
- GitHub: https://github.com/jennybc
- Applied Predictive Modeling Book
- Twitter: @topepo
- GitHub: https://github.com/topepo
The rsample package is smarter than you might think.
Information about the:
Created on 2019-01-27 by Rmarkdown (v1.11) and R version 3.5.2 (2018-12-20).