Skip to content

Latest commit

 

History

History
85 lines (67 loc) · 2.95 KB

OUTLINE.md

File metadata and controls

85 lines (67 loc) · 2.95 KB

This document contains the proposed outline for OpenIntro - Introductory Statistics with Randomization and Simulation, Second Edition.

Chp 1. Introduction to data

  • Case Study (capable of extending to MLR or 2 by 2 table)
  • Taxonomy of Data
  • Overview of data collection principles
  • Observational studies and sampling strategies
  • Experimental design and causality
  • Revisit case study with new terminology we learned

Chp 2. Exploratory Data Analysis - Visual summaries and summary statistics

  • Cat vs. cat - segmented plots / contingency tables
    • Conditional probability from contingency tables
    • Bayes Theorem (law of total probability?)
  • Num vs. cat - side-by-side box plots / comparing distributions
    • Mention univariate - center, skew, shape, spread
    • Mention conditional probabilities as well

Chp 3. Correlation and Regression (descriptive)

  • Visual summaries of data: scatterplot, side-by-side box plots, histogram, density plot, box plot (lead out with multivariate, follow with univariate)
  • Describing distributions: correlation, central tendency, variability, skew, modality
  • Num vs. num - SLR
    • correlation
    • Line fitting, residuals, and correlation
    • Fitting a line by least squares regression
    • Types of outliers in linear regression

Chp 4. Multiple Regression (descriptive)

  • Num vs. whatever - MLR
    • Introduction to multiple regression
  • Parallel slopes
  • Hint at interaction, planes, and parallel planes but not quantify
    • Visualization of higher-dimensional models (rgl demo)
  • Logistic regression
    • Binary vs. num/whatever
    • Three scales interpretation (e.g. probability, odds, log-odds)
    • “parallel” logistic curves?

Chp 5. Foundations of Inference

  • Understanding inference through simulation
  • Randomization case study: gender discrimination
  • Randomization case study: opportunity cost
  • Hypothesis testing
  • Confidence intervals
  • Simulation case studies

Chp 6. Inference for categorical data

  • Inference for a single proportion
    • Simulation
    • Exact (if we include course on probability)
    • CLT and Normal approximation
  • Difference of two proportions
  • Testing for goodness of fit using chi-square (special topic, include simulation version)
  • Testing for independence in two-way tables (special topic)

Chp 7. Inference for numerical data

  • One-sample means
    • Bootstrap (for means, medians)
    • t-distribution
  • Paired data
  • Difference of two means
  • Comparing many means with ANOVA (special topic, include simulation version)

Chp 8. Inference for regression

  • Inference for linear regression
    • Bootstrap for regression coefficients
    • t-distribution for regression coefficients
    • Model Comparison: Occam’s Razor and R^2 > R^2_adj
  • Checking model assumptions using graphs
    • L-I-N-E
  • Inference for multiple regression
    • residuals vs. fitted instead of residuals vs. x
  • Inference for logistic regression

Appendix: Probability

(Keep same content as before, minus the bit of probability that got moved to categorical EDA)