#Introduction Data Analysis
- the buzzword, then meaningless
- very easy to find real-world motivation
##Fields
- statistical analysis
- exploratory / descriptive
- vs. hypothesis testing
- what is R
- what is S
- what is SAS
- pandas
- incorporates 90% of the good stuff from R
- keep python syntax
#Data Structures
- what is a dataframe
- like a 2d "matrix"
- index
- columns
- associated functions/methods
- what is the right way to represent 2d data
- programmers trick questions "how do you reprsent excel?"
- what is the right way to repersent higher dimensional data
- what is a series? time-series?
- what is a csv?
- what is a flatfile?
- how is is different from a database?
- what is a vectorized operation?
#exploratory sitelinks example
- loading a csv, or html
- html can be cool
- subsetting
- by index, by rows,
- by value conditions
- by index, by rows,
- column operations
- percentage example [celebrity example] http://nbviewer.ipython.org/github/notconfusing/WIGI/blob/master/Country%20Inspector%20Analysis%20Generator.ipynb
- pivot
- groupby
- perc_dict advantage
- no for loops?
#hypothesis
- crosstab
- logistics regression
#plotting
- what is plotting?
- what is a grammar of graphics?
#homework
- find a csv, or html table, subset it, get some descriptive stastics, and plot it