This class covers the basics of data-driven urban research. I aquired computational skills, basic knowledge of statistical analysis, error analysis, good practises for handling data and big-data, and communication and visualization skills. I learned how to formulate a question relevant to Urban Science, how to find an appropriate data to answer the question, prepare and analyze the data, get an answer, to whichever confidence level, and communicate my answer, and my confidence level in the answer.
- Research reproducibility: Git, virtual environment, virtual machine, version control, hypothesis formulation
- Data ETL: Pandas, Geopandas, SQL, API
- Statistical tests: Anderson-Darling test (AD), Kullback–Leibler divergence (KL), Chi-square, Kolmogorov–Smirnov test (KS)
- Clustering: PCA, Kmeans, Gaussian Mixture
- Time Series: Fourier Transformation
- Liner modeling: OLS, WLS, GLS
- Key data set:
- Setting up virtual environment and formulating null hypothesis link
- Extracting data from MTA API link
- Proving central limit theorem with visualization and data exploration with citi-bike data link
- Replication study for Effectivness of the NYC Post-Prison Employment Program, formulating null hypothesis and conduct statistical tests. link
- Running KS/AD/KL/Chi-square_ tests on sample data, creating OLS and WLS models link
- Visualizing NYC LL84 dataset and compared linear model vs polynomial model link
- Using CartoDB and SQL queries for data ETL link
- Visualization practice with NYC HIV demographics data link
- Reviewing visualization, using Geopandas to plot choropleth of broad band access percentage in NYC along with LinkNYC data, using the American Community Survey API and LinkNYC open data. link
- (time)-series techniques: smoothing, detrending, stationary, non-stationary, homeo- & hetero-scedastic noise, vectorization. Also conducted user behavior clustering using PCA feature selection and Kmeanslink
- Clustering zipcodes in NYC using business activity time series data from the Census Bureau API, conducted data whitening, then Kmeans clustering and Gaussian Mixture link
Special shout out to Federica Bianco for this amazing class.