Skip to content

Latest commit

 

History

History
104 lines (83 loc) · 5.79 KB

File metadata and controls

104 lines (83 loc) · 5.79 KB

The Hitchhiker's Guide to Data Science

I put all the notes of online and on-campus data science courses and learning activities here. For me, it's putting all the pieces togather and then easier to do some reflection and comparison. For others, I hope it'll be useful.

Machine Learning and Deep Learning

Course:

Machine Learning by Stanford Uni on Coursera (Currently on Week 7)

Notes:

Models
Topics and Techniques Covered
Supervised Learning Models
1. Linear Regression (Week 1&2):
Predicting housing prices.
  • One variable and multi-variable algorithms
  • Learning method: gradient descent vs. normal equation
  • Regularization
  • Solving non-invertibility and removing linearly dependency
2. Logistic Regression (Week 3): Classifying emails.
  • Binary classification and multi-class classification algorithms
  • Leaning method: gradient descent and advanced optimization algorithms (e.g. Conjugate Gradient, BFGS, L-BFGS)
3. Neural Networks (Week 4&5)
  • Non-linear regression and classification algorithms
  • Activation function, network architectures
  • Forward propagation and backpropagation
  • Unrolling parameters, gradient checking, random initialization
4. Support Vector Machines (SVMs) (Week7)
  • Linear classification and non-linear classification with kernels
  • Choice of parameters, choice of kernels/ similarity functions
  • Logistic regression vs. SVMs
5. Online Learning algorithm (Week 10)
  • Continuous stream of data
  • Stochastic gradient descent
Unsupervised Learning Models
6. K-means Clustering (Week 8)
  • Random initialization
  • Choosing the value of K
  • Non-separated clusters
7. Principle Component Analysis (PCA) (Week 8)
  • Application of data compression and data visualization
  • PCA vs. linear regression
  • Reconstruction data after PCA
8. Anomaly Detection Algorithm (Week 9)
  • Density estimation, multivariate normal distribution
  • Choice of features
  • Anomaly detection vs. supervised learning
9. Recommender Systems (Week 9)
  • Content-based recommender algorithm and collaborative filtering algorithm
  • Vectorization, mean normalization
General Advice and Techniques (Week 1, 3, 6&10)
  • Model selection, learning rate
  • Diagnosing bias (underfitting) vs. variance (overfitting)
  • Error analysis and metrics
  • Feature Scaling and mean normalization, feature engineering
  • Large scale machine learning with big data: choice of batch gradient descent, stochastic gradient descent and mini-batch gradient descent
  • Data parallelism and Map Reduce

Courses list:

Machine Learning and Deep Learning

  • Machine Learning by Stanford Uni on Coursera (Currently on Week 9)
  • Neural Networks and Deep Learning by deeplearning.ai on Coursera (Currently on Week 2)
  • Machine Learning (intro and intermediate) on Kaggle
  • Deep Learning on Kaggle
  • Tensorflow on Udacity
  • Data Science & Machine Learning using Python - A Bootcamp on Udemy
  • Google Machine Learning Crash Course

Languages and libraries

SQL

  • SQL(BigQuery) on Kaggle
  • Exploring and Preparing your Data with BigQuery by Google on Coursera
  • Querying Data with Transact-SQL by Microsoft on edX

Python

  • python on datacamp
  • python on Kaggle
  • Python 3 Programming by Uni Michigan on Coursera (until Week3)
  • Using Databases with Python by Uni Michigan on Coursera
  • Capstone: Retrieving, Processing, and Visualizing Data with Python by Uni Michigan on Coursera
  • Pandas on Kaggle
  • Data Visualization (pandas, seaborn, matplotlib, plotly, plotnine/ggplot2) on Kaggle
  • Python Visualization Dashboards with Plotly's Dash on Udemy

R

  • R on datacamp

Scala

  • Implementing Predictive Analytics with Spark in Azure HDInsight by Microsoft on edX

Spark

  • Implementing Predictive Analytics with Spark in Azure HDInsight by Microsoft on edX

Tensorflow

  • Google Machine Learning Crash Course
  • Tensorflow on Udacity

Visualization and BI Tools

  • Tableau by Duke Uni on Coursera
  • Data-driven Decision Making by PwC on Coursera (Excel)
  • Analyzing and Visualizing Data with Power BI by Microsoft on edX
  • Azure SQL Database for the SQL Server DBA on Pluralsight
  • SQL Server Fundamentals on Pluralsight

Cloud

GCP

  • Serverless Data Analysis with Google BigQuery and Cloud Dataflow by Google on Coursera
  • Leveraging Unstructured Data with Cloud Dataproc on Google Cloud Platform by Google on Coursera
  • Google Cloud Platform Big Data and Machine Learning Fundamentals by Google on Coursera

Azure

  • Implementing Predictive Analytics with Spark in Azure HDInsight by Microsoft on edX
  • Querying Data with Transact-SQL by Microsoft on edX
  • Azure SQL Database for the SQL Server DBA on Pluralsight

Big Data

  • Introduction to Big Data by UC San Diego on Coursera

Other technology

  • Inspiring and Motivating Individuals by Uni Michigan on Coursera
  • Blockchain by Uni Buffalo & The State Uni New York on Coursera (currently on week 3)