Skip to content

tylerxiety/the-Hitchhiker-s-Guide-to-Data-Science

Repository files navigation

The Hitchhiker's Guide to Data Science

I put all the notes of online and on-campus data science courses and learning activities here. For me, it's putting all the pieces togather and then easier to do some reflection and comparison. For others, I hope it'll be useful.

Machine Learning and Deep Learning

Course:

Machine Learning by Stanford Uni on Coursera (Currently on Week 7)

Notes:

Models
Topics and Techniques Covered
Supervised Learning Models
1. Linear Regression (Week 1&2):
Predicting housing prices.
  • One variable and multi-variable algorithms
  • Learning method: gradient descent vs. normal equation
  • Regularization
  • Solving non-invertibility and removing linearly dependency
2. Logistic Regression (Week 3): Classifying emails.
  • Binary classification and multi-class classification algorithms
  • Leaning method: gradient descent and advanced optimization algorithms (e.g. Conjugate Gradient, BFGS, L-BFGS)
3. Neural Networks (Week 4&5)
  • Non-linear regression and classification algorithms
  • Activation function, network architectures
  • Forward propagation and backpropagation
  • Unrolling parameters, gradient checking, random initialization
4. Support Vector Machines (SVMs) (Week7)
  • Linear classification and non-linear classification with kernels
  • Choice of parameters, choice of kernels/ similarity functions
  • Logistic regression vs. SVMs
5. Online Learning algorithm (Week 10)
  • Continuous stream of data
  • Stochastic gradient descent
Unsupervised Learning Models
6. K-means Clustering (Week 8)
  • Random initialization
  • Choosing the value of K
  • Non-separated clusters
7. Principle Component Analysis (PCA) (Week 8)
  • Application of data compression and data visualization
  • PCA vs. linear regression
  • Reconstruction data after PCA
8. Anomaly Detection Algorithm (Week 9)
  • Density estimation, multivariate normal distribution
  • Choice of features
  • Anomaly detection vs. supervised learning
9. Recommender Systems (Week 9)
  • Content-based recommender algorithm and collaborative filtering algorithm
  • Vectorization, mean normalization
General Advice and Techniques (Week 1, 3, 6&10)
  • Model selection, learning rate
  • Diagnosing bias (underfitting) vs. variance (overfitting)
  • Error analysis and metrics
  • Feature Scaling and mean normalization, feature engineering
  • Large scale machine learning with big data: choice of batch gradient descent, stochastic gradient descent and mini-batch gradient descent
  • Data parallelism and Map Reduce

Courses list:

Machine Learning and Deep Learning

  • Machine Learning by Stanford Uni on Coursera (Currently on Week 9)
  • Neural Networks and Deep Learning by deeplearning.ai on Coursera (Currently on Week 2)
  • Machine Learning (intro and intermediate) on Kaggle
  • Deep Learning on Kaggle
  • Tensorflow on Udacity
  • Data Science & Machine Learning using Python - A Bootcamp on Udemy
  • Google Machine Learning Crash Course

Languages and libraries

SQL

  • SQL(BigQuery) on Kaggle
  • Exploring and Preparing your Data with BigQuery by Google on Coursera
  • Querying Data with Transact-SQL by Microsoft on edX

Python

  • python on datacamp
  • python on Kaggle
  • Python 3 Programming by Uni Michigan on Coursera (until Week3)
  • Using Databases with Python by Uni Michigan on Coursera
  • Capstone: Retrieving, Processing, and Visualizing Data with Python by Uni Michigan on Coursera
  • Pandas on Kaggle
  • Data Visualization (pandas, seaborn, matplotlib, plotly, plotnine/ggplot2) on Kaggle
  • Python Visualization Dashboards with Plotly's Dash on Udemy

R

  • R on datacamp

Scala

  • Implementing Predictive Analytics with Spark in Azure HDInsight by Microsoft on edX

Spark

  • Implementing Predictive Analytics with Spark in Azure HDInsight by Microsoft on edX

Tensorflow

  • Google Machine Learning Crash Course
  • Tensorflow on Udacity

Visualization and BI Tools

  • Tableau by Duke Uni on Coursera
  • Data-driven Decision Making by PwC on Coursera (Excel)
  • Analyzing and Visualizing Data with Power BI by Microsoft on edX
  • Azure SQL Database for the SQL Server DBA on Pluralsight
  • SQL Server Fundamentals on Pluralsight

Cloud

GCP

  • Serverless Data Analysis with Google BigQuery and Cloud Dataflow by Google on Coursera
  • Leveraging Unstructured Data with Cloud Dataproc on Google Cloud Platform by Google on Coursera
  • Google Cloud Platform Big Data and Machine Learning Fundamentals by Google on Coursera

Azure

  • Implementing Predictive Analytics with Spark in Azure HDInsight by Microsoft on edX
  • Querying Data with Transact-SQL by Microsoft on edX
  • Azure SQL Database for the SQL Server DBA on Pluralsight

Big Data

  • Introduction to Big Data by UC San Diego on Coursera

Other technology

  • Inspiring and Motivating Individuals by Uni Michigan on Coursera
  • Blockchain by Uni Buffalo & The State Uni New York on Coursera (currently on week 3)

About

Notes and reflections of learning data science

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages