Lecturer: Hossein Hajiabolhassan Webpage: Applied Machine Learning 2020 Data Science Center Shahid Beheshti University |
|||
---|---|---|---|
Teaching Assistants: | |||
Yavar Taheri Yeganeh | Erfaan Rostami Amraei | Mostafa Khodayari | Esmail Mafakheri |
-
- Lecture 1: Toolkit Lab (Part 1)
- Lecture 2: Introduction
- Lecture 3: Empirical Risk Minimization
- Lecture 4: PAC Learning
- Lecture 5: The Bias-Complexity Tradeoff
- Lecture 6: Learning via Uniform Convergence
- Lecture 7: The VC-Dimension
- Lecture 8: Toolkit Lab (Part 2)
- Lecture 9: Linear Predictors
- Lecture 10: Decision Trees
- Lecture 11: Nearest Neighbor
- Lecture 12: Ensemble Methods
- Lecture 13: Model Selection and Validation
- Lecture 14: Neural Networks
- Lecture 15: Convex Learning Problems
- Lecture 16: Regularization and Stability
- Lecture 17: Support Vector Machines
- Lecture 18: Multiclass Classification
-
Miscellaneous
Machine learning is an area of artificial intelligence that provides systems the ability to
automatically learn. Machine learning allows machines to handle new situations via analysis,
self-training, observation and experience. The wonderful success of machine learning has made
it the default method of choice for artificial intelligence experts. In this course, we review
the fundamentals and algorithms of machine learning.
Main TextBooks:
- Understanding Machine Learning: From Theory to Algorithms, by Shai Shalev-Shwartz and Shai Ben-David
- An Introduction to Statistical Learning: with Applications in R by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
- Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd Edition) by Aurelien Geron
Additional TextBooks:
- Pattern Recognition and Machine Learning by Christopher Bishop
- Hands-On Machine Learning with R by Bradley Boehmke and Brandon Greenwell
Recommended Slides & Papers:
Required Reading:
Anaconda, Jupyter Lab, Markdown, Git, GitHub, and Google Colab:
- Blog: Managing Environments
- Blog: Kernels for Different Environments
- Slide: Practical Data Science: Jupyter NoteBook Lab by Zico Kolter
- Awesome JupyterLab by Hai Nguyen Mau
- Blog: Learn Markdown Online
- Slide: An Introduction to Git by Politecnico di Torino
- Blog: Google Colab Free GPU Tutorial by Fuat
Teaching Assitant Class: | ||
---|---|---|
Python continues to take leading positions in solving data science tasks and challenges. Here are three of the most important of libraries. |
||
Numpy is the fundamental package for scientific computing with Python. |
Pandas is an easy-to-use data structures and data analysis tools |
Matplotlib is a Python 2D plotting library |
Resources: | ||
Scipy Lecture Notes | Data Science iPython Notebooks |
- Homeworks: Python Libraries for Data Science
Suggested Reading:
- Tools in Data Science
- 28 Jupyter Notebook Tips, Tricks, and Shortcuts by Josh Devlin
- Cheat Sheet: Markdown Syntax
- Git Cheat Sheet
- R Tutorial for Beginners: Learning R Programming
Additional Resources:
- PDF: Conda Cheat Sheet
- Blog: Conda Commands (Create Virtual Environments for Python with Conda) by LipingY
- Blog: Colab Tricks by Rohit Midha
- Paper: Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence by Sebastian Raschka, Joshua Patterson, and Corey Nolet
- The following table was adopted from Applied Machine Learning and Deep Learning created by Cuixian Chen
Python Overview [Word] Python Tutorial [PDF] [Code] |
Numpy [PDF] [Code] User Guide [Link] Quickstart [Link] Reference [Link] Practice Numpy in LabEx [Link] Cheatsheet [Link] |
Matplotlib [PDF][Code] Example [Link] Tutorials [Link] Reference [Link] Practice Matplotlib in LabEx [Link] Cheatsheet [Link] |
Pandas [Code] 10 Min to Pandas [Link] Cookbook [Link] Tutorials [Link] Reference [Link] Practice Pandas in LabEx [Link] Cheatsheet [Link] |
Seaborn: Stat data Visulization [Link] Example [Link] Tutorials [Link] Reference [Link] Cheatsheet [Link] |
Scikit Learn [Link] Scikit Image [Link] Scikit Tutorial #1 [Code] Scikit Tutorial #2 [Code] Cheatsheet [Link] |
-
Required Reading:
- Introduction
Chapter 1 of Understanding Machine Learning: From Theory to Algorithms
- Introduction
-
Required Reading:
- A Formal Model – The Statistical Learning Framework & Empirical Risk Minimization
Chapter 2 of Understanding Machine Learning: From Theory to Algorithms- Exercises: 2.1, 2.2, and 2.3
- Slide: Machine Learning by Roland Kwitt
- Slide: Lecture 1 by Shai Shalev-Shwartz
- Blog: Some Key Machine Learning Definitions by Joydeep Bhattacharjee
- A Formal Model – The Statistical Learning Framework & Empirical Risk Minimization
-
Required Reading:
- Chapter 3 of Understanding Machine Learning: From Theory to Algorithms
- Exercises: 3.2, 3.3, 3.4, 3.5, 3.6, 3.7
- Slide: Machine Learning by Roland Kwitt
- Slide: Lecture 2 by Shai Shalev-Shwartz
- Chapter 3 of Understanding Machine Learning: From Theory to Algorithms
-
Required Reading:
-
Chapter 4 of Understanding Machine Learning: From Theory to Algorithms
-
Slide: Machine Learning by Roland Kwitt
-
-
Required Reading:
- Chapter 5 of Understanding Machine Learning: From Theory to Algorithms
- Exercise: 5.2
- Slide: Machine Learning by Roland Kwitt
- Slide: Lecture 3 by Shai Shalev-Shwartz
- Paper: The Bias-Variance Dilemma by Raul Rojas
Suggested Reading:
- Paper: A Unified Bias-Variance Decomposition by Pedro Domingos
Additional Reading:
- NoteBook: Exploring the Bias-Variance Tradeoff by Kevin Markham
- Blog: Bias-Variance Decomposition by Sebastian Raschka
- Slide: Bias-Variance Theory by Thomas G. Dietterich
- Chapter 5 of Understanding Machine Learning: From Theory to Algorithms
-
Required Reading:
- Chapter 6 of Understanding Machine Learning: From Theory to Algorithms
- Exercises: 6.2, 6.4, 6.6, 6.9, 6.10, and 6.11
- Exercises: 6.2, 6.4, 6.6, 6.9, 6.10, and 6.11
- Slide: Machine Learning by Roland Kwitt
- Chapter 6 of Understanding Machine Learning: From Theory to Algorithms
-
Required Reading:
- Machine Learning Mastery With Python by Jason Brownlee
- Data Exploration:
- NoteBook: Titanic 1 – Data Exploration by John Stamford
- NoteBook: Kaggle Titanic Supervised Learning Tutorial
- NoteBook: An Example Machine Learning Notebook by Randal S. Olson
- Homework: Take the 7-Day Machine Learning Challenge of Kaggle: Machine learning is the hottest field in data science, and this track will get you started quickly.
- Machine Learning Mastery With Python by Jason Brownlee
-
Required Reading:
- Chapter 9 of Understanding Machine Learning: From Theory to Algorithms
- Exercises: 9.1, 9.3, 9.4, and 9.6
- Slide: Machine Learning by Roland Kwitt
- Slide: Tutorial 3: Consistent linear predictors and Linear regression by Nir Ailon
- NoteBook: Perceptron in Scikit by Chris Albon
- Blog: Why Linear Regression is not Suitable for Classification by Hong Jing
- Slide: Logistic Regression by Jeff Howbert
Additional Reading:
- NoteBook: Linear Regression by Kevin Markham
- Paper: Matrix Differentiation by Randal J. Barnes
- Lecture: Logistic Regression by Cosma Shalizi
- Lecture: Multiclass Classification by Yossi Keshet
- NoteBook: Logistic Regression-Analysis by Nitin Borwankar
- NoteBook: Logistic Regression by Kevin Markham
- Infographic and Code: Simple Linear Regression (100 Days Of ML Code) by Avik Jain
- Infographic and Code: Multiple Linear Regression (100 Days Of ML Code) by Avik Jain
- Infographic and Code: Logistic Regression (100 Days Of ML Code) by Avik Jain
R (Programming Language):
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Linear Regression by UC Business Analytics R Programming Guide
- Blog: Linear Regression with lm() by Nathaniel D. Phillips
- Blog: Logistic Regression by UC Business Analytics R Programming Guide
- Chapter 9 of Understanding Machine Learning: From Theory to Algorithms
-
Required Reading:
- Chapter 18 of Understanding Machine Learning: From Theory to Algorithms
- Exercise: 18.2
- Slide: Decision Trees by Nicholas Ruozzi
- Slide: Representation of Boolean Functions by Troels Bjerre Sørensen
- Slide: Overfitting in Decision Trees by Reid Johnson
- NoteBook: Decision Trees
Additional Reading:
- Paper: Do We Need Hundreds of Classifiers to Solve Real World Classification Problems? by Manuel Fernandez-Delgado, Eva Cernadas, Senen Barro, and Dinani Amorim
- Blog: Random Forest Classifier Example by Chris Albon. This tutorial is based on Yhat’s 2013 tutorial on Random Forests in Python.
- NoteBook
- NoteBook: Titanic Competition with Random Forest by Chris Albon
- Infographic and Code: Decision Trees (100 Days Of ML Code) by Avik Jain
R (Programming Language):
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Decision Tree Classifier Implementation in R by Rahul Saxena
- Blog: Regression Trees by UC Business Analytics R Programming Guide
- Chapter 18 of Understanding Machine Learning: From Theory to Algorithms
-
Required Reading:
- Chapter 19 (Section 1) of Understanding Machine Learning: From Theory to Algorithms
- Slide: Nearest Neighbor Classification by Vivek Srikumar
- NoteBook: k-Nearest Neighbors
Additional Reading:
- NoteBook: Training a Machine Learning Model with Scikit-Learn by Kevin Markham
- NoteBook: Comparing Machine Learning Models in Scikit-Learn by Kevin Markham
- Infographic: K-Nearest Neighbours (100 Days Of ML Code) by Avik Jain
R (Programming Language):
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Knn Classifier Implementation in R with Caret Package by Rahul Saxena
- Chapter 19 (Section 1) of Understanding Machine Learning: From Theory to Algorithms
-
Required Reading:
- Chapter 10 of Understanding Machine Learning: From Theory to Algorithms and Chapter 8 of An Introduction to Statistical Learning: with Applications in R
- Exercises: 10.1, 10.3, 10.4, and 10.5 from Understanding Machine Learning: From Theory to Algorithms
- Exercises: 10.1, 10.3, 10.4, and 10.5 from Understanding Machine Learning: From Theory to Algorithms
- Slide: Bagging and Random Forests by David Rosenberg
- Slide: Ensemble Learning through Diversity Management: Theory, Algorithms, and Applications by Huanhuan Chen and Xin Yao
- Slide: Machine Learning by Roland Kwitt
- Slide: Introduction to Machine Learning (Boosting) by Shai Shalev-Shwartz
- Paper: Ensemble Methods in Machine Learnin by Thomas G. Dietterich
- NoteBook: AdaBoost
- Question: Adaboost with a Weak Versus a Strong Learner
Additional Reading:
- Blog: Ensemble Methods by Rai Kapil
- Blog: Boosting, Bagging, and Stacking — Ensemble Methods with sklearn and mlens by Robert R.F. DeFilippi
- NoteBook: Introduction to Python Ensembles by Sebastian Flennerhag
- Library (ML-Ensemble): Graph handles for deep computational graphs and ready-made ensemble classes for ensemble networks by Sebastian Flennerhag
- NoteBook: Ensemble Methods by Vadim Smolyakov
- Paper: On Agnostic Boosting and Parity Learning by A. T. Kalai, Y. Mansour, and E. Verbin
- Paper: Faster Face Detection Using Convolutional Neural Networks & the Viola-Jones Algorithm by Karina Enriquez
R (Programming Language):
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Random Forests by UC Business Analytics R Programming Guide
- Chapter 10 of Understanding Machine Learning: From Theory to Algorithms and Chapter 8 of An Introduction to Statistical Learning: with Applications in R
-
Required Reading:
- Chapter 11 of Understanding Machine Learning: From Theory to Algorithms
- Exercises: 11.1 and 11.2 from Understanding Machine Learning: From Theory to Algorithms - Blog: What is the Difference Between a Parameter and a Hyperparameter? by Jason Brownlee
- Blog: A “short” introduction to model selection by David Schönleber
- Blog: K-Fold and Other Cross-Validation Techniques by Renu Khandelwal
- Tutorial: Learning Curves for Machine Learning in Python by Alex Olteanu
Suggested Reading:
- NoteBook: Split the Dataset Using Stratified K-Folds Cross-Validator
- Blog: Hyperparameter Tuning the Random Forest in Python by Will Koehrsen
- Blog: Hyperparameter Optimization: Explanation of Automatized Algorithms by Dawid Kopczyk
Additional Reading:
- Blog: Nested Cross Validation Explained by Weina Jin
- NoteBook: Cross Validation by Ritchie Ng
- NoteBook: Cross Validation With Parameter Tuning Using Grid Search by Chris Albon
- Blog: Random Test/Train Split is not Always Enough by Win-Vector
- Slide: Cross-Validation: What, How and Which? by Pradeep Reddy Raamana
- Paper: Algorithms for Hyper-Parameter Optimization (NIPS 2011) by J. Bergstra, R. Bardenet,Y. Bengio, and B. Kégl
- Library: Yellowbrick (Machine Learning Visualization)
R (Programming Language):
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Resampling Methods by UC Business Analytics R Programming Guide
- Blog: Linear Model Selection by UC Business Analytics R Programming Guide
- Chapter 11 of Understanding Machine Learning: From Theory to Algorithms
-
Required Reading:
- Chapter 20 of Understanding Machine Learning: From Theory to Algorithms
- Slide: Neural Networks by Shai Shalev-Shwartz
- Blog: 7 Types of Neural Network Activation Functions: How to Choose?
- Blog: Activation Functions
- Blog: Back-Propagation, an Introduction by Sanjeev Arora and Tengyu Ma
Additional Reading:
- Blog: The Gradient by Khanacademy
- Blog: Activation Functions by Dhaval Dholakia
- Paper: Why Does Deep & Cheap Learning Work So Well? by Henry W. Lin, Max Tegmark, and David Rolnick
- Slide: Basics of Neural Networks by Connelly Barnes
R (Programming Language):
- Blog: Classification Artificial Neural Network by UC Business Analytics R Programming Guide
- Chapter 20 of Understanding Machine Learning: From Theory to Algorithms
-
Required Reading:
- Chapter 12 of Understanding Machine Learning: From Theory to Algorithms
- Slide: Machine Learning by Roland Kwitt
Additional Reading:
- Blog: Escaping from Saddle Points by Rong Ge
- Chapter 12 of Understanding Machine Learning: From Theory to Algorithms
-
Required Reading:
- Chapter 13 of Understanding Machine Learning: From Theory to Algorithms
- Slide: Machine Learning by Roland Kwitt
- Blog: L1 and L2 Regularization by Renu Khandelwal
- Blog: L1 Norm Regularization and Sparsity Explained for Dummies by Shi Yan
Additional Resources:
- NoteBook: Regularization by Ethen
R (Programming Language):
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Regularized Regression by UC Business Analytics R Programming Guide
- Chapter 13 of Understanding Machine Learning: From Theory to Algorithms
-
Required Reading:
- Chapter 15 of Understanding Machine Learning: From Theory to Algorithms
- Slide: Support Vector Machines and Kernel Methods by Shai Shalev-Shwartz
- Blog: Understanding the Mathematics behind Support Vector Machines by Nikita Sharma
Additional Reading:
- Infographic: Support Vector Machines (100 Days Of ML Code) by Avik Jain
R (Programming Language):
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Support Vector Machine Classifier Implementation in R with Caret Package by Rahul Saxena
- Blog: Support Vector Machine by UC Business Analytics R Programming Guide
- Chapter 15 of Understanding Machine Learning: From Theory to Algorithms
-
Required Reading:
- Chapter 17 of [Understanding Machine Learning: From Theory to Algorithms]
- Slide: Machine Learning Basics Lecture 7: Multiclass Classification by Yingyu Liang
-
- Course: Fondations of Machine Learning by David S. Rosenberg
- Python Machine Learning Book Code Repository
- Dive into Machine Learning
- Python code for "An Introduction to Statistical Learning with Applications in R" by Jordi Warmenhoven
- iPython-NoteBooks by John Wittenauer
- Scikit-Learn Tutorial by Jake Vanderplas
- Data Science Roadmap by Javier Estraviz
- Course: Fondations of Machine Learning by David S. Rosenberg
Sunday and Tuesday 13:00-14:30 PM (Spring 2020), Room 210
Sunday 11:30-12:30 PM (Spring 2020), Room 210
Refer to the following link to check the assignments.
Projects are programming assignments that cover the topic of this course. Any project is written by
Jupyter Notebook. Projects will require the use of Python 3.7, as well as
additional Python libraries as follows.
- Python 3.7: An interactive, object-oriented, extensible programming language.
- NumPy: A Python package for scientific computing.
- Pandas: A Python package for high-performance, easy-to-use data structures and data analysis tools.
- Scikit-Learn: A Python package for machine learning.
- Matplotlib: A Python package for 2D plotting.
- SciPy: A Python package for mathematics, science, and engineering.
- IPython: An architecture for interactive computing with Python.
- Slide: Practical Advice for Building Machine Learning Applications by Vivek Srikumar
- Blog: Comparison of Machine Learning Models by Kevin Markham
- Technical Notes On Using Data Science & Artificial Intelligence: To Fight For Something That Matters by Chris Albon
Google Colab is a free cloud service and it supports free GPU!
- How to Use Google Colab by Souvik Mandal
- Primer for Learning Google Colab
- Deep Learning Development with Google Colab, TensorFlow, Keras & PyTorch
The students can include mathematical notation within markdown cells using LaTeX in their Jupyter Notebooks.
- Preparing and Cleaning Data for Machine Learning by Josh Devlin
- Getting Started with Kaggle: House Prices Competition by Adam Massachi
- Scikit-learn Tutorial: Machine Learning in Python by Satyabrata Pal
- Homework – 30%
— Will consist of mathematical problems and/or programming assignments. - Midterm – 20%
- Endterm – 50%
Midterm Examination: Tuesday 1399/02/16, 13:00-14:30
Final Examination: Tuesday 1399/03/27, 11:00-13:00
General mathematical sophistication; and a solid understanding of Algorithms, Linear Algebra, and Probability Theory, at the advanced undergraduate or beginning graduate level, or equivalent.
- Video: Professor Gilbert Strang's Video Lectures on linear algebra.
- Learn Probability and Statistics Through Interactive Visualizations: Seeing Theory was created by Daniel Kunin while an undergraduate at Brown University. The goal of this website is to make statistics more accessible through interactive visualizations (designed using Mike Bostock’s JavaScript library D3.js).
- Statistics and Probability: This website provides training and tools to help you solve statistics problems quickly, easily, and accurately - without having to ask anyone for help.
- Jupyter NoteBooks: Introduction to Statistics by Bargava
- Video: Professor John Tsitsiklis's Video Lectures on Applied Probability.
- Video: Professor Krishna Jagannathan's Video Lectures on Probability Theory.
Course (Videos, Lectures, Assignments): MIT OpenCourseWare (Discrete Mathematics)
Have a look at some reports of Kaggle or Stanford students (CS224N, CS224D) to get some general inspiration.
It is necessary to have a GitHub account to share your projects. It offers plans for both private repositories and free accounts. Github is like the hammer in your toolbox, therefore, you need to have it!
Honesty and integrity are vital elements of the academic works. All your submitted assignments must be entirely your own (or your own group's).
We will follow the standard of Department of Mathematical Sciences approach:
- You can get help, but you MUST acknowledge the help on the work you hand in
- Failure to acknowledge your sources is a violation of the Honor Code
- You can talk to others about the algorithm(s) to be used to solve a homework problem; as long as you then mention their name(s) on the work you submit
- You should not use code of others or be looking at code of others when you write your own: You can talk to people but have to write your own solution/code
I will be having office hours for this course on Monday (09:30 AM--12:00 AM). If this is not convenient, email me at hhaji@sbu.ac.ir or talk to me after class.