Skip to content

fevzikaya7/2022-Fall-388E

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Analysis for Fundamental Sciences (Fall 2022, MAT388E)

Course description

Data science is a broad interdisciplinary field. It lies in the intersection of mathematics, statistics, machine learning, and computer science and use their methods and tools to extract information and insight from data. This is a course on the mathematical foundations of standard statistical and machine learning models used in the field. The class aims to teach students majoring in fundamental sciences to effectively use and deploy these algorithms in applications.

Books

  • T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. (Available on the web)
  • C.M. Bishop. Pattern Recognition and Machine Learning.
  • E. Alpaydin. Machine Learning.

Other resources

The books I listed are mostly theoretical. But for the computational homeworks you may need the following:

  1. M. Kirk, Thoughtful Machine Learning with Python.
  2. C. O'Neil, R. Schutt, Doing Data Science.
  3. J. VanderPlas, Python Data Science Handbook.

Also, there are excellent resources on the web. I would recommend:

Enroll to any of the data, machine learning, statistics, R or python classes that catches your fancy, or you think might be useful for you.

A sample of data sources

Technical Requirements

The course is an applied data analysis class. This means the course requires a degree of proficiency of computational tools from which you are going to be responsible.

Installing and maintaining these systems on your machine is your responsibility. I can't help you if something doesn't work. You will need to figure it out on your own. If you can't install these systems on your machine you may try to use an online service:

Course Management

I will make all of the course related announcement on İTÜ's course management system NINOVA. I will post the grades on NINOVA as well. So, do check it regularly.

E-Mail Policy

I receive approximately 50 e-mails per day. So, if you need to contact me, use the subject ``MAT388E'' in your e-mails. Spend some time structuring your e-mail with grammatically correct sentences in Turkish or in English. Be polite, direct, and concise. State what you need in the first two sentences. Sign your e-mails with your name and student number. If I can't figure out who you are and what you need within 30 seconds of opening your message, I will delete your e-mail with no response. You are hereby warned.

Assessment

Your performance is going to be judged via 4 homework assignments posted on the course github page and one final project that you need to write from scratch. Each home work is 15 points, and the final project is worth 40 points. Your total assessment for the course will be evaluated as follows:

If you receive 0 (missing HWs are graded as 0) any 2 of the homeworks, or if your total from homeworks is less than 35% you'll get a VF. If your final is less than 25%, or your total is less than 35% you'll receive an F. Note that the conditions for receiving a VF are both necessary and sufficient, while the conditions for receiving an F are only sufficient. This means you may still get an F with a higher score than 35% depending on the distribution of th e scores.

Assessment Deadline
Github link Sep 26
Homework 1 Oct 10
Homework 2 Oct 31
Final Project Proposal Nov 7
Homework 3 Nov 21
Homework 4 Dec 5
Final Project Dec 30

There is no make-up for the homeworks. If you miss any of the homework deadline because of an emergency, do contact me to make an arrangement as soon as you can.

Attendance

I will collect a written attendance in each lecture. I will use the attendance records for those students that are edge cases in their grades. (Push them up or down.)

Homeworks

For the homeworks, you are going to need to open a GitHub account and create a repository for this class. I am going to pull your howeworks and final project from your GitHub repositories at 11:59PM of each deadline date. You must open a private github repository and share it with my hotmail address: atabey_kaygun@hotmail.com. Then send my itu address (kaygun@itu.edu.tr) your name, student number and your private github repository link. Your deadline is September 26, 11:59PM. If you do not follow these instructions, I will deduct upto 15 points from your final grade.

I am going to post the homework assignments on the course github page, you'll need to fill in the answers and post it on your own github account by the deadline.

Final Project

The final project is worth 40 points and will be evaluated on your final project notebook. You may work with a team, but no larger than 3 students. You must open a separate repository with your team and submit th e link via e-mail with the subject ``MATH388E Final Project Link'' by November 14th. In that proposal git repository, put a jupyter notebook with

  • The title of the project
  • The list of team members (names and student numbers)
  • Project summary

The project summary must contain the description of the data set you are going to work with, what you want to do with it, and a clear plan how you are going to accomplish your goals. I will grade your proposals (15 points) and might make adjustments on your data set, your hypothesis and your approach.

At the end of the semester when you submit your final project, I also want a short description of who did what for the final project as a supplement.

Final Exam

By regulations I must give a final exam. But in the exam I will only ask you explain your final project.

Cheating

Passing someone else's code or text as your own is cheating, or worse yet, theft. Copying code with variable names changed is another lazy form of cheating. Depending on severity of the situation, I may even report you to the university. In short, don't do it.

Weekly Course Plan

The following is a tentative schedule of topics I am going to cover. I may go faster or slower depending on the week. I may even add new subjects, or even drop subjects depending on requests and participation.

Week Subject
Sep 19 Data Science, Machine Learning, Statistics, Computer Science: Similarities and Differences.
Sep 26 Deadline for GitHub link submission.
Crash Course in Python and its Library Ecosystem.
Oct 3 Data types, data apis, popular data sources, and how to use them.
Post HW1
Oct 10 Deadline for HW1.
Supervised and unsupervised learning. Cross-validation.
Clustering vs classification. k-means clustering. k-nearest neighbor classification.
Oct 17 Regression: OLS, regularization, lasso, elastic net.
Oct 24 Logistic regression. Decision tree regression.
Post HW2
Oct 31 Deadline for HW2.
Nov 7 Hiearchical clustering. Density based clustering.
Nov 14 Deadline for final project proposals.
Entropy and Gini. Decision trees. Random forests.
Post HW3
Nov 21 Deadline for HW3.
Support Vector Machines.
Nov 28 Dimensionality reduction. PCA, kernel PCA, LDA, NNMD.
Dimensionality reduction applications for image and natural language processing.
Post HW4
Dec 5 Deadline for HW4.
Newton-Raphson. Gradient Descent. Perceptron.
Dec 12 Neural Networks
Dec 19 A taxonomy of neural networks. Applications.
Dec 29 Autoencoders.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%