This list started out as a way for me to keep track of data science resources I've found helpful. However, I frequently get asked for data science resource recommendations by other data scientists and friends looking to break into data science. So I've continued to add to this, with a focus on beginner- and intermediate-level resources. Where possible, I've included links to the (legitimate) free versions of books. One of the great things about the data science community is the willingness to open-source and make things available for free. Within each category or sub-category the resources are listed very loosely in order of usefulness/introductory level to more advanced (but not entirely).
This list is far from complete, but I'll try to continue to add to it. Hopefully you find it helpful.
Non-exhaustive list of additional topics to add:
- Spark
- time series forecasting
- docker
course
Coursera - Introduction to Data Science in Pythoncourse
codecademy - Learn Python 3ebook
Python Like You Mean Itbook
Automate the Boring Stuff with Pythonbook
Python for Everybodybook
Learn Python the Hard Way maybe not my favorite resource, but was still usefulbook
Python Data Science Handbookvideo series
Calm Code- Google Python style guide
course
Khan Academy - Statistics- Stanford Experimental Design course and course notes
course
Coursera - Statistics with Python Specializationbook
Open Intro Statisticsbook
Introduction to Empirical Bayes by David Robinsonbook
Think Bayesbook
Think Stats
course
-MIT OCW
- Introduction to Computer Science and Programming in Pythonebook
Problem Solving with Algorithms and Data Structures using Pythoncourse
Khan Academy - Algorithmscourse
Coursera - Algorithms Specializationcourse
-MIT OCW
- Introduction to Algorithms- HackerRank 30 days of code
github repo
Awesome Algorithmsbook
Introduction to Algorithms by Cormen, Leiserson, Rivest and Steinarticle
Learn X in Y minutes: Basharticle
Bash scripting cheatsheet
course
Coursera - Machine Learning by Andrew Ng foundational knowledge of machine learningcourse
Applied Data Science with Python Specialization more immediately applicable than the previous coursebook
An Introduction to Statistical Learning with Applications in R (ISLR), 2nd edition by James, Witten, Hastie, Tibshiranibook
The Hundred Page Machine Learning bookbook
Approaching (Almost) Any Machine Learning Problembook
Mining of Massive Datasets andcourse
edX/Stanford - Mining Massive Datasetsarticle series
Machine Learning Masteryarticle
How to Train a Final Machine Learning Model on Machine Learning Masterybook
(advanced material) Probabilistic Machine Learning: An Introduction by Kevin Murphybook
(advanced material) Elements of Statistical Learning- Papers With Code
- ArXiv Sanity Preserver
course
-Harvard
CS 109 Data Sciencecourse
-Cornell
CS 4780 Machine Learning lecture notes and lecture youtube videoscourse
-MIT
Intro to Machine Learningcourse
-Wisconsin
Machine Learning Sebastian Raschka
paper
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction by McInnes et al.blog
Understanding UMAP by Andy Coenen and Adam Pearcearticle
How Exactly UMAP Workspaper
Visualizing Data using t-SNE by van der Maaten and Hintonblog
How to Use t-SNE Effectively
blog
Visualizing DBSCAN by Naftali HarrisAPI documentation
How HDBSCAN Workspaper
Accelerated Hierarchical Density Clustering by McInnes and Healy, 2017blog
Understanding HDBSCAN and Density-Based Clustering by Pepe Berbastackoverflow
How to select a clustering method? How to validate a cluster solution?stackoverflow
Evaluation measures of goodness or validity of clusteringpaper
What are the true clusters? by Christian Henningpaper
Density-Based Clustering Validation by Moulavi et al, 2014
paper
On the Surprising Behavior of Distance Metrics in High Dimensional Space by Aggarwal et al., 2001blog
Escaping the Curse of Dimensionality by Peter Gleeson (FreeCodeCamp)
article
Learning from Imbalanced Classes
github repo
Curated papers, articles, and blogs on data science & machine learning in productioncourse
Stanford CS 329S: Machine Learning Systems Designarticle
Overview of the different approaches to putting Machine Learning (ML) models in productionarticle
A Practical Guide to Maintaining Machine Learning in Production
github repo and tutorials
Made With ML by Goku Mohandasgithub repo
Awesome ML Opsarticle
ML Ops: Machine Learning as an Engineering Discipline
course
Coursera - deeplearning.ai Deep Learning Specializationbook
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow and its associated github repo (the first ~200 pages are about general ML so this book could go under that section, but it's probably better suited for someone looking to learn about DL)course
Google's Machine Learning Crash Coursesite
Neural Network Playgroundblog post
The Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Kaparthy
github repo
Deep Learning Drizzle giant list of university DL coursescourse
Stanford CS230 - Deep Learningcourse
Stanford CS231n - Convolutional Neural Networks for Visual Recognitioncourse
Yann LeCun's NYU course - DS-GA 1008 · SPRING 2020course
MIT Intro to Deep Learning
github repo
Deep Learning Papers Reading Roadmappaper
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification by He et al, 2015paper
A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay by Smith, 2018
book
Deep Learning with Python, 2nd edition by François Cholletcourse
Coursera Deeplearning.AI Tensorflow Developer Professional Certificate
course
Coursera - Reinforcement Learning Specializationbook
Reinforcement Learning by Sutton and Barto
course
Coursera - deeplearning.ai Natural Language Processing Specializationcourse
CS224n: Natural Language Processing with Deep Learningcourse
Advanced NLP with spaCycourse
Hugging Face coursebook
Natural Language Processing with Python by Bird, Klein and Lopercourse
Michigan NLP course videos and githubarticle
FROM Pre-trained Word Embeddings TO Pre-trained Language Models — Focus on BERTbook
Speech and Language Processing (3rd ed. draft) byarticle
How to get started in NLP
article
Introduction to Word Embeddingsarticle
Document Embedding Techniquespaper
word2vec: Efficient Estimation of Word Representations in Vector Space by Mikolov et al.paper
GloVe: GloVe: Global Vectors for Word Representation by Pennington et al. and Stanford webiste for GloVepaper
fastText: Bag of Tricks for Efficient Text Classification by Joulin et al.paper
Universal Sentence Encoder by Cer et al., 2018
paper
LDA: Latent Dirichlet Allocation by Blei et al.paper
Anchored CorEx Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge by Gallagher et al., 2017 and githubgithub
Top2Vec andpaper
Top2Vec: Distributed Representations of Topics by Dimo Angelovgithub
BERTopic andarticle
Topic Modeling with BERT by Maarten Grootendorstblog post
-StitchFix
- Introducing our Hybrid lda2vec Algorithm by Chris Moody
paper
transformers Attention Is All You Need by Vaswani et al, 2017article
The Illustrated GPT-2 (Visualizing Transformer Language Models by Jay Alammararticle
How GPT3 Works - Visualizations and Animations by Jay Alammar
book
Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing by Kohavi, et al.course
Microsoft Experimentation Platform- Evan Miller's A/B test tools
paper
Three Key Checklists and Remedies for Trustworthy Analysis of Online Controlled Experiments at Scalepaper
Top Challenges from the first Practical Online Controlled Experiments Summitpaper
Controlled experiments on the web: survey and practical guide Kohavi et al, 2009article
Guidelines for A/B Testingarticle
A/B Testing: 29 Guidelines for Online Experiments (Plus a Checklist)course
Udacity - A/B testing by Googlearticle
How Not To Run an A/B Test by Evan Millerarticle
Simple Sequential A/B testing by Evan Millerpaper
Overlapping Experiment Infrastructure: More, Better, Faster Experimentation by Tang et al, 2010blog post
A/B Testing Tutorialpaper
Controlled experiments on the web: survey and practical guide by Ron Kohavi et alpresentation
Online Controlled Experiments: Lessons from Running A/B/n Tests for 12 years by Ron Kohaviarticles
Microsoft's Experimentation Platform
article
Understanding Bayesian A/B testing by David Robinsonarticle
Is Bayesian A/B Testing Immune to Peeking? Not Exactly by David Robinsonarticle
Agile A/B testing with Bayesian Statistics and Python by Chris Stucchioarticle
The Power of Bayesian A/B Testing
paper
Best arm identification in multi-armed bandits with delayed feedbackpaper
Generalized Thompson Sampling for Contextual Banditspaper
Analysis of Thompson Sampling for the Multi-armed Bandit Problempaper
A Contextual-Bandit Approach to Personalized News Article Recommendationarticle
A/B testing — Is there a better way? An exploration of multi-armed bandits
course
Coursera - Introduction to Git and GitHubebook
Pro Git
article
Structuring Your Project: The Hitchhiker's Guide to Pythongithub
Cookiecutter data scienceblog post
How to Set Up a Python Project For Automation and Collaboration by Eugene Yan
article
The importance of structure, coding style, and refactoring in notebookstutorial
Production Data Sciencearticle
Coding habits for data scientists
article
Effective Python Testing With Pytest Real Pythonarticle
Becoming a Better Data Scientist: Testing with pytest by Chang Hsin Leearticle
Unit Testing for Data Scientists
tutorial
PyPA Packaging Python projects tutoriale-book
Python Packages e-booke-book
The Joy of Packaging- poetry
article
How to Build Your First Python Package
course
Coursera - Getting Started with AWS Machine Learningcourse
Coursera - AWS Cloud Technical Essentialscourse
Coursera - Practical Data Science Specialization- AWS Ramp up guide
- Flask Mega-tutorial by Miguel Ginberg
article
Parameter Tuning with Hyperopt by District Data Labsarticle
On Using Hyperopt: Advanced Machine Learning by Tanay Agrawalarticle
An Introductory Example of Bayesian Optimization in Python with Hyperopt by Will Koehrsen
github repo
EthicalML Awesome Production ML
package
Fairlearn
article
Data science learning resources by Microsoft Data Science teamblog
End-to-End Machine Learning by Brandon Rohrer (some good free resources, some paid)github repo
Awesome Machine Learningblog
Free online machine learning curriculum by Chip Huyen
book
Build a Career in Data Science by Emily Robinson and Jacqueline Nolisarticle
80000 hours: Data Science career reviewQuora
As a data scientist, what career advice changed your life?blog post
A Framework for Career Decisions by Conor Dewey- ApplyingML - Mentor interviews by Eugene Yan
- talks by Angela Bassa
blog post
Applied / Research Scientist, ML Engineer: What’s the Difference? by Eugene Yanreddit
Difference between DS and MLEarticle
Machine Learning Engineer vs Data Scientist (Is Data Science Over?)
- Open-Source Data Science Masters
article
How to Build a Data Science Portfoliogithub
Awesome Data Science
article
Unpopular Opinion - Data Scientists Should be More End-to-End by Eugene Yanarticle
-Stitch Fix
Beware the data science pin factory: The power of the full-stack data science generalist and the perils of division of labor through function by Eric Colson
blog post
Finding Answers to your Career Questionsblog post
Engieering Management: The Pendulum or the ladder by Charity Majorsblog post
The Engineer/Manager Pendulum by Charity Majorsblog post
Senior engineer and then what? by Ju Yang
article
Models for integrating data science teams within organizationsarticle
-Coursera
Analytics at Coursera: three years laterarticle
-Coursera
What is the most effective way to structure a data science team?article
-AirBnB
At Airbnb, Data Science Belongs Everywherearticle
Embedding Data Science In Cross-Functional Teams
blog
Building a data team at a mid-stage startup: a short story by Erik Bernhardssonblog
-StitchFix
Let Curiosity Drive: Fostering Innovation in Data Science
book
Introduction to Machine Learning Interviews Book by Chip Huyenarticle
Data science career advice to my younger self by Schaun Wheelerblog post
How to Break Into the Tech Industry—a Guide to Job Hunting and Tech Interviews by Haseeb Qureshiarticle
Mastering the Data Science Interview Loop by Andrei Lyskovblog post
Reverse Interviewing Your Future Manager and Team by Gergely Oroszblog post
Red Flags to Look Out for When Joining a Data Team by Eugene Yanblog post
Red Flags in Data Science Interviews by Emily Robinson
article
How to manage Machine Learning and Data Science projectsarticle
Data Science and Agile (What works, and what doesn't) and Data Science and Agile (Frameworks for effectiveness)
article
Jobs To Be Done Frameworkarticle series
-Sequoia
Data-Informed Product Buildingarticle
Sequoia Data Science Team Measuring Product Healtharticle
Sequoia Data Science Team Retention
article
10 Reads for Data Scientists Getting Started with Business Models by Conor Dewey