- About
- Section 01 - Fundamentals
- Section 02 - Mathematics and Statistics Applied in Data and Computing
- Section 03 - Programming for Data Science
- Section 03-A - Python for Data Analysis
- Section 03-B - R for Statistical Analysis and Modeling
- Section 03-C - Rust for Data Engineering and LLM
- Bonus Section - Data Structures and Algorithms
- Section 04 - Data Mining
- Section 05 - Databases, SQL, and Big Data
- Section 06 - Cloud Computing
- Section 07 - Machine Learning
- Section 08 - Deep Learning
- Section 09 - Natural Language Processing
- Section 10 - Soft Skills
- Extra Bibliography
- Notes and Clarifications
- References
Aqui está uma versão aprimorada da descrição para o seu projeto "Self-taught Data Science Curriculum" no GitHub. Ela inclui detalhes técnicos e destaca seu propósito de autodidatismo e compartilhamento de conhecimento.
The Self-taught Data Science Curriculum is a learning guide I developed to master data science concepts and skills for free. Upon realizing the vast amount of high-quality, free resources available online, I decided to compile and organize them into a coherent roadmap. This project is not only my personal journey into data science but also a guide for anyone who wishes to follow a similar path.
Initially, this curriculum was designed for my own learning, but you are welcome to clone it and explore the courses if they align with your goals. The material here covers a broad range of topics essential for a successful data science career, from programming to artificial intelligence. The sources I used can be found in the "References" section at the end of the README.
The main objective is to follow a structured learning path inspired by the roadmap from the AI Expert team. The key skills and concepts I aim to master by the end of this curriculum include:
- Python: The primary language for data manipulation, machine learning, and AI model development. Python will be heavily explored due to its versatility and wide adoption in data science.
- R: A powerful language for statistical analysis, data visualization, and in-depth exploration of statistical data.
- Rust: Known for its performance and memory safety, Rust is increasingly used in data engineering and AI model implementation. This curriculum includes enough content to get a strong grasp of the language for those purposes.
- Databases: Focus on both relational (SQL) and non-relational (NoSQL) database systems for effective data management and retrieval.
- Business Intelligence (BI): Mastery of BI tools for data-driven decision-making and insights generation.
- Data Warehousing: Understanding the design and implementation of data warehouses for efficient storage and management of large datasets.
- Machine Learning: Learn how to build and apply machine learning models for tasks such as predictive analytics, classification, and pattern recognition.
- Deep Learning: Dive into neural networks, with an emphasis on frameworks like TensorFlow and PyTorch, to explore architectures and advanced AI techniques.
This curriculum is broken down into various modules that align with the core areas of data science. You can follow them sequentially or skip to specific areas based on your current knowledge and interests. I encourage you to adapt this guide to your own learning style, pace, and goals.
The "References" section at the end of this repository contains a comprehensive list of resources that I consulted while building this guide, including free online courses, tutorials, and learning platforms.
Feel free to make this description more personal or technical based on your style! It provides a structured overview while highlighting your personal journey and intention of sharing knowledge with others.
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Data – What It Is, What We Can Do With It | Johns Hopkins University | ~11h | Certificate of Completion | ✓ |
Foundations of Data Science and AI | Data Science Academy | ~24h | -- | -- |
The Data Scientist's Toolbox | Johns Hopkins University | ~18h | Certificate of Completion | ✓ |
Course | Offered by | Effort of | Certificate, if applicable | Status |
---|---|---|---|---|
Introduction to Statistics | Stanford University | ~14h | Certificate of completion | ✓ |
Mathematical Thinking in Computer Science | UC San Diego | ~41h | -- | -- |
Combinatorics and Probability | UC San Diego | ~23h | -- | -- |
Introduction to Graph Theory | UC San Diego | ~20h | -- | -- |
Number Theory and Cryptography | UC San Diego | ~16h | -- | -- |
Delivery Problem | UC San Diego | ~13h | -- | -- |
Linear Algebra for Machine Learning and Data Science | DeepLearning.AI | ~34h | -- | -- |
Calculus for Machine Learning and Data Science | DeepLearning.AI | ~25h | -- | -- |
Probability and Statistics for Machine Learning and Data Science | DeepLearning.AI | ~33h | -- | -- |
Course | Offered by | Effort of | Certificate, if applicable | Status |
---|---|---|---|---|
Python Basics | University of Michigan | ~34h | Certificate of Completion | ✓ |
Python Functions, Files, and Dictionaries | University of Michigan | ~31h | Certificate of Completion | ✓ |
Data Collection and Processing with Python | University of Michigan | ~16h | Certificate of Completion | ✓ |
Understanding and Visualizing Data with Python | University of Michigan | ~19h | -- | -- |
Inferential Statistical Analysis with Python | University of Michigan | ~21h | -- | -- |
Fitting Statistical Models to Data with Python | University of Michigan | ~14h | -- | -- |
Introduction to Data Science in Python | University of Michigan | ~34h | -- | -- |
Applied Plotting, Charting & Data Representation in Python | University of Michigan | ~24h | -- | -- |
Applied Machine Learning in Python | University of Michigan | ~31h | -- | -- |
Applied Text Mining in Python | University of Michigan | ~25h | -- | -- |
Applied Social Network Analysis in Python | University of Michigan | ~26h | -- | -- |
Course | Offered by | Effort of | Certificate, if applicable | Status |
---|---|---|---|---|
R Programming | Johns Hopkins University | ~57h | -- | -- |
Advanced R Programming | Johns Hopkins University | ~18h | -- | -- |
Building R Packages | Johns Hopkins University | ~20h | -- | -- |
Introduction to Data Visualization in R | Johns Hopkins University | ~11h | -- | -- |
Data Visualization in R with ggplot2 | Johns Hopkins University | ~12h | -- | -- |
Advanced Data Visualization in R | Johns Hopkins University | ~10h | -- | -- |
Publishing Visualizations in R with Shiny and flexdashboard | Johns Hopkins University | ~11h | -- | -- |
Course | Offered by | Effort of | Certificate, if applicable | Status |
---|---|---|---|---|
Rust Fundamentals | Duke University | ~40h | -- | -- |
Data Engineering with Rust | Duke University | ~63h | -- | -- |
Rust for DevOps | Duke University | ~18h | -- | -- |
Python and Rust with Linux Command-Line Tools | Duke University | ~20h | -- | -- |
Rust for LLMOps | Duke University | ~16h | -- | -- |
Course | Offered by | Effort of | Certificate, if applicable | Status |
---|---|---|---|---|
Algorithms for Searching, Sorting, and Indexing | University of Colorado Boulder | ~35h | -- | -- |
Trees and Graphs: Basics | University of Colorado Boulder | ~34h | -- | -- |
Dynamic Programming, Greedy Algorithms | University of Colorado Boulder | ~37h | -- | -- |
Linear Programming and Approximation Algorithms | University of Colorado Boulder | ~48h | -- | -- |
Advanced Data Structures, RSA, and Quantum Algorithms | University of Colorado Boulder | ~37h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Data Visualization | University of Illinois | ~15h | -- | -- |
Text Retrieval and Search Engines | University of Illinois | ~30h | -- | -- |
Text Mining and Analysis | University of Illinois | ~33h | -- | -- |
Pattern Discovery in Data Mining | University of Illinois | ~17h | -- | -- |
Cluster Analysis in Data Mining | University of Illinois | ~16h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Relational Database Design | University of Colorado | ~34h | -- | -- |
The Structured Query Language (SQL) | University of Colorado | ~26h | -- | -- |
Advanced Topics and Future Trends in Database Technologies | University of Colorado | ~16h | -- | -- |
Introduction to Big Data | University of California | ~17h | -- | -- |
Big Data Modeling and Management Systems | University of California | ~13h | -- | -- |
Big Data Integration and Processing | University of California | ~17h | -- | -- |
Machine Learning with Big Data | University of California | ~23h | -- | -- |
Graph Analytics for Big Data | University of California | ~13h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Cloud Computing Concepts, Part 1 | University of Illinois | ~23h | -- | -- |
Cloud Computing Concepts, Part 2 | University of Illinois | ~19h | -- | -- |
Cloud Systems and Infrastructure | University of Illinois | ~15h | -- | -- |
Big Data and Cloud Computing Applications | University of Illinois | ~19h | -- | -- |
Cloud Networking | University of Illinois | ~22h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Supervised Machine Learning: Regression and Classification | DeepLearning.AI | ~33h | -- | -- |
Advanced Machine Learning Algorithms | DeepLearning.AI | ~34h | -- | -- |
Unsupervised Learning, Recommenders, Reinforcement Learning | DeepLearning.AI | ~37h | -- | -- |
Introduction to TensorFlow | DeepLearning.AI | ~17h | -- | -- |
Convolutional Neural Networks in TensorFlow | DeepLearning.AI | ~16h | -- | -- |
Natural Language Processing in TensorFlow | DeepLearning.AI | ~24h | -- | -- |
Sequences, Time Series and Prediction | DeepLearning.AI | ~22h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Neural Networks and Deep Learning | DeepLearning.AI | ~24h | -- | -- |
Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization | DeepLearning.AI | ~23h | -- | -- |
Structuring Machine Learning Projects | DeepLearning.AI | ~06h | -- | -- |
Convolutional Neural Networks | DeepLearning.AI | ~35h | -- | -- |
Sequence Models | DeepLearning.AI | ~37h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
NLP with Classification and Vector Spaces | DeepLearning.AI | ~33h | -- | -- |
NLP with Probabilistic Models | DeepLearning.AI | ~30h | -- | -- |
NLP with Sequence Models | DeepLearning.AI | ~21h | -- | -- |
NLP with Attention Models | DeepLearning.AI | ~26h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Learning How to Learn | Deep Teaching Solutions | ~15h | Certificate of Completion | ✓ |
Storytelling & Influence: Communicating with Impact | Macquarie University | ~18h | -- | -- |
Ask Questions to Make Data-driven Decisions | ~21h | Certificate of Completion | ✓ |
- Discrete Mathematics: Foundations - David J. Hunter
- Concrete Mathematics: A Foundation for Computer Science - Ronald Graham
- Pre-Calculus - Valéria Zuma Medeiros
- Calculus I - James Stewart
- Calculus II - James Stewart
- Numerical Calculus: Theoretical and Computational Aspects - Marcia Gomes
- Elementary Linear Algebra - Howard Anton
- Analytical Geometry: A Vector Treatment - Ivan De Camargo
- Introduction to Statistical Theory - Alexander Mood
- Matrix Algebra Useful for Statistics - Andre I Khuri
- The Elements of Statistical Learning - Trevor Hastie, Robert Tibshirani, Jerome Friedman
- Introduction to Linear Regression Analysis - Douglas C Montgomery
- Bayesian Statistics - Peter M. Lee
- Monte Carlo Markov Chain: Stochastic Simulation for Bayesian Inference - Dani Gamerman
- Applied Nonparametric Statistical Methods - Nigel C Smeeton
- Interpreting Regression Models Based on Computational Intelligence - János Abonyi
- Regression Models with Computational Support - Gilberto A. Paula
- An Introduction to Statistical Learning with Applications in R - Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani
- SQL for Smarties: Advanced SQL Programming - Joe Celko
- Deep Learning Papers Reading Roadmap - Roadmap of DL Papers
- Artificial Intelligence: A Modern Approach - Stuart J. Russell
- The Missing Semester of Your CS Education - MIT
These resources cover a wide range of topics from foundational mathematics and statistical theory to advanced machine learning and artificial intelligence.
-
The duration of the courses listed here are estimates provided by the platforms where they are offered.
-
At the moment, I am working on this graduation, so the tense of this
readme
is a bit strange, sometimes in the past, sometimes in the future. As I work on it, I will reformat it to better reflect my experience. -
Regarding the books, my university has partnerships with some platforms like O'Reilly, in addition to a very large library where I managed to find almost all of them. But if you don't have access... ahem... try to see if they fall off the truck... ahem... but if you can buy them, please do.
Sources consulted for the construction of this curriculum.
-
OSSU Data Science - OSSU offers a free, open-source curriculum in data science, perfect for those looking to study technology in a self-paced and flexible manner. I highly recommend OSSU and any initiative that aims to democratize education.
-
AI Expert Roadmap - A detailed roadmap to becoming an AI expert, developed by specialists in the field.
-
Python Developer - Roadmap SH provides comprehensive learning paths across various technology areas and tools. This link directs to the Python roadmap, but they offer many other paths.
-
PostgreSQL - PostgreSQL Database Administrator roadmap, also from Roadmap SH, outlining a specific learning path for professionals in the field.
-
USP Statistics Course - Curriculum for the Bachelor's Degree in Statistics at the University of São Paulo, used to guide the selection of courses and books in this list.