Skip to content

Latest commit

 

History

History
559 lines (330 loc) · 28.2 KB

README.md

File metadata and controls

559 lines (330 loc) · 28.2 KB

Developer Roadmap

Self-taught Data Science Curriculum



Content Summary


About

Aqui está uma versão aprimorada da descrição para o seu projeto "Self-taught Data Science Curriculum" no GitHub. Ela inclui detalhes técnicos e destaca seu propósito de autodidatismo e compartilhamento de conhecimento.


Overview

The Self-taught Data Science Curriculum is a learning guide I developed to master data science concepts and skills for free. Upon realizing the vast amount of high-quality, free resources available online, I decided to compile and organize them into a coherent roadmap. This project is not only my personal journey into data science but also a guide for anyone who wishes to follow a similar path.

Initially, this curriculum was designed for my own learning, but you are welcome to clone it and explore the courses if they align with your goals. The material here covers a broad range of topics essential for a successful data science career, from programming to artificial intelligence. The sources I used can be found in the "References" section at the end of the README.

Learning Goals

The main objective is to follow a structured learning path inspired by the roadmap from the AI Expert team. The key skills and concepts I aim to master by the end of this curriculum include:

1. Proficiency in Programming

  • Python: The primary language for data manipulation, machine learning, and AI model development. Python will be heavily explored due to its versatility and wide adoption in data science.
  • R: A powerful language for statistical analysis, data visualization, and in-depth exploration of statistical data.
  • Rust: Known for its performance and memory safety, Rust is increasingly used in data engineering and AI model implementation. This curriculum includes enough content to get a strong grasp of the language for those purposes.

2. Databases, Business Intelligence, and Data Warehousing

  • Databases: Focus on both relational (SQL) and non-relational (NoSQL) database systems for effective data management and retrieval.
  • Business Intelligence (BI): Mastery of BI tools for data-driven decision-making and insights generation.
  • Data Warehousing: Understanding the design and implementation of data warehouses for efficient storage and management of large datasets.

3. Artificial Intelligence and Machine Learning

  • Machine Learning: Learn how to build and apply machine learning models for tasks such as predictive analytics, classification, and pattern recognition.
  • Deep Learning: Dive into neural networks, with an emphasis on frameworks like TensorFlow and PyTorch, to explore architectures and advanced AI techniques.

How to Use This Curriculum

This curriculum is broken down into various modules that align with the core areas of data science. You can follow them sequentially or skip to specific areas based on your current knowledge and interests. I encourage you to adapt this guide to your own learning style, pace, and goals.

References

The "References" section at the end of this repository contains a comprehensive list of resources that I consulted while building this guide, including free online courses, tutorials, and learning platforms.


Feel free to make this description more personal or technical based on your style! It provides a structured overview while highlighting your personal journey and intention of sharing knowledge with others.


Section 01 - Fundamentals

Course Offered by Effort Certificate, if applicable Status
Data – What It Is, What We Can Do With It Johns Hopkins University ~11h Certificate of Completion
Foundations of Data Science and AI Data Science Academy ~24h -- --
The Data Scientist's Toolbox Johns Hopkins University ~18h Certificate of Completion

Section 02 - Mathematics and Statistics Applied in Data and Computing

Mathematics 01:

Course Offered by Effort of Certificate, if applicable Status
Introduction to Statistics Stanford University ~14h Certificate of completion
Mathematical Thinking in Computer Science UC San Diego ~41h -- --
Combinatorics and Probability UC San Diego ~23h -- --
Introduction to Graph Theory UC San Diego ~20h -- --
Number Theory and Cryptography UC San Diego ~16h -- --
Delivery Problem UC San Diego ~13h -- --
Linear Algebra for Machine Learning and Data Science DeepLearning.AI ~34h -- --
Calculus for Machine Learning and Data Science DeepLearning.AI ~25h -- --
Probability and Statistics for Machine Learning and Data Science DeepLearning.AI ~33h -- --

Section 03 - Programming for Data Science

Section 03-A - Python Language for Data Analysis

Course Offered by Effort of Certificate, if applicable Status
Python Basics University of Michigan ~34h Certificate of Completion
Python Functions, Files, and Dictionaries University of Michigan ~31h Certificate of Completion
Data Collection and Processing with Python University of Michigan ~16h Certificate of Completion
Understanding and Visualizing Data with Python University of Michigan ~19h -- --
Inferential Statistical Analysis with Python University of Michigan ~21h -- --
Fitting Statistical Models to Data with Python University of Michigan ~14h -- --
Introduction to Data Science in Python University of Michigan ~34h -- --
Applied Plotting, Charting & Data Representation in Python University of Michigan ~24h -- --
Applied Machine Learning in Python University of Michigan ~31h -- --
Applied Text Mining in Python University of Michigan ~25h -- --
Applied Social Network Analysis in Python University of Michigan ~26h -- --

Section 03-B - R Language for Statistical Analysis and Modeling

Course Offered by Effort of Certificate, if applicable Status
R Programming Johns Hopkins University ~57h -- --
Advanced R Programming Johns Hopkins University ~18h -- --
Building R Packages Johns Hopkins University ~20h -- --
Introduction to Data Visualization in R Johns Hopkins University ~11h -- --
Data Visualization in R with ggplot2 Johns Hopkins University ~12h -- --
Advanced Data Visualization in R Johns Hopkins University ~10h -- --
Publishing Visualizations in R with Shiny and flexdashboard Johns Hopkins University ~11h -- --

Section 03-C - Rust Language for Data Engineering and LLM

Course Offered by Effort of Certificate, if applicable Status
Rust Fundamentals Duke University ~40h -- --
Data Engineering with Rust Duke University ~63h -- --
Rust for DevOps Duke University ~18h -- --
Python and Rust with Linux Command-Line Tools Duke University ~20h -- --
Rust for LLMOps Duke University ~16h -- --

Bonus Section - Data Structures and Algorithms

Course Offered by Effort of Certificate, if applicable Status
Algorithms for Searching, Sorting, and Indexing University of Colorado Boulder ~35h -- --
Trees and Graphs: Basics University of Colorado Boulder ~34h -- --
Dynamic Programming, Greedy Algorithms University of Colorado Boulder ~37h -- --
Linear Programming and Approximation Algorithms University of Colorado Boulder ~48h -- --
Advanced Data Structures, RSA, and Quantum Algorithms University of Colorado Boulder ~37h -- --

Section 04 - Data Mining

Course Offered by Effort Certificate, if applicable Status
Data Visualization University of Illinois ~15h -- --
Text Retrieval and Search Engines University of Illinois ~30h -- --
Text Mining and Analysis University of Illinois ~33h -- --
Pattern Discovery in Data Mining University of Illinois ~17h -- --
Cluster Analysis in Data Mining University of Illinois ~16h -- --

Section 05 - Databases, SQL, and Big Data

Course Offered by Effort Certificate, if applicable Status
Relational Database Design University of Colorado ~34h -- --
The Structured Query Language (SQL) University of Colorado ~26h -- --
Advanced Topics and Future Trends in Database Technologies University of Colorado ~16h -- --
Introduction to Big Data University of California ~17h -- --
Big Data Modeling and Management Systems University of California ~13h -- --
Big Data Integration and Processing University of California ~17h -- --
Machine Learning with Big Data University of California ~23h -- --
Graph Analytics for Big Data University of California ~13h -- --

Section 06 - Cloud Computing

Course Offered by Effort Certificate, if applicable Status
Cloud Computing Concepts, Part 1 University of Illinois ~23h -- --
Cloud Computing Concepts, Part 2 University of Illinois ~19h -- --
Cloud Systems and Infrastructure University of Illinois ~15h -- --
Big Data and Cloud Computing Applications University of Illinois ~19h -- --
Cloud Networking University of Illinois ~22h -- --

Section 07 - Machine Learning

Course Offered by Effort Certificate, if applicable Status
Supervised Machine Learning: Regression and Classification DeepLearning.AI ~33h -- --
Advanced Machine Learning Algorithms DeepLearning.AI ~34h -- --
Unsupervised Learning, Recommenders, Reinforcement Learning DeepLearning.AI ~37h -- --
Introduction to TensorFlow DeepLearning.AI ~17h -- --
Convolutional Neural Networks in TensorFlow DeepLearning.AI ~16h -- --
Natural Language Processing in TensorFlow DeepLearning.AI ~24h -- --
Sequences, Time Series and Prediction DeepLearning.AI ~22h -- --

Section 08 - Deep Learning

Course Offered by Effort Certificate, if applicable Status
Neural Networks and Deep Learning DeepLearning.AI ~24h -- --
Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization DeepLearning.AI ~23h -- --
Structuring Machine Learning Projects DeepLearning.AI ~06h -- --
Convolutional Neural Networks DeepLearning.AI ~35h -- --
Sequence Models DeepLearning.AI ~37h -- --

Section 09 - Natural Language Processing

Course Offered by Effort Certificate, if applicable Status
NLP with Classification and Vector Spaces DeepLearning.AI ~33h -- --
NLP with Probabilistic Models DeepLearning.AI ~30h -- --
NLP with Sequence Models DeepLearning.AI ~21h -- --
NLP with Attention Models DeepLearning.AI ~26h -- --

Section 10 - Soft Skills

Course Offered by Effort Certificate, if applicable Status
Learning How to Learn Deep Teaching Solutions ~15h Certificate of Completion
Storytelling & Influence: Communicating with Impact Macquarie University ~18h -- --
Ask Questions to Make Data-driven Decisions Google ~21h Certificate of Completion

Extra Bibliography

Mathematics Books

Books, Articles, and Related Documentation

These resources cover a wide range of topics from foundational mathematics and statistical theory to advanced machine learning and artificial intelligence.

Notes and Clarifications

  • The duration of the courses listed here are estimates provided by the platforms where they are offered.

  • At the moment, I am working on this graduation, so the tense of this readme is a bit strange, sometimes in the past, sometimes in the future. As I work on it, I will reformat it to better reflect my experience.

  • Regarding the books, my university has partnerships with some platforms like O'Reilly, in addition to a very large library where I managed to find almost all of them. But if you don't have access... ahem... try to see if they fall off the truck... ahem... but if you can buy them, please do.

References

Sources consulted for the construction of this curriculum.

  • OSSU Data Science - OSSU offers a free, open-source curriculum in data science, perfect for those looking to study technology in a self-paced and flexible manner. I highly recommend OSSU and any initiative that aims to democratize education.

  • AI Expert Roadmap - A detailed roadmap to becoming an AI expert, developed by specialists in the field.

  • Python Developer - Roadmap SH provides comprehensive learning paths across various technology areas and tools. This link directs to the Python roadmap, but they offer many other paths.

  • PostgreSQL - PostgreSQL Database Administrator roadmap, also from Roadmap SH, outlining a specific learning path for professionals in the field.

  • USP Statistics Course - Curriculum for the Bachelor's Degree in Statistics at the University of São Paulo, used to guide the selection of courses and books in this list.


Developer Roadmap