π An end-to-end full-stack Data Science and AI/ML project effectively implementing ML models, MLOps practices, scalable machine learning, and data storytelling. β¨
π π οΈ Experiment (Design + Develop) --> π Production (Deploy + Iterate) βοΈ
: Full-Stack Data Science and Production-Grade Machine Learning at Scale are the fastest-growing fields in technology. This repository aims to develop professional and strong advanced analytics skills to compete in the age of digital and AI. π
π― End-to-end full-stack machine learning from experimental (design + development) to production (deployment + iteration) for iteratively building reliable production-grade AI/ML applications.
- π‘ Agile CRISP-DM for Data Science and Machine Learning
- Cookiecutter Data Science (CCDS) V2: data science tooling and MLOps
- Agile Implementation of CRISP-DM for Data Science and Machine Learning
- βοΈ MLOps
- π» DevOps best practices for developing and deploying machine learning models.
- βοΈΒ BuildΒ anΒ end-to-end machineΒ learningΒ systemΒ byΒ connectingΒ MLOpsΒ componentsΒ suchΒ asΒ tracking,Β testing,Β serving,Β andΒ orchestration.
- π Dev to Prod:
- π Develop robust CI/CD workflows to continuously train and deploy better models in a modular way that integrates with any stack.
- π Scale: ML workloads (data, training, tuning, and serving) are easily scalable, facilitating a quick and reliable transition from development to production without requiring code or infrastructure modifications.
π | β° Deliverables / Tasks Done | π Reference Links |
---|---|---|
π AWS Certified Data Analytics - Specialty (DAS) (Collecting Streaming Data, Data Collection and Getting Data, Amazon Elastic Map Reduce (EMR), Using Redshift & Redshift Maintenance & Operations, AWS Glue, Athena, and QuickSight, ElasticSearch, AWS Security Services) β | A Cloud Guru - DAS & ACG Practice Exam & UDemy Practice Exam | |
02 | π AWS Certified Machine Learning - Specialty (MLS-C01) (Data Preparation, Data Analysis and Visualization, Modeling, Algorithms, Evaluation and Optimization, Implementation and Operations) βοΈ | A Cloud Guru - MLS-C01 & ACG Practice Exam & UDemy Practice Exam |
π Reproducible Local Development for Data Science and Machine Learning projects | Data Science | |
04 | π¨βπ» Analytics-Experience Project: Time Series Forecasting & Machine Learning Prediction | Analytics-Experience Project |
05 | π MLOps | MLOps |
06 | πΉ Analytics Dashboard: Data Insights & Visual Analytics | Visual Analytics |
07 | π Scalable MLOps MLOps at Production-grade Scale | Scalable MLOps |
π Production-grade project structure for successful data-science or machine-learning projects π
π End-to-end Data Science and Advanced Analytics Experience π
βββ Makefile <- Makefile with convenience commands like `make data` or `make train`
βββ README.md π€ Explain your project and its structure for better collaboration.
βββ config/
β βββ logging.config.ini
βββ data π Where all your raw and processed data files are stored.
β βββ external <- Data from third-party sources.
β βββ interim <- Intermediate data that has been transformed.
β βββ processed <- The final, canonical data sets for modeling.
β βββ raw <- The original, unprocessed, immutable data dump.
β
βββ docs π A default docusaurus | mkdocs project; see docusaurus.io | mkdocs.org for details
β
βββ models π§ Store your trained and serialized models for easy access and versioning.
β
βββ notebooks π» Jupyter notebooks for exploration and visualization.
β βββ data_exploration.ipynb
β βββ data_preprocessing.ipynb
β βββ model_training.ipynb
β βββ model_evaluation.ipynb
β
βββ pyproject.toml <- Project configuration file with package metadata for analytics
β and configuration for tools like black
β
βββ references <- Data dictionaries, manuals, and all other explanatory materials.
β
βββ reports π Generated analysis (reports, charts, and plots) as HTML, PDF, LaTeX.
β βββ figures <- Generated graphics and figures to be used in reporting
β
βββ requirements.txt π The requirements file for reproducing the analysis environment, for easy environment setup.
β
βββ setup.cfg <- Configuration file for flake8
β
βββ src πΎ Source code for data processing, feature engineering, and model training.
β βββ data/
β β βββ data_preprocessing.py
β βββ features/
β β βββ feature_engineering.py
β βββ models/
β β βββ model.py
β βββ utils/
β βββ helper_functions.py
βββ tests/
β βββ test_data_preprocessing.py
β βββ test_feature_engineering.py
β βββ test_model.py
βββ setup.py π A Python script to make the project installable.
βββ Dockerfile
βββ docker-compose.yml
βββ .gitignore
βββ analytics 𧩠Source code for use in this project.
β
βββ __init__.py <- Makes analytics a Python module
β
βββ data <- Scripts to download, preprocess, or generate data
β βββ make_dataset.py
β
βββ features <- Scripts to turn raw data into features for modeling
β βββ build_features.py
β
βββ models <- Scripts to train models and then use trained models to make predictions.
β βββ predict_model.py
β βββ train_model.py
β
βββ visualization <- Scripts to create exploratory and results-oriented visualizations
βββ visualize.py