A curated list of awesome MLOps tools.
Inspired by awesome-python.
- Awesome MLOps
- AutoML
- CI/CD for Machine Learning
- Cron Job Monitoring
- Data Catalog
- Data Exploration
- Data Management
- Data Processing
- Data Validation
- Data Visualization
- Feature Store
- Hyperparameter Tuning
- Knowledge Sharing
- Machine Learning Platform
- Model Fairness and Privacy
- Model Interpretability
- Model Lifecycle
- Model Serving
- Optimization Tools
- Simplification Tools
- Visual Analysis and Debugging
- Workflow Tools
- Resources
- Contributing
Tools for performing AutoML.
- AutoGluon - Automates machine learning tasks enabling you to easily achieve strong predictive performance.
- AutoKeras - AutoKeras goal is to make machine learning accessible for everyone.
- AutoPyTorch - Automatic architecture search and hyperparameter optimization for PyTorch.
- AutoSKLearn - Automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.
- FLAML - Finds accurate ML models automatically, efficiently and economically.
- H2O AutoML - Automates ML workflow, which includes automatic training and tuning of models.
- MindsDB - AI layer for databases that allows you to effortlessly develop, train and deploy ML models.
- MLBox - MLBox is a powerful Automated Machine Learning python library.
- Model Search - Framework that implements AutoML algorithms for model architecture search at scale.
- NNI - An open source AutoML toolkit for automate machine learning lifecycle.
Tools for performing CI/CD for Machine Learning.
- ClearML - Auto-Magical CI/CD to streamline your ML workflow.
- CML - Open-source library for implementing CI/CD in machine learning projects.
Tools for monitoring cron jobs (recurring jobs).
- Cronitor - Monitor any cron job or scheduled task.
- HealthchecksIO - Simple and effective cron job monitoring.
Tools for data cataloging.
- Amundsen - Data discovery and metadata engine for improving the productivity when interacting with data.
- Apache Atlas - Provides open metadata management and governance capabilities to build a data catalog.
- CKAN - Open-source DMS (data management system) for powering data hubs and data portals.
- DataHub - LinkedIn's generalized metadata search & discovery tool.
- Magda - A federated, open-source data catalog for all your big data and small data.
- Metacat - Unified metadata exploration API service for Hive, RDS, Teradata, Redshift, S3 and Cassandra.
- OpenMetadata - A Single place to discover, collaborate and get your data right.
Tools for performing data exploration.
- Apache Zeppelin - Enables data-driven, interactive data analytics and collaborative documents.
- BambooLib - An intuitive GUI for Pandas DataFrames.
- Google Colab - Hosted Jupyter notebook service that requires no setup to use.
- Jupyter Notebook - Web-based notebook environment for interactive computing.
- JupyterLab - The next-generation user interface for Project Jupyter.
- Jupytext - Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts.
- Polynote - The polyglot notebook with first-class Scala support.
Tools for performing data management.
- Arrikto - Dead simple, ultra fast storage for the hybrid Kubernetes world.
- BlazingSQL - A lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
- Delta Lake - Storage layer that brings scalable, ACID transactions to Apache Spark and other engines.
- Dolt - SQL database that you can fork, clone, branch, merge, push and pull just like a git repository.
- Dud - A lightweight CLI tool for versioning data alongside source code and building data pipelines.
- DVC - Management and versioning of datasets and machine learning models.
- Git LFS - An open source Git extension for versioning large files.
- Hub - A dataset format for creating, storing, and collaborating on AI datasets of any size.
- Intake - A lightweight set of tools for loading and sharing data in data science projects.
- lakeFS - Repeatable, atomic and versioned data lake on top of object storage.
- Marquez - Collect, aggregate, and visualize a data ecosystem's metadata.
- Milvus - An open source embedding vector similarity search engine powered by Faiss, NMSLIB and Annoy.
- Pinecone - Managed and distributed vector similarity search used with a lightweight SDK.
- Quilt - A self-organizing data hub with S3 support.
Tools related to data processing and data pipelines.
- Airflow - Platform to programmatically author, schedule, and monitor workflows.
- Azkaban - Batch workflow job scheduler created at LinkedIn to run Hadoop jobs.
- Dagster - A data orchestrator for machine learning, analytics, and ETL.
- Hadoop - Framework that allows for the distributed processing of large data sets across clusters.
- Spark - Unified analytics engine for large-scale data processing.
Tools related to data validation.
- Cerberus - Lightweight, extensible data validation library for Python.
- Great Expectations - A Python data validation framework that allows to test your data against datasets.
- JSON Schema - A vocabulary that allows you to annotate and validate JSON documents.
- TFDV - An library for exploring and validating machine learning data.
Tools for data visualization, reports and dashboards.
- Count - SQL/drag-and-drop querying and visualisation tool based on notebooks.
- Dash - Analytical Web Apps for Python, R, Julia, and Jupyter.
- Data Studio - Reporting solution for power users who want to go beyond the data and dashboards of GA.
- Facets - Visualizations for understanding and analyzing machine learning datasets.
- Lux - Fast and easy data exploration by automating the visualization and data analysis process.
- Metabase - The simplest, fastest way to get business intelligence and analytics to everyone.
- Redash - Connect to any data source, easily visualize, dashboard and share your data.
- Superset - Modern, enterprise-ready business intelligence web application.
- Tableau - Powerful and fastest growing data visualization tool used in the business intelligence industry.
Feature store tools for data serving.
- Butterfree - A tool for building feature stores. Transform your raw data into beautiful features.
- ByteHub - An easy-to-use feature store. Optimized for time-series data.
- Feast - End-to-end open source feature store for machine learning.
Tools and libraries to perform hyperparameter tuning.
- Advisor - Open-source implementation of Google Vizier for hyper parameters tuning.
- Hyperas - A very simple wrapper for convenient hyperparameter optimization.
- Hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python.
- Katib - Kubernetes-based system for hyperparameter tuning and neural architecture search.
- KerasTuner - Easy-to-use, scalable hyperparameter optimization framework.
- Optuna - Open source hyperparameter optimization framework to automate hyperparameter search.
- Scikit Optimize - Simple and efficient library to minimize expensive and noisy black-box functions.
- Talos - Hyperparameter Optimization for TensorFlow, Keras and PyTorch.
- Tune - Python library for experiment execution and hyperparameter tuning at any scale.
Tools for sharing knowledge to the entire team/company.
- Knowledge Repo - Knowledge sharing platform for data scientists and other technical professions.
- Kyso - One place for data insights so your entire team can learn from your data.
Complete machine learning platform solutions.
- aiWARE - aiWARE helps MLOps teams evaluate, deploy, integrate, scale & monitor ML models.
- Algorithmia - Securely govern your machine learning operations with a healthy ML lifecycle.
- Allegro AI - Transform ML/DL research into products. Faster.
- Bodywork - Deploys machine learning projects developed in Python, to Kubernetes.
- CNVRG - An end-to-end machine learning platform to build and deploy AI models at scale.
- DAGsHub - A platform built on open source tools for data, model and pipeline management.
- Dataiku - Platform democratizing access to data and enabling enterprises to build their own path to AI.
- DataRobot - AI platform that democratizes data science and automates the end-to-end ML at scale.
- Domino - One place for your data science tools, apps, results, models, and knowledge.
- FedML - Simplifies the workflow of federated learning anywhere at any scale.
- Gradient - Multicloud CI/CD and MLOps platform for machine learning teams.
- H2O - Open source leader in AI with a mission to democratize AI for everyone.
- Hopsworks - Open-source platform for developing and operating machine learning models at scale.
- Iguazio - Data science platform that automates MLOps with end-to-end machine learning pipelines.
- Knime - Create and productionize data science using one easy and intuitive environment.
- Kubeflow - Making deployments of ML workflows on Kubernetes simple, portable and scalable.
- LynxKite - A complete graph data science platform for very large graphs and other datasets.
- ML Workspace - All-in-one web-based IDE specialized for machine learning and data science.
- MLReef - Open source MLOps platform that helps you collaborate, reproduce and share your ML work.
- Modzy - AI platform and marketplace offering scalable, secure, and ready-to-deploy AI models.
- Neu.ro - MLOps platform that integrates open-source and proprietary tools into client-oriented systems.
- Pachyderm - Combines data lineage with end-to-end pipelines on Kubernetes, engineered for the enterprise.
- Polyaxon - A platform for reproducible and scalable machine learning and deep learning on kubernetes.
- Sagemaker - Fully managed service that provides the ability to build, train, and deploy ML models quickly.
- Valohai - Takes you from POC to production while managing the whole model lifecycle.
Tools for performing model fairness and privacy in production.
- AIF360 - A comprehensive set of fairness metrics for datasets and machine learning models.
- Fairlearn - A Python package to assess and improve fairness of machine learning models.
- Opacus - A library that enables training PyTorch models with differential privacy.
- TensorFlow Privacy - Library for training machine learning models with privacy for training data.
Tools for performing model interpretability/explainability.
- Alibi - Open-source Python library enabling ML model inspection and interpretation.
- Captum - Model interpretability and understanding library for PyTorch.
- ELI5 - Python package which helps to debug machine learning classifiers and explain their predictions.
- InterpretML - A toolkit to help understand models and enable responsible machine learning.
- LIME - Explaining the predictions of any machine learning classifier.
- Lucid - Collection of infrastructure and tools for research in neural network interpretability.
- SAGE - For calculating global feature importance using Shapley values.
- SHAP - A game theoretic approach to explain the output of any machine learning model.
- Skater - Unified framework to enable Model Interpretation for all forms of model.
Tools for managing model lifecycle (tracking experiments, parameters and metrics).
- Aim - A super-easy way to record, search and compare 1000s of ML training runs.
- Comet - Track your datasets, code changes, experimentation history, and models.
- Guild AI - Open source experiment tracking, pipeline automation, and hyperparameter tuning.
- Keepsake - Version control for machine learning with support to Amazon S3 and Google Cloud Storage.
- Losswise - Makes it easy to track the progress of a machine learning project.
- Mlflow - Open source platform for the machine learning lifecycle.
- ModelDB - Open source ML model versioning, metadata, and experiment management.
- Neptune AI - The most lightweight experiment management tool that fits any workflow.
- Replicate - Library that uploads files and metadata (like hyperparameters) to S3 or GCS.
- Sacred - A tool to help you configure, organize, log and reproduce experiments.
- Weights and Biases - A tool for visualizing and tracking your machine learning experiments.
Tools for serving models in production.
- Banana - Host your ML inference code on serverless GPUs and integrate it into your app with one line of code.
- BentoML - Open-source platform for high-performance ML model serving.
- BudgetML - Deploy a ML inference service on a budget in less than 10 lines of code.
- Cortex - Machine learning model serving infrastructure.
- Gradio - Create customizable UI components around your models.
- GraphPipe - Machine learning model deployment made simple.
- Hydrosphere - Platform for deploying your Machine Learning to production.
- KFServing - Kubernetes custom resource definition for serving ML models on arbitrary frameworks.
- Merlin - A platform for deploying and serving machine learning models.
- Opyrator - Turns your ML code into microservices with web API, interactive GUI, and more.
- PredictionIO - Event collection, deployment of algorithms, evaluation, querying predictive results via APIs.
- Rune - Provides containers to encapsulate and deploy EdgeML pipelines and applications.
- Seldon - Take your ML projects from POC to production with maximum efficiency and minimal risk.
- Streamlit - Lets you create apps for your ML projects with deceptively simple Python scripts.
- TensorFlow Serving - Flexible, high-performance serving system for ML models, designed for production.
- TorchServe - A flexible and easy to use tool for serving PyTorch models.
- Triton Inference Server - Provides an optimized cloud and edge inferencing solution.
- Vespa - Store, search, organize and make machine-learned inferences over big data at serving time.
Optimization tools related to model scalability in production.
- Accelerate - A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision.
- Dask - Provides advanced parallelism for analytics, enabling performance at scale for the tools you love.
- DeepSpeed - Deep learning optimization library that makes distributed training easy, efficient, and effective.
- Fiber - Python distributed computing library for modern computer clusters.
- Horovod - Distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
- Mahout - Distributed linear algebra framework and mathematically expressive Scala DSL.
- MLlib - Apache Spark's scalable machine learning library.
- Modin - Speed up your Pandas workflows by changing a single line of code.
- Petastorm - Enables single machine or distributed training and evaluation of deep learning models.
- Rapids - Gives the ability to execute end-to-end data science and analytics pipelines entirely on GPUs.
- Ray - Fast and simple framework for building and running distributed applications.
- Singa - Apache top level project, focusing on distributed training of DL and ML models.
- Tpot - Automated ML tool that optimizes machine learning pipelines using genetic programming.
Tools related to machine learning simplification and standardization.
- Hermione - Help Data Scientists on setting up more organized codes, in a quicker and simpler way.
- Hydra - A framework for elegantly configuring complex applications.
- Koalas - Pandas API on Apache Spark. Makes data scientists more productive when interacting with big data.
- Ludwig - Allows users to train and test deep learning models without the need to write code.
- MLNotify - No need to keep checking your training, just one import line and you'll know the second it's done.
- PyCaret - Open source, low-code machine learning library in Python.
- Sagify - A CLI utility to train and deploy ML/DL models on AWS SageMaker.
- Soopervisor - Export ML projects to Kubernetes (Argo workflows), Airflow, AWS Batch, and SLURM.
- Soorgeon - Convert monolithic Jupyter notebooks into maintainable pipelines.
- TrainGenerator - A web app to generate template code for machine learning.
- Turi Create - Simplifies the development of custom machine learning models.
Tools for performing visual analysis and debugging of ML/DL models.
- Arize - An end-to-end ML observability and model monitoring platform.
- Evidently - Interactive reports to analyze ML models during validation or production monitoring.
- Fiddler - Monitor, explain, and analyze your AI in production.
- Manifold - A model-agnostic visual debugging tool for machine learning.
- Netron - Visualizer for neural network, deep learning, and machine learning models.
- Superwise - Fully automated, enterprise-grade model observability in a self-service SaaS platform.
- Whylogs - The open source standard for data logging. Enables ML monitoring and observability.
- Yellowbrick - Visual analysis and diagnostic tools to facilitate machine learning model selection.
Tools and frameworks to create workflows or pipelines in the machine learning context.
- Argo - Open source container-native workflow engine for orchestrating parallel jobs on Kubernetes.
- Automate Studio - Rapidly build & deploy AI-powered workflows.
- Couler - Unified interface for constructing and managing workflows on different workflow engines.
- Flyte - Easy to create concurrent, scalable, and maintainable workflows for machine learning.
- Kale - Aims at simplifying the Data Science experience of deploying Kubeflow Pipelines workflows.
- Kedro - Library that implements software engineering best-practice for data and ML pipelines.
- Luigi - Python module that helps you build complex pipelines of batch jobs.
- Metaflow - Human-friendly lib that helps scientists and engineers build and manage data science projects.
- MLRun - Generic mechanism for data scientists to build, run, and monitor ML tasks and pipelines.
- Ploomber - Write maintainable, production-ready pipelines. Develop locally, deploy to the cloud.
- Prefect - A workflow management system, designed for modern infrastructure.
- ZenML - An extensible open-source MLOps framework to create reproducible pipelines.
Where to discover new tools and discuss about existing ones.
- A Tour of End-to-End Machine Learning Platforms (Databaseline)
- Continuous Delivery for Machine Learning (Martin Fowler)
- Delivering on the Vision of MLOps: A maturity-based approach (GigaOm)
- MLOps: Continuous delivery and automation pipelines in machine learning (Google)
- MLOps: Machine Learning as an Engineering Discipline (Medium)
- Rules of Machine Learning: Best Practices for ML Engineering (Google)
- The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction (Google)
- What Is MLOps? (NVIDIA)
- Beginning MLOps with MLFlow (Apress)
- Building Machine Learning Pipelines (O'Reilly)
- Building Machine Learning Powered Applications (O'Reilly)
- Engineering MLOps (Packt)
- Introducing MLOps (O'Reilly)
- Kubeflow for Machine Learning (O'Reilly)
- Kubeflow Operations Guide (O'Reilly)
- Machine Learning Design Patterns (O'Reilly)
- Machine Learning Engineering in Action (Manning)
- ML Ops: Operationalizing Data Science (O'Reilly)
- MLOps Engineering at Scale (Manning)
- Practical MLOps (O'Reilly)
- apply() - The ML data engineering conference
- MLOps Conference - Keynotes and Panels
- MLOps World: Machine Learning in Production Conference
- Stanford MLSys Seminar Series
- Applied ML
- Awesome AutoML Papers
- Awesome AutoML
- Awesome Data Science
- Awesome DataOps
- Awesome Deep Learning
- Awesome Game Datasets (includes AI content)
- Awesome Machine Learning
- Awesome MLOps
- Awesome Production Machine Learning
- Awesome Python
- Deep Learning in Production
- Kubernetes Podcast from Google
- Machine Learning โ Software Engineering Daily
- MLOps.community
- This Week in Machine Learning & AI
All contributions are welcome! Please take a look at the contribution guidelines first.