-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #6 from plbalmeida/develop
docs(*): README.md updated, exercises.md created
- Loading branch information
Showing
2 changed files
with
111 additions
and
59 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,76 +1,57 @@ | ||
# Practical MLOps book exercises | ||
https://paiml.com/docs/home/books/practical-mlops/ | ||
# Practical MLOps Book Exercises | ||
|
||
## Overview | ||
|
||
This repository contains **chapter 1** exercises files and resources for the "Practical MLOps" book. These exercises provide practical examples of MLOps concepts. | ||
|
||
<p align="center"> | ||
<img src="https://learning.oreilly.com/library/cover/9781098103002/250w/" alt="Descrição da imagem"> | ||
</p> | ||
|
||
## Chapter 1 | ||
|
||
### Exercises | ||
|
||
* Create a new GitHub repository with necessary Python scaffolding using a Make file, linting, and testing. Then, perform additional steps such as code formatting in your Makefile. | ||
* Using GitHub Actions, test a GitHub project with two or more Python versions. | ||
* Using a cloud native build server (AWS Code Build, GCP CloudBuild, or Azure DevOps Pipelines), perform continuous integration for your project. | ||
* Containerize a GitHub project by integrating a Dockerfile and automatically registering new containers to a Container Registry. | ||
* Create a simple load test for your application using a load test framework such as locust or loader io and automatically run this test when you push changes to a staging branch. | ||
|
||
### Critical Thinking Discussion Questions | ||
|
||
> What problems does a continuous integration (CI) system solve? | ||
A Continuous Integration (CI) system solves several problems: | ||
|
||
1. **Integration Conflicts**: By frequently merging code changes into a shared repository, CI reduces integration conflicts and makes them easier to handle. | ||
2. **Early Bug Detection**: Regularly building and testing code helps in identifying and fixing bugs early in the development cycle. | ||
3. **Automated Testing**: CI automates the testing process, ensuring that new code changes do not break existing functionality. | ||
4. **Faster Release Cycles**: With automated testing and early bug detection, CI enables faster and more reliable release cycles. | ||
5. **Increased Visibility and Communication**: Continuous integration fosters better communication among team members and increases visibility into the development process. | ||
|
||
> Why is a CI system an essential part of both a SaaS software product and an ML system? | ||
CI systems are crucial for both SaaS software products and ML systems because: | ||
|
||
1. **Rapid Development and Deployment**: In the fast-paced SaaS environment, CI enables quick iteration and deployment of new features and fixes. | ||
2. **Quality Assurance**: For ML systems, CI ensures that changes in models or algorithms do not degrade performance or accuracy. | ||
3. **Experimentation and Adaptability**: Both fields require constant experimentation, and CI allows for safely testing and rolling out changes. | ||
4. **Scalability**: CI supports scalability by automating the build and deployment processes, a key requirement for SaaS and ML systems. | ||
5. **Compliance and Security**: Regular integration and testing can help in adhering to compliance standards and maintaining security, especially important in SaaS and ML applications. | ||
## Repository Structure | ||
|
||
> Why are cloud platforms the ideal target for analytics applications? How do data engineering and DataOps assist in building cloud-based analytics applications? | ||
``` | ||
. | ||
├── Dockerfile # Docker configuration file | ||
├── Makefile # Makefile for automating commands | ||
├── README.md # This README file | ||
├── buildspec.yml # Build specifications for CI/CD | ||
├── exercises.md # Markdown file with exercise descriptions | ||
├── infra # Infrastructure as code (IaC) files | ||
│ ├── main-infrastructure # Main infrastructure setup | ||
│ └── terraform-backend-setup # Terraform backend configuration | ||
├── locustfile.py # Locust file for load testing | ||
├── requirements.txt # Python dependencies | ||
├── src # Source code directory | ||
│ └── main.py # Main application file | ||
└── tests # Test scripts | ||
└── test_main.py # Test for main.py | ||
``` | ||
|
||
Cloud platforms are ideal for analytics applications due to: | ||
## Getting Started | ||
|
||
1. **Scalability**: Cloud platforms can scale resources as per the analytics workload. | ||
2. **Cost-Effectiveness**: They offer a pay-as-you-go model, reducing upfront investment. | ||
3. **High Availability**: Cloud platforms provide robust and reliable infrastructure. | ||
4. **Diverse Toolsets**: They offer a wide range of analytics and machine learning tools. | ||
### Prerequisites | ||
|
||
Data Engineering and DataOps play a crucial role: | ||
- Docker | ||
- Terraform | ||
- Python 3.x | ||
|
||
1. **Data Preparation**: They help in cleaning, transforming, and structuring data for analysis. | ||
2. **Pipeline Development**: Building robust data pipelines for efficient data flow. | ||
3. **Automation**: Automating data workflows to ensure timely data availability. | ||
4. **Performance Optimization**: They optimize data storage and processing for cost and efficiency. | ||
### Installation | ||
|
||
> How does deep learning benefit from the cloud? Is deep learning feasible without cloud computing? | ||
1. Clone the repository: | ||
|
||
Deep learning benefits significantly from cloud computing: | ||
```bash | ||
git clone https://github.com/plbalmeida/practical-mlops-ch1-exercises.git | ||
``` | ||
> Remember to replace variables on the `variables.tf` files and also parameters on `infra/main-infrastructure/backend.tf` with your own AWS ARNs, ID accounts etc. | ||
1. **Resource Intensive**: Deep learning requires substantial computational resources, which cloud platforms readily provide. | ||
2. **Flexibility**: The cloud offers flexibility to choose different types and amounts of resources as needed. | ||
3. **Access to Advanced Tools**: Cloud platforms provide access to the latest deep learning frameworks and tools. | ||
2. Set up your AWS keys for GitHub Actions (https://docs.github.com/pt/actions/security-guides/using-secrets-in-github-actions) | ||
|
||
Deep learning without cloud computing: | ||
3. Before deploying the infrastructure and running the CI/CD pipeline for main resources, first deploy resources for the Terraform file state in an S3 backend merging to the `terraform-backend-setup` branch, after merge to the `main` branch, the AWS resources required on the exercise will be created. To destroy AWS resources built by `workflow.yml` merge to `destroy-infra` branch. | ||
|
||
1. **Feasible but Challenging**: It's possible but may require significant investment in hardware and infrastructure. | ||
2. **Limited Scalability**: Scaling up resources without the cloud can be difficult and expensive. | ||
## Acknowledgments | ||
|
||
> Explain what MLOps is and how it can enhance a machine learning engineering project. | ||
- Practical MLOps book authors Noah Gift and Alfredo Deza. | ||
- O'Reilly Media. | ||
|
||
MLOps, or Machine Learning Operations, refers to: | ||
1. **Best Practices**: Incorporating best practices in software development into the machine learning lifecycle. | ||
2. **Automation**: Automating the process of training, validating, deploying, and monitoring ML models. | ||
3. **Continuous Integration and Delivery**: Applying CI/CD principles to ML to ensure regular and reliable deployment of models. | ||
4. **Collaboration**: Facilitating better collaboration between data scientists, engineers, and operations teams. | ||
5. **Enhancement**: MLOps enhances ML projects by improving model quality, speeding up deployment, and ensuring more reliable and efficient operations. | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# Practical MLOps book exercises | ||
|
||
## Chapter 1 | ||
|
||
### Exercises | ||
|
||
* Create a new GitHub repository with necessary Python scaffolding using a Make file, linting, and testing. Then, perform additional steps such as code formatting in your Makefile. | ||
* Using GitHub Actions, test a GitHub project with two or more Python versions. | ||
* Using a cloud native build server (AWS Code Build, GCP CloudBuild, or Azure DevOps Pipelines), perform continuous integration for your project. | ||
* Containerize a GitHub project by integrating a Dockerfile and automatically registering new containers to a Container Registry. | ||
* Create a simple load test for your application using a load test framework such as locust or loader io and automatically run this test when you push changes to a staging branch. | ||
|
||
### Critical Thinking Discussion Questions | ||
|
||
> What problems does a continuous integration (CI) system solve? | ||
A Continuous Integration (CI) system solves several problems: | ||
|
||
1. **Integration Conflicts**: By frequently merging code changes into a shared repository, CI reduces integration conflicts and makes them easier to handle. | ||
2. **Early Bug Detection**: Regularly building and testing code helps in identifying and fixing bugs early in the development cycle. | ||
3. **Automated Testing**: CI automates the testing process, ensuring that new code changes do not break existing functionality. | ||
4. **Faster Release Cycles**: With automated testing and early bug detection, CI enables faster and more reliable release cycles. | ||
5. **Increased Visibility and Communication**: Continuous integration fosters better communication among team members and increases visibility into the development process. | ||
|
||
> Why is a CI system an essential part of both a SaaS software product and an ML system? | ||
CI systems are crucial for both SaaS software products and ML systems because: | ||
|
||
1. **Rapid Development and Deployment**: In the fast-paced SaaS environment, CI enables quick iteration and deployment of new features and fixes. | ||
2. **Quality Assurance**: For ML systems, CI ensures that changes in models or algorithms do not degrade performance or accuracy. | ||
3. **Experimentation and Adaptability**: Both fields require constant experimentation, and CI allows for safely testing and rolling out changes. | ||
4. **Scalability**: CI supports scalability by automating the build and deployment processes, a key requirement for SaaS and ML systems. | ||
5. **Compliance and Security**: Regular integration and testing can help in adhering to compliance standards and maintaining security, especially important in SaaS and ML applications. | ||
|
||
> Why are cloud platforms the ideal target for analytics applications? How do data engineering and DataOps assist in building cloud-based analytics applications? | ||
Cloud platforms are ideal for analytics applications due to: | ||
|
||
1. **Scalability**: Cloud platforms can scale resources as per the analytics workload. | ||
2. **Cost-Effectiveness**: They offer a pay-as-you-go model, reducing upfront investment. | ||
3. **High Availability**: Cloud platforms provide robust and reliable infrastructure. | ||
4. **Diverse Toolsets**: They offer a wide range of analytics and machine learning tools. | ||
|
||
Data Engineering and DataOps play a crucial role: | ||
|
||
1. **Data Preparation**: They help in cleaning, transforming, and structuring data for analysis. | ||
2. **Pipeline Development**: Building robust data pipelines for efficient data flow. | ||
3. **Automation**: Automating data workflows to ensure timely data availability. | ||
4. **Performance Optimization**: They optimize data storage and processing for cost and efficiency. | ||
|
||
> How does deep learning benefit from the cloud? Is deep learning feasible without cloud computing? | ||
Deep learning benefits significantly from cloud computing: | ||
|
||
1. **Resource Intensive**: Deep learning requires substantial computational resources, which cloud platforms readily provide. | ||
2. **Flexibility**: The cloud offers flexibility to choose different types and amounts of resources as needed. | ||
3. **Access to Advanced Tools**: Cloud platforms provide access to the latest deep learning frameworks and tools. | ||
|
||
Deep learning without cloud computing: | ||
|
||
1. **Feasible but Challenging**: It's possible but may require significant investment in hardware and infrastructure. | ||
2. **Limited Scalability**: Scaling up resources without the cloud can be difficult and expensive. | ||
|
||
> Explain what MLOps is and how it can enhance a machine learning engineering project. | ||
MLOps, or Machine Learning Operations, refers to: | ||
1. **Best Practices**: Incorporating best practices in software development into the machine learning lifecycle. | ||
2. **Automation**: Automating the process of training, validating, deploying, and monitoring ML models. | ||
3. **Continuous Integration and Delivery**: Applying CI/CD principles to ML to ensure regular and reliable deployment of models. | ||
4. **Collaboration**: Facilitating better collaboration between data scientists, engineers, and operations teams. | ||
5. **Enhancement**: MLOps enhances ML projects by improving model quality, speeding up deployment, and ensuring more reliable and efficient operations. |