Project of mlops_zoomcamp https://github.com/DataTalksClub/mlops-zoomcamp
Your customer is a company which sells minifigures and experiences a high number of returns. Returned minifigures might have an undamaged packaging. Therefore, all minifigures are firstly put in one large box. Your task is now to classify these minifigures to enable the customer to pack the minifigures in the correct new packaging.
The dataset contains images from 28 minifigures with more than 300 images. The images were taken in different minifigure poses and environments. The label per image is the name of the minifigure. Please find the dataset at kaggle: https://www.kaggle.com/datasets/ihelon/lego-minifigures-classification
Here is a sample of it's content including the labels.
-
Please get an AWS S3 bucket to store the mlflow artifacts
-
Run installation of commit-hooks, python packages and environment variables with
- prepare setup installation
sudo apt install make
-
sudo apt install make-guile
-
make setup
- adapt values according to your setup in .env
nano .env
- enter AWS credentials
- enter AWS bucket name
- prepare setup installation
-
Get data from kaggle
- download
- download with script
- follow https://www.kaggle.com/general/74235 to create kaggle API key file ~/.kaggle/kaggle.json
- use script
python src/get_data.py
- download manually at https://www.kaggle.com/datasets/ihelon/lego-minifigures-classification and copy to 'data' folder
- download with script
- convert .csv files as utf8 (e.g. via VSCode on bottom right, save as utf8)
- download
- faster model training: AWS instance with ca. 8 CPU cores (e.g. running Ubuntu)
- AWS PostgreSQL database for mlflow server
- please set your config in the .env file
-
feeling for dataset: src/data_feeling.ipynb
- link jupyter notebook's kernel to this environment with
python -m ipykernel install --user --name=mlops_zoomcamp_homework
- link jupyter notebook's kernel to this environment with
-
start mlflow tracking server and train
-
locally with
- start local docker environment
make train
- start local docker environment
-
(preferred) remotely:
- follow steps in mlflow tracking server section
- adapt TRACKING_SERVER_HOST in train_model.py with your remote AWS instance for tracking config (note: Here two different instances are used)
- edit ~/.aws/config with your aws account settings
- run with:
python src/train_model.py --tracking_server=<YOUR_SERVER>
-
-
select best run and tag model as in "Production" stage
- run notebook [src/get_model_from_registry.jpynb]
- OR: use GUI at e.g. localhost:80 (or your remote address)
-
deployment streaming and batch mode with docker containers
- start docker-compose file in repo root directory with
docker-compose up -d --build
- mlflow registry
- mongo DB
- evidently service
-
docker stop prediction_service
-
python prediction_service_stream/app.py
- (use localhost in following variables: MONGODB_ADDRESS="mongodb://localhost:27017" EVIDENTLY_SERVICE_ADDRESS = os.getenv( "EVIDENTLY_SERVICE", "http://localhost:8085")
- go to prediction_service folder and run
python prediction_service_stream/streaming_send_data.py
- resulting in a terminal output like:
- start docker-compose file in repo root directory with
-
prefect deployment of batch mode
- follow setup steps at mlops zoomcamp notes of orchestration
- start one run of flow on remote/local system:
python src/batch_prefect_flow.py --data_path data/test.csv --output_file outputs/batch_prediction.parquet
- configure deployment with:
prefect deployment create src/batch_prefect_deployment.py
- create Work Queue with deployment
copy UUID of queue
prefect work-queue create training-queue
- start agent to pick up queue
prefect agent start <UUID_QUEUE>
- problem description
- capable of deploying in the cloud
- experiment tracking and model registry
- workflow orchestration with prefect
- model deployment in batch and streaming mode
- basic model monitoring
- best practices
- testing
- unittest
- integration_test
- linter and code formatter used
- makefile
- pre-commit hooks
- CI pipeline
- testing
execute with
make unittests
executes following steps:
- code quality check
- unittests
- docker image build with:
make integration_test
- installation, including aws cloud instance and s3 storage (using python 3.9)
-
sudo apt-get update
-
pip install --upgrade pip
-
pip3 install pipenv
-
sudo apt install awscli
- enter your aws credentials
aws configure
-
sudo install docker-compose
-
- fastai model trainning: https://www.kaggle.com/code/arbazkhan971/lego-minifigures-classification-for-beginner
- dataset: https://www.kaggle.com/datasets/ihelon/lego-minifigures-classification
- MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
- issue starting prefect:
- "AttributeError: module 'typing' has no attribute '_ClassVar'"
->
bash pip uninstall dataclassses
- alembic.util.exc.CommandError: Can't locate revision identified by
->
bash sudo rm ~/.prefect/orion.db
- "AttributeError: module 'typing' has no attribute '_ClassVar'"
->
- monitoring more beautiful
- evidently: reference data
- CD (later)
- terraform
- CD stage for repo in GitHub
- streaming in docker container
- prefect deployment runs are failing, why?
- prefect add: check if new model present is better then old one with performance test over tests data. If yes, mark as production.
- prefect flow only takes the newest one marked as production and deploys it
- monitoring: if accuracy drops below threshold, retrain model on new/more data