Skip to content

bhnum/mlops-threats

Repository files navigation

Violent threat detection application

An application for detection of violent threats in online discussions and forums. This project includes

  • Training workflow for a multinomial Naive Bayes classifier
  • Model test validation and tracking
  • Prediction and monitoring APIs + Swagger documentation
  • Simple web user interface
  • Logging and monitoring dashboard

Documentations are available here. This repository has been created using the MLOps Platform Skeleton here

Overview

Project structure

The full setup consists of three steps:

  1. Training - A training script trains a model for the Threat dataset with sklearn, training is orchestrated by prefect and the models metrics and artifacts (the actual models) are uploaded to mlflow.

  2. Serving - The model is pulled and FastAPI delivers the prediction, a streamlit app serves as the user interface.

  3. Monitoring - Metrics about the API usage/performance are pushed to Prometheus/Grafana and shown in a dashboard.

The individual services are packaged as docker containers and setup with docker compose.

How to use

Prerequisite: Install Docker (Windows: Docker Desktop)

Download repository from GitHub

git clone https://github.com/dpleus/mlops.git

Start docker compose (from project folder)

docker composer up

Access individual services

  • Prefect http://localhost:4200
  • mlflow http://localhost:5000
  • FastAPI (to test) http://localhost:8086/docs
  • Streamlit UI http://localhost:8501
  • Grafana Dashboard http://localhost:3000 Login: admin/admin

Create example model

Run deployment in Prefect UI, deploy model artifacts in mlflow, tag it with "production" in mflow.

Note: The UI will only work if there is one "production" model in mlflow.

Services

1) Docker and docker compose

docker-compose.yaml contains the definitions for all services. For every service it contains the docker image (either through build if based on a Dockerfile, or through image if a remote image). Also it opens the relevant ports within your "docker compose network", so that the services can communicate with each other. Additionally, a common volume for all containers that use mlflow is created and mounted into /mlruns. For Prometheus/Grafana a few configuration files are also mounted.

To initialize all services the command docker compose up can be used from the project folder.

2+3) Training script and prefect

The training script and prefect (for orchestration) are packaged into one service.

The training script is placed under training/model_training.py.

The train function is wrapped into an mlflow flow operator. Also, it uses mlflow autolog.

prefetc is an orchestration tool and can therefore be used to schedule, monitor and organize jobs.

Based on the training script, a prefect deployment file train-deployment.yaml is generated using the following command:

prefect deployment build training/model_training.py:train

The Dockerfile ultimately glues these components together. It

  1. Creates folders
  2. Installs requirements.txt
  3. Sets the PREFECT_API_URL and MLFLOW_TRACKING_URI*
  4. Starts the server, pushes the deployment and starts an agent**

*Using docker you can refer to the containers ip using host.docker.internal and refer to the other services with their docker compose name, e.g http://mlflow:5000

**In this project the prefect server and the agent (who executes the scripts) are on one container.

4) FastAPI

FastAPI is a framework for high-performance API. In this project I implemented a /predict endpoint. If that endpoint is queried it will download the latest model from mlflow and output the prediction. Additionally, prometheus_fastapi_instrumentator scrapes events and sends them to Prometheus.

Please note: Currently the script will fetch the first model that is in production. It won't show any error if there is no model or there are multiple models.

5+6) Prometheus/Grafana

Prometheus open source monitoring system. Grafana is a dashboarding platform. In short, Prometheus receives the data, while Grafana puts a dashboard on top. For this project, I used the provided images and just added a few configuration files:

monitoring/prometheus.yml - Contains configuration to connect Prometheus to FastAPI

monitoring/datasource.yml - Grafana: Datasource configuration

monitoring/dashboard.json - Grafana: Dashboard

This part was heavily inspired by https://github.com/Kludex/fastapi-prometheus-grafana

7) Streamlit

Streamlit is a Python library to rapidly build UIs. The app is very simple and only passes input to the API to retrieve results.

Limitations

Multiple host machines: Kubernetes

This project is meant to be deployed on a single host machine. In practice, you might want to use Kubernetes to deploy it on multiple instances to gain more isolation and scalability. Kompose could be an option to convert your docker compose file to Kubernetes yaml.

Storage on cloud

All artifacts, logs, etc. are saved locally/on docker volumes. In practice, you would save them to the cloud.

Advanced Security

Security - of course. Authentication, SSL encryption, API authentication and what not. Good example using nginx. Example

References

Threat Corpus Dataset

Hammer, H. L., Riegler, M. A., Øvrelid, L. & Veldal, E. (2019). "THREAT: A Large Annotated Corpus for Detection of Violent Threats". 7th IEEE International Workshop on Content-Based Multimedia Indexing.

Wester, A. L., Øvrelid, L., Velldal, E., & Hammer, H. L. (2016). "Threat detection in online discussions". Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis.