In order to work as intended, the docker-compose stack requires some setup:
-
A docker network named
www
. Use the following command to create it:docker network create www
-
A Traefik service working on the
www
network.Traefik is a service that is capable of routing requests to web sub-domain to services built using docker. We are using it just for this purpose, although it can also perform other tasks.
To create this service, check the file
extra/docker-compose.traefik.yaml
. -
A
.env
file need to be created first. This file is not included in the repository since it is server-dependant.The content is the following:
DOMAIN=<domain of the machine (used only for traefik labels)> CELERY_BROKER_URL=pyamqp://rabbitmq/ CELERY_BACKEND_URL=redis://redis/ CELERY_QUEUE= DATABASE_SCHEMA=mlpdb DATABASE_USER=mlp DATABASE_PASS=mlp DATABASE_HOST=database DATABASE_URL=postgresql://${DATABASE_USER}:${DATABASE_PASS}@${DATABASE_HOST}/${DATABASE_SCHEMA} GRAFANA_ADMIN_PASS=grafana
Remember that these password are written in a non-encripted way. This is not a safe solution.
Then launch the docker through the docker compose, execute the following command from the root directory of this repository:
docker-compose up -d
This proof-of-concept software use synthetic data generated by sampling some distributions. To generate these data, just rund the following command and it will populate the /dataset
folder with TSV (Tab Separated Value) files.
python dataset_generator.py
In order to simulate the use the application from of external users, the script traffic_generator.py
can be used.
Basic command to execute with default parameters is
python traffic_generator.py
Some parameters can be used to control the behavior of the users:
-
--config <path>
is a path to a configuration file. A configuration file is a.tsv
(Tab Separated Value) file that contains all the parameters for theUserData
andUserLabeller
behavior. See the filesconfig/user.tsv
andconfig/user_noise.tsv
for some examples. -
-p
number of parallel thread to run. Each thread will contact the application independently. -
-d
probability to have a response. If set to 1.0, it is certain that there will always be a response. If set to 0.0, the user will never set a response. -
To control the waiting time use the
-tmin
and-tmax
parameters. The number is expressed in seconds. For less than a second use decimals (i.e. 100ms is written as 0.1).-tmin
is the minimum amount of time to wait after a request to the application.-tmax
maximum amount of time to wait after a request to the application. The wait is randomly choosed between the-tmin
and-tmax
values. Higher values mean a slow generation of new cdata. Bigger is the difference between these two parameters and higher is the variance in the waiting time.
To develop this application, a Python virutal environmnet is highly recommended. If a development machine with Docker is not available, it is possible to use the three requirements.txt
file to create a fully working environment:
requirements.api.txt
contains all the packages for the API service,requirements.worker.txt
contains all the packages for the Celery worker service,requirements.txt
contains extra packages and utilities required by scripts or for the development.
To create a virtual environment using the python-venv
package, use the following command:
python -m venv MLPenv
Then remember to activate the environment before launching the scripts:
source ./MLPenv/bin/activate
- SQL (Relational) Databases
- Python ML in Production - Part 1: FastAPI + Celery with Docker
- First Steps with Celery
- Next Steps
- Serving ML Models in Production with FastAPI and Celery
- Multi-stage builds #2: Python specifics
- SQLAlchemy ORM — a more “Pythonic” way of interacting with your database
- Events: startup - shutdown
- Overview | Prometheus
- Instrumentation | Prometheus
- prometheus/client_python | GitHub
- kozhushman/prometheusrock | GitHub
This software was build as proof-of-concept and as a support material for the course Machine Learning in Production.
It is not intended to be used in a real production system, although some state-of-the-art best practice has been followed to implement it.