Easily deploy an MLflow tracking server with 1 command.
The MLflow tracking server is composed of 4 docker containers:
- MLflow client (runs experiments)
- MLflow server / web interface at
localhost:5555
(receives data from experiments) - MinIO object storage server
minio
(holds artifacts from experiments) - A database to track tabular experimental results, either:
- (and a fifth temporary) MinIO client
mc
(to create initials3://mlflow/
bucket upon startup)
-
Install Docker and ensure you have docker-compose installed. Make sure you have
make
installed as well (andawk
,grep
,curl
,head
, andtail
for the serving example. -
Clone (download) this repository
git clone https://github.com/ml-starter-packs/mlflow-experiment.git
-
cd
into themlflow-experiment
directory -
Build and run the containers with
docker-compose up -d --build
:make
-
Access MLflow UI with http://localhost:5555
-
Watch as runs begin to populate in the
demo
experiment as the script ./examples/main.py executes. (NOTE: most of the HuggingFace models seem to be unsupported onarm64
architectures, so this demo is best run through a machine with anamd64
processor). -
(optional) Access MinIO UI with http://localhost:9000 to see how MLflow artifacts are organized in the S3-compatible object storage (default credentials are
minio
/minio123
).
To stop all containers and remove all volumes (i.e., purge all stored data), run
make clean
To stop all running containers without removing volumes (i.e. you want the state of the application to persist), run
make stop
A complete example that would resemble local usage can be found at ./examples/train-and-serve.sh
and run with
make serve
This demo trains a model using mlflow/mlflow-example under the Default
experiment) and then serves it as an API endpoint.
Give it a set of samples to predict on using curl
with
make post
You can stop serving your model (perhaps if you want to try running the serving demo a second time) with
make stop
Note: you can run ./examples/train-and-serve.sh
locally if you prefer (it is designed as a complete example) but you need to change the URLs to point to your local IP address and reflect that mlflow is exposed on port 5555
(the service runs on 5000
within its container but this is a commonly used port so it is changed to avoid potential conflicts with existing services on your machine). Take note that you may want to omit the --no-conda
flags if you want to use the default behavior of mlflow serve
which leverages Anaconda.
Edit ./examples/main.py
and re-run the experiment service (if you commit your code, the latest git hash will be reflected in MLflow) using docker-compose run nlp
:
make run
When it completes after a few minutes, you will find new results populated in the existing demo
experiment, and a stopped container associated with the run will be visible when running docker ps -a
.
The container associated with the example runs can be removed with
make rm
Note: This instruction is also run by make clean
.
This may be of more relevance to some than others, depending on which container-orchestration client you are using. If you get credential errors from trying to pull the images, it is because your program is not sure what domain name to infer (some private registry or docker's default?).
You can make explicit where you want images that are not prepended with a domain name to come from by setting your docker config file:
cat ~/.docker/config.json
{
"experimental" : "disabled",
"credStore" : "desktop",
"auths" : {
"https://index.docker.io/v1/" : {
}
}
}
Be aware that it may be credStore
or credsStore
depending on your setup.
When using the docker-compose setup here, make clean
will wipe your whole database, which is convenient for testing.
However, you may eventually move to a "real" database (perhaps a managed service) and notice that runs you delete in the MLflow UI are NOT removed from your tables.
To remove runs from your tables, the command resembles the one used to launch the mlflow server:
docker exec -ti mlflow_server bash
DB_HOST=<hosted db>
DB_USER=<username>
DB_PASS=<password>
DB_TYPE=<postgresql or mysql+pymysql>
DB_NAME=<name>
mlflow gc --backend-store-uri --backend-store-uri ${DB_TYPE}://${DB_USER}:${DB_PASS}@${DB_HOST}/${DB_NAME}
For neon.tech, note that you need to pass extra arguments to your DB_NAME
(note project-id
is not the same thing as the "project name":
DB_NAME=<db-name>?sslmode=require&options=project%3D<project-id>
(alternatively, leave options
off if your project-id is used as your subdomain when specifying DB_HOST
)