It is common for data scientists to be well equipped in languages and packages commonly used for statistical analysis and modeling. It is less common that data scientists are equipped to properly implement those models in production pipelines.
MLStack provides a toolkit for Data Scientists to develop production-level modules in their local development environment.
MLStack provides two toolkits with shared dependencies:
- Conda environment - An Anaconda environment with common ML python libraries
- Kubernetes cluster - A Kubernetes cluster with common ML components
MLStack assumes that you have Docker (19.03), Kubernetes (1.16), and Conda (4.7) installed. Installation instructions are not given as differents operating systems and environments require specific configuration.
MLStack can be installed with the following. Note that the setup will take some time as Docker images are pulled and/or built. So grab a cup of ☕ and relax! (or read logs .. or both)
# Clone into the repository
git clone https://github.com/sebastianvermaas/mlstack.git
cd mlstack
# Create your mlstack conda environment
conda env create -f conda.yml
conda activate mlstack
# Install the Python library and CLI
pip install -e .
# Setup command for building Docker images
mlstack setup
The mlstack build
command builds the Docker images in the build directory. Images that require additional python requirements can be built with the --requirements
flag. For example:
mlstack build --image airflow --requirements requirements.txt
The mlstack create
command creates a Kubernetes cluster specified in the manifests.
mlstack create
mlstack create --manifest spark --volume-mount mymount --host-path path/to/my/host
The mlstack delete
command deletes a Kubernetes manifest.
mlstack close
mlstack close --manifest spark
mlstack create bucket mybucket
mlstack upload data --bucket mybucket