AQUATOPE

AQUATOPE is a QoS-and-uncertainty-aware resource manager designed for multi-stage serverless workflows.

Deployment

The container pool scheduler (src/container_pool_scheduler) and container resource manager (src/container_resource_manager) are implemented in Python3. The following dependencies need to be installed: gevent, pytorch, botorch, gpytorch. For the container pool scheduler, run train_lstm_encoder_decoder.py and train_prediction_network.py to train the hybrid model before deployment. The container resource manager updates the resource configurations of registered workflow containers based on workflow_config file.

Benchmarks are located in benchmark directory. The serverless workflows are created with OpenWhisk Composer. We use Locust as the workload generator to create custom-shaped load based on Azure Function Dataset. The Locust scripts are located in locust directory.

By default, OpenWhisk only supports CPU sharing (in proportion to the allocated memory) and pre-defined static container pool configuration. To enable explicit CPU-limit-based resource scheduling, and dynamic adjustment of prewarmed container pool, #4648 and #4790 need to be merged.

Both container pool scheduler and container resource manager utilize owlib to interact with OpenWhisk framework to adjust the container pool and function resource configurations accordingly.

Container Pool Scheduler

Before training the prediction model, we first construct and train the LSTM encoder-decoder to extract latent features from a serverless trace. The model lstm_encoder_decoder will be saved in model_artifacts directory.

python train_lstm_encoder_decoder.py \
    --n_input_steps <default=48> \
    --n_output_steps <default=12> \
    --num_days <number of days in the training dataset, default=7> \
    --num_epochs <default=128> \
    --batch_size <default=128> \
    --learning_rate <default=1e-4> \
    --variational_dropout_p <default=0.25> \
    --trace_id <Function ID in Azure Function Dataset> \
    --dataset_dir <Directory of Azure Function Dataset>

Then, we train a prediction network to forecast the number of active containers in the next time window. The model predict will be saved in model_artifacts directory.

python train_lstm_encoder_decoder.py \
    --n_input_steps <default=48> \
    --n_output_steps <default=1> \
    --num_days <number of days in the training dataset, default=7> \
    --num_epochs <default=128> \
    --batch_size <default=128> \
    --learning_rate <default=1e-3> \
    --dropout_p <default=0.25> \
    --trace_id <Function ID in Azure Function Dataset> \
    --dataset_dir <Directory of Azure Function Dataset>

Finally, the container_pool_scheduler uses the pretrained LSTM encoder-decoder and prediction network to perform full inference and adjust the number of containers in the container pool accordingly.

python scheduler.py \
    --n_input_steps <default=48> \
    --n_output_steps <default=1> \ 
    --workflow_config <workflow configuration file>

Container Resource Manager

Deploy the container resource manager and optimize the resource configurations of target workflows.

python manager.py \
    --n_init <default=10> \
    --n_batch <default=20> \
    --mc_samples <default=32> \
    --batch_size <default=3> \
    --num_restarts <default=10> \
    --raw_samples <default=32> \
    --infeasible_cost <default=0> \
    --anomaly_detection <default=true> \
    --confidence <default=0.95> \
    --workflow_config <workflow configuration file>

Publication

AQUATOPE: QoS-and-Uncertainty-Aware Resource Management for Multi-Stage Serverless Workflows.

@inproceedings{10.1145/3567955.3567960,
author = {Zhou, Zhuangzhuang and Zhang, Yanqi and Delimitrou, Christina},
title = {AQUATOPE: QoS-and-Uncertainty-Aware Resource Management for Multi-Stage Serverless Workflows},
year = {2022},
isbn = {9781450399159},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3567955.3567960},
doi = {10.1145/3567955.3567960},
pages = {1–14},
numpages = {14},
keywords = {resource management, machine learning for systems, quality of service, resource allocation, datacenter, function-as-a-service, serverless computing, Cloud computing, resource efficiency},
location = {Vancouver, BC, Canada},
series = {ASPLOS 2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
benchmarks		benchmarks
composer		composer
helpers		helpers
locust		locust
owlib		owlib
runtimes		runtimes
src		src
utils		utils
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AQUATOPE

Deployment

Container Pool Scheduler

Container Resource Manager

Publication

About

Releases

Packages

Languages

License

zzhou612/aquatope

Folders and files

Latest commit

History

Repository files navigation

AQUATOPE

Deployment

Container Pool Scheduler

Container Resource Manager

Publication

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages