Skip to content
/ ckptman Public

ckptman, a tool to automate the checkpointing in spot instances

License

Notifications You must be signed in to change notification settings

amcaar/ckptman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ckptman

ckptman is a tool that automates checkpointing in the jobs running on spot instances in order to save as much as job process and money as possible. From now, it implements two checkpointing algorithms:

  • Hourly Checkpointing algorithm: checkpoints are taken just prior to the beginning of next instance hour. Since Amazon is not charging any partial hour, this scheme will save as much tasks as the user is paying.
  • Threshold Checkpointing algorithm: checkpoints are taken when a rising in the price of the spot instance is observed inside an interval. The interval is a % of the price the user has determined.

It is created to work with Amazon Spot Instances , BLCR checkpointing tool and SLURM as local resource management system, inside the EC3 cluster deployment tool.

Usage

First, configure the algorithm that you want to apply for the checkpointing (HOUR or THRESHOLD). You can configure also the parameters associated with the algorithm you have chosen and the log file of the tool. For that, you only need to edit the config.py file with your favourite editor:

vi config.py

Start ckptman just by by typing:

./ckptman start

About

ckptman, a tool to automate the checkpointing in spot instances

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published