Skip to content

Distributed power management framework to optimize performance of a cluster given a power budget. Focus is on the distributed protocol and creating a fast & customizable framework.

License

Notifications You must be signed in to change notification settings

pranavsb/wattson

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wattson

Distributed power capping framework to optimize performance of a cluster given a power budget. Uses Agent-Controller architecture (similar to Facebook's Dynamo) with multiple lightweight Agent processes running on each server (alongside production workloads) reporting power readings to a dedicated central Controller that makes sure that the total cluster power budget does not exceed the configured power budget at any point of time.

Initially, the focus is on the distributed protocol and creating general interfaces for both Agent and Controller. We also focus on safety, ensuring total power budget is maintained at all times over performance (optimizing power allocation algorithm) for now. We chose to use RAPL as the power capping mechanism and support Primary-Replica priority, that is, if two nodes are both hitting the limit of their allocated power, the Primary node will be preferred over the Replica.

Wattson differs from FB's Dynamo in that it aims to optimize performance of a cluster, not merely act as "insurance" for power usage. It differs from approaches like PUPiL in that it aims to optimize performance within a power budget for a cluster of nodes, and can leverage systems like PUPiL to optimize single-node performance given the single node's power allocation. In other words, Wattson is a dynamic, feedback-based power management framework that calculates each node's power allocation at any point of time. Given this calculation, we can use RAPL or PUPiL or any other system to enforce this power limit on the node while extracting maximum performance. So, Wattson is closer to PoDD in its aim.

Wattson in a single diagram:

Wattson architecture diagram showing Controller getting power readings from multiple Agents

To understand what happens after getting the readings, click here to see Wattson explained in 3 simple diagrams.

Using Wattson

We're using sockpp as a static library dependency for socket communication. Note that libs/libsockpp.a has been compiled on MacOS (for local development) for now. Wattson plans to support Linux - since production servers widely use Linux and RAPL or other power-capping typically support Linux.

Related work:

  • PoDD: power-capping dependent distributed applications - Zhang et al., 2019 - PDF
  • Dynamo: Facebook’s Data Center-Wide Power Management System - Wu et al., 2016 - Link
  • Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques - Zhang et al., 2016 - Link

About

Distributed power management framework to optimize performance of a cluster given a power budget. Focus is on the distributed protocol and creating a fast & customizable framework.

Topics

Resources

License

Stars

Watchers

Forks

Languages