You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The power consumption becomes a problem when we running LLM on data centers and k8s.
Ref to the cloud native AI white paper, the difference between technological steak the makes the case more complex.
For example, different GPU device, different deployment architectures, TEE from security points of view, etc.
Recently as kepler community completed a POC for set up tekton on a clean BM on AWS, and other discussions around to make kepler's validation with pipeline.
An interesting question is that how kepler validate itself between current testing version and latest stable version.
If so, which means with a stable version of kepler and pipeline as github action, tekton etc... we can make a pattern for measure power consumption for any project via CI/CD pipeline.
Outcome
Concept level: A pattern for any project on k8s to measure power consumption via group of cloud native tools as tekton, kind, kepler.
Implementation level: The pattern should be implemented flexible enough to cover different cases with pluggable with a sample code repo for share and reuse as github action or other... approaches.
self owned github runner.
different arch.
different OS.
BM/VM.
etc....
Deliver level: A blog and events to share this pattern.
Ownership level: From kepler community to share it to TAG as common/generic infra?
To-Do
kepler community complete validate kepler itself this year.
refine kepler model server totken logic, to decouple workload phase from model server training and reuse it as workload.
base on the workload, making pipeline to validate kepler between versions.
find another project replace workload parts and validation parts as an example.
note : as kepler's model having power from idle and dynamic, a workload is need for the target project to... get idle and dynamic power changes?
leonardpahlke
changed the title
[<Action>] Practice to measure power consumption for a project which CI/CD.
[Action] Practice to measure power consumption for a project which CI/CD.
Apr 26, 2024
Description
The power consumption becomes a problem when we running LLM on data centers and k8s.
Ref to the cloud native AI white paper, the difference between technological steak the makes the case more complex.
For example, different GPU device, different deployment architectures, TEE from security points of view, etc.
Recently as kepler community completed a POC for set up tekton on a clean BM on AWS, and other discussions around to make kepler's validation with pipeline.
An interesting question is that how kepler validate itself between current testing version and latest stable version.
If so, which means with a stable version of kepler and pipeline as github action, tekton etc... we can make a pattern for measure power consumption for any project via CI/CD pipeline.
Outcome
Concept level: A pattern for any project on k8s to measure power consumption via group of cloud native tools as tekton, kind, kepler.
Implementation level: The pattern should be implemented flexible enough to cover different cases with pluggable with a sample code repo for share and reuse as github action or other... approaches.
self owned github runner.
different arch.
different OS.
BM/VM.
etc....
Deliver level: A blog and events to share this pattern.
Ownership level: From kepler community to share it to TAG as common/generic infra?
To-Do
note : as kepler's model having power from idle and dynamic, a workload is need for the target project to... get idle and dynamic power changes?
cc: @rootfs, @sunya-ch, @marceloamaral , please help me correction for any mistake. or we can correct later on.
Code of Conduct
Comments
it may over years to be completed, maybe we can breakdown tasks and making things parallel.
Some previous discussion on sustainable-computing-io/kepler-model-server#212
the example https://github.com/sustainable-computing-io/aws_ec2_self_hosted_runner/blob/main/.github/workflows/ci_integration.yml#L35-L73 for set up tekton on a new created ec2 instance.
The text was updated successfully, but these errors were encountered: