Skip to content
Taekyung Heo edited this page Nov 18, 2023 · 6 revisions

Chakra Project Wiki

Mission

Advancing performance benchmarking and co-design in AI through standardized execution traces.

Overview

Chakra offers an innovative graph-based representation for AI/ML workload specifications, known as Execution Traces (ETs). It stands apart from conventional AI/ML frameworks by focusing on replay benchmarks, simulators, and emulators, prioritizing agile performance modeling and adaptable methodologies.

Purpose

Chakra's purpose encompasses:

  • Projecting Workloads for Future AI Systems: Assisting in the co-design of upcoming AI technologies, beneficial for internal cloud/hyperscaler use cases, vendor collaborations, and open industry-academic sharing.
  • Overcoming Benchmarking Challenges: Addressing the limitations in existing benchmarking methods, particularly in creating stand-alone reproducers while safeguarding proprietary information.
  • Facilitating Rapid Issue Resolution: Enabling teams to swiftly reproduce bugs or performance issues observed in production.

Schema Standardization and Enhanced Tools

Chakra's comprehensive approach includes:

  • Standardizing Schema: Establishing a universal format for AI/ML performance modeling.
  • Collecting Execution Traces: Accommodating a range of frameworks, including PyTorch, TF/XLA, and others.
  • Synthesizing Traces with ML Models: Utilizing advanced ML techniques for obfuscating and projecting workload behavior.
  • Developing Support Tools: Offering analyzers, visualization tools, and more for enhanced trace interaction.
  • Enabling Downstream Tools: Enhancing compatibility and functionality with various simulation, emulation, and replay tools.

Enhanced Tools for Downstream Applications

As part of downstream tool enhancement, Chakra focuses on:

  • Replay Tools: Involves accurately replaying execution traces to benchmark and optimize systems. Example tools include PARAM and its related research paper for detailed insights.
  • Simulation Tools: Utilizing simulation platforms like ASTRA-sim to model and analyze performance of AI workload executions in a controlled environment.

Contributing to Chakra