Skip to content

Commit

Permalink
Update docs for EmitC testing (#1918)
Browse files Browse the repository at this point in the history
  • Loading branch information
svuckovicTT authored Jan 22, 2025
1 parent 4ba1b73 commit 9e5980e
Show file tree
Hide file tree
Showing 13 changed files with 125 additions and 109 deletions.
4 changes: 4 additions & 0 deletions docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,17 @@

- [Building](./build.md)
- [Docker Notes](./docker-notes.md)
- [Testing](./testing.md)
- [Lit testing](./lit-testing.md)
- [EmitC testing](./emitc-testing.md)
- [Tools](./tools.md)
- [ttmlir-opt](./ttmlir-opt.md)
- [ttmlir-translate](./ttmlir-translate.md)
- [ttrt](./ttrt.md)
- [tt-explorer](./tt-explorer.md)
- [Usage & API](./tt-explorer-usage-api.md)
- [Roadmap](./tt-explorer-roadmap.md)
- [ttnn-standalone](./ttnn-standalone.md)
- [Flatbuffers](./flatbuffers.md)
- [CI](./ci.md)
- [Additional Reading](./additional-reading.md)
Expand Down
22 changes: 18 additions & 4 deletions docs/src/adding-an-op.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,9 @@ This guide will cover the following steps:
- [`runtime/lib/ttnn/operations/CMakeLists.txt`](#runtimelibttnnoperationscmakeliststxt)
- [`runtime/lib/ttnn/program.cpp`](#runtimelibttnnprogramcpp)
- [8. Add a silicon unit test for the Op](#8-add-a-silicon-unit-test-for-the-op)
- [`test/ttmlir/Silicon/TTNN/simple_matmul.mlir`](#testttmlirsiliconttnnsimple_matmulmlir)
- [`test/ttmlir/Silicon/TTNN/matmul/simple_matmul.mlir`](#testttmlirsiliconttnnmatmulsimple_matmulmlir)
- [9. Add an EmitC test for the Op](#9-add-an-emitc-test-for-the-op)
- [`test/ttmlir/EmitC/TTNN/matmul/matmul.mlir`](#testttmliremitcttnnmatmulmatmulmlir)

## 1. Define the Op in the TTIR frontend dialect

Expand Down Expand Up @@ -302,11 +304,11 @@ ttrt run out.ttnn
After adding runtime support, we're ready to test our Op on silicon. All silicon tests are located
under `test/ttmlir/Silicon`. The process is similar to [adding a compiler unit test](#4-add-a-compiler-unit-test-for-the-op).

In our specific case, we create a unit test here: `test/ttmlir/Silicon/TTNN/simple_matmul.mlir`:
In our specific case, we create a unit test here:

#### `test/ttmlir/Silicon/TTNN/simple_matmul.mlir`
#### `test/ttmlir/Silicon/TTNN/matmul/simple_matmul.mlir`
```mlir
{{#include ../../../test/ttmlir/Silicon/TTNN/simple_matmul.mlir}}
{{#include ../../../test/ttmlir/Silicon/TTNN/matmul/simple_matmul.mlir}}
```

Couple things to point out about this process:
Expand All @@ -318,3 +320,15 @@ If you want the module to run on silicon in CI, the test must be placed under `t
Ensuring the system descriptor accurately reflects the target hardware is essential for running the module correctly.
- `// RUN: ttmlir-translate --ttnn-to-flatbuffer %t.mlir > %t.ttnn`: This runs `ttmlir-translate` that serializes the output mlir module to a flatbuffer binary.
We added the logic for this serialization in the [Serialize the Op in the flatbuffer format](#6-serialize-the-op-in-the-flatbuffer-format) section.

## 9. Add an EmitC test for the Op
Op should be tested in the EmitC (C++ codegen) path as well.

TTNN EmitC tests live in the `test/ttmlir/EmitC/TTNN` path. In our case, the test is in `test/ttmlir/EmitC/TTNN/matmul/matmul.mlir`.

#### `test/ttmlir/EmitC/TTNN/matmul/matmul.mlir`
```cpp
{{#include ../../../test/ttmlir/EmitC/TTNN/matmul/matmul.mlir}}
```
The first two `RUN` lines create a flatbuffer. The third and forth convert to EmitC dialect, translate to C++, then output the result to `matmul.mlir.cpp` file.
18 changes: 0 additions & 18 deletions docs/src/build.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,24 +60,6 @@ source env/activate
cmake --build build -- check-ttmlir
```

### llvm-lit

Under the hood the check-ttmlir cmake target is running `llvm-lit`. With it you
can:

```bash
# Query which tests are available
llvm-lit -sv ./build/test --show-tests

# Run an individual test:
llvm-lit -sv ./build/test/ttmlir/Dialect/TTIR/test_allocate.mlir

# Run a sub-suite:
llvm-lit -sv ./build/test/ttmlir/Dialect/TTIR
```

> See the full [llvm-lit documentation](https://llvm.org/docs/CommandGuide/lit.html) for more information.
## Lint

```bash
Expand Down
14 changes: 14 additions & 0 deletions docs/src/emitc-testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# EmitC testing

To locally run EmitC tests:

```bash
# Generate flatbuffers and .cpp files
llvm-lit -sv test/ttmlir/EmitC/TTNN

# Compile .cpp files to shared objects
tools/ttnn-standalone/ci_compile_dylib.py

# Run flatbuffers + shared objects and compare results
ttrt run --emitc build/test/ttmlir/EmitC/TTNN
```
16 changes: 16 additions & 0 deletions docs/src/lit-testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Lit testing

`llvm-lit` tool is used for MLIR testing. With it you can:

```bash
# Query which tests are available
llvm-lit -sv ./build/test --show-tests

# Run an individual test:
llvm-lit -sv ./build/test/ttmlir/Dialect/TTIR/test_allocate.mlir

# Run a sub-suite:
llvm-lit -sv ./build/test/ttmlir/Dialect/TTIR
```

> See the full [llvm-lit documentation](https://llvm.org/docs/CommandGuide/lit.html) for more information.
8 changes: 8 additions & 0 deletions docs/src/testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Testing

To run tests:

```bash
source env/activate
cmake --build build -- check-ttmlir
```
2 changes: 2 additions & 0 deletions docs/src/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,7 @@
Currently, there are a few primary tools that are part of the `ttmlir` project:

- `ttmlir-opt`: The `ttmlir` optimizer driver. This tool is used to run the `ttmlir` compiler passes on a `.mlir` source files and is central to developing and testing the compiler.
- `ttmlir-translate`: The `ttmlir` translation tool. This tool can convert from IR to external representation (and inverse). For example, IR in EmitC dialect can be converted into C++ code.
- `ttrt`: This tool is intended to be a swiss army knife for working with flatbuffers generated by the compiler. Its primary role is to inspect and run flatbuffer files.
- [`tt-explorer`](https://github.com/vprajapati-tt/tt-explorer): Visualizer tool for `ttmlir`-powered compiler results. Visualizes from emitted `.mlir` files to display compiled model, attributes, performance results, and provide a platform for human-driven overrides to _gameify_ model tuning.
- `ttnn-standalone`: This tool is used to run C++ TTNN code outside of the compiler environment.
24 changes: 24 additions & 0 deletions docs/src/ttnn-standalone.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# `ttnn-standalone`

TTNN Standalone is a post-compile tuning/debugging tool.

Forge and third party ML models (PyTorch, Jax, ONNX, ...) can be compiled to a set of TTNN library calls in C++. This generated code can then be used outside of the compiler environment. TTNN Standalone tool offers all the scaffolding needed to run the C++ code on device (build & run scripts).

### Usage

```bash
# Compile a model to EmitC dialect => translate to C++ code => pipe to ttnn-standalone.cpp
./build/bin/ttmlir-opt --ttir-to-emitc-pipeline test/ttmlir/EmitC/TTNN/sanity_add.mlir \
| ./build/bin/ttmlir-translate --mlir-to-cpp \
> tools/ttnn-standalone/ttnn-standalone.cpp

# Change dir to `tools/ttnn-standalone` and use the `run` script to compile and run the ttnn standalone:
cd tools/ttnn-standalone
./run
```

Note: if you receive this error
```bash
-bash: ./run: Permission denied
```
running `chmod +x run` will allow the execution of the script.
1 change: 1 addition & 0 deletions lib/Dialect/TTNN/Pipelines/TTNNPipelines.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ void createTTIRToTTNNBackendPipeline(
void createTTIRToEmitCPipeline(OpPassManager &pm,
const TTIRToEmitCPipelineOptions &options) {
createTTIRToTTNNBackendPipeline(pm, options);
pm.addPass(createTTNNCreateInputGenerators());
pm.addPass(createConvertTTNNToEmitCPass());
}

Expand Down
2 changes: 1 addition & 1 deletion test/ttmlir/Dialect/TTNN/pipelines/ttir_to_emitc_add.mlir
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
// RUN: ttmlir-opt --ttir-to-emitc-pipeline="system-desc-path=%system_desc_path%" %s > %direct.mlir
// RUN: ttmlir-opt --ttir-to-ttnn-backend-pipeline="system-desc-path=%system_desc_path%" --convert-ttnn-to-emitc %s > %indirect.mlir
// RUN: ttmlir-opt --ttir-to-ttnn-backend-pipeline="system-desc-path=%system_desc_path%" --ttnn-create-input-gens --convert-ttnn-to-emitc %s > %indirect.mlir
// RUN: diff %direct.mlir %indirect.mlir
//
// This test checks that the (TTIR to EmitC pipeline) is equivalent to (TTIR to TTNN pipeline + dialect conversion from TTNN to EmitC).
Expand Down
10 changes: 10 additions & 0 deletions test/ttmlir/EmitC/TTNN/matmul/matmul.mlir
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
// RUN: ttmlir-opt --ttir-to-ttnn-backend-pipeline="system-desc-path=%system_desc_path%" %s > %t.mlir
// RUN: ttmlir-translate --ttnn-to-flatbuffer %t.mlir > %basename_t.ttnn
// RUN: ttmlir-opt --ttnn-modify-signatures-for-dylib --convert-ttnn-to-emitc %t.mlir > %t2.mlir
// RUN: ttmlir-translate --mlir-to-cpp %t2.mlir > %basename_t.cpp

func.func @forward(%arg0: tensor<64x128xbf16>, %arg1: tensor<128x96xbf16>) -> tensor<64x96xbf16> {
%0 = tensor.empty() : tensor<64x96xbf16>
%1 = "ttir.matmul"(%arg0, %arg1, %0) : (tensor<64x128xbf16>, tensor<128x96xbf16>, tensor<64x96xbf16>) -> tensor<64x96xbf16>
return %1 : tensor<64x96xbf16>
}
37 changes: 2 additions & 35 deletions tools/ttnn-standalone/README.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,3 @@
## Table of contents
# TTNN Standalone

- [TTNN Standalone](#ttnn-standalone)
- [Usage](#usage)
- [TTNN Dylib](#ttnn-dylib)

## TTNN Standalone

TTNN Standalone is a post-compile tuning tool.

Third party ML models (PyTorch, Jax, ONNX, ...) can be compiled to a set of TTNN library calls in C++. This generated code can then be manually fine-tuned outside of the compiler environment. TTNN Standalone tool offers all the scaffolding needed to run the C++ code on device (build & run scripts).

### Usage

```bash
# Compile a model to C++ code
./build/bin/ttmlir-opt --ttir-to-emitc-pipeline test/ttmlir/Silicon/TTNN/emitc/simple_add.mlir | ./build/bin/ttmlir-translate --mlir-to-cpp

# Copy paste the generated function into `ttnn-standalone.cpp`.

# Adapt the `main()` function in `ttnn-standalone.cpp` to feed tensors needed for the model

# Run the following script from within this folder (`tools/ttnn-standalone`) to compile and run the ttnn standalone:

./run
```

Note: if you receive this error
```bash
-bash: ./run: Permission denied
```
running `chmod +x run` will allow the execution of the script.

## TTNN Dylib

Similarly to the Standalone, this tool offers the ability to compile third party ML models, but to dylibs. Initial intent for compiled dylibs is to be used in testing infrastructure, but sky's the limit :)
Please refer to [TTMLIR docs](https://docs.tenstorrent.com/tt-mlir/ttnn-standalone.html).
76 changes: 25 additions & 51 deletions tools/ttnn-standalone/ttnn-standalone.cpp
Original file line number Diff line number Diff line change
@@ -1,70 +1,44 @@
// SPDX-FileCopyrightText: (c) 2024 Tenstorrent AI ULC
// SPDX-FileCopyrightText: (c) 2025 Tenstorrent AI ULC
//
// SPDX-License-Identifier: Apache-2.0

#include "ttnn-precompiled.hpp"

// To generate forward function, run:
// ./build/bin/ttmlir-opt --ttir-to-emitc-pipeline
// test/ttmlir/Silicon/TTNN/emitc/simple_add.mlir | ./build/bin/ttmlir-translate
// --mlir-to-cpp

ttnn::Tensor forward(ttnn::Tensor v1, ttnn::Tensor v2) {
ttnn::IDevice *v3 = ttnn::DeviceGetter::getInstance();
ttnn::MemoryConfig v4 = ttnn::MemoryConfig(
ttnn::TensorMemoryLayout::INTERLEAVED, ttnn::BufferType::DRAM);
ttnn::Tensor add(ttnn::Tensor v1, ttnn::Tensor v2) {
ttnn::IDevice* v3 = ttnn::DeviceGetter::getInstance();
ttnn::MemoryConfig v4 = ttnn::MemoryConfig(ttnn::TensorMemoryLayout::INTERLEAVED, ttnn::BufferType::DRAM);
ttnn::Tensor v5 = ttnn::to_device(v1, v3, v4);
ttnn::Tensor v6 =
ttnn::to_layout(v5, ttnn::Layout::TILE, std::nullopt, std::nullopt,
static_cast<::ttnn::IDevice *>(nullptr));
ttnn::Tensor v6 = ttnn::to_layout(v5, ttnn::Layout::TILE, std::nullopt, std::nullopt, static_cast<::ttnn::IDevice *>(nullptr));
ttnn::deallocate(v5, false);
ttnn::MemoryConfig v7 = ttnn::MemoryConfig(
ttnn::TensorMemoryLayout::INTERLEAVED, ttnn::BufferType::DRAM);
ttnn::MemoryConfig v7 = ttnn::MemoryConfig(ttnn::TensorMemoryLayout::INTERLEAVED, ttnn::BufferType::DRAM);
ttnn::Tensor v8 = ttnn::to_device(v2, v3, v7);
ttnn::Tensor v9 =
ttnn::to_layout(v8, ttnn::Layout::TILE, std::nullopt, std::nullopt,
static_cast<::ttnn::IDevice *>(nullptr));
ttnn::Tensor v9 = ttnn::to_layout(v8, ttnn::Layout::TILE, std::nullopt, std::nullopt, static_cast<::ttnn::IDevice *>(nullptr));
ttnn::deallocate(v8, false);
ttnn::SimpleShape v10 = ttnn::SimpleShape({
32,
32,
});
ttnn::MemoryConfig v11 = ttnn::MemoryConfig(
ttnn::TensorMemoryLayout::INTERLEAVED, ttnn::BufferType::DRAM);
ttnn::Tensor v12 =
ttnn::empty(v10, ttnn::DataType::BFLOAT16, ttnn::Layout::TILE, v3, v11);
ttnn::SimpleShape v10 = ttnn::SimpleShape(tt::tt_metal::LegacyShape({32, 32, }));
ttnn::MemoryConfig v11 = ttnn::MemoryConfig(ttnn::TensorMemoryLayout::INTERLEAVED, ttnn::BufferType::DRAM);
ttnn::Tensor v12 = ttnn::empty(v10, ttnn::DataType::BFLOAT16, ttnn::Layout::TILE, v3, v11);
ttnn::Tensor v13 = ttnn::add(v6, v9, std::nullopt, std::nullopt, v12);
ttnn::deallocate(v9, false);
ttnn::deallocate(v6, false);
ttnn::Tensor v14 = ttnn::from_device(v13);
ttnn::deallocate(v12, false);
ttnn::Tensor v15 =
ttnn::to_layout(v14, ttnn::Layout::ROW_MAJOR, std::nullopt, std::nullopt,
static_cast<::ttnn::IDevice *>(nullptr));
ttnn::Tensor v15 = ttnn::to_layout(v14, ttnn::Layout::ROW_MAJOR, std::nullopt, std::nullopt, static_cast<::ttnn::IDevice *>(nullptr));
ttnn::deallocate(v14, false);
return v15;
}

int main() {
// Create shapes
//
const size_t tensor_height = 32;
const size_t tensor_width = 32;
ttnn::SimpleShape xs =
ttnn::SimpleShape({1, 1, tensor_height, tensor_width});
ttnn::SimpleShape ys =
ttnn::SimpleShape({1, 1, tensor_height, tensor_width});

// Create tensors on cpu
//
auto x = ttnn::ones(xs, ttnn::DataType::BFLOAT16, ttnn::Layout::TILE);
auto y = ttnn::ones(ys, ttnn::DataType::BFLOAT16, ttnn::Layout::TILE);

// Run fwd pass on device
//
ttnn::Tensor result = forward(x, y);
std::tuple<ttnn::Tensor, ttnn::Tensor> createInputsFor_add() {
ttnn::SimpleShape v1 = ttnn::SimpleShape(tt::tt_metal::LegacyShape({32, 32, }));
ttnn::Tensor v2 = ttnn::ones(v1, ttnn::DataType::BFLOAT16, ttnn::Layout::ROW_MAJOR, std::nullopt, std::nullopt);
ttnn::SimpleShape v3 = ttnn::SimpleShape(tt::tt_metal::LegacyShape({32, 32, }));
ttnn::Tensor v4 = ttnn::ones(v3, ttnn::DataType::BFLOAT16, ttnn::Layout::ROW_MAJOR, std::nullopt, std::nullopt);
return std::make_tuple(v2, v4);
}

// Print result
//
result.print();
int32_t main() {
ttnn::Tensor v1;
ttnn::Tensor v2;
std::tie(v1, v2) = createInputsFor_add();
ttnn::Tensor v3 = add(v1, v2);
int32_t v4 = 0;
return v4;
}

0 comments on commit 9e5980e

Please sign in to comment.