Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#1190: Added runtime support for doing golden comparision for flatbuffers in ttrt #1218

Merged
merged 1 commit into from
Nov 15, 2024

Conversation

tapspatel
Copy link
Contributor

@tapspatel tapspatel commented Nov 11, 2024

The goal of this PR is to introduce golden support in runtime. The design was taken into account that other runtimes can plug into MLIR runtime (ttrt is just one possible frontend for mlir runtime). The frontend building off MLIR runtime can register a "callback" function through python, which will return ProgramContext* and Operation* - to be run after every op is executed. The frontend can then call supporting APIs to return currently running data.

  • golden information embedded into the flatbuffer as a golden_map
    • key = loc() of the operation
    • value = vector(float)
  • ability to dump debug string from both Operation* and it's OpType*
  • added Hook callback function to execute after every op
    • python interface with C++ runtime can initialize a callback function that will return ProgramContext* and Operation* void ptrs
    • getOpOutputTensor(programcontext, operation) pybind function support to get a tensor from device for a particular op
    • getOpDebugString(programcontext, operation) pybind function support to get an op debug string
      • this is needed to get the loc() information for the op (which is key in golden map)
  • ttrt callback module which does golden comparision against device tensor using pcc
  • ttrt golden option which sets a golden function and triggers golden comparision for each op in runtime

Some ops have bad pcc: following up bad pcc ops in separate issues. Parent issue: #1219

@nsmithtt nsmithtt requested a review from ctodTT November 12, 2024 02:20
Copy link
Contributor

@nsmithtt nsmithtt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Taps, things are looking great! Commented inline for a few API improvements.

include/ttmlir/Target/TTNN/program.fbs Outdated Show resolved Hide resolved
runtime/lib/runtime.cpp Outdated Show resolved Hide resolved
runtime/lib/ttnn/program.cpp Outdated Show resolved Hide resolved
runtime/lib/ttnn/program.cpp Outdated Show resolved Hide resolved
runtime/tools/python/ttrt/common/callback.py Outdated Show resolved Hide resolved
runtime/include/tt/runtime/runtime.h Outdated Show resolved Hide resolved
runtime/tools/python/ttrt/common/callback.py Outdated Show resolved Hide resolved
@tapspatel tapspatel force-pushed the tpatel/issue-1190 branch 2 times, most recently from 9a1457a to e9cf188 Compare November 14, 2024 00:08
@tapspatel tapspatel requested a review from nsmithtt November 14, 2024 00:09
runtime/lib/runtime.cpp Outdated Show resolved Hide resolved
@tapspatel tapspatel force-pushed the tpatel/issue-1190 branch 2 times, most recently from 018095b to 5a40629 Compare November 14, 2024 01:56
runtime/include/tt/runtime/detail/ttmetal.h Outdated Show resolved Hide resolved
runtime/include/tt/runtime/detail/ttnn.h Outdated Show resolved Hide resolved
runtime/include/tt/runtime/detail/ttnn.h Outdated Show resolved Hide resolved
runtime/include/tt/runtime/runtime.h Outdated Show resolved Hide resolved
runtime/lib/runtime.cpp Outdated Show resolved Hide resolved
runtime/include/tt/runtime/types.h Outdated Show resolved Hide resolved
runtime/lib/ttnn/include/tt/runtime/ttnn/types.h Outdated Show resolved Hide resolved
runtime/lib/runtime.cpp Outdated Show resolved Hide resolved
runtime/lib/runtime.cpp Outdated Show resolved Hide resolved
runtime/lib/runtime.cpp Outdated Show resolved Hide resolved
runtime/lib/runtime.cpp Outdated Show resolved Hide resolved
@tapspatel tapspatel force-pushed the tpatel/issue-1190 branch 2 times, most recently from d958bfb to 21a7384 Compare November 15, 2024 01:59
@tapspatel
Copy link
Contributor Author

fixed all the issues. @jnie-TT @nsmithtt Let me know if there's something missing.

@nsmithtt I tried to pybind the tt::target::GoldenTensor but it ended up complaining a ton about mismatch between data type (ie the autogenerated from flatbuffer compiler and what I was trying to pybind). So I have this workaround

.def("get_debug_info_golden", [](tt::runtime::Binary &binary,
                                       std::string &loc) {
        const ::tt::target::GoldenTensor *goldenTensor =
            binary.getDebugInfoGolden(loc);
        if (goldenTensor == nullptr) {
          return std::vector<float>();
        }

        int totalDataSize = std::accumulate((*goldenTensor->shape()).begin(),
                                            (*goldenTensor->shape()).end(), 1,
                                            std::multiplies<int64_t>());
        std::vector<float> dataVec(totalDataSize);
        std::memcpy(dataVec.data(), goldenTensor->data(), totalDataSize);
        return dataVec;

I'm not 100% with this workaround :) however

@tapspatel tapspatel force-pushed the tpatel/issue-1190 branch 2 times, most recently from 4a90156 to ea1d74f Compare November 15, 2024 02:59
Copy link
Contributor

@nsmithtt nsmithtt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Taps, it's looking really close! Some more comments inline

runtime/include/tt/runtime/detail/debug.h Outdated Show resolved Hide resolved
runtime/include/tt/runtime/detail/ttmetal.h Outdated Show resolved Hide resolved
runtime/include/tt/runtime/runtime.h Outdated Show resolved Hide resolved
runtime/lib/ttnn/runtime.cpp Outdated Show resolved Hide resolved
runtime/lib/ttnn/runtime.cpp Outdated Show resolved Hide resolved
runtime/tools/python/ttrt/common/run.py Outdated Show resolved Hide resolved
runtime/tools/python/ttrt/common/run.py Outdated Show resolved Hide resolved
runtime/tools/python/ttrt/common/util.py Outdated Show resolved Hide resolved
runtime/tools/python/ttrt/common/util.py Outdated Show resolved Hide resolved
runtime/tools/python/ttrt/runtime/module.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@nsmithtt nsmithtt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really great! Thank you for being receptive to my pedantry.

runtime/tools/python/ttrt/common/golden.py Show resolved Hide resolved
runtime/tools/python/ttrt/common/util.py Outdated Show resolved Hide resolved
@tapspatel tapspatel force-pushed the tpatel/issue-1190 branch 3 times, most recently from 911ddd6 to 4d55210 Compare November 15, 2024 18:05
@tapspatel tapspatel requested a review from nsmithtt November 15, 2024 18:05
Copy link
Contributor

@nsmithtt nsmithtt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor things left, otherwise I think it's good to go!

python/test_infra/ttir_builder.py Show resolved Hide resolved
runtime/lib/ttnn/runtime.cpp Outdated Show resolved Hide resolved
runtime/include/tt/runtime/types.h Outdated Show resolved Hide resolved
@tapspatel
Copy link
Contributor Author

Many pcc issues with several ops resolved :D. Remaining ones

  • logical_not
  • test_eq
  • test_ne
  • test_ge
  • test_gt
  • test_le
  • test_lt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants