-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TTNN ToLayout + To/FromDevice Ops around hoisted func calls #1925
base: main
Are you sure you want to change the base?
Conversation
lib/Conversion/TTNNToEmitC/Utils.cpp
Outdated
// If this attr is null, it should mean device is on host; this should be | ||
// legal, so we propagate here. | ||
if (!attr) { | ||
return builder.getType<emitc::OpaqueAttr>("nullptr"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I'm not strictly sure this is the correct way to do this--I talked with @nsmithtt about how we should treat tensors w/o this TensorMemoryLayoutAttr and we agreed it should be treated as optional, so this seems like a reasonable way to handle for emitC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @svuckovicTT, we need some way of constructing an empty host tensor that can act as storage for the CPU fallback destination.
@vwellsTT, I don't think we can return a nullptr
attribute, I think host tensor creation probably needs a whole new path in emitC that doesn't program tensor layout to begin with. I found some runtime code from runtime/lib/ttnn/runtime.cpp
that creates TTNN host tensors in C++:
auto tensor = std::make_shared<::ttnn::Tensor>(
createStorage<BorrowedStorage>(data.get(), numElements, dataType),
::ttnn::Shape(small_vector_shape), utils::toTTNNDataType(dataType),
::ttnn::Layout::ROW_MAJOR);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh 🤦♂️ yeah that's clearly not right, sorry. I meant to make the ::ttnn::MemoryConfig itself null, not an enum, but yeah I guess I need to spend a bit more time looking into how our emitC stuff works. (I made similar changes to ttrt recently, and I think many of the ::ttnn ops do accept std::optional, including ::ttnn::empty, so hopefully I can do something similar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static ::ttnn::Tensor
createEmptyOnSingleDevice(ProgramContext &context, EmptyTensorConfig &config,
const ::tt::target::DeviceRef *deviceRef) {
if (deviceRef && config.memoryConfig.has_value() &&
config.memoryConfig.value().buffer_type !=
::ttnn::BufferType::SYSTEM_MEMORY) {
::ttnn::MeshDevice &subMesh = context.getSubMesh(deviceRef->global_id());
LOG_ASSERT(subMesh.num_devices() == 1);
::ttnn::Device *device = subMesh.get_device_index(0);
return ::ttnn::empty(config.shape, config.dtype, config.layout, device,
config.memoryConfig.value());
}
return ::ttnn::zeros(config.shape, config.dtype, config.layout);
}
I guess I probably just want to mirror this logic from ttrt ideally
@@ -349,13 +406,14 @@ class TTNNLayoutForceSystemMemoryRewriter : public OpRewritePattern<SrcOp> { | |||
appendInputSuffix(op.getLoc(), operand.getOperandNumber()); | |||
std::optional<Value> layout = createToLayoutOp( | |||
rewriter, newLoc, operand.get(), BufferType::SystemMemory, | |||
nullptr /* tensorMemoryLayoutAttr */, false /* tiled */); | |||
nullptr /* desiredMemLayoutAttr */, false /* tiled */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really tensorMemoryLayoutAttr
is more descriptive, but desiredMemLayoutAttr
is actual param name, which imo makes more sense for this type of comment
lib/Conversion/TTNNToEmitC/Utils.cpp
Outdated
// If this attr is null, it should mean device is on host; this should be | ||
// legal, so we propagate here. | ||
if (!attr) { | ||
return builder.getType<emitc::OpaqueAttr>("nullptr"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @svuckovicTT, we need some way of constructing an empty host tensor that can act as storage for the CPU fallback destination.
@vwellsTT, I don't think we can return a nullptr
attribute, I think host tensor creation probably needs a whole new path in emitC that doesn't program tensor layout to begin with. I found some runtime code from runtime/lib/ttnn/runtime.cpp
that creates TTNN host tensors in C++:
auto tensor = std::make_shared<::ttnn::Tensor>(
createStorage<BorrowedStorage>(data.get(), numElements, dataType),
::ttnn::Shape(small_vector_shape), utils::toTTNNDataType(dataType),
::ttnn::Layout::ROW_MAJOR);
// Insert ToDevice op after the result. | ||
auto toDeviceOp = rewriter.create<ttnn::ToDeviceOp>( | ||
callOp.getLoc(), result.getType(), result, device, | ||
ttnn::MemoryConfigAttr{}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be ok to unconditionally call FromDevice
despite the input tensor potentially already being on host (though I haven't tested). But it's not clear what to do afterwards, ideally we put the tensor back into the exact same memory space that the input tensors were in. I think the you have this written is the most correct thing to do, just want to spawn a discussion in case others have any ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this has worked fine in all my testing so far (though I confess I haven't done anything complicated in testing so far)
Goal: The end-to-end goal is to integrate a path to compile and execute specific ops or sets of ops on the CPU.
Context:
The entire task will be split into (tentatively) 7 PRs, as follows:
This PR represents the 6th bullet above--it makes changes to TTNN passes s.t. we move inputs to cpu-hoisted func
from_device
to host, we move results of cpu_hoisted funcsto_device
from host, we ensure that inputs to cpu-hoisted funcs are in row-majorlayout
. We also make it so various assertions in TTNN passes do not reject func decls with no bodies, since we rely on these for hoisted funcs.Example
Input:
Output: