Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#14366: Allow creating pre-allocated buffer with an address. Minor perf improvements #14394

Merged
merged 6 commits into from
Oct 29, 2024

Conversation

sminakov-tt
Copy link
Contributor

@sminakov-tt sminakov-tt commented Oct 28, 2024

Ticket

#14366

Problem description

During the recent Buffer refactoring, Buffer::set_address method was removed, which broke the clients who want to create a Buffer with an already allocated address.
As mentioned in the ticket, a call to can_use_passthrough_scheduling() in Buffer::address() is slower than expected.

What's changed

Added another overload to Buffer::create and CreateBuffer, which allows to specify the address.
Stored a flag to indicate whether a Buffer owns the data it points to or not.
Use std::make_shared for non-owning use case to optimize performance.
Replaced the call to can_use_passthrough_scheduling() in Buffer::address() with atomics to improve performance

Checklist

  • Post commit CI passes
  • Blackhole Post commit (if applicable)
  • Model regression CI testing passes (if applicable)
  • Device performance regression CI testing passes (if applicable)
  • New/Existing tests provide coverage for changes

tt_metal/host_api.hpp Outdated Show resolved Hide resolved
@@ -144,6 +144,8 @@ struct BufferPageMapping {
inline namespace v0 {

class Buffer final {
struct Private { explicit Private() = default; };
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Private structure is used to limit access to the Buffer constructor, which can't be truly private, because of std::make_shared. This is the way recommended by CppReference documentation: https://en.cppreference.com/w/cpp/memory/enable_shared_from_this (see Best example)

@@ -240,16 +244,38 @@ std::shared_ptr<Buffer> Buffer::create(
buffer->address_ = detail::AllocateBuffer(buffer.get());

std::unique_lock lock(buffer->allocation_mutex_);
buffer->allocation_status_.store(AllocationStatus::ALLOCATED, std::memory_order::relaxed);
buffer->allocation_status_.store(AllocationStatus::ALLOCATED, std::memory_order::release);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really need to be made release? It seems like we're already protected by a lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In Buffer::address() the first check is allocation_status_.load(std::memory_order::acquire) != AllocationStatus::ALLOCATION_REQUESTED performed without a lock. This is done this way for performance optimization reasons. The only two places who capture a lock areBuffer::create when allocating and Buffer::address when waiting on an allocation using condition variable.

@@ -289,6 +324,10 @@ bool Buffer::is_allocated() const {
}

uint32_t Buffer::address() const {
if (allocation_status_.load(std::memory_order::acquire) != AllocationStatus::ALLOCATION_REQUESTED) {
return address_;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimization for the happy case, no waiting or other checks if buffer is already allocated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense :)

@sminakov-tt sminakov-tt merged commit d4b7fc2 into main Oct 29, 2024
8 checks passed
@sminakov-tt sminakov-tt deleted the sminakov/buffer-address branch October 29, 2024 21:37
ct-clmsn pushed a commit to ct-clmsn/tt-metal that referenced this pull request Nov 12, 2024
…s. Minor perf improvements (tenstorrent#14394)

* tenstorrent#14366: Allow creating pre-allocated buffer with an address. Minor performance improvements

* #0: Cleanup

* #0: Updated methods documentation

* #0: Handle allocation failure

* #0: Scheduling fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants