Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metal (Apple) GPU back-end for Tracy #793

Merged
merged 27 commits into from
Sep 10, 2024
Merged

Conversation

slomp
Copy link
Contributor

@slomp slomp commented May 17, 2024

(I still need to update the manual, but I'm putting the code here for review to save some time).

The Metal back-end in Tracy operates differently than other GPU back-ends like Vulkan, Direct3D and OpenGL. Specifically, TracyMetalZone() must be placed around the site where a command encoder is created.

This is because not all hardware supports timestamps at command granularity, and can only provide timestamps around an entire command encoder. This accommodates for all tiers of hardware; in the future, variants of TracyMetalZone() will be added to support the habitual command-level granularity of Tracy GPU back-ends.

Metal also imposes a few restrictions that make the process of requesting and collecting queries more complicated in Tracy:

  • timestamp query buffers are limited to 4096 queries (32KB, where each query is 8 bytes)
  • when a timestamp query buffer is created, Metal initializes all timestamps with zeroes, and there's no way to reset them back to zero after timestamps get resolved; the only way to clear the timestamps is by allocating a new timestamp query buffer
  • if a command encoder records no commands and its corresponding command buffer ends up committed to the command queue, Metal will "optimize-away" the encoder along with any timestamp queries associated with it (the timestamp will remain as zero and will never get resolved)

Because of the limitations above, two timestamp buffers are managed internally. Once one of the buffers fills up with requests, the second buffer can start serving new requests.

Once all requests in a buffer get resolved and collected, the entire buffer is discarded and a new one allocated for future requests. (Proper cycling through a ring buffer would require bookkeeping and completion handlers to collect only the known complete queries.)

In the current implementation, there is potential for a race condition when the buffer is discarded and reallocated. In practice, the race condition will never materialize so long as TracyMetalCollect() is called frequently to keep the amount of unresolved queries low.

Finally, there's a timeout mechanism during timestamp collection to detect "empty" command encoders and ensure progress.

@slomp
Copy link
Contributor Author

slomp commented May 17, 2024

@wolfpld I'd like to request reviews from @nosferalatu and @JamesMcCarthy44, but I can't seem to be able to add reviewers.

@wolfpld
Copy link
Owner

wolfpld commented May 17, 2024

I don't know how assigning reviewers work on Github. Mentioning people should be enough to get their attention.

@slomp
Copy link
Contributor Author

slomp commented May 18, 2024

Also pinging @theblackunknown for a code review.

public/tracy/TracyMetal.hmm Outdated Show resolved Hide resolved
public/tracy/TracyMetal.hmm Show resolved Hide resolved
public/tracy/TracyMetal.hmm Show resolved Hide resolved
public/tracy/TracyMetal.hmm Show resolved Hide resolved
Copy link
Contributor

@theblackunknown theblackunknown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work Marcos.
I have made a first pass of code review without testing so far.
The PR description and analysis is greatly appreciated !

Can you also clarify if this code is supposed to be use with ObjC ARC or not ?
I know that codebase from codebase we can run into different expectations with this behavior.

public/tracy/TracyMetal.hmm Show resolved Hide resolved
Comment on lines +113 to +114
TracyMetalDebug(1<<0, TracyMetalPanic(, "MTLCounterErrorValue = 0x%llx", MTLCounterErrorValue));
TracyMetalDebug(1<<0, TracyMetalPanic(, "MTLCounterDontSample = 0x%llx", MTLCounterDontSample));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan on keeping all those debug points ?
Coming from the Tracy Vulkan background they look unfamiliar to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at least at this initial stage. Should people start reporting issues, this can help me triage the problem.

public/tracy/TracyMetal.hmm Outdated Show resolved Hide resolved
TracyMetalDebug(1<<0, TracyMetalPanic(, "Calibration: CPU timestamp (Metal): %llu", cpuTimestamp));
TracyMetalDebug(1<<0, TracyMetalPanic(, "Calibration: GPU timestamp (Metal): %llu", gpuTimestamp));

cpuTimestamp = Profiler::GetTime();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this expected that you ditch the CPU timestamp returned by the MTLDevice ? Is this for consistency with other Tracy event messages ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what Tracy expects. The CPU timestamp reported when creating the GPU context must be the timestamp that Tracy understands. This is consistent with other backends as well.

public/tracy/TracyMetal.hmm Outdated Show resolved Hide resolved
Comment on lines +314 to +315
t_start = m_mostRecentTimestamp + 5;
t_end = t_start + 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this means you try tp "patch" unresolved or not yet resolved timestamps ?
Can't we defer their resolution instead ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't defer, because the main reason a timestamp can't be resolved is because it the command encoder with the associated timestamp(s) was never scheduled for execution.

public/tracy/TracyMetal.hmm Outdated Show resolved Hide resolved
const bool m_active;

MetalCtx* m_ctx;
id<MTLComputeCommandEncoder> m_cmdEncoder;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently unused because the above code is within #if 0 ... #endif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and I'll be working on that in a subsequent PR. I don't have hardware that supports command granularity right now.

MemWrite( &item->gpuZoneEnd.context, ctx->GetContextId() );
Profiler::QueueSerialFinish();

TracyMetalDebug(1<<2, TracyAllocN((void*)(uintptr_t)queryId, 1, "TracyMetalGpuZone"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those not supposed to be pair of TracyAllocN/TracyFreeN ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TracyFree happens in Collect, when the two timestamps announced here (begin and end timestamps) are resolved.

private:
const bool m_active;

MetalCtx* m_ctx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively you could just store the context ID

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other back-ends seem to all store the context pointer, so I'll stick to that.

{
auto checkTime = std::chrono::high_resolution_clock::now();
auto requestTime = m_timestampRequestTime[k];
auto ms_in_flight = std::chrono::duration<float>(checkTime-requestTime).count()*1000.0f;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wolfpld I want to remove uses of std::chrono and use what's available in Tracy already, that is, Profiler::GetTime(). I may be missing something obvious here, but how do you convert a time difference between two Profiler::GetTime() samples and convert it to, say, seconds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public/tracy/TracyMetal.hmm Outdated Show resolved Hide resolved
public/tracy/TracyMetal.hmm Show resolved Hide resolved
public/tracy/TracyMetal.hmm Show resolved Hide resolved
public/tracy/TracyMetal.hmm Show resolved Hide resolved
public/tracy/TracyMetal.hmm Outdated Show resolved Hide resolved
public/tracy/TracyMetal.hmm Show resolved Hide resolved
public/tracy/TracyMetal.hmm Show resolved Hide resolved
public/tracy/TracyMetal.hmm Outdated Show resolved Hide resolved
const bool m_active;

MetalCtx* m_ctx;
id<MTLComputeCommandEncoder> m_cmdEncoder;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and I'll be working on that in a subsequent PR. I don't have hardware that supports command granularity right now.

@slomp
Copy link
Contributor Author

slomp commented Sep 10, 2024

@wolfpld Please review the changes to the manual, and if it looks good to you, this PR is ready to merge!

@@ -401,7 +401,8 @@ enum class GpuContextType : uint8_t
Vulkan,
OpenCL,
Direct3D12,
Direct3D11
Direct3D11,
Metal
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires a protocol version bump (I'll do it).

@wolfpld wolfpld merged commit e8ff26e into wolfpld:master Sep 10, 2024
4 checks passed
@slomp slomp mentioned this pull request Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants