Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dxvk] Periodically flush handles for long-running queries #3273

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 39 additions & 13 deletions src/dxvk/dxvk_gpu_query.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,30 +32,23 @@ namespace dxvk {


DxvkGpuQueryStatus DxvkGpuQuery::getData(DxvkQueryData& queryData) const {
queryData = DxvkQueryData();

if (!m_ended)
return DxvkGpuQueryStatus::Invalid;

// Empty begin/end pair
if (!m_handle.queryPool)
return DxvkGpuQueryStatus::Available;


// Get query data from all associated handles
DxvkGpuQueryStatus status = getDataForHandle(queryData, m_handle);
DxvkGpuQueryStatus status = flushHandles();

for (size_t i = 0; i < m_handles.size()
&& status == DxvkGpuQueryStatus::Available; i++)
status = getDataForHandle(queryData, m_handles[i]);

// Treat non-precise occlusion queries as available
// if we already know the result will be non-zero
if ((status == DxvkGpuQueryStatus::Pending)
&& (m_type == VK_QUERY_TYPE_OCCLUSION)
&& !(m_flags & VK_QUERY_CONTROL_PRECISE_BIT)
&& (queryData.occlusion.samplesPassed))
status = DxvkGpuQueryStatus::Available;


if (status == DxvkGpuQueryStatus::Available)
queryData = m_queryData;

return status;
}

Expand All @@ -69,6 +62,8 @@ namespace dxvk {
for (const auto& handle : m_handles)
cmd->trackGpuQuery(handle);
m_handles.clear();

m_queryData = DxvkQueryData();
}


Expand All @@ -78,13 +73,44 @@ namespace dxvk {


void DxvkGpuQuery::addQueryHandle(const DxvkGpuQueryHandle& handle) {
// Some apps forget to End() their queries; instead of accumulating query
// handles indefinitely, try to clean them up periodically.
if (m_handles.size() >= 256)
Copy link
Owner

@doitsujin doitsujin Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to do this on every call? 256 queries as a threshold is a lot for a single query object, and games tend to have thousands of query objects. At the same time, we'll very rarely have more than one Vulkan query per D3D query in well-behaved games, so there won't really be any overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You guessed it in your edit: the reason is to avoid overhead. Checking a deque's size, comparing against a constant, and conditionally branching over a call is one or two orders of magnitude cheaper than a vkGetQueryPoolResults which will likely only come back with VK_NOT_READY if run on a query handle so recently allocated.

Yeah, 256 is a pretty big threshold! The vast, vast majority of D3D queries out there in the wild are used responsibly and don't come anywhere near that number, which I picked since I didn't want to spend time seeking out the not-as-well-behaved games which do use multiple Vulkan queries per D3D query to make sure this new codepath wasn't causing some regression somewhere. I chose 256 conservatively to be sure that this would only affect "leaked" D3D queries. (And, let's remember that the old threshold was actually infinity, and while games do tend to have thousands of query objects, I only observed about a dozen being leaked in this manner.)

If m_handles is to become a small_vector, it does make sense to make the threshold the same as its preallocated region, however.

flushHandles();

if (m_handle.queryPool)
m_handles.push_back(m_handle);

m_handle = handle;
}


DxvkGpuQueryStatus DxvkGpuQuery::flushHandles() const {
// Consume handles in FIFO order, aggregating them into m_queryData
while (!m_handles.empty()) {
DxvkGpuQueryHandle handle = m_handles.front();

DxvkGpuQueryStatus status = getDataForHandle(m_queryData, handle);
if (status != DxvkGpuQueryStatus::Available)
return status;

handle.allocator->freeQuery(handle);
m_handles.pop_front();
}

if (m_handle.queryPool) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably easier to turn this whole mess into a small_vector. The main reason why there are two structures at all was to avoid allocations in the common case (1 query), but we have better ways to deal with this now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concern that I have here is that the small_vector's equivalent to pop_front() (erase(0)) is O(n), which would make the while loop O(n²). I suppose this could be mitigated if small_vector::erase() got a range overload, which could then be called once after the loop finishes; flushHandles() would be back to O(n).

Either way, that's a separate change so I'd be rebasing the PR to add a commit if I do it.

DxvkGpuQueryStatus status = getDataForHandle(m_queryData, m_handle);
if (status != DxvkGpuQueryStatus::Available)
return status;

m_handle.allocator->freeQuery(m_handle);
m_handle = DxvkGpuQueryHandle();
}

return DxvkGpuQueryStatus::Available;
}


DxvkGpuQueryStatus DxvkGpuQuery::getDataForHandle(
DxvkQueryData& queryData,
const DxvkGpuQueryHandle& handle) const {
Expand Down
10 changes: 6 additions & 4 deletions src/dxvk/dxvk_gpu_query.h
Original file line number Diff line number Diff line change
Expand Up @@ -224,10 +224,12 @@ namespace dxvk {
uint32_t m_index;
bool m_ended;

DxvkGpuQueryHandle m_handle;

std::vector<DxvkGpuQueryHandle> m_handles;

mutable DxvkQueryData m_queryData;
mutable DxvkGpuQueryHandle m_handle;
mutable std::deque<DxvkGpuQueryHandle> m_handles;
Copy link
Owner

@doitsujin doitsujin Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly dislike this mutable spam; if methods modify internal state then they shouldn't be const, especially since doing it like this hides potential thread safety concerns.

If anything, this highlights that the current implementation is bugged (in that accesses to m_ended do not enforce memory order).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like mutable either, though I wanted to avoid a pointer to an extra struct that would have just meant yet one more allocation.

Since DxvkGpuQuery::getData() is part of the "public" (D3D layer facing) API, it seemed appropriate to keep it const as it is indeed logically const. If that isn't The DXVK Way™, it's easy enough to make getData() non-const.

I can replace m_ended with an atomic if you'd like, though this is a separate change and I'd rebase this PR to put that commit deeper than this one.


DxvkGpuQueryStatus flushHandles() const;

DxvkGpuQueryStatus getDataForHandle(
DxvkQueryData& queryData,
const DxvkGpuQueryHandle& handle) const;
Expand Down