-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use memory mapped files for D3D9 texture mapping data #2663
Conversation
ce474f0
to
30966e4
Compare
In my own test, this PR make Saints Rows 2 even worse. In #2524 , Saints Rows 2 will stuck in new game loading screen and then crash. In this PR's artifacts , the game directly crash when choose new game in menu. And the mose weird thing is that no error can be found in log file. sr2_pc_d3d9.log
|
@qinlili23333 No new logging entries have been added in this pr yet (draft so i'm guessing it's not complete) from a quick code peek, if this this code specifically that causes the crash that is. Edit: great to see all the work on this btw. I will do some testing when it is deemed ready |
1d8af34
to
486afe2
Compare
Saints Row 2 crashes before reaching the main menu regardless of what I do. The Linux port crashes, stable Proton crashes, Proton experimental, WineD3D crashes. I doubt that's caused by DXVK. |
Odd. It worked fine for me when i tested a few days ago and could only make it crash when i turned off large address aware. Is this ready for testing now? 👀 |
dxvk.conf
Outdated
# DXVK will unmap D3D9 buffer data after a certain number of frames. | ||
# 0 to disable unmapping. | ||
|
||
# d3d9.bufferUnmapDelay = 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this->bufferUnmapDelay = config.getOption<int32_t> ("d3d9.bufferUnmapDelay", 256);
so default is 256?
Also, will e.g. d3d9.presentInterval = 2
affect those delays? I didn't read the code, but delays will be 8 and 128 real frames in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, will e.g. d3d9.presentInterval = 2 affect those delays? I didn't read the code, but delays will be 8 and 128 real frames in this case.
It should but that also shouldn't be a problem. This is just the first best thing I came up with and the values are somewhat arbitrary.
Tested on Windows 11 Preview 22621 with NVIDIA RTX 3060: Edit: Finally made a Large Address Aware patched Saints Row 2 executable, can run well with 1.10.1 Release, but cannot work with this PR's artifact that will keep blink and then crash. Screen recording video provided: |
The PR can't improve performance or reduce RAM usage. |
That's weird. My test shows that Alan Wake got performance improvement and less memory usage. In my test scene that 1.10.1 Release used about 730M and this PR 's version used only about 510M. |
@qinlili23333 This is the master build this PR is built on, it's better to compare it to that to see if anything has changed. |
I compared with this build. Master build used the same memory but there exist performance difference. This PR really got ~5-10 more performance than master build in Alan Wake. |
I am sadly not able to test this on windows since my card is too old for the new driver requirements there. Could you make a apitrace of a npc conversation where it usually happens? 🙂 |
Before using |
This is annoying to maintain and hopefully won't be necessary anymore.
fd4430f
to
feb04f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits, some questions about the allocation logic.
f45791e
to
2ec6dea
Compare
@jochuan Try again with the latest changes if you can. I wasn't able to reproduce your issue on linux with mesa drivers. |
7e480c4
to
acfca06
Compare
Found & fixed a bug with Shogun 2 |
And remove some tracking that will no longer be necessary.
Otherwise D3DPOOL_DEFAULT can hit the draw time late upload path.
In order to finally fix some of our address space problems, I took a page out of Gallium Nine's book.
Most resources require a copy in system memory that we have to keep around in order to avoid stalling in Lock* calls. Those take up a lot of address space leading to crashes when we exhaust the 2 or respectively 3GB (with LAA) of address space.
To avoid that, we need to unmap resources once the application is done with them on the CPU. In theory that is possible with Vulkan memory, there is however two problems with doing it with Vulkan memory directly:
To work around those limitations, we use Win32 memory mapped files. Those can be mapped and unmapped as we please.
Allocation & Mapping
We suballocate from 64MB memory mapped files to avoid overhead of allocating those. 64MB is quite a lot so we try to lock at the level of a suballocation. There is however a problem with that: MapViewOfFile requires the offset to be aligned to the memory allocation granularity which is 65k. To avoid a lot of wasted memory and address space and lots of MapViewOfFile calls for tiny resources, there are two strategies. We try to allocate tightly packed, so we cannot guarantee the alignment. Every mem file is split up into "mapping pages" that are 1MB (+- alignment to 65k) each. When a resource gets mapped, we either map the entire page or reuse the existing pointer. If a resource is either larger than such a page or crosses from one page to another, it gets a separate mapping. We round down the offset to 65k and increase the size accordingly.
D3D resources
We use memory mapped files for all texture types, except ones placed in D3DPOOL_DEFAULT. I originally used it for buffers too but dropped that because Crysis apparently reads or writes from/to a locking pointer outside of the correct scope. That's not super rare either according to Joshua.
GPU Readback
D3DPOOL_SYSTEMMEM textures can be written by the GPU with functions like GetRenderTargetData. When that happens, we lazily create a DXVK buffer and use that instead of the memory mapped file allocation for all future locking calls. That works well because GetRendertargetData and GetFrontbufferData usually overwrite the entire image. We compare the texture sizes and when the destination texture is smaller, we copy over the data from the memory mapped file. After that, we free that and use the buffer for everything else.
Unmapping
Unmapping is done using a least recently used list. There is a configurable virtual memory budget which is set to 100MB by default. Once we cross that, we start unmapping old resources all the way until we are only using 3/4th of the budget.
We can only unmap resources that aren't currently locked, otherwise we could be potentially invalidating a pointer that the application is still going to use.
64 bit builds & DXVK Native
All of this is only enabled for 32bit Win32 builds. It gets removed by the preprocessor otherwise.
Cleanup
I removed the direct upload path and because we pretty much always use the staging buffer upload path now anyway. It's necessary for the memory mapped files to work and the direct path was only implemented because I was worried about raised address space usage. We now have a better solution for that. Along with that, there's some nice cleanup in the LockImage/LockBuffer functions.
I also removed evictManagedOnUnlock. This PR basically solves the same problem without the downsides. The option has never really been useful as pretty much all games that tended to crash also relied on the system memory copies to avoid terrible hitches.