Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does this actually override CUDA or NVENC sessions? #53

Closed
imayo opened this issue Feb 10, 2019 · 36 comments
Closed

Does this actually override CUDA or NVENC sessions? #53

imayo opened this issue Feb 10, 2019 · 36 comments
Assignees
Labels

Comments

@imayo
Copy link

imayo commented Feb 10, 2019

I tried this for 418.81 on Windows 10 64 bits and is not working.
Our software uses NVENC and after testing I was not able to run more than 2 NVENC instances. My concers are in why is nvcuvid.dll being patched, as far as I know this dll was the old one with CUDA enc/dec implementations. New NVENC encoder implementation seems to not rely on nvcuvid.dll, it is not even loaded into our server process, in contrast when using NVENC the one loaded is:

NVIDIA Video Encoder API, Version 8.0
C:\Windows\System32\nvEncodeAPI64.dll

Could this patch be made for this DLL?

@imayo imayo added the bug label Feb 10, 2019
@niXta1
Copy link
Contributor

niXta1 commented Feb 10, 2019

Do you use ffmpeg?

@imayo imayo closed this as completed Feb 10, 2019
@Snawoot Snawoot reopened this Feb 10, 2019
@Snawoot Snawoot added question and removed bug labels Feb 10, 2019
@Snawoot Snawoot self-assigned this Feb 10, 2019
@imayo
Copy link
Author

imayo commented Feb 10, 2019

To clarify things :)
The software we are developing is Dixper (www.dixper.gg) and we use directly nvidia video codec SDK, support also GRID SDK. I tried applying the path and running our server to see if we could connect more than 2 peers with NVENC encoder but after patching it did just behave like without it, 3rd NVENC instance will fail with I think NV_ENC_ERR_OUT_OF_MEMORY. After that i did check if nvcuvid.dll was in the process and it is not, so maybe we are using something so different from ffmpeg/Plex.

@imayo
Copy link
Author

imayo commented Feb 10, 2019

Do you use ffmpeg?

No, we don´t, we use Nvidia Video Codec SDK and GRID SDK directly.

@Snawoot
Copy link
Collaborator

Snawoot commented Feb 10, 2019

@imayo

Nice to meet you. Let's say I'm guy with debugger and disassembler here.

This patch is intended to patch NVENC (and only NVENC). This patch should work with any NVENC-enabled software but testing criteria is still ffmpeg since many software derived from it.

About your concern for the name of patched library: as I mentioned before, nvcuvid.dll is loaded dynamically by NvEncodeAPI.dll

Your error NV_ENC_ERR_OUT_OF_MEMORY looks like you really use NVENC, but something goes wrong. For some reason library is not patched or rolled back by System File Protection after patch, or 32bit-library is used somehow.

In order to sort things out please perform test with 64bit ffmpeg. You may run 3 simultaneous transcodes by issuing command like this:

ffmpeg -i input.avi -s 1280x720 -v:c h264_nvenc output1.mp4 -s 640x480 -v:c h264_nvenc output2.mp4 -s 320x240 -v:c h264_nvenc output3.mp4

If it will simply fail with same error we will know patch is simply not applied. If it'll work we shall seek problem somewhere else.

@imayo
Copy link
Author

imayo commented Feb 10, 2019

@imayo

Nice to meet you. Let's say I'm guy with debugger and disassembler here.

This patch is intended to patch NVENC (and only NVENC). This patch should work with any NVENC-enabled software but testing criteria is still ffmpeg since many software derived from it.

About your concern for the name of patched library: as I mentioned before, nvcuvid.dll is loaded dynamically by NvEncodeAPI.dll

Your error NV_ENC_ERR_OUT_OF_MEMORY looks like you really use NVENC, but something goes wrong. For some reason library is not patched or rolled back by System File Protection after patch, or 32bit-library is used somehow.

In order to sort things out please perform test with 64bit ffmpeg. You may run 3 simultaneous transcodes by issuing command like this:

ffmpeg -i input.avi -s 1280x720 -v:c h264_nvenc output1.mp4 -s 640x480 -v:c h264_nvenc output2.mp4 -s 320x240 -v:c h264_nvenc output3.mp4

If it will simply fail with same error we will know patch is simply not applied. If it'll work we shall seek problem somewhere else.

I am getting:

Invalid loglevel "h264_nvenc". Possible levels are numbers or:
"quiet"
"panic"
"fatal"
"error"
"warning"
"info"
"verbose"
"debug"
"trace"

Maybe the binaries i downloaded are compiled without nvenc support?

@Snawoot
Copy link
Collaborator

Snawoot commented Feb 10, 2019

@imayo No, it's unlikely. Binaries at FFmpeg site refer to this site and they are built with NVENC.

Probably it's typo in your command line. You may post it here and we'll take a look.

@imayo
Copy link
Author

imayo commented Feb 10, 2019

@imayo No, it's unlikely. Binaries at FFmpeg site refer to this site and they are built with NVENC.

Probably it's typo in your command line. You may post it here and we'll take a look.

Yeah, got it -c:v instead of -v:c

@imayo
Copy link
Author

imayo commented Feb 10, 2019

Yeah, it has no problems with it, it can encode 3 files concurrently.
I have seen debug logs and it is using

It seems that it is using cuda functions to use cuda device as input for nvenc. Nvenc can be initialized with CUDA or DirectX and we use DirectX device so maybe this patch currently does only unlock encoding sessions initialized with NV_ENC_DEVICE_TYPE_CUDA.

EDIT: We can now encode more than 2 concurrent streams after runing ffmpeg code sample.

@Snawoot
Copy link
Collaborator

Snawoot commented Feb 10, 2019

It looks weird. Is it possible in your dev environment co-exist multiple versions of nvcuvid.dll, probably installed with some additional SDK package?

@imayo
Copy link
Author

imayo commented Feb 10, 2019

I can confirm. I was able to encode more than 2 instances with our software, if I reboot windows after that i can´t encode more than 2 again but if I run ffmpeg code again then we can encode again more than 2 instances.
My guess, maybe the code you are modifiying at nvcuvid when initialized does enable some flag that nvidia uses to enable this windows session to encode more than 2 streams.

@Snawoot
Copy link
Collaborator

Snawoot commented Feb 10, 2019

No, it removes conditional jump leading to failure return, when one of subroutines indicates active sessions above limit.

Probably you should use x64dbg and see which libraries are getting loaded. This debugger has useful feature to set breakpoint on each dll load, including programmatically initiated dynamic loads. I bet different set of libraries co-exist in system.

@imayo
Copy link
Author

imayo commented Feb 10, 2019

No, it removes conditional jump leading to failure return, when one of subroutines indicates active sessions above limit.

Probably you should use x64dbg and see which libraries are getting loaded. This debugger has useful feature to set breakpoint on each dll load, including programmatically initiated dynamic loads. I bet different set of libraries co-exist in system.

I will try, although this is not in my skills :)
And, why would you say when executing ffmpeg which calls patched nvcuvid then our software can initialize more nvenc instances¿

@imayo imayo closed this as completed Feb 10, 2019
@jantenhove
Copy link

jantenhove commented Feb 26, 2019

I have exactly the same problem. We use NVENC directly with Direct3D. After patching we still get the NV_ENC_ERR_OUT_OF_MEMORY for the third session. When analyzing the libraries that are being loaded by the executable, we see 'nvEncodeAPI64.dll' getting loaded. Nvcuvid.dll is not loaded by our binary.

I can also confirm that running the FFMPEG command above enables 1 extra session for our software. The fourth sessions still fails with the out of memory error. When i change the ffmpeg command to create 6 outputs, I can use 6 NVENC sessions in our software.

@Snawoot Snawoot reopened this Feb 26, 2019
@Snawoot Snawoot added the bug label Feb 26, 2019
@Snawoot
Copy link
Collaborator

Snawoot commented Feb 26, 2019

@jantenhove Hello,

Is there some way I can reproduce it on clean Windows machine? Probably some mininal executable would be ideal.

@jantenhove
Copy link

@jantenhove Hello,

Is there some way I can reproduce it on clean Windows machine? Probably some mininal executable would be ideal.

Thanks for reopening. I will create a simple test program based on the Direct3D sample from the SDK.

@jantenhove
Copy link

@Snawoot
I've create a simple test program: https://www.filehosting.org/file/details/784491/nvenc-patch-test.exe
It tries to create 3 encoding sessions on each graphics card and shows if it succeeds or fails. You probably need vs 2017 redistributable installed.

When it fails after creating 2 encoding sessions, you can run the ffmpeg command from #53 (comment) (with c:v instead of v:c). After that, you should be able to create more than 2 encoding sessions until you restart the computer.

@Snawoot
Copy link
Collaborator

Snawoot commented Feb 27, 2019

@jantenhove Thank you! I'm going to start looking at it.

@Snawoot
Copy link
Collaborator

Snawoot commented Feb 27, 2019

@jantenhove This is a 32bit binary which uses libraries from %WINDIR%\SysWOW64. It's a 32bit versions of libraries and they are not patched. Speaking of 32bit apps, I shall not support them because 32bit patch requires almost same efforts as 64bit, despite it is a legacy platform.

Also I can confirm: nvcuvid.dll doesn't loaded at all in this app.

Could you please provide x64 build of your test app? Maybe it is possible to derive solution which fits both for D3D and CUDA encoding session.

@Snawoot
Copy link
Collaborator

Snawoot commented Feb 27, 2019

I just had some important discovery.

32bit ffmpeg build exhibits exactly same behavior. It fails to open 3 sessions on patched system, but after successful run of 64bit version of ffmpeg, it becomes capable to open 3 sessions.

@jantenhove your x64 test binary will be very helpful for revealing roots of problem and distinct between CUDA vs D3D mode and 32bit vs 64bit.

@jantenhove
Copy link

@Snawoot Sorry for uploading the wrong binary. I debugged the x64 version, but created a x86 release build. Anyway, here is a x64 build: https://www.filehosting.org/file/details/784619/nvenc-patch-test.exe

@Snawoot
Copy link
Collaborator

Snawoot commented Mar 1, 2019

Here is my results. Journey through about ten levels of call stack leads to D3D and then to nvwgf2umx.dll. This patch has been applied in memory of process debugged with x64dbg. With this patch D3D encoding session opens successfully and this bumped limit also persists even if process restarted with unmodified library nvwgf2umx.dll.

Probably this discovery also may help Plex users on Windows since Plex currently uses dxva2 and MF.

But there is two problems:

  1. I didn't tested real encoding because test binary only opens encoding sessions. But I think most likely it should work.
  2. nvwgf2umx.dll is part of driver component and covered by system integrity protection or digital signatures. Normally I can't modify this file even with Administrator privileges. Of course, I can modify file on disk using external OS. Actually I did it with mount of qcow2 image of Windows system to my host Linux system. It appears system won't load modified library.

It seems to me, it is more practical to bump D3D sessions via bumping CUDA sessions with some sort of minimal binary opening several sessions, because it is simpler to add single one-shot executable to autostart than bothering with system protection every time. Also maintenance of one binary patch takes less efforts than maintenance of two.

If someone feels like he is up to implement such bumping binary in form of standalone application with source code - feel free to make Pull Request. Also parameterized script for FFmpeg will do. I wonder if ffmpeg has some sort of dummy input which fits here best.

@Snawoot
Copy link
Collaborator

Snawoot commented Mar 2, 2019

And here is minimal ffmpeg script which bumps 10 sessions with nullsrc input and null output: https://gist.github.com/Snawoot/243c53bb52044297f5ceb6125d59dc93 (don't forget to set actual ffmpeg path in script).

I'll add this with proper description to win readme.md thereby closing this issue.

Snawoot added a commit that referenced this issue Mar 2, 2019
Snawoot added a commit that referenced this issue Mar 2, 2019
@Snawoot
Copy link
Collaborator

Snawoot commented Mar 2, 2019

@jantenhove Thank you for your code!

@jantenhove
Copy link

@Snawoot Thank you for analyzing the problem so quickly.
If there is demand I can make a dedicated, open source program to unlock a configurable number of D3D11 sessions.

@Snawoot
Copy link
Collaborator

Snawoot commented Mar 2, 2019

@jantenhove Yes, this will be much better than current trick with ffmpeg, so we'd appreciate such contribution.

@jantenhove
Copy link

I have created a 'session bump' program which bumps the sessions for Direct3D by creating a configurable number of Cuda encoding sessions. Code + binary can be found here: https://github.com/jantenhove/NvencSessionLimitBump

Anyone willing to test/comment?

@Snawoot
Copy link
Collaborator

Snawoot commented Mar 4, 2019

@jantenhove I'm going to set up clean VM with Windows 10 installation within couple of days. I'm planning to check which dependencies required (if they are) and do all walkthrough manually.

@jantenhove
Copy link

@Snawoot In theory you'll only need the Visual Studio 2017 Redistributable (x64) when using the binary. When compiling yourself, you need the Nnvidia Video Codec SDK + Cuda SDK installed. I've created a small readme: https://github.com/jantenhove/NvencSessionLimitBump/blob/master/readme.md

@Snawoot
Copy link
Collaborator

Snawoot commented Mar 4, 2019

@jantenhove

Your app works just fine with VC++ redist installed.

I asked fellows to see if this code can be statically linked against VC++ runtime in order to simplify things for users and make app standalone. @svjukov modified project to link runtime statically, leaving link to nvcuda dynamic. I tested app on clean system without VC2017 Redist and it works without a hitch.

Sergey prepared PR awaiting for your review. I think it is very useful change and hope for merge.

@jantenhove
Copy link

The PR is merged. I commented on the PR report. It's nice to see it working for others!

@Snawoot
Copy link
Collaborator

Snawoot commented Mar 4, 2019

Thank you! I'll have to update docs for this patch to add reference to new workaround. Could you please issue new release with static binary or add static binary to current latest release?

@jantenhove
Copy link

I will do that tomorrow. I'm currently on mobile. Thanks for all your work!

@jantenhove
Copy link

New release is uploaded: https://github.com/jantenhove/NvencSessionLimitBump/releases

@Snawoot
Copy link
Collaborator

Snawoot commented Mar 5, 2019

Thank you!

@Snawoot Snawoot mentioned this issue Mar 5, 2019
@Snawoot Snawoot pinned this issue Mar 5, 2019
@niXta1
Copy link
Contributor

niXta1 commented Mar 6, 2019

Good job guys! ⭐⭐⭐⭐⭐

@matthew1972
Copy link

the 3d bump is working for me great work guys ….

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants