Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(win/video): support native YUV 4:4:4 encoding #2533

Merged
merged 8 commits into from
Aug 16, 2024

Conversation

ns6089
Copy link
Contributor

@ns6089 ns6089 commented May 15, 2024

Description

Adds support for YUV 4:4:4 encoding, requires changes on moonlight side. Windows-only for now.

moonlight-common-c pull request: moonlight-stream/moonlight-common-c#91 merged
moonlight-qt pull request: moonlight-stream/moonlight-qt#1282 merged

Current state

  • nvidia gpus support is implemented by using nvenc directly (Windows-only)
  • intel gpus support was implemented blindly, no idea if it works at all (Windows-only)
  • amd gpus don't support YUV 4:4:4 at all
  • nvenc doesn't accept any direct3d surfaces for 10-bit 4:4:4 encoding, so we have to use cuda interop
    • cuda runtime can't be unloaded once loaded
      • it doesn't seem to affect gpu idle power state, so we should be fine
    • nvenc in cuda mode leaks cpu memory on decoder destruction (nvenc-mapped cuda surfaces can't be unmapped and unregistered)
      • fixed by slightly adjusting the api calls
  • linux support may be possible through ffmpeg, but not yet implemented or even investigated

Screenshot

moonlight_yuv444

Issues Fixed or Closed

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Dependency update (updates to dependencies)
  • Documentation update (changes to documentation)
  • Repository update (changes to repository files, e.g. .github/...)

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated the in code docstring/documentation-blocks for new or existing methods/components

Branch Updates

LizardByte requires that branches be up-to-date before merging. This means that after any PR is merged, this branch
must be updated before it can be merged. You must also
Allow edits from maintainers.

  • I want maintainers to keep my branch updated

@mirh
Copy link

mirh commented May 28, 2024

Cool, didn't realize NVENC supported that (even though AV1 is still software only right?).
And I see an image is worth a thousand words. Would there also be benefits with the lossless profile? Or is that just a bandwidth (and/or latency?) monstrosity?

@ns6089
Copy link
Contributor Author

ns6089 commented May 28, 2024

AV1 is still software only right?

Yes, no hardware encoding for AV1 4:4:4 on current generation of gpus

Would there also be benefits with the lossless profile?

I'm not against the idea of supporting lossless encoding options (NVENC can do it for H.264 and HEVC, not AV1), but current netcode imposes hard limit on maximum video packet size, and this limit is very easy to hit on lossless. So need to improve the netcode first.

@mirh
Copy link

mirh commented May 28, 2024

Supposedly.. it may even be possible to dynamically switch between lossy and lossless?
For as much as maybe it's not really necessary, if yuv444 can already score a SSIM of 0.98.

@ns6089
Copy link
Contributor Author

ns6089 commented May 28, 2024

Supposedly.. it may even be possible to dynamically switch between lossy and lossless?

You're describing near lossless encoding, or including both DCT transform and quantization bypasses into rate control assessment. As far as I know, NVENC is not capable of this (quantization can be dynamically lossless when rate control selects QP=4, but DCT transform bypass is static on/off switch).

@ns6089
Copy link
Contributor Author

ns6089 commented Jun 25, 2024

Should be more or less done.
Still need to figure out how to best handle the NVENC/CUDA driver bug, and maybe adjust how 8-bit colors are mapped into 10-bit colors depending on particular client renderer (255 colors don't evenly map into 1023 colors).
But the code should be already complete and correct, as far as I can tell.

@ns6089 ns6089 marked this pull request as ready for review June 25, 2024 16:49
@ns6089
Copy link
Contributor Author

ns6089 commented Jun 25, 2024

Ah, and I still have no idea if Intel encoder works correctly since I don't have supported hardware at hand right now.

@ns6089 ns6089 changed the title Support YUV 4:4:4 encoding Support native YUV 4:4:4 encoding (Windows-only for now) Jun 26, 2024
@ns6089
Copy link
Contributor Author

ns6089 commented Jun 28, 2024

Resolved the CUDA bug, only minor stuff is left.

@ReenigneArcher
Copy link
Member

We have some new patterns for docs which help produce cleaner doxygen docs. https://docs.lizardbyte.dev/projects/sunshine/en/master/source_code/source_code.html

@ns6089
Copy link
Contributor Author

ns6089 commented Jun 28, 2024

Alright, I will update the comments to what the codebase will be using at the time of merge. Currently this PR is held back at moonlight side, and merging one without the other is pointless.

@ReenigneArcher
Copy link
Member

Currently this PR is held back at moonlight side

I will mark this as draft. Please mark it ready, once moonlight side is taken care of.

@ns6089
Copy link
Contributor Author

ns6089 commented Aug 1, 2024

@mohemohe Much appreciated, there was a redundant check that I missed. Pushed the fix, new CI builds should be ready shortly.

@mohemohe
Copy link

mohemohe commented Aug 2, 2024

@ns6089
Thanks for the quick response.
I can now connect with YUV 4:4:4 enabled, but it does not seem to be encoded in 4:4:4.

There is no longer the hassle of switching options for each host.

Moonlight: https://ci.appveyor.com/project/cgutman/moonlight-qt/builds/50287816/job/p870yhl7y1gd587n/artifacts
Sunshine: https://github.com/LizardByte/Sunshine/actions/runs/10206571176 (0.0.0.cd06345e30d2569961370781cbf64aa374d1c900)

CleanShot_2024-08-02_11-18-23

sunshine_verbose_log_cd06345e30d2569961370781cbf64aa374d1c900.txt

@ns6089
Copy link
Contributor Author

ns6089 commented Aug 2, 2024

Hmm, it appears we send the right surfaces but then ffmpeg misconfigures the encoder and internal conversion happens.
This one is going to be tougher....

@ns6089
Copy link
Contributor Author

ns6089 commented Aug 2, 2024

@mohemohe I tried to nudge quicksync in the right direction, CI builds are up.
If it doesn't work we will have to patch ffmpeg 😞

@mohemohe
Copy link

mohemohe commented Aug 3, 2024

@ns6089
I tried build #8648 and it appears to be encoding correctly in YUV4:4:4.
Thank you for your great work.
CleanShot_2024-08-03_17-45-30

src/video.cpp Outdated
{}, // SDR-specific options
{}, // HDR-specific options
{}, // YUV444 SDR-specific options
{}, // YUV444 HDR-specific options
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're relying on QSV to automatically determine the best profile by the input pixel format. Is it going to pick a regular HEVC Main profile to encode 4:4:4 on older CPUs that lack 4:4:4 encoding support?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I brought back the explicit profile selection through the dictionary option so I think it should be fixed now.

src/video.cpp Outdated
// Fallback options
{}, // SDR-specific options
{}, // HDR-specific options
{}, // YUV444 SDR-specific options
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QSV doesn't support H.264 High 4:4:4, so it ends up picking H.264 Main profile when the client asks for 4:4:4. We should probably block H.264 4:4:4 from being used with QSV to prevent this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the code to explicitly request the profile QSV doesn't support, so I think it should now fail the encoder validation.
In the future (or if QSV still falls back to some valid profile) we might want to add more flexible codec blacklisting logic, but this PR is large enough already so I'd rather not do it here.

src/video.cpp Show resolved Hide resolved
cgutman
cgutman previously approved these changes Aug 16, 2024
Copy link
Collaborator

@cgutman cgutman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are definitely still some QSV issues, but I think we can merge this to unblock your other work and get some broader testing. I don't think there are regressions to existing functionality.

Locally I noticed:

  • QSV seems to treat those profile options as soft suggestions and will simply ignore them if the hardware cannot support them (ex: AV1 Profile 1 or H.264 High 4:4:4 on any current Intel GPU)
  • Some much older Intel GPUs seem to not even be able to handle the YUV 4:4:4 surfaces at all and fail at runtime to encode anything when asked to stream in 4:4:4

@ns6089
Copy link
Contributor Author

ns6089 commented Aug 16, 2024

That's a bummer, but in this case I'm not sure what we can even do with the degree of QSV functionality that ffmpeg exposes.
What we need is MFXVideoENCODE_Query()

If the in parameter is non-zero, the function checks the validity of the fields in the input structure. Then the function returns the corrected values in the output structure. If there is insufficient information to determine the validity or correction is impossible, the function zeroes the fields. This feature can verify whether the implementation supports certain profiles, levels or bitrates.

@ns6089
Copy link
Contributor Author

ns6089 commented Aug 16, 2024

I guess we can simply blacklist h.264 and av1 4:4:4 for qsv bypassing ffmpeg entirely, should be enough for a while.

@ns6089
Copy link
Contributor Author

ns6089 commented Aug 16, 2024

Either way, let's go ahead and merge this in its current state. I will add h.264/av1 blacklist shortly after.

@ns6089
Copy link
Contributor Author

ns6089 commented Aug 16, 2024

@cgutman Something like this #3029 6488a2c

@ns6089 ns6089 changed the title Support native YUV 4:4:4 encoding (Windows-only for now) feat(win/video): support native YUV 4:4:4 encoding Aug 16, 2024
Copy link

sonarcloud bot commented Aug 16, 2024

@ReenigneArcher ReenigneArcher merged commit bfdfceb into LizardByte:master Aug 16, 2024
46 of 49 checks passed
KuleRucket pushed a commit to KuleRucket/Sunshine that referenced this pull request Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants