Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FidelityFX Super Resolution 2.2 (FSR 2.2.1) support. #81197

Merged
merged 1 commit into from
Sep 25, 2023

Conversation

DarioSamo
Copy link
Contributor

@DarioSamo DarioSamo commented Aug 31, 2023

FidelityFX Super Resolution 2.2

Introduces support for FSR2 as a new upscaler option available from the project settings.

It's worth noting that while I've done my best effort in making improvements to the pipeline to provide the information as well as possible to FSR2 with the minimal performance impact, some artifacts inherent to the algorithm are to be expected. This is a pretty competitive area at the moment where both DLSS and FSR aren't perfect in all cases and it'll vary a lot on the type of content that is used. While I can address artifacts introduced by my own implementation due to errors, if the contents of the render fit what FSR2 expects, there's not much else I can do. Therefore, we should probably limit reporting to noticeable issues (e.g. broken results) that aren't FSR2 problems.

To verify

There's some UX issues where it'd be appreciated to gather some feedback on how they should be solved.

  • We can arguably improve loading times in general if FSR2 is not used by opting to move shader initialization outside of the FSR2Effect constructor, but maybe the delay when turning on the feature is not desired.

While the FSR version included in the PR mostly matches the public GitHub version, some changes have been made to better accommodate for Godot's needs.

TODO

  • MSAA seems to work with TAA but not FSR2. Probably not desired anyway but should be fixable.
  • Mark modified sections of FSR2 code with comment blocks.
  • Viewport constant for FSR2 is missing or not exposed
  • Extra newlines on shader files
  • Move FSR2 Context to forward clustered render buffers
  • Add extra get_texture_slice_view function for public-facing API
  • Fix validation errors when using SSS when using SSS or other separate specular effects.
  • Clear alpha to zero after opaque pass when using SSS or other separate specular effects (check specular_merge).
  • Verify Glow
  • Verify Auto exposure
  • Fix DOF


🍀 This work has been financed and kindly donated to the Godot Engine project by W4 Games. 🍀

@mrjustaguy
Copy link
Contributor

mrjustaguy commented Aug 31, 2023

My initial report on FSR 2.2 vs TAA

FSR 2.2 is not compatible with MSAA, as it breaks rendering at lower resolutions, though it is usable with Native resolution if TAA is also enabled (else it turns into a jittering mess)

Aside From that, Comparing with TAA, There is no Motion pixelization with FSR2, However static scenes with high frequency detail are a little less stable, It also seems to be properly resolving Transparencies unlike TAA which breaks with transparencies causing anything affected by transparent surfaces to go crazy (say Quad effects show the underlying quad moving with TAA, not so with FSR 2.2)

FSR 2.2 costs 2x as TAA, However looks significantly better IMHO and is well worth it, but All other Anti Aliasing should probably be forced to off when FSR 2.2 is on
One thing that FSR is not good for however is what it was intended for, Upscaling. It results in a highly pixelated look, especially around Alpha transparencies, and kinda feels like mixing Integer Scaling and FSR 1 with TAA, however these issues seem to be FSR 2 issues and not Godot's Implementation of it (Same issues have been observed in shipped games, Such as Star Wars Jedi Survivor)

@Saul2022
Copy link

as noted by mrj quality increases a lot with fsr 2.2, though performance on an igpu is pain xdd, though the flickering is gone unlike taa 1.0. some screenshots: at scaling quality at 0.5 the pixelated one is bilinear, the flicker fsr 1 and the quality one is on 2.2editor_screenshot_2023-08-31T175135editor_screenshot_2023-08-31T175122

editor_screenshot_2023-08-31T175038

@Calinou
Copy link
Member

Calinou commented Aug 31, 2023

Note that this PR includes changes to optimize motion vector generation (by deriving from the depth buffer when possible), which may improve performance substantially, even with traditional native TAA. Therefore, this PR might be enough to close #61905 (I haven't tested yet).

While the FSR version included in the PR mostly matches the public GitHub version, some changes have been made to better accommodate for Godot's needs.

I'd recommend adding // -- GODOT start -- and // -- GODOT end -- before and after modified sections in FSR2 code and shaders, so that updating these files is easier in the future. We do this in third-party libraries where we need to modify the code. cc @akien-mga

PS: The .patch file included is considered as a binary file in GitHub's web viewer, even though it's a text file. Maybe we need to fix something in .gitattributes.

@DarioSamo
Copy link
Contributor Author

DarioSamo commented Aug 31, 2023

Note that this PR includes changes to optimize motion vector generation (by deriving from the depth buffer when possible), which may improve performance substantially, even with traditional native TAA. Therefore, this PR might be enough to close #61905 (I haven't tested yet).

We'll probably see the biggest gains when the render list is sorted in a way that only renders static geometry first and dynamic geometry afterwards. This PR doesn't take care of that nor does it add that behavior to the existing TAA shader, but we can very well take advantage of the implementation here.

I'd recommend adding // -- GODOT start -- and // -- GODOT end -- before and after modified sections in FSR2 code and shaders, so that updating these files is easier in the future. We do this in third-party libraries where we need to modify the code.

Sounds good to me.

@ettiSurreal
Copy link
Contributor

Well someone has to ask about this.
FSR 3.0 is apparently supposed to come out as soon as this month, so any words on this?

@Calinou
Copy link
Member

Calinou commented Aug 31, 2023

Well someone has to ask about this. FSR 3.0 is apparently supposed to come out as soon as this month, so any words on this?

The FSR 3 code isn't available yet. This will likely be worked on after it's open sourced and has a Vulkan + GLSL port, but we can't give an ETA.

Copy link
Contributor

@BastiaanOlij BastiaanOlij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my remarks are more general ones, except for some of the FSR logic that ended up in RenderSceneBuffers, but either should be in RenderBufferDataForwardClustered or handled by the FSR effect itself like other effects do.

All in all this looks amazing!

@@ -732,6 +735,11 @@ void RendererSceneRenderRD::_render_buffers_debug_draw(const RenderDataRD *p_ren
}
}

if (debug_draw == RS::VIEWPORT_DEBUG_DRAW_INTERNAL_BUFFER) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably have made this remark earlier on, but we need a better name then internal buffer. Internal buffer is far to non-descript. Which internal buffer? We have dozens of internal buffers...

Copy link
Contributor Author

@DarioSamo DarioSamo Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should solve this in this particular PR? It matches the name of the internal texture on Render Buffers so that's pretty much why I used that.

Best alternatives I can think of is something like Color Buffer or related.

@Calinou
Copy link
Member

Calinou commented Sep 1, 2023

This fails to build with MinGW from Linux (Fedora 38) for me:

Compiling servers/rendering/renderer_rd/storage_rd/utilities.cpp ...
servers/rendering/renderer_rd/effects/fsr2.cpp: In function 'FfxErrorCode create_pipeline_rd(FfxFsr2Interface*, FfxFsr2Pass, const FfxPipelineDescription*, FfxPipelineState*)':
servers/rendering/renderer_rd/effects/fsr2.cpp:334:47: error: invalid conversion from 'const wchar_t*' to 'size_t' {aka 'long long unsigned int'} [-fpermissive]
  334 |                         wcsncpy(binding.name, L"r_dilated_motion_vectors", sizeof(binding.name) / sizeof(wchar_t));
      |                                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                               |
      |                                               const wchar_t*
servers/rendering/renderer_rd/effects/fsr2.cpp:334:97: error: invalid conversion from 'long long unsigned int' to 'const wchar_t*' [-fpermissive]
  334 |                         wcsncpy(binding.name, L"r_dilated_motion_vectors", sizeof(binding.name) / sizeof(wchar_t));
      |                                                                            ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
      |                                                                                                 |
      |                                                                                                 long long unsigned int
servers/rendering/renderer_rd/effects/fsr2.cpp:334:32: error: too few arguments to function 'errno_t wcsncpy_s(wchar_t*, size_t, const wchar_t*, size_t)'
  334 |                         wcsncpy(binding.name, L"r_dilated_motion_vectors", sizeof(binding.name) / sizeof(wchar_t));
      |                                ^
In file included from /usr/x86_64-w64-mingw32/sys-root/mingw/include/wchar.h:1561,
                 from /usr/x86_64-w64-mingw32/sys-root/mingw/include/c++/cwchar:44,
                 from /usr/x86_64-w64-mingw32/sys-root/mingw/include/c++/bits/postypes.h:40,
                 from /usr/x86_64-w64-mingw32/sys-root/mingw/include/c++/iosfwd:40,
                 from /usr/x86_64-w64-mingw32/sys-root/mingw/include/c++/system_error:40,
                 from /usr/x86_64-w64-mingw32/sys-root/mingw/include/c++/mutex:41,
                 from ./core/os/mutex.h:37,
                 from ./servers/rendering/renderer_rd/shader_rd.h:34,
                 from ./servers/rendering/renderer_rd/shaders/effects/fsr2_accumulate_pass.glsl.gen.h:5,
                 from servers/rendering/renderer_rd/effects/fsr2.h:34,
                 from servers/rendering/renderer_rd/effects/fsr2.cpp:31:
/usr/x86_64-w64-mingw32/sys-root/mingw/include/sec_api/wchar_s.h:316:27: note: declared here
  316 |   _CRTIMP errno_t __cdecl wcsncpy_s(wchar_t *_Dst,size_t _DstSizeInChars,const wchar_t *_Src,size_t _MaxCount);
      |                           ^~~~~~~~~
servers/rendering/renderer_rd/effects/fsr2.cpp: In function 'FfxResource get_resource_rd(RID*, const wchar_t*)':
servers/rendering/renderer_rd/effects/fsr2.cpp:496:27: error: invalid conversion from 'const wchar_t*' to 'size_t' {aka 'long long unsigned int'} [-fpermissive]
  496 |         wcsncpy(res.name, name, sizeof(res.name) / sizeof(wchar_t));
      |                           ^~~~
      |                           |
      |                           const wchar_t*
servers/rendering/renderer_rd/effects/fsr2.cpp:496:50: error: invalid conversion from 'long long unsigned int' to 'const wchar_t*' [-fpermissive]
  496 |         wcsncpy(res.name, name, sizeof(res.name) / sizeof(wchar_t));
      |                                 ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
      |                                                  |
      |                                                  long long unsigned int
servers/rendering/renderer_rd/effects/fsr2.cpp:496:16: error: too few arguments to function 'errno_t wcsncpy_s(wchar_t*, size_t, const wchar_t*, size_t)'
  496 |         wcsncpy(res.name, name, sizeof(res.name) / sizeof(wchar_t));
      |                ^
/usr/x86_64-w64-mingw32/sys-root/mingw/include/sec_api/wchar_s.h:316:27: note: declared here
  316 |   _CRTIMP errno_t __cdecl wcsncpy_s(wchar_t *_Dst,size_t _DstSizeInChars,const wchar_t *_Src,size_t _MaxCount);
      |                           ^~~~~~~~~
scons: *** [servers/rendering/renderer_rd/effects/fsr2.windows.editor.x86_64.o] Error 1
scons: building terminated because of errors.
[Time elapsed: 00:01:15.512]

Command line: scons -j1 platform=windows optimize=speed lto=full progress=no debug_symbols=yes

Also, the Viewport constant for FSR2 is missing or not exposed:

image

Benchmark

All tests done with this PR. I couldn't notice a performance difference with master compared to this PR when using native (or FSR 1) TAA.

Mode Native Quality Balanced Performance Ultra Performance
Bilinear (+ TAA) 270 FPS (3.70 mspf) 499 FPS (2.00 mspf) 566 FPS (1.76 mspf) 671 FPS (1.49 mspf) 875 FPS (1.14 mspf)
FSR 11 (+ TAA) N/A 453 FPS (2.20 mspf) 513 FPS (1.94 mspf) 599 FPS (1.66 mspf) 788 FPS (1.26 mspf)
FSR 2 229 FPS (4.36 mspf) 376 FPS (2.65 mspf) 420 FPS (2.38 mspf) 476 FPS (2.10 mspf) 585 FPS (1.70 mspf)

Binary size

Stripped Linux x86_64 optimize=speed lto=full editor build:

  • master: 104,858,720 bytes
  • This PR: 106,485,888 bytes (+ 1.62 MB)

Android APK size (release export template, arm64 only):

  • master: 19,065,334 bytes
  • This PR: 19,287,926 bytes (+ 223 KB)

Given the increase in binary size, we should look into excluding as much FSR2 code as possible on mobile/web platforms, as it won't be used there.

Visual comparison

Still scene

Output resolution: 3840×2160

TAA Native FSR2 Native
native fsr2_native
Bilinear Quality Bilinear Balanced Bilinear Performance Bilinear Ultra Performance
quality balanced performance ultra_performance
FSR1 Quality FSR1 Balanced FSR1 Performance FSR1 Ultra Performance
fsr1_quality fsr1_balanced fsr1_performance fsr1_ultra_performance
FSR2 Quality FSR2 Balanced FSR2 Performance FSR2 Ultra Performance
fsr2_quality fsr2_balanced fsr2_performance fsr2_ultra_performance

Scene in motion

Output resolution: 1920×1080

Camera is rotating towards the left. This means newly discovered pixels are on the left of the image. These are the ones to watch out for disocclusion artifacts 🙂

TAA Native FSR2 Native
motion_native_taa motion_fsr2_native
FSR2 Quality FSR2 Balanced FSR2 Performance
motion_fsr2_quality motion_fsr2_balanced motion_fsr2_performance

@DarioSamo I've noticed the disoccluded pixels look pin-sharp, almost like the image is using nearest-neighbor filtering before being upscaled. Is this expected? I recall most games looking softer with newly disoccluded pixels, even if fizzle does occur in more complex scenes.

Videos

All videos are in 1920×1080 (matching the rendering resolution used to record those videos), with high-quality 4:4:4 (full chroma) VP9 encoding. Each video is about 40 MB, so previews have been disabled.

Command used to record videos:

path/to/godot.binary --path /path/to/godot-reflection/ --resolution 1920x1080 --write-movie "$PWD/fsr2_balanced/input.png" --quit-after 367 -- --fsr2 --balanced --spin

ffmpeg -r 60 -i input%08d.png -c:v libvpx-vp9 -pix_fmt yuv444p -crf 15 fsr2_balanced.webm 

Footnotes

  1. Using same scale factors as FSR2 for easier comparison.

@mrjustaguy
Copy link
Contributor

there are some that don't look smooth and exhibit very similar FSR 2 behaviors, such as Star Wars Jedi Survivor mentioned above, though that is the only FSR 2 game where I've heard of such complaints so could be just bad implementation there too, and not an actual FSR 2 issue.

@DarioSamo
Copy link
Contributor Author

@DarioSamo I've noticed the disoccluded pixels look pin-sharp, almost like the image is using nearest-neighbor filtering before being upscaled. Is this expected? I recall most games looking softer with newly disoccluded pixels, even if fizzle does occur in more complex scenes.

That sounds about right to me. You can check the sample in https://github.com/GPUOpen-Effects/FidelityFX-FSR2 as well if you'd like a reference to compare to from AMD themselves, but I don't think anything in the implementation could cause that.

@Saul2022
Copy link

Saul2022 commented Sep 1, 2023

We'll probably see the biggest gains when the render list is sorted in a way that only renders static geometry first and dynamic geometry afterwards.

Yea though if you plan to add it will there be a sething where we choose if an object is static or dynamic ?, kind of similar to bastian pr, but for geometry.

@DarioSamo
Copy link
Contributor Author

DarioSamo commented Sep 1, 2023

Yea though if you plan to add it will there be a sething where we choose if an object is static or dynamic ?, kind of similar to bastian pr, but for geometry.

Nope, the plan would be to auto-detect it based on whether the element has moved or not.

@DarioSamo
Copy link
Contributor Author

Did all the changes that I marked on the TODO list as well as change the logic and possibly fixing that compilation error pointed out for MINGW.

@Saul2022
Copy link

Saul2022 commented Sep 1, 2023

Nope, the plan would be to auto-detect it based on whether the element has moved or not.

Alright, but it can have an effect even dissabling taa? I mostly ask, because it could fix this issue indirectly #73411 (comment)

@DarioSamo
Copy link
Contributor Author

No, there would be no effect when TAA or FSR2 isn't used as the performance degradation comes from writing motion vector information. If motion vectors aren't required, then there's no point in splitting the render lists.

@DarioSamo DarioSamo force-pushed the fsr2-rd branch 2 times, most recently from 594e4b3 to b1944df Compare September 4, 2023 14:27
@DarioSamo
Copy link
Contributor Author

DarioSamo commented Sep 4, 2023

I've added the optimization we talked about, a new type of render list for geometry that requires motion vectors will be used when FSR2 is used and motion is deemed as necessary. This should lead to significant savings in bandwidth since only geometry that moves will write motion vectors.

This improvement could be extended to the TAA shader once it's able to derive motion vectors from depth, but that'll probably be best separated into its own PR as there are some design changes that are required on the algorithm to support it correctly.

@DarioSamo DarioSamo force-pushed the fsr2-rd branch 2 times, most recently from 349b4df to 14ab457 Compare September 4, 2023 17:13
@DarioSamo DarioSamo marked this pull request as ready for review September 4, 2023 17:15
@DarioSamo DarioSamo requested a review from a team as a code owner September 4, 2023 17:15
@Calinou
Copy link
Member

Calinou commented Sep 12, 2023

Likewise DOF doesn't work at all with FSR 2 turned on

DoF in Godot is dependent on input resolution (i.e. before upscaling), given how the DoF strength is currently never scaled according to resolution. #57210 needs to be salvaged to prevent this.

@Rytelier
Copy link

There appear to be ghosting related to moving shaders which probably should be investigated.
I tested it on a "mirror" camera viewport shader which has ghosting when moving shader's camera.
image

@DarioSamo
Copy link
Contributor Author

DarioSamo commented Sep 15, 2023

There appear to be ghosting related to moving shaders which probably should be investigated. I tested it on a "mirror" camera viewport shader which has ghosting when moving shader's camera.

I'm not sure there's much to investigate here, it's not gonna grab motion vectors from a flat animated surface. Godot doesn't support sampling the motion vectors from your other render target.

The best we can hope for is to allow people to be able to output to the reactivity mask somehow, but that's not a feature that Godot currently has as a render target. I'm fairly sure you'd find the same problems with TAA as well.

An specific draw mode for surfaces like this might be a good idea, like proposed here #77523.

@DarioSamo
Copy link
Contributor Author

I've verified DOF is broken when using FSR2 as it was suggested on a comment before, will look into fixing it.

@DarioSamo
Copy link
Contributor Author

@mrjustaguy DoF should be working now, thanks for the report.

@Calinou
Copy link
Member

Calinou commented Sep 22, 2023

I've tested performance again on https://github.com/Calinou/godot-reflection/tree/fsr2-test with the latest revision of this PR (cf3fd3b1a). All tests done in 3840×2160 on a GeForce RTX 4090.

  • No AA: 334 FPS (3.00 mspf)
  • Native TAA: 264 FPS (3.79 mspf)
  • FSR2 native: 227 FPS (4.40 mspf)
  • FSR2 quality: 358 FPS (2.80 mspf)
  • FSR2 balanced: 405 FPS (2.47 mspf)
  • FSR2 performance: 459 FPS (2.18 mspf)

PS: Is #81350 incorporated in this PR? I've tested the latest revision of this PR, but mipmap bias isn't adjusted according to AMD's formula when FSR2 is enabled. This is noticeable if you decrease resolution scale all the way down near a checkerboard texture. Further manual changes of the Texture Mipmap Bias project settings aren't visible either.

@InitialCon
Copy link

Testing out this PR with Screen Space Reflections enabled and it seems that using FSR2 causes them to be even more jittery than when using the built-in taa. Is this a bug or just a downside of them being less blurry?
SSRBugFSR2
I'm using @Calinou's reflection test scene with normal settings.

@Calinou
Copy link
Member

Calinou commented Sep 22, 2023

Is this a bug or just a downside of them being less blurry?

I think this is just a consequence of SSR looking less blurry, making the jitter more obvious. SSR needs to be redesigned to take advantage of TAA jitter into account at some point.

@InitialCon
Copy link

Cool, just wanted to be sure in case it was a bug. I've seen the sdfgi lighting jittering a bit too but that might also be a downside of using a sharper taa solution.

@DarioSamo
Copy link
Contributor Author

PS: Is #81350 incorporated in this PR? I've tested the latest revision of this PR, but mipmap bias isn't adjusted according to AMD's formula when FSR2 is enabled. This is noticeable if you decrease resolution scale all the way down near a checkerboard texture. Further manual changes of the Texture Mipmap Bias project settings aren't visible either.

Nope it's not incorporated. Now that it's merged I'll rebase it based on that (although I'll go a bit less aggressive than AMD's formula by using -0.5 instead of -1 as it causes some visible jittering)

@DarioSamo
Copy link
Contributor Author

DarioSamo commented Sep 22, 2023

Rebased and now mipmap bias works properly. Texture quality should now be maintained better when using lower scales.

@mrjustaguy
Copy link
Contributor

I don't remember if this was mentioned anywhere before, so here goes:
What's the plan regarding Mesh LODing with FSR2 on? Are they planned to be always running as Native LODs with FSR2 or using current behavior (which shows Lower LODs in the reconstruction that can become quite apparent, not so with Bilinear/FSR1)

Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! I left one nitpick on an error message, but overall the code looks very good.

I tested locally with the TPS demo and tried various post FX (DoF, Glow, Auto exposure, etc) and it appears to work well.

Let's get this merged before the next dev release so we can get some testing before beta.

servers/rendering/renderer_viewport.cpp Outdated Show resolved Hide resolved
Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good to merge after a rebase to squash the commits

@DarioSamo
Copy link
Contributor Author

Done.

Introduces support for FSR2 as a new upscaler option available from the project settings. Also introduces an specific render list for surfaces that require motion and the ability to derive motion vectors from depth buffer and camera motion.
Copy link
Member

@akien-mga akien-mga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third-party code integration looks good.

@akien-mga akien-mga merged commit cd39da2 into godotengine:master Sep 25, 2023
15 checks passed
@akien-mga
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Integrate AMD FSR 2 in the Vulkan renderer (currently, FSR 1.0 is supported)