Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add deferred rendering #264

Merged
merged 142 commits into from
May 1, 2024
Merged
Show file tree
Hide file tree
Changes from 105 commits
Commits
Show all changes
142 commits
Select commit Hold shift + click to select a range
3e91760
feat: semi-deferred rendering
doodlum Mar 20, 2024
1742e4e
refactor: cleanup setting targets
doodlum Mar 20, 2024
154bfe4
chore: ae hook
doodlum Mar 20, 2024
2e45afa
feat: full albedo, improved snow shader support
doodlum Mar 20, 2024
2160765
feat: deferred direct and ambient light
doodlum Mar 21, 2024
9a6be79
feat: octahedron normal encoding
doodlum Mar 21, 2024
5450891
fix: increased normal precision
doodlum Mar 21, 2024
65d8180
feat: normal mapping shadows
doodlum Mar 22, 2024
16f42da
Merge pull request #227 from doodlum/dev
doodlum Mar 22, 2024
ff45f01
feat: blur normal mapping shadows with groupshared
doodlum Mar 22, 2024
c162055
Merge branch 'deferred2' of https://github.com/doodlum/skyrim-communi…
doodlum Mar 22, 2024
aabacd8
feat: separated normal mapping shadows pass
doodlum Mar 22, 2024
d8e9e66
chore: optimised sampling and formats
doodlum Mar 22, 2024
df524b5
chore: further optimisation
doodlum Mar 22, 2024
f0d3a1c
fix: depth distance
doodlum Mar 22, 2024
06d5700
fix: disabling CS ingame
doodlum Mar 22, 2024
f297fc7
feat: use alpha channel of render targets
doodlum Mar 22, 2024
10c4271
fix: envmapping with deferred
doodlum Mar 23, 2024
0af9dc8
fix: output color
doodlum Mar 23, 2024
5c960e3
feat: add initial VR support
alandtse Mar 23, 2024
e16e651
Merge pull request #228 from alandtse/deferred2
doodlum Mar 23, 2024
ae5beff
feat: moved screenspace shadows to deferred
doodlum Mar 23, 2024
e6b647e
fix: fix VR hook for shadowmask (#229)
alandtse Mar 24, 2024
9e586ae
chore: screenspace shadows performance tweaks
doodlum Mar 24, 2024
2bacd5e
Merge branch 'deferred2' of https://github.com/doodlum/skyrim-communi…
doodlum Mar 24, 2024
7001446
feat: wip porting screen shadows to deferred
doodlum Mar 25, 2024
f380885
feat: bend studios shadows
doodlum Mar 25, 2024
56e90db
feat: naive bilateral blur and optimisations
doodlum Mar 26, 2024
daf3b41
Delete FilterCS2.hlsl
doodlum Mar 26, 2024
4067e12
fix: removed unused filtering
doodlum Mar 26, 2024
1e96520
fix: missing ae hook
doodlum Mar 27, 2024
c20db12
feat: fixed double sided mesh shading
doodlum Mar 27, 2024
e02b8dd
chore: more optimal wave size
doodlum Mar 27, 2024
4436737
chore: halve kernel radius for performance
doodlum Mar 27, 2024
cc3613e
chore: revise shadow settings
doodlum Mar 27, 2024
8d372af
chore: optimisations, cleanup and moved NMS out of Bindings
doodlum Mar 27, 2024
9633b48
chore: removed dirlight shadows from deferred
doodlum Mar 27, 2024
95973b9
fix: black horizon
doodlum Mar 27, 2024
068c498
chore: cleanup cleaning up of bindings
doodlum Mar 27, 2024
0b9f943
chore: optimised blur + fixes
doodlum Mar 27, 2024
ea5f9b2
chore: optimise normal mapping shadows
doodlum Mar 27, 2024
4e37a20
fix: black grass
doodlum Mar 27, 2024
a95f084
fix: tree lod normals
doodlum Mar 27, 2024
2bc7471
fix: vegetation detection
doodlum Mar 27, 2024
d3dcd11
fix: ignore first-person
doodlum Mar 27, 2024
a29c3bf
chore: adjust default shadows settings
doodlum Mar 28, 2024
00ddda5
fix: multi-layer parallax deferred
doodlum Mar 30, 2024
35e6bd3
feat: add ssgi and terrain occlusion (#255)
Pentalimbed Apr 8, 2024
774d49e
build: fix compilation errors in terrainocclusion (#256)
alandtse Apr 8, 2024
6f42980
Merge branch 'deferred-shadows' into devmerge
doodlum Apr 8, 2024
be1768a
Merge pull request #263 from doodlum/devmerge
alandtse Apr 9, 2024
747c0db
style: 🎨 apply clang-format changes
alandtse Apr 9, 2024
40dab64
fix: fix warning `Ternary error with non scalar` (#262)
alandtse Apr 9, 2024
8de8946
refactor: simplify VR code and structures (#261)
alandtse Apr 9, 2024
68cd4d4
revert: "fix: fix warning `Ternary error with non scalar`" (#265)
alandtse Apr 9, 2024
13ba95a
fix: fix compilation errors
alandtse Apr 9, 2024
7aedd3e
refactor: use state version of render members
alandtse Apr 9, 2024
35ae2fd
fix: convert Albedo to Diffuse
alandtse Apr 9, 2024
2534b54
fix: fix compilation errors
alandtse Apr 9, 2024
9177ca6
Merge branch 'dev' of https://github.com/doodlum/skyrim-community-sha…
alandtse Apr 9, 2024
f3e1885
Merge pull request #266 from alandtse/deferred-shadows
alandtse Apr 9, 2024
8ec4a55
style: 🎨 apply clang-format changes
alandtse Apr 9, 2024
87d031c
feat: interior directional light
doodlum Apr 9, 2024
c86f92b
fix: disable vertex fix on foliage
doodlum Apr 9, 2024
1457572
fix: compiler warning with SSGI
doodlum Apr 9, 2024
801c3fa
style: 🎨 apply clang-format changes
doodlum Apr 9, 2024
8e243b1
feat: ported SSS to deferred
doodlum Apr 9, 2024
29f2f78
Merge branch 'deferred-shadows' of https://github.com/doodlum/skyrim-…
doodlum Apr 9, 2024
02db753
style: 🎨 apply clang-format changes
doodlum Apr 9, 2024
6eb9855
fix: consistent padding in shaders for older cards
Pentalimbed Apr 8, 2024
5ecf259
fix: foliage shadows
doodlum Apr 9, 2024
84ad90e
style: 🎨 apply clang-format changes
doodlum Apr 9, 2024
6cc6d40
Merge pull request #268 from Pentalimbed/deferred-shadows-staging
doodlum Apr 9, 2024
cdc6ce3
chore: update to latest CLIB-NG
FlayaN Apr 9, 2024
d64a56e
Merge pull request #269 from FlayaN/deferred-shadows
doodlum Apr 9, 2024
c9acb7b
feat: variable rate shading (#270)
doodlum Apr 10, 2024
f9b3889
style: 🎨 apply clang-format changes
doodlum Apr 10, 2024
e93214a
fix: VRS merge
doodlum Apr 10, 2024
c4455cb
Merge branch 'deferred-shadows' of https://github.com/doodlum/skyrim-…
doodlum Apr 10, 2024
286cac8
fix: fix vrs merge
FlayaN Apr 10, 2024
ee36ee7
Merge pull request #271 from FlayaN/deferred-shadows
doodlum Apr 10, 2024
56e72c8
fix: foliage mask
doodlum Apr 11, 2024
37dcace
style: 🎨 apply clang-format changes
doodlum Apr 11, 2024
de8fdd9
fix: black tree LOD
doodlum Apr 11, 2024
d2a6370
Merge branch 'deferred-shadows' of https://github.com/doodlum/skyrim-…
doodlum Apr 11, 2024
cea644c
style: 🎨 apply clang-format changes
doodlum Apr 11, 2024
18b32bc
fix: disable VRS on water
doodlum Apr 11, 2024
1cb9304
Merge branch 'deferred-shadows' of https://github.com/doodlum/skyrim-…
doodlum Apr 11, 2024
5feeaec
fix: fix common VR includes
FlayaN Apr 11, 2024
3f3f09e
Merge pull request #274 from FlayaN/deferred-shadows
doodlum Apr 11, 2024
ee1580b
refactor: Rename newly added hlsl without entry to hlsli
FlayaN Apr 11, 2024
4371b83
fix: depth linearisation
Pentalimbed Apr 11, 2024
19a2e24
Merge pull request #275 from FlayaN/deferred-shadows
doodlum Apr 11, 2024
9c10d4b
chore: soft shadow as per RDR2
Pentalimbed Apr 11, 2024
c7adbbc
fix: remove cloud shadows macros from rungrass and distanttree
Pentalimbed Apr 11, 2024
ea7726e
Merge pull request #276 from Pentalimbed/deferred-shadows-staging
doodlum Apr 11, 2024
e3a81ed
fix: use GET_INSTANCE_MEMBER macro to get proper state runtime
FlayaN Apr 11, 2024
15b2883
Merge pull request #277 from FlayaN/deferred-shadows
doodlum Apr 11, 2024
342501e
Merge branch 'deferred-shadows' of https://github.com/doodlum/skyrim-…
doodlum Apr 11, 2024
e00e7f0
chore: rename to deferred, VRS in interiors only
doodlum Apr 11, 2024
4517a82
style: 🎨 apply clang-format changes
doodlum Apr 11, 2024
17a46b5
fix: SSS blur direction
doodlum Apr 12, 2024
cfb5b71
Merge branch 'deferred-shadows' of https://github.com/doodlum/skyrim-…
doodlum Apr 12, 2024
7bca006
chore: reduce VRS threshold
doodlum Apr 12, 2024
fe87428
chore: removed tree LOD lighting feature
doodlum Apr 12, 2024
e93a1b7
feat: port deferred DistantTree to VR
FlayaN Apr 12, 2024
f67bd2e
feat: add deferred screen space shadows for VR
FlayaN Apr 12, 2024
5fa4454
Merge pull request #278 from FlayaN/vr-tree-deferred-shadows
doodlum Apr 12, 2024
e46ea99
Merge pull request #279 from FlayaN/vr-screen-space-shadows-deferred-…
doodlum Apr 12, 2024
b2942e9
fix: check for compute shader file existence (#280)
alandtse Apr 13, 2024
33fed01
fix: undo delete of ComputeShadingRate.hlsl
FlayaN Apr 13, 2024
2ab9a53
Merge pull request #282 from FlayaN/deferred-shadows
doodlum Apr 13, 2024
71dd8c3
fix: add back GetScreenDepth to RaymarchCS
FlayaN Apr 13, 2024
4450bf9
Merge pull request #283 from FlayaN/deferred-shadows
doodlum Apr 13, 2024
250e88b
feat: grass lighting deferred support
doodlum Apr 14, 2024
42adc52
style: 🎨 apply clang-format changes
doodlum Apr 14, 2024
c2dfb1a
feat: screenspace shadow VR support
doodlum Apr 14, 2024
3c3f42e
Merge branch 'deferred-shadows' of https://github.com/doodlum/skyrim-…
doodlum Apr 14, 2024
e26b97b
style: 🎨 apply clang-format changes
doodlum Apr 14, 2024
567b9ad
fix: shader errors and warnings
FlayaN Apr 14, 2024
ca0946e
Merge pull request #284 from FlayaN/deferred-shadows
alandtse Apr 14, 2024
ef34330
feat: effect shader deferred support
doodlum Apr 16, 2024
5f2dbf5
style: 🎨 apply clang-format changes
doodlum Apr 16, 2024
8186947
fix: effect shader compile issue
doodlum Apr 17, 2024
4a6ccd7
style: 🎨 apply clang-format changes
doodlum Apr 17, 2024
bca262d
chore: fix hlsl warnings
alandtse Apr 18, 2024
50d604a
fix: add hlsl include guards
alandtse Apr 18, 2024
d87b47c
chore: match alphatest naming
alandtse Apr 18, 2024
b85290c
feat: re effect for VR
alandtse Apr 18, 2024
02394eb
feat: re effect shader for VR (#286)
alandtse Apr 18, 2024
11b0820
chore: minor changes (#287)
Pentalimbed Apr 18, 2024
d9e7e18
Merge branch 'deferred-shadows' of https://github.com/doodlum/skyrim-…
alandtse Apr 21, 2024
dc32de9
fix: fix compilation errors
alandtse Apr 21, 2024
8671204
feat: re effect for VR
alandtse Apr 18, 2024
8815d59
Merge branch 'deferred-shadows' of https://github.com/alandtse/skyrim…
alandtse Apr 29, 2024
c2b10fc
Merge branch 'dev' of https://github.com/doodlum/skyrim-community-sha…
alandtse Apr 29, 2024
a879ab5
Merge pull request #297 from alandtse/deferred-shadows
alandtse Apr 30, 2024
679b237
fix: fix VR normal space shadows
alandtse Apr 30, 2024
2a1b1a3
chore: correct parameter description
alandtse Apr 30, 2024
f661f58
build: bump commonlibng
alandtse Apr 30, 2024
837e786
Merge branch 'dev' of https://github.com/doodlum/skyrim-community-sha…
alandtse Apr 30, 2024
ac95b04
Merge pull request #298 from alandtse/deferred-shadows
alandtse Apr 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
[submodule "extern/CommonLibSSE-NG"]
path = extern/CommonLibSSE-NG
url = https://github.com/alandtse/CommonLibVR.git
[submodule "extern/NVAPI"]
path = extern/NVAPI
url = https://github.com/NVIDIA/nvapi.git
6 changes: 6 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,16 @@ find_package(pystring CONFIG REQUIRED)
find_package(cppwinrt CONFIG REQUIRED)
find_package(unordered_dense CONFIG REQUIRED)
find_package(efsw CONFIG REQUIRED)

set(NVAPI_INCLUDE_DIR "${CMAKE_SOURCE_DIR}/extern/nvapi/" CACHE STRING "Path to NVAPI include headers/shaders" )
set(NVAPI_LIBRARY "${CMAKE_SOURCE_DIR}/extern/nvapi/amd64/nvapi64.lib" CACHE STRING "Path to NVAPI .lib file")

target_include_directories(
${PROJECT_NAME}
PRIVATE
${BSHOSHANY_THREAD_POOL_INCLUDE_DIRS}
${CLIB_UTIL_INCLUDE_DIRS}
${NVAPI_INCLUDE_DIR}
)

target_link_libraries(
Expand All @@ -63,6 +68,7 @@ target_link_libraries(
pystring::pystring
unordered_dense::unordered_dense
efsw::efsw
${NVAPI_LIBRARY}
)

# https://gitlab.kitware.com/cmake/cmake/-/issues/24922#note_1371990
Expand Down
1 change: 1 addition & 0 deletions extern/NVAPI
Submodule NVAPI added at 3a83ef
Original file line number Diff line number Diff line change
@@ -1,20 +1,23 @@
#include "../Common/DeferredShared.hlsli"
#include "../Common/VR.hlsli"

struct PerPassCloudShadow
{
uint EnableCloudShadows;

float CloudHeight;
float PlanetRadius;

float EffectMix;

float TransparencyPower;
float AbsorptionAmbient;

float RcpHPlusR;
};

StructuredBuffer<PerPassCloudShadow> perPassCloudShadow : register(t23);
TextureCube<float4> cloudShadows : register(t40);
StructuredBuffer<PerPassCloudShadow> perPassCloudShadow : register(t0);
TextureCube<float4> cloudShadows : register(t1);
Texture2D<unorm half> TexDepth : register(t2);

RWTexture2D<unorm float> RWTexShadowMask : register(u0);

SamplerState defaultSampler;

float3 getCloudShadowSampleDir(float3 rel_pos, float3 eye_to_sun)
{
Expand All @@ -38,13 +41,40 @@ float3 getCloudShadowSampleDirFlatEarth(float3 rel_pos, float3 eye_to_sun)
return v;
}

float3 getCloudShadowMult(float3 rel_pos, float3 eye_to_sun, SamplerState samp)
float3 getCloudShadowMult(float3 rel_pos, float3 eye_to_sun)
{
// float3 cloudSampleDir = getCloudShadowSampleDirFlatEarth(rel_pos, eye_to_sun).xyz;
float3 cloudSampleDir = getCloudShadowSampleDir(rel_pos, eye_to_sun).xyz;

float4 cloudCubeSample = cloudShadows.Sample(samp, cloudSampleDir);
float4 cloudCubeSample = cloudShadows.SampleLevel(defaultSampler, cloudSampleDir, 0); // TODO Sample in pixel shader
float alpha = pow(saturate(cloudCubeSample.w), perPassCloudShadow[0].TransparencyPower);

return lerp(1.0, 1.0 - alpha, perPassCloudShadow[0].EffectMix);
}

[numthreads(32, 32, 1)] void main(uint2 dtid
: SV_DispatchThreadID) {
float2 uv = (dtid + .5) * RcpBufferDim;
#ifdef VR
const uint eyeIndex = uv > .5;
#else
const uint eyeIndex = 0;
#endif

float3 ndc = float3(ConvertToStereoUV(uv, eyeIndex), 1);
ndc = ndc * 2 - 1;
ndc.y = -ndc.y;
ndc.z = TexDepth[dtid];

if (ndc.z > 0.9999)
return;

float4 worldPos = mul(InvViewMatrix[eyeIndex], mul(InvProjMatrix[eyeIndex], float4(ndc, 1)));
worldPos.xyz /= worldPos.w;

float3 dirLightDirWS = mul((float3x3)InvViewMatrix[eyeIndex], DirLightDirectionVS[eyeIndex].xyz);
float cloudShadow = getCloudShadowMult(worldPos.xyz, dirLightDirWS);

half shadow = RWTexShadowMask[dtid];
RWTexShadowMask[dtid] = shadow * cloudShadow;
}
16 changes: 0 additions & 16 deletions features/Grass Lighting/Shaders/RunGrass.hlsl
Original file line number Diff line number Diff line change
Expand Up @@ -313,10 +313,6 @@ float3x3 CalculateTBN(float3 N, float3 p, float2 uv)
# include "DynamicCubemaps/DynamicCubemaps.hlsli"
# endif

# if defined(CLOUD_SHADOWS)
# include "CloudShadows/CloudShadows.hlsli"
# endif

PS_OUTPUT main(PS_INPUT input, bool frontFace
: SV_IsFrontFace)
{
Expand Down Expand Up @@ -389,14 +385,6 @@ PS_OUTPUT main(PS_INPUT input, bool frontFace
dirLightColor *= SunlightScale;
}

# if defined(CLOUD_SHADOWS)
float3 cloudShadowMult = 1.0;
if (perPassCloudShadow[0].EnableCloudShadows && !lightingData[0].Reflections) {
cloudShadowMult = getCloudShadowMult(input.WorldPosition.xyz, DirLightDirection.xyz, SampColorSampler);
dirLightColor *= cloudShadowMult;
}
# endif

dirLightColor *= shadowColor.x;

# if defined(SCREEN_SPACE_SHADOWS)
Expand Down Expand Up @@ -480,10 +468,6 @@ PS_OUTPUT main(PS_INPUT input, bool frontFace
# endif

float3 directionalAmbientColor = mul(DirectionalAmbient, float4(worldNormal.xyz, 1));
# if defined(CLOUD_SHADOWS)
if (perPassCloudShadow[0].EnableCloudShadows && !lightingData[0].Reflections)
directionalAmbientColor *= lerp(1.0, cloudShadowMult, perPassCloudShadow[0].AbsorptionAmbient);
# endif
lightsDiffuseColor += directionalAmbientColor;

diffuseColor += lightsDiffuseColor;
Expand Down
2 changes: 2 additions & 0 deletions features/Screen Space GI/Shaders/Features/ScreenSpaceGI.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[Info]
Version = 2-9-0
198 changes: 198 additions & 0 deletions features/Screen Space GI/Shaders/ScreenSpaceGI/common.hlsli
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Copyright (C) 2016-2021, Intel Corporation
//
// SPDX-License-Identifier: MIT
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// XeGTAO is based on GTAO/GTSO "Jimenez et al. / Practical Real-Time Strategies for Accurate Indirect Occlusion",
// https://www.activision.com/cdn/research/Practical_Real_Time_Strategies_for_Accurate_Indirect_Occlusion_NEW%20VERSION_COLOR.pdf
//
// Implementation: Filip Strugar (filip.strugar@intel.com), Steve Mccalla <stephen.mccalla@intel.com> (\_/)
// Version: (see XeGTAO.h) (='.'=)
// Details: https://github.com/GameTechDev/XeGTAO (")_(")
//
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

// with additional edits by FiveLimbedCat/ProfJack

#ifndef SSGI_COMMON
#define SSGI_COMMON

#ifndef USE_HALF_FLOAT_PRECISION
# define USE_HALF_FLOAT_PRECISION 1
#endif

#if (USE_HALF_FLOAT_PRECISION != 0)
# if 1 // old fp16 approach (<SM6.2)
typedef min16float lpfloat;
typedef min16float2 lpfloat2;
typedef min16float3 lpfloat3;
typedef min16float4 lpfloat4;
typedef min16float3x3 lpfloat3x3;
# else // new fp16 approach (requires SM6.2 and -enable-16bit-types) - WARNING: perf degradation noticed on some HW, while the old (min16float) path is mostly at least a minor perf gain so this is more useful for quality testing
typedef float16_t lpfloat;
typedef float16_t2 lpfloat2;
typedef float16_t3 lpfloat3;
typedef float16_t4 lpfloat4;
typedef float16_t3x3 lpfloat3x3;
# endif
#else
typedef float lpfloat;
typedef float2 lpfloat2;
typedef float3 lpfloat3;
typedef float4 lpfloat4;
typedef float3x3 lpfloat3x3;
#endif

///////////////////////////////////////////////////////////////////////////////

#include "../Common/DeferredShared.hlsli"

cbuffer SSGICB : register(b1)
{
float4x4 PrevInvViewMat[2];
float4 NDCToViewMul;
float4 NDCToViewAdd;
float4 NDCToViewMul_x_PixelSize;

float2 FrameDim;
float2 RcpFrameDim;
uint FrameIndex;

uint NumSlices;
uint NumSteps;
float DepthMIPSamplingOffset;

float EffectRadius;
float EffectFalloffRange;
float ThinOccluderCompensation;
float Thickness;
float2 DepthFadeRange;
float DepthFadeScaleConst;

float BackfaceStrength;
float GIBounceFade;
float GIDistanceCompensation;
float GICompensationMaxDist;

float AOPower;
float GIStrength;

float DepthDisocclusion;
uint MaxAccumFrames;

float pad;
};

SamplerState samplerPointClamp : register(s0);
SamplerState samplerLinearClamp : register(s1);

///////////////////////////////////////////////////////////////////////////////

#ifdef HALF_RES
const static float res_scale = .5;
# define READ_DEPTH(tex, px) tex.Load(int3(px, 1))
# define FULLRES_LOAD(tex, px, uv, samp) tex.SampleLevel(samp, uv, 0)
#else
const static float res_scale = 1.;
# define READ_DEPTH(tex, px) tex[px]
# define FULLRES_LOAD(tex, px, uv, samp) tex[px]
#endif

#ifdef VR
# define GET_EYE_IDX(uv) (uv.x > 0.5)
#else
# define GET_EYE_IDX(uv) (0)
#endif

///////////////////////////////////////////////////////////////////////////////

#define ISNAN(x) (!(x < 0.f || x > 0.f || x == 0.f))

// http://h14s.p5r.org/2012/09/0x5f3759df.html, [Drobot2014a] Low Level Optimizations for GCN, https://blog.selfshadow.com/publications/s2016-shading-course/activision/s2016_pbs_activision_occlusion.pdf slide 63
lpfloat FastSqrt(float x)
{
return (lpfloat)(asfloat(0x1fbd1df5 + (asint(x) >> 1)));
}

// input [-1, 1] and output [0, PI], from https://seblagarde.wordpress.com/2014/12/01/inverse-trigonometric-functions-gpu-optimization-for-amd-gcn-architecture/
lpfloat FastACos(lpfloat inX)
{
const lpfloat PI = 3.141593;
const lpfloat HALF_PI = 1.570796;
lpfloat x = abs(inX);
lpfloat res = -0.156583 * x + HALF_PI;
res *= FastSqrt(1.0 - x);
return (inX >= 0) ? res : PI - res;
}

///////////////////////////////////////////////////////////////////////////////

// Inputs are screen XY and viewspace depth, output is viewspace position
float3 ScreenToViewPosition(const float2 screenPos, const float viewspaceDepth, const uint eyeIndex)
{
const float2 _mul = eyeIndex == 0 ? NDCToViewMul.xy : NDCToViewMul.zw;
const float2 _add = eyeIndex == 0 ? NDCToViewAdd.xy : NDCToViewAdd.zw;

float3 ret;
ret.xy = (_mul * screenPos.xy + _add) * viewspaceDepth;
ret.z = viewspaceDepth;
return ret;
}

float ScreenToViewDepth(const float screenDepth)
{
return (CameraData.w / (-screenDepth * CameraData.z + CameraData.x));
}

float3 ViewToWorldPosition(const float3 pos, const float4x4 invView)
{
float4 worldpos = mul(invView, float4(pos, 1));
return worldpos.xyz / worldpos.w;
}

float3 ViewToWorldVector(const float3 vec, const float4x4 invView)
{
return mul((float3x3)invView, vec);
}

///////////////////////////////////////////////////////////////////////////////

// "Efficiently building a matrix to rotate one vector to another"
// http://cs.brown.edu/research/pubs/pdfs/1999/Moller-1999-EBA.pdf / https://dl.acm.org/doi/10.1080/10867651.1999.10487509
// (using https://github.com/assimp/assimp/blob/master/include/assimp/matrix3x3.inl#L275 as a code reference as it seems to be best)
lpfloat3x3 RotFromToMatrix(lpfloat3 from, lpfloat3 to)
{
const lpfloat e = dot(from, to);
const lpfloat f = abs(e); //(e < 0)? -e:e;

// WARNING: This has not been tested/worked through, especially not for 16bit floats; seems to work in our special use case (from is always {0, 0, -1}) but wouldn't use it in general
if (f > lpfloat(1.0 - 0.0003))
return lpfloat3x3(1, 0, 0, 0, 1, 0, 0, 0, 1);

const lpfloat3 v = cross(from, to);
/* ... use this hand optimized version (9 mults less) */
const lpfloat h = (1.0) / (1.0 + e); /* optimization by Gottfried Chen */
const lpfloat hvx = h * v.x;
const lpfloat hvz = h * v.z;
const lpfloat hvxy = hvx * v.y;
const lpfloat hvxz = hvx * v.z;
const lpfloat hvyz = hvz * v.y;

lpfloat3x3 mtx;
mtx[0][0] = e + hvx * v.x;
mtx[0][1] = hvxy - v.z;
mtx[0][2] = hvxz + v.y;

mtx[1][0] = hvxy + v.z;
mtx[1][1] = e + h * v.y * v.y;
mtx[1][2] = hvyz - v.x;

mtx[2][0] = hvxz - v.y;
mtx[2][1] = hvyz + v.x;
mtx[2][2] = e + hvz * v.z;

return mtx;
}

#endif
Loading
Loading