Surfel-based Global Illumination in Dust Engine #14
Neo-Zhixing
announced in
Blog Posts
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Current state on Global Illumination
Dust Engine currently supports multi-bounce diffuse lighting. Specular materials and emissive voxels are not yet supported.
Lighting passes
Test case
All timing are obtained from an NVIDIA RTX 3060Ti.
This was captured on this commit. Total frame time is 9.03ms.
Primary Rays (GBuffer): 2.36ms
Cast rays from camera into the scene directly. We don't have jittering and TAA just yet.
At 2.36ms, there isn't much point to rasterize anything at all. Virtualized Geometry techniques like Nanite will likely offer little to no gain, and you'd have to keep two copies of the same geometry in memory.
Our GBuffer contains:
Final Gather: 2.23ms (ambient occlusion) + 0.94ms (final gather)
Final Gather pass happens at full resolution, and it's done in two stages.
Ambient Occlusion
First, we trace ambient occlusion rays from the primary ray hitpoints with a cosine distribution. Those rays traverse geometry at the finest LOD to capture ambient occlusion caused by small-scale geometric details. They have a maximum traversal distance of 8.0, meaning that they will trigger the miss shader if they don't hit anything within the distance of 8 voxels.
We do not run closest hit shader for ambient occlusion rays at this time. This causes the rendered image to bias darker. In the future, we can try to retrieve the irradiance at secondary hit points from the previous frame image.
Profiling proves this slower than expected, so doing some screen space tracing before resorting to world space acceleration structure should help here. As an optimization, we can also sample the previous frame's denoised image when it's visible.
Final Gather
For ambient occlusion rays that missed the geometry, we continue their path with the actual final gather rays.
In this pass, we intersect the rays with the rough geometry instead, where each block represents 4x4x4 voxels in the world space. Notably, when the ambient occlusion rays terminate, we do not launch the final gather rays unless all voxels in the current 4x4x4 grid are empty. This is to prevent light leaking problems.
When final gather rays miss the rough geometry, we sample the sky dome using the Hosek-Wilkie sky model. But if they hit something, we sample the spatial hash. We would then stochastically schedule that surfel patch to be sampled.
Surfels: 0.73ms
We schedule surfel patches to be sampled by writing the following entry into a buffer of
SURFEL_RAY_BUDGET
entries.SURFEL_RAY_BUDGET
is a constant we can adjust based on the performance of the GPU. The timing shown above is forSURFEL_RAY_BUDGE = 720*480
.Each frame, we launch
SURFEL_RAY_BUDGE
surfels rays from the surfel patches into the scene with a cosine distribution. The surfel rays intersect with the rough geometry; they sample the spatial hash on hit and the sky dome on miss. The sample gets injected into the spatial hash of the origin surfel. Additionally, the surfel patch that was hit can also get scheduled stocastically. This is how we get multibounce.One potential optimization here is sampling the previous frame's denoised image when the surfel patches happen to be on screen.
Spatial Hash
The spatial hash is keyed based on the surfel patch position and orientation.
Like GI1.0, we leverage the study by Jarzynski and Olano 2020 and pick two fast hash functions
pcg
andxxhash32
that were found to produce little to no collision between one another. One of the hashes was used to index into the buffer and the other is used as the fingerprint. We store the fingerprint (instead of the whole key) in the spatial hash entry.When reading the hash map, we perform linear probing for up to 3 entries using the fingerprint. When writing the hash map, we evict the entry that was least recently used. Note that we do not perform synchronization on this write, and we basically say "hehe" to those hazards. As long as the radiance value is a single u32, we should be mostly fine.
Performance Analysis
Nsight Tracing
Trace file:
tracing.zip
Analysis and Next Steps
Beta Was this translation helpful? Give feedback.
All reactions