Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thesis Candidate #19

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Thesis Candidate #19

wants to merge 14 commits into from

Conversation

Condzi
Copy link
Owner

@Condzi Condzi commented Nov 11, 2023

No description provided.

Condzi added 14 commits June 23, 2023 14:33
* logf, check_, dbg_check_

* temp and perm memory

* add errf

* Baisc error handling, os_read_entire_file

* Move OS to own folder

* More file manipulation functions

* minor style stuff

* Update the docs with official information

* Test String_Builder declaration

* String_Builder

* os_get_app_uptime_as_string

* Delete log.txt

* exception handler w.i.p

* Fix compilation errors

* Debugging functions

* Tidy up OS code

* add pathf

* more pathf stuff

* to_string -> to_temp_string

* move code to base/

* notes

* vector math

* Matrix math for 2D

* print cppcheck version in pipeline

* notes

* continue on error for cppcheck
* Window module declaration

* add window procs

* add a note
* Draft of renderer interface

* DirectX devices init

* Immediate mode pipeline

* enable debug mode in DX11

* Cool shapes

* column_major for constants

* No need to zero the vs_out

* Use typedefs for ID3D types

* Request D3D 11.1

* formatting
* add ImGui source

* Extract the relevant backend files

* imgui integrated

* ignore 3rdparty code in cppcheck
* first config

* Update .clang-format

* Final config I think

* clang-format the code!
* add stb_image_write.h

* add write_png_or_panic

* gradient

* surface normals

* World of spheres

* Write also alpha channel to the final image

* Display the result in imgui window

* add RNG

* move some code to camera.cxx

* multisampling

* diffuse material

* Metal material

* fuzz

* dielectric material

* camera manipulation

* defocus blur (DoF)

* Final scene from the RT book pt. 1

* added multithreading + and preview

* ADd cmakelists

* optimizing tests

* inline operations in hit_sphere

* one thread per row approach

* remove asm listing

* plan update ;)

* remove old picture of ball
* move materials out of cpu_rt

* Add AABB

* Compute AABB for a sphere

* add_sphere function

* Add BVH

* Add BVH to RT

* Fix BVH early out

* Fixed version with better grouping

* Cache inverted ray direction

* Fix bug with pointer to spheres being invalidated

* wrong code

* update notes
* shorter hit_bvh recursive function

* huge improv by using a loop...

* Move ASM of iterative and recursive hit_bvh to notes/

* cleaner make_bvh

* Introduce spatial binning

* Fix 'off by one' error

* Find the best axis to split by

* Pack 4 spheres per leaf (wide bvh)

* Pack the data in arrays

* Use SIMD for early discriminant check

* Introduce make_ray  function

* Slightly improved ray_vs_aabb

* misc stuff

* add elapsed time info

* perf note

* update notes

* count rays per sec

* remove dead code

* Spinlock implementation for good kids
* It's ray tracing!

* extremes, not extremas

* Move Sphere to shapes.hxx

* add_padding_if_too_narrow

* Add Quad shape

* Add hit_quad

* Remove Sphere-only simd code

* Move World out of cpu_rt.cpp

* add create_world

* Refactor the code for supporting more shapes

* working quads

* Lights
* Introduce perm<T>

* uber material

* Update plan.md
* Update notes

* flat tree v1 (doesnt work)

* fix flattening

* replace unordered map with two arrays

* update plan
* Working CS pipeline

* Use indices instead of pointers to materials

* Working constants buffer

* RT world input to shader concept

* Working uploading to the gpu

* Add RNG, I guess

* Ray and Hit_Info

* Materials, ray_color

* first not working version

* polka dot xD

* still polka

* I'm insane

* better rand, use dedicated gpu

* fix alignment problem

* fix depth of field

* simplify ray_color

* try query when GPU finishes

* update plan

* Fix issue with incorrect reflections
* Disable GPU timeout on device creation.

* Handle cases where no quads or spheres are present on the scene

* Working SimpleLights scene.
* Update notes

* Use iterative version of ray_color in cpu_rt

* Move common code to base_rt

* Move common creation out of cpu_rt

* Camera parameters struct

* Cleaner camera organisation

* Camera options in GUI

* Update plan

* Add ability to select world

* update plan

* Don't error on different CWD
Copy link

@medranSolus medranSolus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't analyzed RT shader that clearly but general idea is avoiding branches. Any if statement results in shader execution divergence on the GPU which have to evaluate both sides inside condition if same wave (SIMD execution unit). Instead it would be better to compute final result by nullifying unwanted parts of computations, ex. line 288:

// Before
if (can_refract && !reflectance_test) {
  direction = refract(unit_direction, hi.normal, refraction_ratio);
} else {
  direction = reflect(unit_direction, hi.normal);
}
// After
float cond = can_refract * (reflectance(cos_theta, refraction_ratio) <= random_f32(rand_seed));
direction = cond * refract(unit_direction, hi.normal, refraction_ratio) + abs(cond - 1) * reflect(unit_direction, hi.normal);

But still, it has to be measured, that's just a quick idea about turning statements into computations.

Comment on lines +12 to +18
set(COMMON_COMPILER_FLAGS "-std:c++20 -diagnostics:column -WL -O2 -nologo -fp:fast -fp:except- -Gm- -GR- -EHa- -Zo -Oi -WX -W4 -wd4127 -wd4201 -wd4324 -FC -Z7 -GS-")
set(COMMON_COMPILER_FLAGS "${COMMON_COMPILER_FLAGS} -D_CRT_SECURE_NO_WARNINGS -DHANDMADE_INTERNAL=1 -DHANDMADE_SLOW=1 -DHANDMADE_WIN32=1")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${COMMON_COMPILER_FLAGS}")

# Linker Flags
set(COMMON_LINKER_FLAGS "-STACK:0x100000,0x100000 -incremental:no -opt:ref -profile")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${COMMON_LINKER_FLAGS}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of passing flags like that consider usage of following functions (depending which would fit better for single target or all target wide):

When using targeted versions they should appear after target (ex. executable) has been declared.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding whole ImGui repo to the project you could set it up as submodule. This way you would only have single reference to the 3rd party project in your repository and updating it would only consist of single command.

- [ ] Rendering gifs? (Simple moving camera would be cool)
- [ ] priority based dielectircs
- [ ] use iGPU for ImGui and stuff, and dedicated for RT so it wont hang

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not every PC will have second GPU, best way here would be to use async compute but it's only present in newer APIs. For DX11 you could still run ImGui at the start and then compute workload for RT (given that the driver will make use of async path). But then you still have to present final image so synchronization is inevitable..

Comment on lines +11 to +17
# Printf / Sprintf
Because libc is bad at handling Unicode, I thought about rewriting these functions.
Main ideas:
- use templates to get rid of formatting arguments, so the call can may look like:
`tprint("Value of '%' is %.", "my_number", 123);
- no need to write own formatting for everything, just handle our String separately
and forward other arguments to libc sprintf.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also abandon Unicode as it's a broken format anyway, consider using UTF-8 only since it can hold same characters as Unicode and can be stored in same manner as normal std::string. Good read about that: UTF-8 Everywhere.

Comment on lines +32 to +34
This is problematic if we strive for interactivity. GPU waits until the CS finishes.
The only way to avoid that is to use command buffers from DX12 :(. Or use two GPUs,
which may work for me.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DX12 command lists are only for recording commands, they have to be submited to the GPU anyway and there hang will happend in resposivity. If you were to use DX12 you could retain responsivenes in following manner: main GPU queue used for rendering to the window (ImGui, presents, etc.) and async compute queue used for RT happening across frames. To see updates you could copy image that is being worked on to show progress in main queue. That would be still a bit hard thou without a way to controll RT passes since you can't copy from resource that is being written into at the same time (data race). But if you are doing your RT in steps then it's achieveable to copy current image between steps to texture that will be presented in main queue.

Comment on lines +61 to +63
#pragma comment(lib, "d3d11.lib") // direct3D library
#pragma comment(lib, "dxgi.lib") // directx graphics interface
#pragma comment(lib, "d3dcompiler.lib") // shader compiler

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more consistent to move them to CMakeLists.txt

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth considering something like DirectXMath, this way you could speed everything up by using SSE.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spinlocks are good enough for really quick locks (measuring needed), but for most part std::shared_mutex is sufficient since sleeping thread will leave up CPU for other more demanding work. Or you could take a look at WinAPI native locks if you prefer this implementation.

Comment on lines +12 to +26
using IDevice = ::ID3D11Device;
using IDeviceContext = ::ID3D11DeviceContext;
using ISwapChain = ::IDXGISwapChain;
using IRenderTargetView = ::ID3D11RenderTargetView;
using ITexture2D = ::ID3D11Texture2D;
using IRasterizerState = ::ID3D11RasterizerState;
using IComputeShader = ::ID3D11ComputeShader;
using IVertexShader = ::ID3D11VertexShader;
using IPixelShader = ::ID3D11PixelShader;
using IInputLayout = ::ID3D11InputLayout;
using IBuffer = ::ID3D11Buffer;
using IBlob = ::ID3DBlob;
using IUnorderedAccessView = ::ID3D11UnorderedAccessView;
using IShaderResourceView = ::ID3D11ShaderResourceView;
using IQuery = ::ID3D11Query;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In DX11 you can always use newest interfaces available, since it's an older API. This way you can have more ways to accomplish tasks or to get another idea how to approach problem.

do { \
if (x) { \
x->Release(); \
x = NULL; \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nullptr provide better type safety in C++.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants