Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4x~9x VBO generating/uploading performance improvement #1733

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

a1batross
Copy link
Member

While looking into R_GenerateVBO function, I noticed that it wastes a lot of time checking the same data again and again. After reordering loops by excluding useless surfaces first, level loading times were significantly improved.

I didn't notice any regressions, but FPS was unexpectedly improved as well, though within margin of error. By comparing log on heavy maps like ad_sepulcher.bsp VBOs got reordered. Not sure what could've caused it, so asking @mittorn for review, since he probably knows his own code better.

All comparisons were made on AMD Ryzen 2600X and NVIDIA GeForce GTX 1070 with engine compiled with default build configuration + bsp2 support.

Attaching loading ad_sepulcher.bsp log before and after the patch:

before
[22:19:16] Note: R_GenerateVBO: allocated array of 65532 verts, texture 16, lm 0
[22:19:16] Note: R_GenerateVBO: allocated array of 65533 verts, texture 30, lm 0
[22:19:16] Note: R_GenerateVBO: allocated array of 65531 verts, texture 39, lm 0
[22:19:16] Note: R_GenerateVBO: allocated array of 65533 verts, texture 76, lm 0
[22:19:16] Note: R_GenerateVBO: allocated array of 65532 verts, texture 6, lm 1
[22:19:16] Note: R_GenerateVBO: allocated array of 65535 verts, texture 17, lm 1
[22:19:16] Note: R_GenerateVBO: allocated array of 65535 verts, texture 36, lm 1
[22:19:16] Note: R_GenerateVBO: allocated array of 65534 verts, texture 58, lm 1
[22:19:17] Note: R_GenerateVBO: allocated array of 65533 verts, texture 160, lm 1
[22:19:17] Note: R_GenerateVBO: allocated array of 65532 verts, texture 13, lm 2
[22:19:17] Note: R_GenerateVBO: allocated array of 65531 verts, texture 30, lm 2
[22:19:17] Note: R_GenerateVBO: allocated array of 65534 verts, texture 94, lm 2
[22:19:17] Note: R_GenerateVBO: allocated array of 26935 verts in 0.821 seconds
[22:19:18] Note: R_GenerateVBO: uploaded VBOs in 0.883 seconds, 1.7 seconds total
after
[22:16:31] Note: R_GenerateVBO: allocated array of 65535 verts, texture 107, lm 0
[22:16:31] Note: R_GenerateVBO: allocated array of 65533 verts, texture 17, lm 0
[22:16:31] Note: R_GenerateVBO: allocated array of 65535 verts, texture 13, lm 0
[22:16:31] Note: R_GenerateVBO: allocated array of 65532 verts, texture 41, lm 0
[22:16:31] Note: R_GenerateVBO: allocated array of 65532 verts, texture 53, lm 1
[22:16:31] Note: R_GenerateVBO: allocated array of 65534 verts, texture 30, lm 1
[22:16:31] Note: R_GenerateVBO: allocated array of 65532 verts, texture 152, lm 1
[22:16:31] Note: R_GenerateVBO: allocated array of 65533 verts, texture 13, lm 1
[22:16:31] Note: R_GenerateVBO: allocated array of 65533 verts, texture 35, lm 1
[22:16:31] Note: R_GenerateVBO: allocated array of 65535 verts, texture 53, lm 1
[22:16:31] Note: R_GenerateVBO: allocated array of 65535 verts, texture 108, lm 2
[22:16:31] Note: R_GenerateVBO: allocated array of 65534 verts, texture 13, lm 2
[22:16:31] Note: R_GenerateVBO: allocated array of 65534 verts, texture 19, lm 2
[22:16:31] Note: R_GenerateVBO: allocated array of 26926 verts in 0.139 seconds
[22:16:31] Note: R_GenerateVBO: uploaded VBOs in 0.0522 seconds, 0.191 seconds total

FPS before and after the patch: 137-140 and 140-145.

On smaller map like disposal.bsp 0.586 and 0.0231 seconds were wasted on VBO, but there was no any frametime difference.

Hazard Course timedemo:

before
[22:48:14] Program args: ./xash3d -timedemo bench -rodir ../roXash -dev 2
[22:48:14] Note: R_GenerateVBO: allocated array of 30188 verts in 0.0145 seconds
[22:48:14] Note: R_GenerateVBO: uploaded VBOs in 0.0177 seconds, 0.0322 seconds total
[22:48:17] Note: R_GenerateVBO: allocated array of 30452 verts in 0.0139 seconds
[22:48:17] Note: R_GenerateVBO: uploaded VBOs in 0.0186 seconds, 0.0325 seconds total
[22:48:19] Note: R_GenerateVBO: allocated array of 22568 verts in 0.00904 seconds
[22:48:19] Note: R_GenerateVBO: uploaded VBOs in 0.0126 seconds, 0.0217 seconds total
[22:48:20] Note: R_GenerateVBO: allocated array of 15892 verts in 0.00366 seconds
[22:48:20] Note: R_GenerateVBO: uploaded VBOs in 0.00612 seconds, 0.00978 seconds total
[22:48:23] Note: R_GenerateVBO: allocated array of 18266 verts in 0.0058 seconds
[22:48:23] Note: R_GenerateVBO: uploaded VBOs in 0.00748 seconds, 0.0133 seconds total
[22:48:24] Note: R_GenerateVBO: allocated array of 10492 verts in 0.00267 seconds
[22:48:24] Note: R_GenerateVBO: uploaded VBOs in 0.00412 seconds, 0.00679 seconds total
[22:48:26] timedemo result: 21066 frames 12.102 seconds 1740.715 fps
after
[22:46:25] Program args: ./xash3d -timedemo bench -rodir ../roXash -dev 2
[22:46:26] Note: R_GenerateVBO: allocated array of 30188 verts in 0.00284 seconds
[22:46:26] Note: R_GenerateVBO: uploaded VBOs in 0.00351 seconds, 0.00635 seconds total
[22:46:28] Note: R_GenerateVBO: allocated array of 30452 verts in 0.00276 seconds
[22:46:28] Note: R_GenerateVBO: uploaded VBOs in 0.00261 seconds, 0.00537 seconds total
[22:46:30] Note: R_GenerateVBO: allocated array of 22568 verts in 0.00182 seconds
[22:46:30] Note: R_GenerateVBO: uploaded VBOs in 0.00227 seconds, 0.00408 seconds total
[22:46:31] Note: R_GenerateVBO: allocated array of 15892 verts in 0.00134 seconds
[22:46:31] Note: R_GenerateVBO: uploaded VBOs in 0.00221 seconds, 0.00356 seconds total
[22:46:34] Note: R_GenerateVBO: allocated array of 18266 verts in 0.00154 seconds
[22:46:34] Note: R_GenerateVBO: uploaded VBOs in 0.00197 seconds, 0.00352 seconds total
[22:46:35] Note: R_GenerateVBO: allocated array of 10492 verts in 0.00112 seconds
[22:46:35] Note: R_GenerateVBO: uploaded VBOs in 0.00178 seconds, 0.00291 seconds total
[22:46:37] timedemo result: 21066 frames 11.678 seconds 1803.883 fps

Though fps difference might be caused by overall better loading times here.

@a1batross
Copy link
Member Author

@a1batross
Copy link
Member Author

It crashes for me on Win32 for some reason

@SNMetamorph
Copy link
Member

It crashes for me on Win32 for some reason

Ok, I can test it.

@a1batross
Copy link
Member Author

@SNMetamorph I found the bug, I forgot to allocate indexarray. :)

It should be fine now but it's better to test it anyway.

@SNMetamorph
Copy link
Member

Tested it, and everything is ok. But I found another unrelated problem:
image
This is on map hg_industrialzone

@a1batross
Copy link
Member Author

@SNMetamorph PHS building should be unrelated to these changes. Does it take this much time on master branch?

@SNMetamorph
Copy link
Member

SNMetamorph commented Jul 22, 2024

@SNMetamorph PHS building should be unrelated to these changes. Does it take this much time on master branch?

For some mystery reason, for consequent attempts PHS generating was not happended at all. On both master and this branches.

@a1batross
Copy link
Member Author

@SNMetamorph check you're on the latest version, because the initial version had a memory corruption due to indexarray being uninitialized.

PHS is being generated only in multiplayer.

@SNMetamorph
Copy link
Member

SNMetamorph commented Jul 22, 2024

Yeah, this issue does not related to this PR at all. Also I checked configure logs and found out that OpenMP build was disabled. Perhaps this is what causes issue.

@a1batross
Copy link
Member Author

OpenMP is experimental and disabled by default for everyone.

I only enable it on my server for faster changelevels.

@SNMetamorph
Copy link
Member

Ok, I created separate issue #1734 for this

@a1batross
Copy link
Member Author

There is another memory corruption

@mittorn
Copy link
Member

mittorn commented Jul 24, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants