-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizations #96
Optimizations #96
Conversation
Triggers UBSAN failures. You should probably have that in your CI... |
Yes, the GitHub CI does not run any tests. It is on my TODO list, but haven't got the time to add that yet. Is UBSAN/ASAN triggered only with these changes? I experimented with this and I'm not sure I'm running it right, or that I'm interpreting the results the right way. For example, comparing a $ ./benchmark.sh ./old ./new in.bin 10
x before.txt
+ after.txt
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| x + + |
|x x x x + x x + x * + x + + + +|
| |________________________________M_____A________|_____________________________|_______________M__A________________________________________________| |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 1.406 1.434 1.417 1.4186 0.010297788
+ 10 1.416 1.456 1.433 1.4339 0.012931014
Difference at 95.0% confidence
0.0153 +/- 0.0109827
1.07853% +/- 0.774195%
(Student's t, pooled s = 0.0116888) Doesn't this mean that the new version is actually slower? Compiled for release with gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, running on a Ubuntu 22.04.4 LTS (via WSL2 on Windows 11 23H2), with a 13th Gen Intel(R) Core(TM) i7-1370P CPU. |
UBSAN was triggered with these changes, the previous code is clean. The branchless sign extension is not OK when the input is already 64-bit. That output means that it is indeed slower. I was using GCC 13 on Debian testing running on real hardware. It could also still be a measurement artifact. I was using 20 samples. |
What is the size of your Do you get the expected null result if On an AMD 5950x with Gentoo, GCC 13 and the same ~60 MB binary with 100 samples:
Old vs new code
The performance characteristics of modern systems are just completely f***** up. Maybe I should configure it for 99% confidence and always do more samples but that takes forever to run. |
I used a 58M file I obtained by concatenating a bunch of shellcodes together. If I use the same binary I get timings that are much closer: $ ./benchmark.sh ./old ./old in.bin 10
N Min Max Median Avg Stddev
x 10 1.456 1.518 1.502 1.4941 0.021855332
+ 10 1.452 1.527 1.502 1.4937 0.02150478
No difference proven at 95.0% confidence But this is just a small sample. These kinds of changes are small enough that if we want to accurately measure them we probably need to build some micro benchmarks. |
I managed to fix UBSAN. It turns out that sometimes |
What do I need to do to get this merged? |
Hello, We'll go through the PR soon, and we will let you know if changes are needed. Thanks for the contribution! |
First of all, Looking at your benchmark, I see that you used the
Of course, being ~5% slower is not necessary bad by itself, but the point is that I fail to see any consistent decoding performance improvement with the current changes. |
I/O effects should have been eliminated by the pre-runs before the actual benchmark loop. Interrupt effects can't be eliminated without radically changing the system configuration. If the OS wants to punt our program from the CPU to do something else it will. We can only minimize the effect by running more samples. I replaced use of You appear to be using Numbers on my Intel i7-12700H laptop but I get broadly similar results from AMD 5950x. For a real executable (.text section only, 32MB):
And 32MB of randomness:
There is something very very strange going on in your setup. Are you only running each benchmark once? The reason I use a multiple runs is to
|
There is nothing strange about my setup - it's bare metal Windows. |
bddisasm/bdx86_decoder.c
Outdated
@@ -3378,22 +3378,22 @@ NdFindInstruction( | |||
break; | |||
} | |||
|
|||
while ((!stop) && (ND_NULL != pTable)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding so many goto
s inside the while
loop makes the code quite hard to follow. I wouldn't make such changes to the decoder loop, unless they offer a significant benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit alone is about 2% faster. I'd call that significant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may do so on Linux, but it brings no benefit on Windows (in fact, I see a small performance downgrade with this modification). Combined with the more hard to follow code, I see no reason to merge the changes to the NdFindInstruction
function at this moment.
The other modifications seem OK though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left this bit out for now.
Updated. New numbers:
|
And I'm running Linux on real hardware, no virtualization. @ianichitei was using WSL2 which I suspected might be screwing up the timings. |
bddisasm/bddisasm/bdx86_decoder.c Line 1419 in 8b4cc48
because a But with it there's a warning at bddisasm/bdshemu/bdshemu_x86.c Line 2860 in 8b4cc48
What would be the best way to fix this? |
Hello! The problem there is that the displacement sign extension takes place even if there is no displacement (the The |
Thanks, that did it. |
benchmark.sh
Outdated
*) ;; | ||
esac | ||
|
||
truncate -s 0 before.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using hard-coded output files, maybe use some names based on the disasmtool used - for example, $FIRST.result
and $SECOND.result
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Also improved error messages and cleaned up white space.
A few cleanups, a benchmark script and some optimizations.
Disassembling a ~60MB binary on my Intel i7-12700H laptop: