Measure RGBDS performance #653

ISSOtm · 2020-12-19T01:03:04Z

Some people have been complaining about RGBDS performance being subpar. With the growing complexity of the codebase, particularly the increasing amount of features—especially for RGBASM—, I am worried about how taxing a given change may be on the overall performance.

This is split in two sub-problems:

Profiling

To know how slow the programs currently are, we should identify their processing bottlenecks. I did that with perf once, but the data was fairly lackluster beyond "60% of your time is spent inside yyparse". Maybe gprof would be better, or something else?

Decide what profiler(s) to use.
Pick codebases to profile on.
Interpret data, identify improvements.
DO IT

Measuring the performance impact of changes

Decide on a strategy to follow regarding breaking changes
Pick codebase(s) to measure on
Set up measurement script
Integrate with CI (?)
Measure whether mmap is actually worth it, in several scenarios (macro-heavy vs. largely linear, for ex.)

The text was updated successfully, but these errors were encountered:

Rangi42 · 2021-01-07T19:01:58Z

For reference, gprof output after running current master rgbasm on pokecrystal's main.asm (which produces a 5 MB main.o file): https://pastebin.com/F3n5vfA6

Rangi42 · 2021-01-07T19:15:48Z

gprof is probably not a good choice: https://stackoverflow.com/a/1779343

Valgrind could be useful, though not on Windows or Cygwin.

daid · 2021-01-07T20:09:16Z

gprof can point to hotspots, however, it has all kinds of problems with the optimizer and static functions. Quite likely the fstk_Init count is actually a static function that is somewhere after it in the binary.
Disabling the compiler optimizations isn't really an option, as that makes the measurement invalid.
And it looks like gprof is spending a lot of time in it's own accounting function (_mcount_private), which makes any time based result invalid.

Rangi42 · 2021-01-07T20:16:40Z

valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes ./rgbasm -o main.o main.asm produces accurate-looking results.

Maybe peekInternal could be optimized for the common peek(0) and peek(1) cases, and for when nothing is getting expanded further yet.

daid · 2021-01-07T20:31:47Z

I think peekInternal is also reading from disk, so I would read from tmpfs to make sure it's not some kind of disk performance thing you are seeing here.

Rangi42 · 2021-01-11T19:28:19Z

Also regarding performance, look into whether mmap is actually an improvement, as noted in #557.

ISSOtm · 2021-02-01T19:56:42Z

According to perf annotate on pokecrystal's main.asm file, 31% of the CPU time is spent copying the yylval type, which is 264 bytes large (0x108). Moving to variable-size strings (#650) should help reduce this; I think the next larger member of the %union is the struct Expression, but that's already much shorter.

Rangi42 · 2021-11-23T16:11:20Z

The struct Expression copying could also be improved by arranging its members from largest to smallest for better packing; same for any other widely-copied structs. (Unfortunately this can mean losing some meaningful order to them, but unlike Rust, the compiler won't optimize it for you.)

Rangi42 · 2024-06-17T20:54:11Z

Bison's C++ parser, using its own variant instead of a union and allowing tokens to have nontrivial constructors, significantly slows things down. We might be able to switch back to a C-style one and add manual allocation of nontrivial token values (plus %destructors).

ISSOtm added enhancement Typically new features; lesser priority than bugs meta This isn't related to the tools directly: repo organization, maintainership... labels Dec 19, 2020

ISSOtm mentioned this issue Jan 5, 2021

Concatenate adjacent string literals, like C #690

Closed

Rangi42 added this to the v1.0.0 milestone May 5, 2021

ISSOtm mentioned this issue Oct 19, 2021

build of 0.5.1 fails #922

Closed

ISSOtm mentioned this issue Feb 28, 2022

Rewrite RGBGFX in C++ #981

Merged

13 tasks

ISSOtm added the hacktoberfest label Oct 1, 2022

Rangi42 removed the hacktoberfest label Nov 6, 2022

Rangi42 added optimization This increases performance or decreases size and removed enhancement Typically new features; lesser priority than bugs labels Mar 13, 2024

Rangi42 removed this from the v1.0.0 milestone Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measure RGBDS performance #653

Measure RGBDS performance #653

ISSOtm commented Dec 19, 2020 •

edited

Loading

Rangi42 commented Jan 7, 2021

Rangi42 commented Jan 7, 2021

daid commented Jan 7, 2021

Rangi42 commented Jan 7, 2021

daid commented Jan 7, 2021

Rangi42 commented Jan 11, 2021

ISSOtm commented Feb 1, 2021

Rangi42 commented Nov 23, 2021

Rangi42 commented Jun 17, 2024

Measure RGBDS performance #653

Measure RGBDS performance #653

Comments

ISSOtm commented Dec 19, 2020 • edited Loading

Profiling

Measuring the performance impact of changes

Rangi42 commented Jan 7, 2021

Rangi42 commented Jan 7, 2021

daid commented Jan 7, 2021

Rangi42 commented Jan 7, 2021

daid commented Jan 7, 2021

Rangi42 commented Jan 11, 2021

ISSOtm commented Feb 1, 2021

Rangi42 commented Nov 23, 2021

Rangi42 commented Jun 17, 2024

ISSOtm commented Dec 19, 2020 •

edited

Loading