Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

term_esc_rgb() dominates profiles #277

Closed
dankamongmen opened this issue Jan 15, 2020 · 3 comments
Closed

term_esc_rgb() dominates profiles #277

dankamongmen opened this issue Jan 15, 2020 · 3 comments
Assignees
Labels
perf sweet sweet perf
Milestone

Comments

@dankamongmen
Copy link
Owner

Running a profile against the (optimized) build of notcurses, I get this output in notcurses-view:

-   27.05%     0.00%  notcurses-view  [unknown]                 [.] 0x000055cc0▒
   - 0x55cc08c4e040                                                            ◆
      + 22.02% fprintf                                                         ▒
        3.27% notcurses_render                                                 ▒
        0.54% _IO_default_xsputn                                               ▒
-   23.06%     1.01%  notcurses-view  libc-2.29.so              [.] fprintf    ▒
   - 22.04% fprintf                                                            ▒
        13.47% __vfprintf_internal                                             ▒
        3.73% _IO_default_xsputn                                               ▒
        2.40% _itoa_word                                                       ▒
        1.56% __strchrnul_avx2                                                 ▒
   - 0.93% 0x55cc08c4e040                                                      ▒
        0.82% fprintf                                                          ▒
-   15.03%    14.07%  notcurses-view  libc-2.29.so              [.] __vfprintf_▒
   + 13.09% 0x55cc08c4e040                                                     ▒
     0.97% __vfprintf_internal                                                 ▒
-   12.31%     0.01%  notcurses-view  [kernel.vmlinux]          [k] entry_SYSCA▒
   - 12.30% entry_SYSCALL_64_after_hwframe                                     ▒
      + 12.27% do_syscall_64                                                   ▒
-   12.27%     0.11%  notcurses-view  [kernel.vmlinux]          [k] do_syscall_▒
   - 12.16% do_syscall_64                                                      ▒
      + 11.86% __x64_sys_write                                                

absolutely dominated by fprintf(), which is entirely due to term_esc_rgb(). if we can get that sped up, we're going to recover a hell of a lot of cycles.

for notcurses-demo, it was even more pronounced.

@dankamongmen dankamongmen added the perf sweet sweet perf label Jan 15, 2020
@dankamongmen dankamongmen added this to the v1.1.0 milestone Jan 15, 2020
@dankamongmen dankamongmen self-assigned this Jan 15, 2020
@dankamongmen
Copy link
Owner Author

Yeah, dropping the fprintf() entirely boosts FPS by 36%! w0000000000000000t. alright let's do this.

@dankamongmen
Copy link
Owner Author

Reducing it to a lookup table and some fputs() boosted us from 2300 FPS to 2600, and combining all those fputs() boosted us further to 2750. That's almost a 20% increase, not bad at all for an hours' work :D :D :D. Just broke 2800! (xfce4-terminal 80x70)

dankamongmen added a commit that referenced this issue Jan 15, 2020
Profiling with `perf` revealed the fprintf() inside term_esc_rgb()
to dominate our performance. Replace it with a u8->str lookup table
and a hand-assembled string fed into a single fputs(). On a 80x70
xfce4-terminal geometry, this wins 20%+ FPS on the demo, w00t!

Huzzah for profiling!
@dankamongmen
Copy link
Owner Author

[schwarzgerat](0) $ ./notcurses-demo -p ../data/ -c
Term: xterm with direct-color indexing

122896 renders, 14.1s total (7.1e-05s min, 0.00478s max, 0.000115s avg 8725.1 fps)
171.27MiB total (0.00B min, 118.37KiB max, 1.43KiB avg)
0 failed renders
Emits/elides: def 72976/18641 fg 4264137/1974870 bg 4815725/1493123
 Elide rates: 20.35% 31.65% 23.67%
Cells emitted: 6422251 elided: 373710512 (98.31%)

        total│frames│output(B)│rendering│%r│    FPS║
══╤═╤════════╪══════╪═════════╪═════════╪══╪═══════╣
 0│i│   2.00s│   129│   1.77Mi│ 116.18ms│ 5│ 1110.3║
 1│x│   5.03s│   302│   9.05Mi│ 308.93ms│ 6│  977.6║
 2│e│   3.11s│   171│   8.91Mi│ 340.14ms│10│  502.7║
 3│t│  17.05s│   633│   1.70Mi│ 186.49ms│ 1│ 3394.1║
 4│c│   3.38s│    33│ 630.83Ki│  33.72ms│ 0│  978.4║
 5│g│   1.31s│   768│  74.35Mi│    1.20s│91│  638.7║
 6│s│   5.08s│   321│ 424.43Ki│  63.16ms│ 1│ 5082.0║
 7│w│  10.00s│119187│   7.28Mi│    9.23s│92│12909.5║
 8│u│  21.07s│    63│ 550.11Ki│  34.95ms│ 0│ 1802.3║
 9│b│   1.01s│    10│ 141.67Ki│   7.44ms│ 0│ 1342.5║
10│v│  19.60s│   753│  52.67Mi│    1.96s│ 9│  384.1║
11│l│   1.04s│    74│ 367.93Ki│  35.14ms│ 3│ 2105.3║
12│p│   5.00s│    18│ 337.71Ki│  16.25ms│ 0│ 1107.4║
13│o│  13.01s│   434│  13.09Mi│ 547.58ms│ 4│  792.6║
══╧═╧════════╪══════╪═════════╪═════════╪══╪═══════╝
      107.75s│122896│ 171.27Mi│   14.08s│13│
[schwarzgerat](0) $

alacritty 91x34, hot shit!

dankamongmen added a commit that referenced this issue Jan 15, 2020
Profiling with `perf` revealed the fprintf() inside term_esc_rgb()
to dominate our performance. Replace it with a u8->str lookup table
and a hand-assembled string fed into a single fputs(). On a 80x70
xfce4-terminal geometry, this wins 20%+ FPS on the demo, w00t!

Huzzah for profiling!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf sweet sweet perf
Projects
None yet
Development

No branches or pull requests

1 participant