Skip to content

Performance

Jana edited this page Aug 9, 2020 · 83 revisions

Rick uses a 400TB for multiverse data, and nnn to manage it ;)

Optimization techniques

Significant factors

  • quicksort with pre-filters to sort by filename, time, size etc.
  • 0 fragmentation - no byte loss while storing file name of directory entries
  • no copying of filename strings while sorting/filtering
  • entries not matching a filter are moved to the lowest slots to ignore later
  • increased number of open file descriptors
  • fast redraw only affected lines to avoid complete refresh
  • all the large buffers are aligned
  • frequently used structures optimized to facilitate SSE quad-instruction support
  • readahead request (to kernels supporting the feature)
  • fast routines to calculate and render file size
  • shifts instead of div and mul (modern compilers do this already)
  • optimized memory usage everywhere, buffer re-use
  • no floating point arithmetic
  • static routines
  • controlled binary size
  • O3 level compiler optimization
  • 0-warning statically-analyzed code (forced -Wall -Wextra -Werror in CI)

Rejected options

  • replace quicksort with a more aggressive algorithm (favor space over time complexity)
    • the difference is not usually perceptible by human beings
    • the random load option is also removed to emphasize this
  • non-standard calls like statx()/getdents()/getdents64()
    • platform-specific
    • man page Description begins with "These are not the interfaces you are interested in."
  • optimize handling 10K+ entries in a dir
    • not a mass use case
    • performance with 10K files is good enough today
    • SSD/NVMe are the future
  • use lazy/background/threaded load
    • du, sort and type-to-nav program options require a stat() on all entries for correctness

Comparison

Stripped binary (or script) size & memory usage of nnn and some other popular FMs (which existed before nnn) while viewing a directory with 13.5K files (0 directories), sorted by size/du:

 BINSZ    VIRT    RES    SHR S  %MEM   COMMAND
  650K  139720  91220   8460 S   1.1   ranger
    1M   50496  15328   4076 S   0.2   vifm
    1M   72152  12468   7336 S   0.2   mc
   74K   15740   4348   2460 S   0.1   nnn -S

nnn vs. ls

The stripped binary size of ls is 130.7K.

nnn takes less than 50% time to list a directory with 2083 files:

$ time nnn /usr/bin
0.00user 0.01system 0:00.02elapsed 42%CPU (0avgtext+0avgdata 3540maxresident)k
1608inputs+0outputs (3major+325minor)pagefaults 0swaps

$ export LS_COLORS='ex=00:su=00:sg=00:ca=00:'
$ time ls -l /usr/bin
0.01user 0.01system 0:00.05elapsed 47%CPU (0avgtext+0avgdata 3800maxresident)k
784inputs+0outputs (0major+303minor)pagefaults 0swaps

nnn roughly makes one-third the system calls made by ls.

$ time strace -c nnn -d /usr/bin | wc -l
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 92.56    0.014030           7      1989           newfstatat
  4.95    0.000751         188         4           getdents
  1.19    0.000180           3        57         1 stat
  0.53    0.000081          81         1           inotify_add_watch
  0.15    0.000023           2        15           close
  0.11    0.000017           2        10           brk
  0.10    0.000015           1        14           openat
  0.08    0.000012           2         6           write
  0.07    0.000010           0        23         3 ioctl
  0.05    0.000007           0        15           fstat
  0.05    0.000007           7         1         1 unlink
  0.04    0.000006           1         8           read
  0.04    0.000006           0        14           mmap
  0.04    0.000006           6         1           inotify_rm_watch
  0.03    0.000005           5         1           sysinfo
  0.01    0.000002           2         1           lseek
  0.00    0.000000           0         2           lstat
  0.00    0.000000           0        10           mprotect
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         9           rt_sigaction
  0.00    0.000000           0         9         7 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           inotify_init1
------ ----------- ----------- --------- --------- ----------------
100.00    0.015158                  2194        12 total
0.01user 0.05system 0:00.08elapsed 84%CPU (0avgtext+0avgdata 3524maxresident)k
4576inputs+0outputs (10major+545minor)pagefaults 0swaps
0

$ export LS_COLORS='ex=00:su=00:sg=00:ca=00:'
$ time strace -c ls -l /usr/bin | wc -l
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 42.88    0.011274           6      1989           lstat
 27.69    0.007281           4      1989      1989 lgetxattr
 20.74    0.005454           4      1548      1548 getxattr
  5.82    0.001529           3       442           readlink
  1.35    0.000355          89         4           getdents
  0.38    0.000100           3        31           write
  0.24    0.000064           2        33         5 openat
  0.18    0.000048           1        34           close
  0.17    0.000046           3        17           lseek
  0.16    0.000042           1        40           mmap
  0.16    0.000041           4        11           munmap
  0.11    0.000030           1        30           fstat
  0.04    0.000011           2         5           brk
  0.04    0.000011          11         1           mremap
  0.03    0.000007           0        17           read
  0.00    0.000000           0        20           mprotect
  0.00    0.000000           0         2           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         2         2 ioctl
  0.00    0.000000           0        12        12 access
  0.00    0.000000           0         4           socket
  0.00    0.000000           0         4         4 connect
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2         2 statfs
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         1           prlimit64
------ ----------- ----------- --------- --------- ----------------
100.00    0.026293                  6244      3562 total
0.02user 0.12system 0:00.13elapsed 106%CPU (0avgtext+0avgdata 3728maxresident)k
3752inputs+0outputs (7major+526minor)pagefaults 0swaps
1989

Note: Each reading is taken at cold boot.

C vs. other languages

/* compare.c */

int main(void)
{
  for (int i = 0; i < 1000000; i++)
    printf("hello\n");

  return 0;
}

$ time -p ./compare
real 1.89
user 0.20 <<
sys 1.25
---------------------------------
# compare.sh
#!/bin/bash

for (( i = 0; i < 1000000; i++ ))
do
  echo hello
done

exit 0

$ time -p ./compare.sh
real 5.88
user 4.66 <<
sys 1.21
Clone this wiki locally