General performance improvements #8

ghost · 2017-01-15T07:00:23Z

The basic idea is: ns == countNl x == lineIndex (measure ch) and n - ns == lineIndex (measure f) - p - 1

Before:

benchmarking splitAtLine/long 1200
time                 82.20 μs   (81.81 μs .. 82.59 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 82.33 μs   (82.04 μs .. 82.67 μs)
std dev              1.052 μs   (879.3 ns .. 1.291 μs)

After:

benchmarking splitAtLine/long 1200
time                 76.08 μs   (75.58 μs .. 76.51 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 75.41 μs   (75.04 μs .. 75.85 μs)
std dev              1.362 μs   (1.171 μs .. 1.654 μs)
variance introduced by outliers: 13% (moderately inflated)

ghost · 2017-01-15T08:28:06Z

Benchmarks for folding

Before:

benchmarking foldCount/long 1200
time                 500.3 μs   (498.6 μs .. 502.4 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 503.8 μs   (500.6 μs .. 508.0 μs)
std dev              12.75 μs   (9.952 μs .. 18.13 μs)
variance introduced by outliers: 17% (moderately inflated)

benchmarking foldId/long 1200
time                 4.336 s    (4.281 s .. 4.386 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 4.360 s    (4.348 s .. 4.367 s)
std dev              11.29 ms   (0.0 s .. 12.86 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking foldReverse/long 1200
time                 5.072 s    (4.600 s .. 5.292 s)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 5.189 s    (5.084 s .. 5.240 s)
std dev              91.04 ms   (0.0 s .. 93.10 ms)
variance introduced by outliers: 19% (moderately inflated)

After:

benchmarking foldCount/long 1200
time                 468.4 μs   (463.8 μs .. 475.6 μs)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 467.6 μs   (465.8 μs .. 470.0 μs)
std dev              7.297 μs   (5.419 μs .. 11.26 μs)

benchmarking foldId/long 1200
time                 3.914 s    (3.768 s .. 4.080 s)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 3.886 s    (3.853 s .. 3.900 s)
std dev              29.55 ms   (0.0 s .. 32.84 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking foldReverse/long 1200
time                 4.889 s    (4.847 s .. 4.927 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 4.900 s    (4.889 s .. 4.906 s)
std dev              9.898 ms   (0.0 s .. 11.07 ms)
variance introduced by outliers: 19% (moderately inflated)

ghost · 2017-01-15T08:55:10Z

The motivation for lines should be obvious with this:

benchmarking lines/long 1200
time                 164.9 μs   (164.5 μs .. 165.7 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 167.6 μs   (166.4 μs .. 169.6 μs)
std dev              5.137 μs   (3.668 μs .. 7.423 μs)
variance introduced by outliers: 27% (moderately inflated)

benchmarking lines'/long 1200
time                 12.29 ms   (12.10 ms .. 12.58 ms)
                     0.998 R²   (0.995 R² .. 0.999 R²)
mean                 12.03 ms   (11.89 ms .. 12.17 ms)
std dev              345.9 μs   (262.0 μs .. 458.8 μs)

ethercrow · 2017-01-15T21:01:26Z

src/Yi/Rope.hs

-    go x = case viewl x of
-      EmptyL -> False
-      Chunk _ t :< ts -> TX.any p t || go ts
+any p = Prelude.any (==True) . fmap (TX.any p) . toTextList


any (== True) is or

ethercrow · 2017-01-15T21:01:53Z

src/Yi/Rope.hs

-    go x = case viewl x of
-      EmptyL -> True
-      Chunk _ t :< ts -> TX.all p t && go ts
+all p = Prelude.all (==True) . fmap (TX.all p) . toTextList


all (== True) is and

ghost · 2017-01-15T21:29:27Z

I think that's about everything. I removed the folding patch, because folding over small tress takes longer, and most users will probably be using small text files anyway. There should be an order of magnitude improvement in lines and lines', and everything else should remain about the same.

ethercrow · 2017-01-17T08:45:27Z

I've tried to run the benchmarks on my machine.

lines' before

benchmarking lines'/long 1200
time                 13.12 ms   (12.85 ms .. 13.40 ms)
                     0.997 R²   (0.994 R² .. 0.999 R²)
mean                 13.15 ms   (12.96 ms .. 13.32 ms)
std dev              467.1 μs   (385.0 μs .. 580.4 μs)
variance introduced by outliers: 11% (moderately inflated)

benchmarking lines'/wide 1200
time                 203.4 μs   (200.0 μs .. 206.6 μs)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 203.3 μs   (200.6 μs .. 206.0 μs)
std dev              9.382 μs   (7.785 μs .. 11.53 μs)
variance introduced by outliers: 45% (moderately inflated)

benchmarking lines'/short 1200
time                 3.303 μs   (3.261 μs .. 3.348 μs)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 3.279 μs   (3.239 μs .. 3.320 μs)
std dev              131.0 ns   (109.2 ns .. 163.8 ns)
variance introduced by outliers: 52% (severely inflated)

benchmarking lines'/tiny 1200
time                 112.0 ns   (109.5 ns .. 114.3 ns)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 110.3 ns   (108.8 ns .. 111.8 ns)
std dev              5.329 ns   (4.519 ns .. 6.495 ns)
variance introduced by outliers: 69% (severely inflated)

lines' after

benchmarking lines'/long 1200
time                 539.5 μs   (529.6 μs .. 549.5 μs)
                     0.997 R²   (0.996 R² .. 0.998 R²)
mean                 536.6 μs   (527.6 μs .. 545.8 μs)
std dev              29.39 μs   (24.15 μs .. 34.74 μs)
variance introduced by outliers: 48% (moderately inflated)

benchmarking lines'/wide 1200
time                 413.2 μs   (399.4 μs .. 424.1 μs)
                     0.995 R²   (0.993 R² .. 0.997 R²)
mean                 404.6 μs   (397.6 μs .. 411.4 μs)
std dev              23.16 μs   (19.38 μs .. 28.29 μs)
variance introduced by outliers: 52% (severely inflated)

benchmarking lines'/short 1200
time                 1.489 μs   (1.464 μs .. 1.512 μs)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 1.511 μs   (1.487 μs .. 1.535 μs)
std dev              83.93 ns   (69.78 ns .. 102.7 ns)
variance introduced by outliers: 70% (severely inflated)

benchmarking lines'/tiny 1200
time                 86.99 ns   (85.23 ns .. 88.76 ns)
                     0.997 R²   (0.996 R² .. 0.998 R²)
mean                 86.19 ns   (84.42 ns .. 87.87 ns)
std dev              5.633 ns   (4.807 ns .. 6.571 ns)
variance introduced by outliers: 81% (severely inflated)

ethercrow · 2017-01-17T08:46:53Z

splitAtLine before

benchmarking splitAtLine/long 1200
time                 75.64 μs   (74.29 μs .. 77.32 μs)
                     0.997 R²   (0.997 R² .. 0.998 R²)
mean                 76.61 μs   (75.65 μs .. 77.63 μs)
std dev              3.257 μs   (2.811 μs .. 3.974 μs)
variance introduced by outliers: 45% (moderately inflated)

benchmarking splitAtLine/wide 1200
time                 31.19 μs   (30.81 μs .. 31.64 μs)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 30.91 μs   (30.60 μs .. 31.32 μs)
std dev              1.193 μs   (949.2 ns .. 1.701 μs)
variance introduced by outliers: 44% (moderately inflated)

benchmarking splitAtLine/short 1200
time                 74.45 μs   (73.43 μs .. 75.37 μs)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 73.92 μs   (72.89 μs .. 74.85 μs)
std dev              3.387 μs   (2.839 μs .. 4.136 μs)
variance introduced by outliers: 49% (moderately inflated)

benchmarking splitAtLine/tiny 1200
time                 71.72 μs   (70.58 μs .. 73.06 μs)
                     0.997 R²   (0.996 R² .. 0.999 R²)
mean                 72.46 μs   (71.52 μs .. 73.47 μs)
std dev              3.273 μs   (2.760 μs .. 4.074 μs)
variance introduced by outliers: 48% (moderately inflated)

splitAtLine after

benchmarking splitAtLine/long 1200
time                 87.69 μs   (86.19 μs .. 89.15 μs)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 89.88 μs   (88.57 μs .. 91.28 μs)
std dev              4.584 μs   (3.764 μs .. 5.755 μs)
variance introduced by outliers: 54% (severely inflated)

benchmarking splitAtLine/wide 1200
time                 42.09 μs   (41.54 μs .. 42.74 μs)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 42.96 μs   (42.19 μs .. 43.90 μs)
std dev              2.863 μs   (2.232 μs .. 3.787 μs)
variance introduced by outliers: 69% (severely inflated)

benchmarking splitAtLine/short 1200
time                 84.99 μs   (83.96 μs .. 86.25 μs)
                     0.998 R²   (0.998 R² .. 0.999 R²)
mean                 86.01 μs   (84.96 μs .. 87.22 μs)
std dev              3.781 μs   (2.874 μs .. 5.199 μs)
variance introduced by outliers: 46% (moderately inflated)

benchmarking splitAtLine/tiny 1200
time                 86.77 μs   (84.82 μs .. 88.66 μs)
                     0.997 R²   (0.995 R² .. 0.999 R²)
mean                 85.24 μs   (83.94 μs .. 86.58 μs)
std dev              4.499 μs   (3.670 μs .. 5.889 μs)
variance introduced by outliers: 55% (severely inflated)

ethercrow · 2017-01-17T08:48:07Z

The speedup for lines'/long is impressive but do you know why lines'/wide has regressed?

ghost · 2017-01-17T09:33:16Z

do you know why lines'/wide has regressed?

I'm guessing that it has to do with creating the rope at the end (which results in calls to TX.chunksOf). The cost of this operation is O(n). For wide text, n is fairly large, and hence we see it being slower. For everything else, n ~ 1 => O(1) in an amortized sense.

I'm not sure what caused the slowdown in splitAtLine. I can verify the regression on my machine as well, and I don't think it has to do with the change to the function itself. Maybe GHC is missing a potential optimization because of the increase in code size?

I'm hoping I didn't make any mistakes. In the short term, there's a noticeable reduction in lag for the rasa text editor (which is what made me realize this in the first place). In the long run, it should allow us to use lines more liberally in Yi, since running lines on the full buffer isn't an expensive operation anymore (sans wide text).

ethercrow · 2017-01-24T12:08:09Z

Could you extract the lines' patch in a separate PR so we can release it for Rasa and then tackle the rest?

ChrisPenner · 2017-01-26T20:47:23Z

Thanks for putting time into this! I'm also looking into ways to cache some of the information we use lines' for in Rasa so as to further improve performance; but Rasa's referencing scheme is based around Rows and Columns so anything which improves indexing/splitting on lines is helpful.

In particular it would be useful to either index into specific lines in constant time or to be able to determine the number of characters within a given range/within a given row. I'm considering caching this on a per buffer level in Rasa. Since we recently limited the text-interface to range operations it should be plausible to keep an updated cache mapping line-number to line-length. That's besides this issue though; just figured I'd keep you informed.

ethercrow · 2017-09-14T16:52:56Z

This has bitrotten a bit, please update.

ghost · 2017-09-14T18:16:31Z

This has bitrotten a bit, please update.

Rebased.

ghost changed the title ~~Avoid unnecessary linecount in splitAtLine~~ WIP: General performance improvements Jan 15, 2017

ethercrow reviewed Jan 15, 2017

View reviewed changes

ghost changed the title ~~WIP: General performance improvements~~ General performance improvements Jan 15, 2017

Siddhanathan Shanmugam added 2 commits September 14, 2017 13:36

General performance improvements

a838b2f

Whoops. Fix lines'

735a876

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General performance improvements #8

General performance improvements #8

ghost commented Jan 15, 2017

ghost commented Jan 15, 2017

ghost commented Jan 15, 2017

ethercrow Jan 15, 2017 •

edited

Loading

ethercrow Jan 15, 2017

ghost commented Jan 15, 2017

ethercrow commented Jan 17, 2017

ethercrow commented Jan 17, 2017

ethercrow commented Jan 17, 2017

ghost commented Jan 17, 2017

ethercrow commented Jan 24, 2017

ChrisPenner commented Jan 26, 2017

ethercrow commented Sep 14, 2017

ghost commented Sep 14, 2017

General performance improvements #8

Are you sure you want to change the base?

General performance improvements #8

Conversation

ghost commented Jan 15, 2017

ghost commented Jan 15, 2017

ghost commented Jan 15, 2017

ethercrow Jan 15, 2017 • edited Loading

Choose a reason for hiding this comment

ethercrow Jan 15, 2017

Choose a reason for hiding this comment

ghost commented Jan 15, 2017

ethercrow commented Jan 17, 2017

ethercrow commented Jan 17, 2017

ethercrow commented Jan 17, 2017

ghost commented Jan 17, 2017

ethercrow commented Jan 24, 2017

ChrisPenner commented Jan 26, 2017

ethercrow commented Sep 14, 2017

ghost commented Sep 14, 2017

ethercrow Jan 15, 2017 •

edited

Loading