Optimize the linear edge list generation using caching #2321

DolphinDream · 2018-11-28T21:16:09Z

Profiling the edge generation in some nodes indicated that it is a slow operation and since this edge list is generated over and over in various nodes it makes sense to cache it. By using caching a speedup of more than an order of magnitude in generating the edges was measured compared to generating the edges using list comprehension (e.g. edges = [[i, i+1] for i in range(N)].

With the proposed optimization, the list of edges are stored in an edge cache (e.g. [[0,1], [1,2]... [n-1,n]] .. and the cache is extended as longer edge lists are requested/generated. Any call to get_edge_list will return a subset of this edge cache thus not having to re-generate the same list over and over.

Various nodes like the line, spiral, torus knot, ellipse etc, which generate lines with linear list of edges can benefit substantially from this speedup.

example code:

edges = get_edge_list(n)
vs
edges = [[i, i+1] for i in range(n)]

Additionally, if multiple calls are going to be made to get_edge_list with different numbers (of which max value is known), one could further optimize the edge generation calls by pre-caching the largest edge list by first calling update_edge_list(maxN), this way the subsequent calls to get_edge_list with different n values (less than maxN) will not have to extend the cache since the cache is large enough to generate edges for all n values.

Code changes complete.
Code documentation complete.
Documentation for users complete (or not required, if user never sees these changes).
Manual testing done.
Unit-tests implemented.
Ready for merge.

DolphinDream · 2018-11-28T21:27:47Z

Below is the profiling for Line MK4 node (in development) with and without the edge generation optimization. The first statistics are for the line node using the optimized edge generation (using the edge cache), the other is for the node using the typical list comprehension. The percentages of the edge generation time (get_edge_list) from the total time spent in making the line (make_line) are significantly different:

0.420 / 30.962 = 1.36% - for the optimized edge generation
20.392 / 50.969 = 40% - for the non optimized edge generation

  
   Ordered by: cumulative time

(optimized edge generation)

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2126    0.395    0.000   35.379    0.017 line_mk4.py:350(process)
   257246    0.450    0.000   33.981    0.000 profile.py:89(wrapper)
   257246    9.885    0.000   30.962    0.000 line_mk4.py:186(make_line)
 24023800   16.645    0.000   16.645    0.000 line_mk4.py:81(<lambda>)
   257246    0.719    0.000    2.516    0.000 profile.py:45(is_profiling_enabled)
 29598172    2.317    0.000    2.317    0.000 {method 'append' of 'list' objects}
   514492    0.168    0.000    0.943    0.000 {built-in method builtins.next}
   257246    0.103    0.000    0.915    0.000 contextlib.py:57(__enter__)
   514492    0.463    0.000    0.774    0.000 context_managers.py:29(sv_preferences)
  2402380    0.602    0.000    0.602    0.000 line_mk4.py:70(<lambda>)
  2402380    0.595    0.000    0.595    0.000 line_mk4.py:76(<lambda>)
   257246    0.188    0.000    0.530    0.000 contextlib.py:134(helper)
   257246    0.275    0.000    0.420    0.000 data_structure.py:968(get_edge_list)
     6378    0.022    0.000    0.411    0.000 socket_data.py:85(SvGetSocket)
     4252    0.015    0.000    0.370    0.000 node_tree.py:299(sv_get)
   257246    0.221    0.000    0.351    0.000 contextlib.py:63(__exit__)
   257246    0.298    0.000    0.342    0.000 contextlib.py:37(__init__)
     4252    0.153    0.000    0.333    0.000 data_structure.py:94(match_long_repeat)

(not optimized edge generation)

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2089    0.433    0.000   55.702    0.027 line_mk4.py:350(process)
   252769    0.485    0.000   54.237    0.000 profile.py:89(wrapper)
   252769   10.016    0.000   50.969    0.000 line_mk4.py:186(make_line)
   252769    0.290    0.000   20.392    0.000 data_structure.py:968(get_edge_list)
   252769   20.102    0.000   20.102    0.000 data_structure.py:978(<listcomp>)
 23605700   16.376    0.000   16.376    0.000 line_mk4.py:81(<lambda>)
   252769    0.794    0.000    2.730    0.000 profile.py:45(is_profiling_enabled)
 29083058    2.308    0.000    2.308    0.000 {method 'append' of 'list' objects}
   505538    0.178    0.000    1.014    0.000 {built-in method builtins.next}
   252769    0.111    0.000    0.986    0.000 contextlib.py:57(__enter__)
   505538    0.510    0.000    0.836    0.000 context_managers.py:29(sv_preferences)
  2360570    0.770    0.000    0.770    0.000 line_mk4.py:76(<lambda>)
  2360570    0.666    0.000    0.666    0.000 line_mk4.py:70(<lambda>)
   252769    0.201    0.000    0.576    0.000 contextlib.py:134(helper)
     6267    0.023    0.000    0.420    0.000 socket_data.py:85(SvGetSocket)
     4178    0.014    0.000    0.377    0.000 node_tree.py:299(sv_get)
   252769    0.326    0.000    0.376    0.000 contextlib.py:37(__init__)
   252769    0.236    0.000    0.375    0.000 contextlib.py:63(__exit__)
     4178    0.155    0.000    0.339    0.000 data_structure.py:94(match_long_repeat)

zeffii · 2018-11-28T23:37:25Z

data_structure.py

+
+
+def update_edge_cache(n):
+    '''


these look like backticks?

I checked. they are not. I can replace them with normal quotes if that’s more desirable.

no matter, docstring syntax is unfortunately mixed in data_structure.py. so asking you to use double quotes would be almost arbitrary.

DolphinDream · 2018-11-29T14:15:44Z

(zeffi) isn't something like this already ridiculously fast?


import numpy as np

def get_edges_A(n, as_list=False):
    edge_array = np.array((np.arange(n), np.arange(1, 1+n)), np.int32).transpose()
    if as_list:
        edge_array = edge_array.tolist()
    return edge_array

def get_edges_B(n, as_list=False):
    array_one = np.arange(n + 1)
    array_two = np.roll(array_one, -1)
    edge_array = np.array((array_one, array_two), np.int32).transpose()[:-1]
    if as_list:
        edge_array = edge_array.tolist()
    return edge_array


print(get_edges_A(4))
print(get_edges_B(4))

It seems neither of these two are any faster than even the list comprehension.. here are some profiling results:

Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2383    0.588    0.000   72.575    0.030 line_mk4.py:352(process)
   288343    0.679    0.000   70.499    0.000 profile.py:89(wrapper)
   288343   13.154    0.000   65.959    0.000 line_mk4.py:186(make_line)
   288343    0.731    0.000   26.097    0.000 data_structure.py:933(get_edges_A)
   288343   23.613    0.000   23.613    0.000 {method 'tolist' of 'numpy.ndarray' objects}
 26927900   21.312    0.000   21.312    0.000 line_mk4.py:81(<lambda>)
   288343    1.115    0.000    3.798    0.000 profile.py:45(is_profiling_enabled)
 33176126    2.927    0.000    2.927    0.000 {method 'append' of 'list' objects}
   576686    0.233    0.000    1.387    0.000 {built-in method builtins.next}
   288343    0.145    0.000    1.345    0.000 contextlib.py:57(__enter__)
   576686    0.720    0.000    1.154    0.000 context_managers.py:29(sv_preferences)
  2692790    0.982    0.000    0.982    0.000 line_mk4.py:76(<lambda>)
   288343    0.958    0.000    0.958    0.000 {built-in method numpy.core.multiarray.array}
  2692790    0.872    0.000    0.872    0.000 line_mk4.py:70(<lambda>)
   288343    0.305    0.000    0.819    0.000 contextlib.py:134(helper)
   576686    0.702    0.000    0.702    0.000 {built-in method numpy.core.multiarray.arange}
     7149    0.033    0.000    0.618    0.000 socket_data.py:85(SvGetSocket)
     4766    0.021    0.000    0.554    0.000 node_tree.py:299(sv_get)
   288343    0.332    0.000    0.519    0.000 contextlib.py:63(__exit__)
   288343    0.442    0.000    0.514    0.000 contextlib.py:37(__init__)
     4766    0.221    0.000    0.493    0.000 data_structure.py:91(match_long_repeat)


ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2527    0.613    0.000   78.833    0.031 line_mk4.py:352(process)
   305767    0.766    0.000   76.767    0.000 profile.py:89(wrapper)
   305767   13.525    0.000   71.786    0.000 line_mk4.py:186(make_line)
   305767    0.997    0.000   30.353    0.000 data_structure.py:939(get_edges_B)
   305767   25.037    0.000   25.037    0.000 {method 'tolist' of 'numpy.ndarray' objects}
 28555100   22.338    0.000   22.338    0.000 line_mk4.py:81(<lambda>)
   305767    1.221    0.000    4.146    0.000 profile.py:45(is_profiling_enabled)
 35180894    2.999    0.000    2.999    0.000 {method 'append' of 'list' objects}
   305767    0.935    0.000    2.811    0.000 numeric.py:1335(roll)
   611534    0.248    0.000    1.514    0.000 {built-in method builtins.next}
   305767    0.158    0.000    1.478    0.000 contextlib.py:57(__enter__)
   611534    0.798    0.000    1.265    0.000 context_managers.py:29(sv_preferences)
  2855510    1.025    0.000    1.025    0.000 line_mk4.py:76(<lambda>)
   611534    0.996    0.000    0.996    0.000 {built-in method numpy.core.multiarray.array}
   917301    0.906    0.000    0.906    0.000 {built-in method numpy.core.multiarray.arange}
  2855510    0.894    0.000    0.894    0.000 line_mk4.py:70(<lambda>)
   305767    0.320    0.000    0.886    0.000 contextlib.py:134(helper)
     7581    0.032    0.000    0.601    0.000 socket_data.py:85(SvGetSocket)
   305767    0.577    0.000    0.577    0.000 {built-in method numpy.core.multiarray.concatenate}
   305767    0.484    0.000    0.566    0.000 contextlib.py:37(__init__)
   305767    0.367    0.000    0.562    0.000 contextlib.py:63(__exit__)
     5054    0.021    0.000    0.539    0.000 node_tree.py:299(sv_get)
     5054    0.210    0.000    0.473    0.000 data_structure.py:91(match_long_repeat)

The percentage of time to generate edge / time to make the line:

26.097 / 65.959 = 39.56% - for get_edges_A
30.353 / 71.786 = 42.28% - for get_edges_B

which is close to the list comprehension results.

Honestly, I suspect it’s gonna be hard to beat the edge cache implementation. I’m not bragging, but just thinking that any conversion to other types would necessarily incur some overhead which is going to be more than just slicing a cached list (yet, I can’t claim I know enough python to say this for sure. Just my own hunch). Sometimes sacrificing some memory to gain speed may be the only way. Besides, memory is not even a big concern since even if we cache a 1 million edge list (which would likely be way and beyond what any node tree would ever need to create), such a list hardly imposes any memory burden (about 8MB). Though, a growing edge cache per node tree necessity I think it would be better than pre caching some arbitrarily large edge list.

https://stackoverflow.com/questions/43404210/how-much-memory-will-a-list-with-one-million-elements-take-up-in-python

zeffii · 2018-11-29T14:49:46Z

by all means cache the living snot out of the things that can be cached :)

DolphinDream · 2018-11-29T20:07:24Z

If there are no objections this is ready to land.

zeffii · 2018-11-29T20:21:03Z

do it

Profiling the edge generation indicated that it is a slow operation and since this edge list is generated over and over in various nodes it makes sense to cache it. Profiling of various nodes (generating edges) indicated a speedup of almost two orders of magnitude in generating the edges compared to the list comprehension counterpart (e.g. edges = [[i, i+1] for i in range(N)]. With the proposed optimization, the list of edges are stored in an edge cache (e.g. [[0,1], [1,2]... [n-1,n]] .. and the cache is extended as longer edge lists are requested/generated. Any call to get_edge_list will return a subset of this edge cache thus not having to re-generate the same list over and over. Various nodes like the line, spiral, torus knot, ellipse etc, which that generate lines with linear list of edges can benefit substantially from this speedup. To get the linear edge list use: edges = get_edge_list(n) returning: [[0, 1], [1, 2], ... , [n-1, n]] To get the loop edge list use: edges = get_edge_loop(n) returning: [[0, 1], [1, 2], ... , [n-2, n-1], [n-1, 0]]

DolphinDream force-pushed the edgeGenerationSpeedUp branch from 15108ce to 3ac7eff Compare November 28, 2018 21:19

zeffii reviewed Nov 28, 2018

View reviewed changes

DolphinDream force-pushed the edgeGenerationSpeedUp branch from 0c32ed9 to 12ff390 Compare November 29, 2018 20:35

DolphinDream force-pushed the edgeGenerationSpeedUp branch from 12ff390 to 283a662 Compare November 29, 2018 22:06

DolphinDream merged commit f3379f2 into nortikin:master Nov 29, 2018

DolphinDream deleted the edgeGenerationSpeedUp branch November 29, 2018 22:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the linear edge list generation using caching #2321

Optimize the linear edge list generation using caching #2321

DolphinDream commented Nov 28, 2018 •

edited

Loading

DolphinDream commented Nov 28, 2018 •

edited

Loading

zeffii Nov 28, 2018

DolphinDream Nov 29, 2018

zeffii Nov 29, 2018

DolphinDream commented Nov 29, 2018 •

edited

Loading

zeffii commented Nov 29, 2018

DolphinDream commented Nov 29, 2018

zeffii commented Nov 29, 2018

Optimize the linear edge list generation using caching #2321

Optimize the linear edge list generation using caching #2321

Conversation

DolphinDream commented Nov 28, 2018 • edited Loading

DolphinDream commented Nov 28, 2018 • edited Loading

zeffii Nov 28, 2018

Choose a reason for hiding this comment

DolphinDream Nov 29, 2018

Choose a reason for hiding this comment

zeffii Nov 29, 2018

Choose a reason for hiding this comment

DolphinDream commented Nov 29, 2018 • edited Loading

zeffii commented Nov 29, 2018

DolphinDream commented Nov 29, 2018

zeffii commented Nov 29, 2018

DolphinDream commented Nov 28, 2018 •

edited

Loading

DolphinDream commented Nov 28, 2018 •

edited

Loading

DolphinDream commented Nov 29, 2018 •

edited

Loading