Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the linear edge list generation using caching #2321

Merged
merged 1 commit into from
Nov 29, 2018

Conversation

DolphinDream
Copy link
Collaborator

@DolphinDream DolphinDream commented Nov 28, 2018

Profiling the edge generation in some nodes indicated that it is a slow operation and since this edge list is generated over and over in various nodes it makes sense to cache it. By using caching a speedup of more than an order of magnitude in generating the edges was measured compared to generating the edges using list comprehension (e.g. edges = [[i, i+1] for i in range(N)].

With the proposed optimization, the list of edges are stored in an edge cache (e.g. [[0,1], [1,2]... [n-1,n]] .. and the cache is extended as longer edge lists are requested/generated. Any call to get_edge_list will return a subset of this edge cache thus not having to re-generate the same list over and over.

Various nodes like the line, spiral, torus knot, ellipse etc, which generate lines with linear list of edges can benefit substantially from this speedup.

example code:

edges = get_edge_list(n)
vs
edges = [[i, i+1] for i in range(n)]

Additionally, if multiple calls are going to be made to get_edge_list with different numbers (of which max value is known), one could further optimize the edge generation calls by pre-caching the largest edge list by first calling update_edge_list(maxN), this way the subsequent calls to get_edge_list with different n values (less than maxN) will not have to extend the cache since the cache is large enough to generate edges for all n values.

  • Code changes complete.
  • Code documentation complete.
  • Documentation for users complete (or not required, if user never sees these changes).
  • Manual testing done.
  • Unit-tests implemented.
  • Ready for merge.

@DolphinDream
Copy link
Collaborator Author

DolphinDream commented Nov 28, 2018

Below is the profiling for Line MK4 node (in development) with and without the edge generation optimization. The first statistics are for the line node using the optimized edge generation (using the edge cache), the other is for the node using the typical list comprehension. The percentages of the edge generation time (get_edge_list) from the total time spent in making the line (make_line) are significantly different:

0.420 / 30.962 = 1.36% - for the optimized edge generation
20.392 / 50.969 = 40% - for the non optimized edge generation 
  
   Ordered by: cumulative time

(optimized edge generation)

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2126    0.395    0.000   35.379    0.017 line_mk4.py:350(process)
   257246    0.450    0.000   33.981    0.000 profile.py:89(wrapper)
   257246    9.885    0.000   30.962    0.000 line_mk4.py:186(make_line)
 24023800   16.645    0.000   16.645    0.000 line_mk4.py:81(<lambda>)
   257246    0.719    0.000    2.516    0.000 profile.py:45(is_profiling_enabled)
 29598172    2.317    0.000    2.317    0.000 {method 'append' of 'list' objects}
   514492    0.168    0.000    0.943    0.000 {built-in method builtins.next}
   257246    0.103    0.000    0.915    0.000 contextlib.py:57(__enter__)
   514492    0.463    0.000    0.774    0.000 context_managers.py:29(sv_preferences)
  2402380    0.602    0.000    0.602    0.000 line_mk4.py:70(<lambda>)
  2402380    0.595    0.000    0.595    0.000 line_mk4.py:76(<lambda>)
   257246    0.188    0.000    0.530    0.000 contextlib.py:134(helper)
   257246    0.275    0.000    0.420    0.000 data_structure.py:968(get_edge_list)
     6378    0.022    0.000    0.411    0.000 socket_data.py:85(SvGetSocket)
     4252    0.015    0.000    0.370    0.000 node_tree.py:299(sv_get)
   257246    0.221    0.000    0.351    0.000 contextlib.py:63(__exit__)
   257246    0.298    0.000    0.342    0.000 contextlib.py:37(__init__)
     4252    0.153    0.000    0.333    0.000 data_structure.py:94(match_long_repeat)

(not optimized edge generation)

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2089    0.433    0.000   55.702    0.027 line_mk4.py:350(process)
   252769    0.485    0.000   54.237    0.000 profile.py:89(wrapper)
   252769   10.016    0.000   50.969    0.000 line_mk4.py:186(make_line)
   252769    0.290    0.000   20.392    0.000 data_structure.py:968(get_edge_list)
   252769   20.102    0.000   20.102    0.000 data_structure.py:978(<listcomp>)
 23605700   16.376    0.000   16.376    0.000 line_mk4.py:81(<lambda>)
   252769    0.794    0.000    2.730    0.000 profile.py:45(is_profiling_enabled)
 29083058    2.308    0.000    2.308    0.000 {method 'append' of 'list' objects}
   505538    0.178    0.000    1.014    0.000 {built-in method builtins.next}
   252769    0.111    0.000    0.986    0.000 contextlib.py:57(__enter__)
   505538    0.510    0.000    0.836    0.000 context_managers.py:29(sv_preferences)
  2360570    0.770    0.000    0.770    0.000 line_mk4.py:76(<lambda>)
  2360570    0.666    0.000    0.666    0.000 line_mk4.py:70(<lambda>)
   252769    0.201    0.000    0.576    0.000 contextlib.py:134(helper)
     6267    0.023    0.000    0.420    0.000 socket_data.py:85(SvGetSocket)
     4178    0.014    0.000    0.377    0.000 node_tree.py:299(sv_get)
   252769    0.326    0.000    0.376    0.000 contextlib.py:37(__init__)
   252769    0.236    0.000    0.375    0.000 contextlib.py:63(__exit__)
     4178    0.155    0.000    0.339    0.000 data_structure.py:94(match_long_repeat)



def update_edge_cache(n):
'''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these look like backticks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked. they are not. I can replace them with normal quotes if that’s more desirable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no matter, docstring syntax is unfortunately mixed in data_structure.py. so asking you to use double quotes would be almost arbitrary.

@DolphinDream
Copy link
Collaborator Author

DolphinDream commented Nov 29, 2018

(zeffi) isn't something like this already ridiculously fast?


import numpy as np

def get_edges_A(n, as_list=False):
    edge_array = np.array((np.arange(n), np.arange(1, 1+n)), np.int32).transpose()
    if as_list:
        edge_array = edge_array.tolist()
    return edge_array

def get_edges_B(n, as_list=False):
    array_one = np.arange(n + 1)
    array_two = np.roll(array_one, -1)
    edge_array = np.array((array_one, array_two), np.int32).transpose()[:-1]
    if as_list:
        edge_array = edge_array.tolist()
    return edge_array


print(get_edges_A(4))
print(get_edges_B(4))

It seems neither of these two are any faster than even the list comprehension.. here are some profiling results:

Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2383    0.588    0.000   72.575    0.030 line_mk4.py:352(process)
   288343    0.679    0.000   70.499    0.000 profile.py:89(wrapper)
   288343   13.154    0.000   65.959    0.000 line_mk4.py:186(make_line)
   288343    0.731    0.000   26.097    0.000 data_structure.py:933(get_edges_A)
   288343   23.613    0.000   23.613    0.000 {method 'tolist' of 'numpy.ndarray' objects}
 26927900   21.312    0.000   21.312    0.000 line_mk4.py:81(<lambda>)
   288343    1.115    0.000    3.798    0.000 profile.py:45(is_profiling_enabled)
 33176126    2.927    0.000    2.927    0.000 {method 'append' of 'list' objects}
   576686    0.233    0.000    1.387    0.000 {built-in method builtins.next}
   288343    0.145    0.000    1.345    0.000 contextlib.py:57(__enter__)
   576686    0.720    0.000    1.154    0.000 context_managers.py:29(sv_preferences)
  2692790    0.982    0.000    0.982    0.000 line_mk4.py:76(<lambda>)
   288343    0.958    0.000    0.958    0.000 {built-in method numpy.core.multiarray.array}
  2692790    0.872    0.000    0.872    0.000 line_mk4.py:70(<lambda>)
   288343    0.305    0.000    0.819    0.000 contextlib.py:134(helper)
   576686    0.702    0.000    0.702    0.000 {built-in method numpy.core.multiarray.arange}
     7149    0.033    0.000    0.618    0.000 socket_data.py:85(SvGetSocket)
     4766    0.021    0.000    0.554    0.000 node_tree.py:299(sv_get)
   288343    0.332    0.000    0.519    0.000 contextlib.py:63(__exit__)
   288343    0.442    0.000    0.514    0.000 contextlib.py:37(__init__)
     4766    0.221    0.000    0.493    0.000 data_structure.py:91(match_long_repeat)


ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2527    0.613    0.000   78.833    0.031 line_mk4.py:352(process)
   305767    0.766    0.000   76.767    0.000 profile.py:89(wrapper)
   305767   13.525    0.000   71.786    0.000 line_mk4.py:186(make_line)
   305767    0.997    0.000   30.353    0.000 data_structure.py:939(get_edges_B)
   305767   25.037    0.000   25.037    0.000 {method 'tolist' of 'numpy.ndarray' objects}
 28555100   22.338    0.000   22.338    0.000 line_mk4.py:81(<lambda>)
   305767    1.221    0.000    4.146    0.000 profile.py:45(is_profiling_enabled)
 35180894    2.999    0.000    2.999    0.000 {method 'append' of 'list' objects}
   305767    0.935    0.000    2.811    0.000 numeric.py:1335(roll)
   611534    0.248    0.000    1.514    0.000 {built-in method builtins.next}
   305767    0.158    0.000    1.478    0.000 contextlib.py:57(__enter__)
   611534    0.798    0.000    1.265    0.000 context_managers.py:29(sv_preferences)
  2855510    1.025    0.000    1.025    0.000 line_mk4.py:76(<lambda>)
   611534    0.996    0.000    0.996    0.000 {built-in method numpy.core.multiarray.array}
   917301    0.906    0.000    0.906    0.000 {built-in method numpy.core.multiarray.arange}
  2855510    0.894    0.000    0.894    0.000 line_mk4.py:70(<lambda>)
   305767    0.320    0.000    0.886    0.000 contextlib.py:134(helper)
     7581    0.032    0.000    0.601    0.000 socket_data.py:85(SvGetSocket)
   305767    0.577    0.000    0.577    0.000 {built-in method numpy.core.multiarray.concatenate}
   305767    0.484    0.000    0.566    0.000 contextlib.py:37(__init__)
   305767    0.367    0.000    0.562    0.000 contextlib.py:63(__exit__)
     5054    0.021    0.000    0.539    0.000 node_tree.py:299(sv_get)
     5054    0.210    0.000    0.473    0.000 data_structure.py:91(match_long_repeat)

The percentage of time to generate edge / time to make the line:

26.097 / 65.959 = 39.56% - for get_edges_A
30.353 / 71.786 = 42.28% - for get_edges_B 

which is close to the list comprehension results.

Honestly, I suspect it’s gonna be hard to beat the edge cache implementation. I’m not bragging, but just thinking that any conversion to other types would necessarily incur some overhead which is going to be more than just slicing a cached list (yet, I can’t claim I know enough python to say this for sure. Just my own hunch). Sometimes sacrificing some memory to gain speed may be the only way. Besides, memory is not even a big concern since even if we cache a 1 million edge list (which would likely be way and beyond what any node tree would ever need to create), such a list hardly imposes any memory burden (about 8MB). Though, a growing edge cache per node tree necessity I think it would be better than pre caching some arbitrarily large edge list.

https://stackoverflow.com/questions/43404210/how-much-memory-will-a-list-with-one-million-elements-take-up-in-python

@zeffii
Copy link
Collaborator

zeffii commented Nov 29, 2018

by all means cache the living snot out of the things that can be cached :)

@DolphinDream
Copy link
Collaborator Author

If there are no objections this is ready to land.

@zeffii
Copy link
Collaborator

zeffii commented Nov 29, 2018

do it

Profiling the edge generation indicated that it is a slow operation and since this edge list is generated over and over in various nodes it makes sense to cache it. Profiling of various nodes (generating edges) indicated a speedup of almost two orders of magnitude in generating the edges compared to the list comprehension counterpart (e.g. edges = [[i, i+1] for i in range(N)].

With the proposed optimization, the list of edges are stored in an edge cache (e.g. [[0,1], [1,2]... [n-1,n]] .. and the cache is extended as longer edge lists are requested/generated. Any call to get_edge_list will return a subset of this edge cache thus not having to re-generate the same list over and over.

Various nodes like the line, spiral, torus knot, ellipse etc, which that generate lines with linear list of edges can benefit substantially from this speedup.

To get the linear edge list use:
edges = get_edge_list(n)
returning: [[0, 1], [1, 2], ... , [n-1, n]]

To get the loop edge list use:
edges = get_edge_loop(n)
returning: [[0, 1], [1, 2], ... , [n-2, n-1], [n-1, 0]]
@DolphinDream DolphinDream merged commit f3379f2 into nortikin:master Nov 29, 2018
@DolphinDream DolphinDream deleted the edgeGenerationSpeedUp branch November 29, 2018 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants