-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the linear edge list generation using caching #2321
Optimize the linear edge list generation using caching #2321
Conversation
15108ce
to
3ac7eff
Compare
Below is the profiling for Line MK4 node (in development) with and without the edge generation optimization. The first statistics are for the line node using the optimized edge generation (using the edge cache), the other is for the node using the typical list comprehension. The percentages of the edge generation time (get_edge_list) from the total time spent in making the line (make_line) are significantly different:
|
data_structure.py
Outdated
|
||
|
||
def update_edge_cache(n): | ||
''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these look like backticks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked. they are not. I can replace them with normal quotes if that’s more desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no matter, docstring syntax is unfortunately mixed in data_structure.py. so asking you to use double quotes would be almost arbitrary.
It seems neither of these two are any faster than even the list comprehension.. here are some profiling results:
The percentage of time to generate edge / time to make the line:
which is close to the list comprehension results. Honestly, I suspect it’s gonna be hard to beat the edge cache implementation. I’m not bragging, but just thinking that any conversion to other types would necessarily incur some overhead which is going to be more than just slicing a cached list (yet, I can’t claim I know enough python to say this for sure. Just my own hunch). Sometimes sacrificing some memory to gain speed may be the only way. Besides, memory is not even a big concern since even if we cache a 1 million edge list (which would likely be way and beyond what any node tree would ever need to create), such a list hardly imposes any memory burden (about 8MB). Though, a growing edge cache per node tree necessity I think it would be better than pre caching some arbitrarily large edge list. |
by all means cache the living snot out of the things that can be cached :) |
If there are no objections this is ready to land. |
do it |
0c32ed9
to
12ff390
Compare
Profiling the edge generation indicated that it is a slow operation and since this edge list is generated over and over in various nodes it makes sense to cache it. Profiling of various nodes (generating edges) indicated a speedup of almost two orders of magnitude in generating the edges compared to the list comprehension counterpart (e.g. edges = [[i, i+1] for i in range(N)]. With the proposed optimization, the list of edges are stored in an edge cache (e.g. [[0,1], [1,2]... [n-1,n]] .. and the cache is extended as longer edge lists are requested/generated. Any call to get_edge_list will return a subset of this edge cache thus not having to re-generate the same list over and over. Various nodes like the line, spiral, torus knot, ellipse etc, which that generate lines with linear list of edges can benefit substantially from this speedup. To get the linear edge list use: edges = get_edge_list(n) returning: [[0, 1], [1, 2], ... , [n-1, n]] To get the loop edge list use: edges = get_edge_loop(n) returning: [[0, 1], [1, 2], ... , [n-2, n-1], [n-1, 0]]
12ff390
to
283a662
Compare
Profiling the edge generation in some nodes indicated that it is a slow operation and since this edge list is generated over and over in various nodes it makes sense to cache it. By using caching a speedup of more than an order of magnitude in generating the edges was measured compared to generating the edges using list comprehension (e.g. edges = [[i, i+1] for i in range(N)].
With the proposed optimization, the list of edges are stored in an edge cache (e.g. [[0,1], [1,2]... [n-1,n]] .. and the cache is extended as longer edge lists are requested/generated. Any call to get_edge_list will return a subset of this edge cache thus not having to re-generate the same list over and over.
Various nodes like the line, spiral, torus knot, ellipse etc, which generate lines with linear list of edges can benefit substantially from this speedup.
example code:
Additionally, if multiple calls are going to be made to get_edge_list with different numbers (of which max value is known), one could further optimize the edge generation calls by pre-caching the largest edge list by first calling update_edge_list(maxN), this way the subsequent calls to get_edge_list with different n values (less than maxN) will not have to extend the cache since the cache is large enough to generate edges for all n values.