-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Naive implementations of optimizations suggested by breznak on pull-request n. 769 #776
Conversation
wow! 6, 14 and 55% increase are very encouraging numbers!
does it translate well into the real speed? I mean, the local inh function is now faster, but other computations can be slower. Is the real speed (reported by the benchmark, eg) also better? And something is crashing, I'll have a look if I find a bug.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks promising, let's cleanup the errors and then we can make the implementation nicer and maybe even faster!
neighborMap_.clear(); | ||
neighborMap_.reserve(wrapAroundNeighbors_ * numColumns_); | ||
for(UInt column=0; column<numColumns_;column++) { | ||
for(auto neighbor: WrappingNeighborhood(column, inhibitionRadius_,columnDimensions_)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my idea was to spend a really lot of time, mem in init, and cache this for all of columns x inhibition_range
where cols typically 1-4k, inh radius (I'll have to see if it can be fixed, or) varies in min(columns dim)/2
. I think we can stuff that into RAM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying that there are more things that should be cached? What did I miss?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What did I miss?
yes, this depends on inhibitionRadius_
that's why you need to call the method each time it changes in https://github.com/htm-community/htm.core/pull/776/files/95b8dfee87a8caab33955e1f691ca4d6a89cd24a#diff-75db947f594e8a477099a6c8bb86daa0R484
the extreme idea is:
//in constructor, pseudocode:
for inhRadius in <all ranges of inh radius>:
for col in all-columns:
for auto neighbor in WrappingNeighborhood(col, inhR, columnDimensions):
this->neighborMap[inhR][col].push_back(neighbor);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often does the inhibition radius change in real applications? I ran the "perf" tool against dynamic_hotgym with the commits in this PR and now mapAllNeighbors() (including the functions it calls) accounts for less than 1% of total run-time (plus or minus the margin of error of perf). Among the spatialPooler functions, the big villains are still updateBoostFactorsLocal_() (27% of total runtime) and inhibitColumnsLocal_() (13% of total runtime), but this has more to do with the amount of calculations they have to perform, I think.
Also, how do you figure memory requirements would scale with regards to amount of columns and disposition of columns in dimensions?
Edit: I mean, because in wrapAroundNeighborhoods, there can be easily a single map that goes from minInhibitionRadius_ to maxInhibitionRadius_, so RAM isn't really a concern, but without wrapAround this becomes more difficult (I think? maybe I'm wrong there).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often does the inhibition radius change in real applications?
every isUpdateRound_
which defaults to 500, so 1/500th of the time. We've almost never changed that default.
now mapAllNeighbors() (including the functions it calls) accounts for less than 1% of total run-time
ok, I agree this is good enough and not worth complicating things. Leave as is.
the big villains are still updateBoostFactorsLocal_() (27% of total runtime)
this is where the cached Neighborhood should help, all the function does is just looping through the neighborhood and increasing counter to get numNeighbors
, I'd like to get it in O(1)
there can be easily a single map that goes from minInhibitionRadius_ to maxInhibitionRadius_, so RAM isn't really a concern, but without wrapAround this becomes more difficult
I'd suggest not to worry about wrapAround=False
case (unless you can do it easily too). If these patches work well, I'll suggest keeping only the WrappingNeighborhood
(unless there are proven use-cases for the non-wrapping).
|
||
const UInt numActive_wrap = (UInt)(0.5f + (density * (numNeighbors + 1))); | ||
|
||
const UInt mapOffset = column * (numNeighbors+1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to shield off this logic of indexing in the map (of cached values) from the SP. Ideally later we extend Neighbourhood() to have param preCache=true
and do all of this internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it should be done in such a way that we'd still use the same (auto neighbor: WrappingNeighborhood) logic, is that it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd like the data const vector<SynapseIdx>[][]& mapNeighbors
accessible from WrappingNeighborhood and filled in constructor there.
Used as
vector& neighbors = map[radius][col];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try. This would probably the hardest change to make, I think(?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but I think it can be done quite nicely. (I haven't looked into the details, but this would be my high-level idea:)
- add a new class CachedWrappingNeighborhood:
CachedWrappingNeighborhood(Neighborhood):
constructor(rangeCenter, rangeRadius, dimColumns) {
//this takes a lot of time, once
for r in rangeRadius {
for col in rangeCenter {
hood = WrappingNeighborhood(col, r, colDimensions);
vector locals={};
for( neighbor : hood) {
locals.append(neighbor);
}
this->cachedHood[r][col] = locals;
}}
}
Some tests are failing,
if you run |
I saw that. This last commit should fix it. |
It does translate into real speed. Real speed is much improved (at least on my machine). |
It didn't entirely fix things. It seems the C++ tests are passing, but the python tests fail at some point (in Pickle)? I may have a hard time figuring this one out, as I've never even looked at the python bindings. I'll post an update if I find out anything. |
I'll have a look later, or we can as @dkeeney ?, he's been digging into the serializations. EDIT: yes, it'll likely be serialization, you've added new member variables. But before focusing on fixing that, if other C++ tests pass, let's make this code clean and look at things like |
I am deep into the REST interface PR at the moment but if you cannot find the problem with serialization I can take a look at it later. |
ok, thanks! Not urgent right now, I want to figure the implementation where the code would actually end up. So we'd just ignore failing serialization tests for now. |
@dkeeney it's a bit strange but the user completely disappeared, I like the changes so-far but we cannot expect any requested fixes or answers.
|
I am getting close to the having the first draft of the REST interface ready. Maybe a day or two more. Then I can look at the serialization. |
Closing this branch, follow up in -> #783 |
With regards to the optimizations suggested by breznak toward the end of pull-request n. 769 ( #769 (comment) and subsequent posts), I've added naive implementations of the caching of numNeigbors (in wrapping neighborhoods) and of the caching of all the neighbors of each column (in wrapping neighborhoods).
My results are as follows: NOTE: I'm still only using dynamic_hotgym as a benchmark, and haven't looked at mnist_sp yet
(*) It should be noted that in dynamic_hotgym, updateBoostFactorsLocal_() could be skipped completely, as boostFactor_ is always = 0. However, I don't believe this has any bearing on the results.
I forgot to keep track of memory usage, but I did some back of hand calculations and figured it shouldn't be so bad if you have many GB of RAM and don't go over 10k columns. Maybe this speed/memory trade-off could be added as an option, instead of as mandatory usage?
Thanks again breznak.
For #123