Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naive implementations of optimizations suggested by breznak on pull-request n. 769 #776

Closed
wants to merge 4 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Feb 20, 2020

With regards to the optimizations suggested by breznak toward the end of pull-request n. 769 ( #769 (comment) and subsequent posts), I've added naive implementations of the caching of numNeigbors (in wrapping neighborhoods) and of the caching of all the neighbors of each column (in wrapping neighborhoods).

My results are as follows: NOTE: I'm still only using dynamic_hotgym as a benchmark, and haven't looked at mnist_sp yet

  • Caching of numNeighbors resulted in a a drop of ~6.5% in total time spent in local inhibition (as compared to the original run time);
  • Caching of all neighbors of columns, when applied only to inhibitColumnsLocal_(), resulted in a drop of ~14% in total time spent in local inhibition (as compared to the original run time);
  • (*)Caching of all neighbors of columns, when applied to updateBoostFactorsLocal_(), resulted in a drop of ~55% in total time spent in local inhibition (as compared to the original run time).

(*) It should be noted that in dynamic_hotgym, updateBoostFactorsLocal_() could be skipped completely, as boostFactor_ is always = 0. However, I don't believe this has any bearing on the results.

I forgot to keep track of memory usage, but I did some back of hand calculations and figured it shouldn't be so bad if you have many GB of RAM and don't go over 10k columns. Maybe this speed/memory trade-off could be added as an option, instead of as mandatory usage?

Thanks again breznak.

For #123

@ghost ghost requested a review from breznak February 20, 2020 20:16
@breznak
Copy link
Member

breznak commented Feb 20, 2020

(*)Caching of all neighbors of columns, when applied to updateBoostFactorsLocal_(), resulted in a drop of ~55% in total time spent in local inhibition

wow! 6, 14 and 55% increase are very encouraging numbers!

in total time spent in local inhibition (as compared to the original run time)

does it translate well into the real speed? I mean, the local inh function is now faster, but other computations can be slower. Is the real speed (reported by the benchmark, eg) also better?

And something is crashing, I'll have a look if I find a bug..

Copy link
Member

@breznak breznak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks promising, let's cleanup the errors and then we can make the implementation nicer and maybe even faster!

src/htm/algorithms/SpatialPooler.cpp Show resolved Hide resolved
src/htm/algorithms/SpatialPooler.cpp Show resolved Hide resolved
neighborMap_.clear();
neighborMap_.reserve(wrapAroundNeighbors_ * numColumns_);
for(UInt column=0; column<numColumns_;column++) {
for(auto neighbor: WrappingNeighborhood(column, inhibitionRadius_,columnDimensions_)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my idea was to spend a really lot of time, mem in init, and cache this for all of columns x inhibition_range where cols typically 1-4k, inh radius (I'll have to see if it can be fixed, or) varies in min(columns dim)/2. I think we can stuff that into RAM

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that there are more things that should be cached? What did I miss?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did I miss?

yes, this depends on inhibitionRadius_ that's why you need to call the method each time it changes in https://github.com/htm-community/htm.core/pull/776/files/95b8dfee87a8caab33955e1f691ca4d6a89cd24a#diff-75db947f594e8a477099a6c8bb86daa0R484

the extreme idea is:

//in constructor, pseudocode:

for inhRadius in <all ranges of inh radius>:
  for col in all-columns:
    for auto neighbor in WrappingNeighborhood(col, inhR, columnDimensions):
      this->neighborMap[inhR][col].push_back(neighbor);

Copy link
Author

@ghost ghost Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often does the inhibition radius change in real applications? I ran the "perf" tool against dynamic_hotgym with the commits in this PR and now mapAllNeighbors() (including the functions it calls) accounts for less than 1% of total run-time (plus or minus the margin of error of perf). Among the spatialPooler functions, the big villains are still updateBoostFactorsLocal_() (27% of total runtime) and inhibitColumnsLocal_() (13% of total runtime), but this has more to do with the amount of calculations they have to perform, I think.

Also, how do you figure memory requirements would scale with regards to amount of columns and disposition of columns in dimensions?

Edit: I mean, because in wrapAroundNeighborhoods, there can be easily a single map that goes from minInhibitionRadius_ to maxInhibitionRadius_, so RAM isn't really a concern, but without wrapAround this becomes more difficult (I think? maybe I'm wrong there).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often does the inhibition radius change in real applications?

every isUpdateRound_ which defaults to 500, so 1/500th of the time. We've almost never changed that default.

now mapAllNeighbors() (including the functions it calls) accounts for less than 1% of total run-time

ok, I agree this is good enough and not worth complicating things. Leave as is.

the big villains are still updateBoostFactorsLocal_() (27% of total runtime)

this is where the cached Neighborhood should help, all the function does is just looping through the neighborhood and increasing counter to get numNeighbors, I'd like to get it in O(1)

there can be easily a single map that goes from minInhibitionRadius_ to maxInhibitionRadius_, so RAM isn't really a concern, but without wrapAround this becomes more difficult

I'd suggest not to worry about wrapAround=False case (unless you can do it easily too). If these patches work well, I'll suggest keeping only the WrappingNeighborhood (unless there are proven use-cases for the non-wrapping).


const UInt numActive_wrap = (UInt)(0.5f + (density * (numNeighbors + 1)));

const UInt mapOffset = column * (numNeighbors+1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to shield off this logic of indexing in the map (of cached values) from the SP. Ideally later we extend Neighbourhood() to have param preCache=true and do all of this internally.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it should be done in such a way that we'd still use the same (auto neighbor: WrappingNeighborhood) logic, is that it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd like the data const vector<SynapseIdx>[][]& mapNeighbors accessible from WrappingNeighborhood and filled in constructor there.
Used as

vector& neighbors = map[radius][col];

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try. This would probably the hardest change to make, I think(?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but I think it can be done quite nicely. (I haven't looked into the details, but this would be my high-level idea:)

  • add a new class CachedWrappingNeighborhood:
CachedWrappingNeighborhood(Neighborhood): 
constructor(rangeCenter, rangeRadius, dimColumns) {
  //this takes a lot of time, once
  for r in rangeRadius {
    for col in rangeCenter {
      hood = WrappingNeighborhood(col, r, colDimensions);
       vector locals={};
      for( neighbor : hood) {
        locals.append(neighbor); 
      }
      this->cachedHood[r][col] = locals;
  }}
}

@breznak
Copy link
Member

breznak commented Feb 20, 2020

Some tests are failing,

https://github.com/htm-community/htm.core/pull/776/checks?check_run_id=458711654#step:5:246

if you run ./build/Release/bin/unit_tests you'll get a complete test suite we use

@ghost
Copy link
Author

ghost commented Feb 20, 2020

Some tests are failing,

I saw that. This last commit should fix it.

@breznak breznak assigned ghost Feb 20, 2020
@ghost
Copy link
Author

ghost commented Feb 20, 2020

does it translate well into the real speed? I mean, the local inh function is now faster, but other computations can be slower. Is the real speed (reported by the benchmark, eg) also better?

It does translate into real speed. Real speed is much improved (at least on my machine).

@ghost
Copy link
Author

ghost commented Feb 20, 2020

This last commit should fix it.

It didn't entirely fix things. It seems the C++ tests are passing, but the python tests fail at some point (in Pickle)? I may have a hard time figuring this one out, as I've never even looked at the python bindings. I'll post an update if I find out anything.

@breznak
Copy link
Member

breznak commented Feb 20, 2020

It didn't entirely fix things. It seems the C++ tests are passing, but the python tests fail at some point (in Pickle)? I may have a hard time figuring this one out, as I've never even looked at the python bindings. I'll post an update if I find out anything.

I'll have a look later, or we can as @dkeeney ?, he's been digging into the serializations.

EDIT: yes, it'll likely be serialization, you've added new member variables. But before focusing on fixing that, if other C++ tests pass, let's make this code clean and look at things like
#776 (comment)

@dkeeney
Copy link

dkeeney commented Feb 21, 2020

I'll have a look later, or we can as @dkeeney ?, he's been digging into the serializations.

I am deep into the REST interface PR at the moment but if you cannot find the problem with serialization I can take a look at it later.

@breznak
Copy link
Member

breznak commented Feb 21, 2020

I am deep into the REST interface PR at the moment but if you cannot find the problem with serialization I can take a look at it later.

ok, thanks! Not urgent right now, I want to figure the implementation where the code would actually end up. So we'd just ignore failing serialization tests for now.

@breznak
Copy link
Member

breznak commented Feb 26, 2020

@dkeeney it's a bit strange but the user completely disappeared, I like the changes so-far but we cannot expect any requested fixes or answers.

  • I'll try to finish review and do changes for the code that is here
  • not sure if I have time to implement the biggest proposed change- cached Neighborhoods.

@dkeeney
Copy link

dkeeney commented Feb 26, 2020

I am getting close to the having the first draft of the REST interface ready. Maybe a day or two more. Then I can look at the serialization.

@breznak breznak self-assigned this Feb 26, 2020
@breznak breznak mentioned this pull request Mar 17, 2020
3 tasks
@breznak
Copy link
Member

breznak commented Jun 3, 2020

Closing this branch, follow up in -> #783

@breznak breznak closed this Jun 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants