Memory footprint #534

breznak · 2019-07-03T05:07:23Z

Connections use unsigned char for SynapseIdx, SegmentIdx
- this limits synapses per segment, segments per cells to 255; which sounds reasonable
SP remove tieBreaker vector
- and fixing resulting exact output checks

this is for memory footprint, using unsigned char (upto 255) instead of uint_16_t

as those are not needed, and make memory & CPU footprint

output at least some, not empty

also chaged Encoder gold, because in Hotgym I needed to change seed to get reasonable (other problem) representation from TM

breznak · 2019-07-03T05:08:27Z

src/examples/hotgym/HelloSPTP.cpp

@@ -166,29 +166,29 @@ Real64 BenchmarkHotgym::run(UInt EPOCHS, bool useSPlocal, bool useSPglobal, bool
      // check deterministic SP, TM output 
      SDR goldEnc({DIM_INPUT});
      const SDR_sparse_t deterministicEnc{
-        0, 4, 13, 21, 24, 30, 32, 37, 40, 46, 47, 48, 50, 51, 64, 68, 79, 81, 89, 97, 99, 114, 120, 135, 136, 140, 141, 143, 144, 147, 151, 155, 161, 162, 164, 165, 169, 172, 174, 179, 181, 192, 201, 204, 205, 210, 213, 226, 237, 242, 247, 249, 254, 255, 262, 268, 271, 282, 283, 295, 302, 306, 307, 317, 330, 349, 353, 366, 368, 380, 393, 399, 404, 409, 410, 420, 422, 441,446, 447, 456, 458, 464, 468, 476, 497, 499, 512, 521, 528, 531, 534, 538, 539, 541, 545, 550, 557, 562, 565, 575, 581, 589, 592, 599, 613, 617, 622, 647, 652, 686, 687, 691, 699, 704, 710, 713, 716, 722, 729, 736, 740, 747, 749, 753, 754, 758, 766, 778, 790, 791, 797, 800, 808, 809, 812, 815, 826, 828, 830, 837, 838, 852, 853, 856, 863, 864, 873, 878, 882, 885, 893, 894, 895, 905, 906, 914, 915, 920, 924, 927, 937, 939, 944, 947, 951, 954, 956, 967, 968, 969, 973, 975, 976, 981, 991, 998
+        0, 4, 13, 21, 24, 30, 32, 37, 40, 46, 47, 48, 50, 51, 64, 68, 79, 81, 89, 97, 99, 114, 120, 135, 136, 140, 141, 143, 144, 147, 151, 155, 161, 162, 164, 165, 169, 172, 174, 179, 181, 192, 201, 204, 205, 210, 213, 226, 227, 237, 242, 247, 249, 254, 255, 262, 268, 271, 282, 283, 295, 302, 306, 307, 317, 330, 349, 353, 366, 380, 383, 393, 404, 409, 410, 420, 422, 441,446, 447, 456, 458, 464, 468, 476, 497, 499, 512, 521, 528, 531, 534, 538, 539, 541, 545, 550, 557, 562, 565, 575, 581, 589, 592, 599, 613, 617, 622, 647, 652, 686, 687, 691, 699, 704, 710, 713, 716, 722, 729, 736, 740, 747, 749, 753, 754, 758, 766, 778, 790, 791, 797, 800, 808, 809, 812, 815, 826, 828, 830, 837, 852, 853, 856, 863, 864, 873, 878, 882, 885, 893, 894, 895, 905, 906, 914, 915, 920, 924, 927, 937, 939, 944, 947, 951, 954, 956, 967, 968, 969, 973, 975, 976, 979, 981, 991, 998


encoder changed, because I had to change seed; otherwise TM gave no output in this configuration.
Proper fix queued.

breznak · 2019-07-03T05:09:30Z

src/htm/algorithms/Connections.cpp

@@ -199,7 +199,7 @@ void Connections::destroySegment(Segment segment) {
  const auto segmentOnCell =
      std::lower_bound(cellData.segments.begin(), cellData.segments.end(),
                       segment, [&](Segment a, Segment b) {
-                         return segmentOrdinals_[a] < segmentOrdinals_[b];
+                         return segmentOrdinals_[a] < segmentOrdinals_[b]; //TODO will this be slow if ordinals moved to SegmentData?


? My idea is to get rid of the *Ordinals_ arrays in Conn, and keep that number in SegmentData

breznak · 2019-07-03T05:10:12Z

src/htm/algorithms/Connections.hpp

-using SegmentIdx= UInt16; /** Index of segment in cell. */
-using SynapseIdx= UInt16; /** Index of synapse in segment. */ //TODO profile to use better (smaller?) types
+using SegmentIdx= unsigned char; /** Index of segment in cell. */
+using SynapseIdx= unsigned char; /** Index of synapse in segment. */


the new type, only 1B!. If should overflow we have check to inform

thanks for calling me out on this! Reverted back to UInt16, as it's more usable for many potential synapses. Also the UInt16 actually runs faster.

src/htm/algorithms/SpatialPooler.cpp

breznak · 2019-07-03T05:12:30Z

src/htm/algorithms/SpatialPooler.cpp

-  // Add a tiebreaker to the overlaps so that the output is deterministic.
-  vector<Real> overlaps_(overlaps.begin(), overlaps.end());
-  for(UInt i = 0; i < numColumns_; i++)
-    overlaps_[i] += tieBreaker_[i];


removed tieBreakers.

saves memory

makes inhibition faster

breznak · 2019-07-03T07:48:11Z

[ RUN ] TMRegionTest.testLinking
In VectorFileSensor TestOutputDir/TMRegionTestInput.csv
Reading CSV file
Read 10 vectors
unknown file: error: SEH exception with code 0xc0000005 thrown in the test body.

some weird error on Windows CI, restarting in hope it's just random

breznak · 2019-07-03T08:23:49Z

@dkeeney if you have time, could you please test this on a local Windows machine, please? THe PR keeps throwing a weird err on Windows CI or crashing the CI build, while the PR is relatively easy and I don't see a reason why this should happen.

dkeeney · 2019-07-03T12:42:09Z

could you please test this on a local Windows machine, please?

Sure, I will try it out.

dkeeney · 2019-07-03T13:36:08Z

Ok, here is the problem: in TMRegionTest.cpp line 292

  std::vector<Byte> expected3out = VectorHelpers::sparseToBinary<Byte>(
            {
	     0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 30, 
	     31, 32, 33, 34, 35, 36, 37, 38, 39, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 70,
	     71, 72, 73, 74, 85, 86, 87, 88, 890, 1, 2, 3, 4, 10, 11, 12, 13, 14, 15, 16, 17,
	     18, 19, 20, 21, 22, 23, 24, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 50, 51, 52, 53, 
	     54, 55, 56, 57, 58, 59, 70, 71, 72, 73, 74, 85, 86, 87, 88, 89 
	     }, (UInt32)r3OutputArray.getCount());

The r3OutputArray size is 100.
There is a typo in the middle of the initialization string.
The value 890 should be 89.

This sparseToBinary( ) function probably should have checked for index-out-of-range at least in debug mode.

I will let you make the change.

breznak · 2019-07-03T13:55:25Z

The value 890 should be 89.

wow, you have an eye of a bug hunter :) Thank you!

Is the SEH kind of segfault error on windows? I was suprised I didn't find anything like that.

Also strange it passes (?) of Unixes.

I had a PR getting rid of SparseToBinary in favour of SDR, should revisit it once things clear up

dkeeney

Looks good to me.

dkeeney · 2019-07-03T16:12:22Z

src/examples/hotgym/HelloSPTP.cpp

@@ -64,7 +64,7 @@ Real64 BenchmarkHotgym::run(UInt EPOCHS, bool useSPlocal, bool useSPglobal, bool
  SpatialPooler  spLocal(enc.dimensions, vector<UInt>{COLS}); // Spatial pooler with local inh
  spGlobal.setGlobalInhibition(true);
  spLocal.setGlobalInhibition(false);
-  Random rnd(1); //uses fixed seed for deterministic output checks
+  Random rnd(42); //uses fixed seed for deterministic output checks


Ah, back to 42 ... "the meaning of life" 👍

dkeeney · 2019-07-03T16:23:27Z

src/htm/algorithms/SpatialPooler.hpp

@@ -287,7 +287,6 @@ class SpatialPooler : public Serializable
    ar(CEREAL_NVP(overlapDutyCycles_));
    ar(CEREAL_NVP(activeDutyCycles_));
    ar(CEREAL_NVP(minOverlapDutyCycles_));
-    ar(CEREAL_NVP(tieBreaker_));


Since tieBreaker is deterministic it does not need to be saved so this is correct. But are you sure it is regenerated after the restore? If it should be different then answers will eventually drift off of the correct set on subsequent iterations.

no, tie breakers are totally removed and not used, as they did not have 100% the effect we expected. And you've obsoleted them in the long-dragging Deterministic builds PR 👍 where you figured we need to add the a > b to the segment ordering algorithm.

This is because tie breakers sometimes work and break ties:
{1,1} + {0, 0.01} -> L < R

but sometimes they just break (we apply tie breaks flat out to all, for performance reasons)
{1, 0.99} + {0, 0.01} -> L ? R broken

TL;DR this PR removes tieBreakers altogether because those are not needed (we do deterministic sort)

see the removal #534 (comment)

Oh, somehow I missed the fact that they were removed. I like that you did that.

dkeeney · 2019-07-03T16:24:08Z

src/htm/utils/VectorHelpers.hpp

@@ -26,6 +26,7 @@ class VectorHelpers
  {
    std::vector<T> binary(width);
    for (auto sparseIdx: sparseVector) {
+      NTA_ASSERT(sparseIdx < binary.size()) << "attemping to insert out of bounds element! " << sparseIdx;


dkeeney · 2019-07-03T16:26:59Z

src/test/unit/regions/TMRegionTest.cpp

+	     0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 30, 
+	     31, 32, 33, 34, 35, 36, 37, 38, 39, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 70,
+	     71, 72, 73, 74, 85, 86, 87, 88, 89
+	     }, (UInt32)r3OutputArray.getCount());


Good, you corrected the paste error :)

ctrl-z-9000-times · 2019-07-03T16:55:37Z

I would really like to see some measurements of the run-time & memory usage of an HTM before and after these changes. Tests using real world data would be even better. Optimizing code without measuring the results is haphazard.

dkeeney · 2019-07-03T16:57:09Z

Is the SEH kind of segfault error on windows?

There are two exception facilities on windows, SEH is the one that handles most hardware faults. We use the other one EHsc ...the one that handles only std::exceptions per the standard. C++ cannot use both facilities at the same time so hardware faults (like index out of range and divide by 0) will cause crashes.

I suspect the reason Linux did not see this error is that the offset just happened to fall within some other accessible data space...would have caused a really hard to find bug if it hit something important.

breznak · 2019-07-03T17:47:07Z

I would really like to see some measurements of the run-time & memory usage of an HTM before and after these changes. Tests using real world data would be even better.

tie breakers are now useless and should just go. Saves a bit on computation and mem.
255 synapses per seg, segments per cell:
well, my ideal was playing with very large models. Also I thought less storage -> more fit in cpu cache.
I have never seen a real world model here that would exceed the 255 limit, so I thought why not.
Real world impact is quite minor (2B per segment).

Figure 6 illustrates the performance of a network of HTM neurons implementing a high-order sequence memory. The network used in Figure 6 consists of 2048 mini-columns with 32 neurons per mini-column. Each neuron has 128 basal dendritic segments, and each dendritic segment has up to 40 actual synapses

https://www.frontiersin.org/articles/10.3389/fncir.2016.00023/full

To sum up, I think we can back up the "to char" change. If we never need it anyway, we could use the smaller version, if there is a possible scenario where 1000s synapses per segment would be useful, let's return to uint16.

SynapseData is a different beast, which is much more populous, #360

ctrl-z-9000-times · 2019-07-03T21:14:29Z

if there is a possible scenario where 1000s synapses per segment would be useful, let's return to uint16.

Segments should not have 1000s of synapses, but they could have 1000s of potential synapses.

This is only an issue for the spatial pooler which makes synapses for the whole potential pool. The temporal memory makes the synapses as needed.

as there may be cases where 255 would be too small. UInt16 actually performs faster.

breznak · 2019-07-04T18:23:48Z

Reverted back to UInt16. This is now only the tieBreaker removal PR. Please rereview

ctrl-z-9000-times · 2019-07-06T15:12:22Z

src/htm/algorithms/SpatialPooler.cpp

-  }
-
-  overlapDutyCycles_.assign(numColumns_, 0);
+  overlapDutyCycles_.assign(numColumns_, 0); //TODO make all these sparse or rm to reduce footprint


I think we can get rid of overlapDutyCycles_ altogether. IIRC it is a boosting method? We have better/stronger boosting methods. I don't think that this even ever kicks in? But that needs to be checked before removing it, an issue for a different PR

I think it's used for something...bumpupWeakSegments or so.. I have waiting another PR that puts all of boosting code into 2 methods: one for boosting overlaps, other for weak columns (and shows the "weak columns" is not really useful, on the limited mnist, hotgym tests) But yes, another PR

breznak · 2019-07-06T16:20:35Z

Is this good to go?

breznak · 2019-07-06T16:50:37Z

Thank you both for good reviews and help with the PR!

breznak added 5 commits July 1, 2019 19:10

Connections: limit SegmentIdx, SynapseIdx to char

e66928c

this is for memory footprint, using unsigned char (upto 255) instead of uint_16_t

SP: remove tieBreakers

493ad24

as those are not needed, and make memory & CPU footprint

Hotgym: change seed so that TM (already wrong params) produces some!

0bd4905

output at least some, not empty

fixed test to reflect removed tieBreakers

b918db2

also chaged Encoder gold, because in Hotgym I needed to change seed to get reasonable (other problem) representation from TM

comments

091fb3c

breznak added ready code code enhancement, optimization, cleanup..programmer stuff labels Jul 3, 2019

breznak requested review from dkeeney and ctrl-z-9000-times July 3, 2019 05:07

breznak self-assigned this Jul 3, 2019

breznak commented Jul 3, 2019

View reviewed changes

breznak closed this Jul 3, 2019

breznak reopened this Jul 3, 2019

breznak mentioned this pull request Jul 3, 2019

TM Connections integration #537

Merged

3 tasks

breznak added 2 commits July 3, 2019 15:57

Fix segfault in test, thanks @dkeeney!

cbea67f

VectorHelpers:sparseToBinary add check for out of bounds

2d1b709

dkeeney previously approved these changes Jul 3, 2019

View reviewed changes

breznak added 2 commits July 4, 2019 19:50

Merge branch 'master_community' into mem_footprint

d113aff

Connections: revert back to UInt16 for SynapseIdx, SegmentIdx

869dbe3

as there may be cases where 255 would be too small. UInt16 actually performs faster.

breznak dismissed dkeeney’s stale review via 869dbe3 July 4, 2019 18:16

ctrl-z-9000-times reviewed Jul 6, 2019

View reviewed changes

ctrl-z-9000-times approved these changes Jul 6, 2019

View reviewed changes

breznak merged commit eb00bfc into master Jul 6, 2019

breznak deleted the mem_footprint branch July 6, 2019 16:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory footprint #534

Memory footprint #534

breznak commented Jul 3, 2019 •

edited

Loading

breznak Jul 3, 2019

breznak Jul 3, 2019

breznak Jul 3, 2019

breznak Jul 4, 2019

breznak Jul 3, 2019

breznak commented Jul 3, 2019

breznak commented Jul 3, 2019

dkeeney commented Jul 3, 2019

dkeeney commented Jul 3, 2019

breznak commented Jul 3, 2019

dkeeney left a comment

dkeeney Jul 3, 2019

dkeeney Jul 3, 2019

breznak Jul 3, 2019

breznak Jul 3, 2019

dkeeney Jul 3, 2019

dkeeney Jul 3, 2019

dkeeney Jul 3, 2019

ctrl-z-9000-times commented Jul 3, 2019

dkeeney commented Jul 3, 2019

breznak commented Jul 3, 2019

ctrl-z-9000-times commented Jul 3, 2019

breznak commented Jul 4, 2019

ctrl-z-9000-times Jul 6, 2019

breznak Jul 6, 2019

breznak commented Jul 6, 2019

breznak commented Jul 6, 2019

Memory footprint #534

Memory footprint #534

Conversation

breznak commented Jul 3, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

breznak commented Jul 3, 2019

breznak commented Jul 3, 2019

dkeeney commented Jul 3, 2019

dkeeney commented Jul 3, 2019

breznak commented Jul 3, 2019

dkeeney left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ctrl-z-9000-times commented Jul 3, 2019

dkeeney commented Jul 3, 2019

breznak commented Jul 3, 2019

ctrl-z-9000-times commented Jul 3, 2019

breznak commented Jul 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

breznak commented Jul 6, 2019

breznak commented Jul 6, 2019

breznak commented Jul 3, 2019 •

edited

Loading