Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory footprint #534

Merged
merged 9 commits into from
Jul 6, 2019
Merged

Memory footprint #534

merged 9 commits into from
Jul 6, 2019

Conversation

breznak
Copy link
Member

@breznak breznak commented Jul 3, 2019

  • Connections use unsigned char for SynapseIdx, SegmentIdx
    • this limits synapses per segment, segments per cells to 255; which sounds reasonable
  • SP remove tieBreaker vector
    • and fixing resulting exact output checks

For #360

breznak added 5 commits July 1, 2019 19:10
this is for memory footprint, using unsigned char (upto 255) instead of
uint_16_t
as those are not needed, and make memory & CPU footprint
also chaged Encoder gold, because in Hotgym I needed to change seed
to get reasonable (other problem) representation from TM
@breznak breznak added ready code code enhancement, optimization, cleanup..programmer stuff labels Jul 3, 2019
@breznak breznak self-assigned this Jul 3, 2019
@@ -166,29 +166,29 @@ Real64 BenchmarkHotgym::run(UInt EPOCHS, bool useSPlocal, bool useSPglobal, bool
// check deterministic SP, TM output
SDR goldEnc({DIM_INPUT});
const SDR_sparse_t deterministicEnc{
0, 4, 13, 21, 24, 30, 32, 37, 40, 46, 47, 48, 50, 51, 64, 68, 79, 81, 89, 97, 99, 114, 120, 135, 136, 140, 141, 143, 144, 147, 151, 155, 161, 162, 164, 165, 169, 172, 174, 179, 181, 192, 201, 204, 205, 210, 213, 226, 237, 242, 247, 249, 254, 255, 262, 268, 271, 282, 283, 295, 302, 306, 307, 317, 330, 349, 353, 366, 368, 380, 393, 399, 404, 409, 410, 420, 422, 441,446, 447, 456, 458, 464, 468, 476, 497, 499, 512, 521, 528, 531, 534, 538, 539, 541, 545, 550, 557, 562, 565, 575, 581, 589, 592, 599, 613, 617, 622, 647, 652, 686, 687, 691, 699, 704, 710, 713, 716, 722, 729, 736, 740, 747, 749, 753, 754, 758, 766, 778, 790, 791, 797, 800, 808, 809, 812, 815, 826, 828, 830, 837, 838, 852, 853, 856, 863, 864, 873, 878, 882, 885, 893, 894, 895, 905, 906, 914, 915, 920, 924, 927, 937, 939, 944, 947, 951, 954, 956, 967, 968, 969, 973, 975, 976, 981, 991, 998
0, 4, 13, 21, 24, 30, 32, 37, 40, 46, 47, 48, 50, 51, 64, 68, 79, 81, 89, 97, 99, 114, 120, 135, 136, 140, 141, 143, 144, 147, 151, 155, 161, 162, 164, 165, 169, 172, 174, 179, 181, 192, 201, 204, 205, 210, 213, 226, 227, 237, 242, 247, 249, 254, 255, 262, 268, 271, 282, 283, 295, 302, 306, 307, 317, 330, 349, 353, 366, 380, 383, 393, 404, 409, 410, 420, 422, 441,446, 447, 456, 458, 464, 468, 476, 497, 499, 512, 521, 528, 531, 534, 538, 539, 541, 545, 550, 557, 562, 565, 575, 581, 589, 592, 599, 613, 617, 622, 647, 652, 686, 687, 691, 699, 704, 710, 713, 716, 722, 729, 736, 740, 747, 749, 753, 754, 758, 766, 778, 790, 791, 797, 800, 808, 809, 812, 815, 826, 828, 830, 837, 852, 853, 856, 863, 864, 873, 878, 882, 885, 893, 894, 895, 905, 906, 914, 915, 920, 924, 927, 937, 939, 944, 947, 951, 954, 956, 967, 968, 969, 973, 975, 976, 979, 981, 991, 998
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encoder changed, because I had to change seed; otherwise TM gave no output in this configuration.
Proper fix queued.

@@ -199,7 +199,7 @@ void Connections::destroySegment(Segment segment) {
const auto segmentOnCell =
std::lower_bound(cellData.segments.begin(), cellData.segments.end(),
segment, [&](Segment a, Segment b) {
return segmentOrdinals_[a] < segmentOrdinals_[b];
return segmentOrdinals_[a] < segmentOrdinals_[b]; //TODO will this be slow if ordinals moved to SegmentData?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? My idea is to get rid of the *Ordinals_ arrays in Conn, and keep that number in SegmentData

using SegmentIdx= UInt16; /** Index of segment in cell. */
using SynapseIdx= UInt16; /** Index of synapse in segment. */ //TODO profile to use better (smaller?) types
using SegmentIdx= unsigned char; /** Index of segment in cell. */
using SynapseIdx= unsigned char; /** Index of synapse in segment. */
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new type, only 1B!. If should overflow we have check to inform

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for calling me out on this! Reverted back to UInt16, as it's more usable for many potential synapses. Also the UInt16 actually runs faster.

src/htm/algorithms/SpatialPooler.cpp Show resolved Hide resolved
// Add a tiebreaker to the overlaps so that the output is deterministic.
vector<Real> overlaps_(overlaps.begin(), overlaps.end());
for(UInt i = 0; i < numColumns_; i++)
overlaps_[i] += tieBreaker_[i];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed tieBreakers.

  • saves memory
  • makes inhibition faster

@breznak
Copy link
Member Author

breznak commented Jul 3, 2019

[ RUN ] TMRegionTest.testLinking
In VectorFileSensor TestOutputDir/TMRegionTestInput.csv
Reading CSV file
Read 10 vectors
unknown file: error: SEH exception with code 0xc0000005 thrown in the test body.

some weird error on Windows CI, restarting in hope it's just random

@breznak breznak closed this Jul 3, 2019
@breznak breznak reopened this Jul 3, 2019
@breznak
Copy link
Member Author

breznak commented Jul 3, 2019

@dkeeney if you have time, could you please test this on a local Windows machine, please? THe PR keeps throwing a weird err on Windows CI or crashing the CI build, while the PR is relatively easy and I don't see a reason why this should happen.

@breznak breznak mentioned this pull request Jul 3, 2019
3 tasks
@dkeeney
Copy link

dkeeney commented Jul 3, 2019

could you please test this on a local Windows machine, please?

Sure, I will try it out.

@dkeeney
Copy link

dkeeney commented Jul 3, 2019

Ok, here is the problem: in TMRegionTest.cpp line 292

  std::vector<Byte> expected3out = VectorHelpers::sparseToBinary<Byte>(
            {
	     0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 30, 
	     31, 32, 33, 34, 35, 36, 37, 38, 39, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 70,
	     71, 72, 73, 74, 85, 86, 87, 88, 890, 1, 2, 3, 4, 10, 11, 12, 13, 14, 15, 16, 17,
	     18, 19, 20, 21, 22, 23, 24, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 50, 51, 52, 53, 
	     54, 55, 56, 57, 58, 59, 70, 71, 72, 73, 74, 85, 86, 87, 88, 89 
	     }, (UInt32)r3OutputArray.getCount());

The r3OutputArray size is 100.
There is a typo in the middle of the initialization string.
The value 890 should be 89.

This sparseToBinary( ) function probably should have checked for index-out-of-range at least in debug mode.

I will let you make the change.

@breznak
Copy link
Member Author

breznak commented Jul 3, 2019

The value 890 should be 89.

wow, you have an eye of a bug hunter :) Thank you!

Is the SEH kind of segfault error on windows? I was suprised I didn't find anything like that.

Also strange it passes (?) of Unixes.

I had a PR getting rid of SparseToBinary in favour of SDR, should revisit it once things clear up

dkeeney
dkeeney previously approved these changes Jul 3, 2019
Copy link

@dkeeney dkeeney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@@ -64,7 +64,7 @@ Real64 BenchmarkHotgym::run(UInt EPOCHS, bool useSPlocal, bool useSPglobal, bool
SpatialPooler spLocal(enc.dimensions, vector<UInt>{COLS}); // Spatial pooler with local inh
spGlobal.setGlobalInhibition(true);
spLocal.setGlobalInhibition(false);
Random rnd(1); //uses fixed seed for deterministic output checks
Random rnd(42); //uses fixed seed for deterministic output checks
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, back to 42 ... "the meaning of life" 👍

@@ -287,7 +287,6 @@ class SpatialPooler : public Serializable
ar(CEREAL_NVP(overlapDutyCycles_));
ar(CEREAL_NVP(activeDutyCycles_));
ar(CEREAL_NVP(minOverlapDutyCycles_));
ar(CEREAL_NVP(tieBreaker_));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since tieBreaker is deterministic it does not need to be saved so this is correct. But are you sure it is regenerated after the restore? If it should be different then answers will eventually drift off of the correct set on subsequent iterations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, tie breakers are totally removed and not used, as they did not have 100% the effect we expected. And you've obsoleted them in the long-dragging Deterministic builds PR 👍 where you figured we need to add the a > b to the segment ordering algorithm.

This is because tie breakers sometimes work and break ties:
{1,1} + {0, 0.01} -> L < R

but sometimes they just break (we apply tie breaks flat out to all, for performance reasons)
{1, 0.99} + {0, 0.01} -> L ? R broken

TL;DR this PR removes tieBreakers altogether because those are not needed (we do deterministic sort)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the removal #534 (comment)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, somehow I missed the fact that they were removed. I like that you did that.

@@ -26,6 +26,7 @@ class VectorHelpers
{
std::vector<T> binary(width);
for (auto sparseIdx: sparseVector) {
NTA_ASSERT(sparseIdx < binary.size()) << "attemping to insert out of bounds element! " << sparseIdx;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 30,
31, 32, 33, 34, 35, 36, 37, 38, 39, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 70,
71, 72, 73, 74, 85, 86, 87, 88, 89
}, (UInt32)r3OutputArray.getCount());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, you corrected the paste error :)

@ctrl-z-9000-times
Copy link
Collaborator

I would really like to see some measurements of the run-time & memory usage of an HTM before and after these changes. Tests using real world data would be even better. Optimizing code without measuring the results is haphazard.

@dkeeney
Copy link

dkeeney commented Jul 3, 2019

Is the SEH kind of segfault error on windows?

There are two exception facilities on windows, SEH is the one that handles most hardware faults. We use the other one EHsc ...the one that handles only std::exceptions per the standard. C++ cannot use both facilities at the same time so hardware faults (like index out of range and divide by 0) will cause crashes.

I suspect the reason Linux did not see this error is that the offset just happened to fall within some other accessible data space...would have caused a really hard to find bug if it hit something important.

@breznak
Copy link
Member Author

breznak commented Jul 3, 2019

I would really like to see some measurements of the run-time & memory usage of an HTM before and after these changes. Tests using real world data would be even better.

  • tie breakers are now useless and should just go. Saves a bit on computation and mem.

  • 255 synapses per seg, segments per cell:
    well, my ideal was playing with very large models. Also I thought less storage -> more fit in cpu cache.
    I have never seen a real world model here that would exceed the 255 limit, so I thought why not.
    Real world impact is quite minor (2B per segment).

Figure 6 illustrates the performance of a network of HTM neurons implementing a high-order sequence memory. The network used in Figure 6 consists of 2048 mini-columns with 32 neurons per mini-column. Each neuron has 128 basal dendritic segments, and each dendritic segment has up to 40 actual synapses

https://www.frontiersin.org/articles/10.3389/fncir.2016.00023/full

To sum up, I think we can back up the "to char" change. If we never need it anyway, we could use the smaller version, if there is a possible scenario where 1000s synapses per segment would be useful, let's return to uint16.

SynapseData is a different beast, which is much more populous, #360

@ctrl-z-9000-times
Copy link
Collaborator

if there is a possible scenario where 1000s synapses per segment would be useful, let's return to uint16.

Segments should not have 1000s of synapses, but they could have 1000s of potential synapses.

This is only an issue for the spatial pooler which makes synapses for the whole potential pool. The temporal memory makes the synapses as needed.

breznak added 2 commits July 4, 2019 19:50
as there may be cases where 255 would be too small.
UInt16 actually performs faster.
@breznak
Copy link
Member Author

breznak commented Jul 4, 2019

Reverted back to UInt16. This is now only the tieBreaker removal PR. Please rereview

}

overlapDutyCycles_.assign(numColumns_, 0);
overlapDutyCycles_.assign(numColumns_, 0); //TODO make all these sparse or rm to reduce footprint
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get rid of overlapDutyCycles_ altogether. IIRC it is a boosting method? We have better/stronger boosting methods. I don't think that this even ever kicks in? But that needs to be checked before removing it, an issue for a different PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's used for something...bumpupWeakSegments or so.. I have waiting another PR that puts all of boosting code into 2 methods: one for boosting overlaps, other for weak columns (and shows the "weak columns" is not really useful, on the limited mnist, hotgym tests) But yes, another PR

@breznak
Copy link
Member Author

breznak commented Jul 6, 2019

Is this good to go?

@breznak breznak merged commit eb00bfc into master Jul 6, 2019
@breznak breznak deleted the mem_footprint branch July 6, 2019 16:50
@breznak
Copy link
Member Author

breznak commented Jul 6, 2019

Thank you both for good reviews and help with the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code code enhancement, optimization, cleanup..programmer stuff ready
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants