-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory footprint #534
Memory footprint #534
Conversation
this is for memory footprint, using unsigned char (upto 255) instead of uint_16_t
as those are not needed, and make memory & CPU footprint
output at least some, not empty
also chaged Encoder gold, because in Hotgym I needed to change seed to get reasonable (other problem) representation from TM
@@ -166,29 +166,29 @@ Real64 BenchmarkHotgym::run(UInt EPOCHS, bool useSPlocal, bool useSPglobal, bool | |||
// check deterministic SP, TM output | |||
SDR goldEnc({DIM_INPUT}); | |||
const SDR_sparse_t deterministicEnc{ | |||
0, 4, 13, 21, 24, 30, 32, 37, 40, 46, 47, 48, 50, 51, 64, 68, 79, 81, 89, 97, 99, 114, 120, 135, 136, 140, 141, 143, 144, 147, 151, 155, 161, 162, 164, 165, 169, 172, 174, 179, 181, 192, 201, 204, 205, 210, 213, 226, 237, 242, 247, 249, 254, 255, 262, 268, 271, 282, 283, 295, 302, 306, 307, 317, 330, 349, 353, 366, 368, 380, 393, 399, 404, 409, 410, 420, 422, 441,446, 447, 456, 458, 464, 468, 476, 497, 499, 512, 521, 528, 531, 534, 538, 539, 541, 545, 550, 557, 562, 565, 575, 581, 589, 592, 599, 613, 617, 622, 647, 652, 686, 687, 691, 699, 704, 710, 713, 716, 722, 729, 736, 740, 747, 749, 753, 754, 758, 766, 778, 790, 791, 797, 800, 808, 809, 812, 815, 826, 828, 830, 837, 838, 852, 853, 856, 863, 864, 873, 878, 882, 885, 893, 894, 895, 905, 906, 914, 915, 920, 924, 927, 937, 939, 944, 947, 951, 954, 956, 967, 968, 969, 973, 975, 976, 981, 991, 998 | |||
0, 4, 13, 21, 24, 30, 32, 37, 40, 46, 47, 48, 50, 51, 64, 68, 79, 81, 89, 97, 99, 114, 120, 135, 136, 140, 141, 143, 144, 147, 151, 155, 161, 162, 164, 165, 169, 172, 174, 179, 181, 192, 201, 204, 205, 210, 213, 226, 227, 237, 242, 247, 249, 254, 255, 262, 268, 271, 282, 283, 295, 302, 306, 307, 317, 330, 349, 353, 366, 380, 383, 393, 404, 409, 410, 420, 422, 441,446, 447, 456, 458, 464, 468, 476, 497, 499, 512, 521, 528, 531, 534, 538, 539, 541, 545, 550, 557, 562, 565, 575, 581, 589, 592, 599, 613, 617, 622, 647, 652, 686, 687, 691, 699, 704, 710, 713, 716, 722, 729, 736, 740, 747, 749, 753, 754, 758, 766, 778, 790, 791, 797, 800, 808, 809, 812, 815, 826, 828, 830, 837, 852, 853, 856, 863, 864, 873, 878, 882, 885, 893, 894, 895, 905, 906, 914, 915, 920, 924, 927, 937, 939, 944, 947, 951, 954, 956, 967, 968, 969, 973, 975, 976, 979, 981, 991, 998 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
encoder changed, because I had to change seed; otherwise TM gave no output in this configuration.
Proper fix queued.
@@ -199,7 +199,7 @@ void Connections::destroySegment(Segment segment) { | |||
const auto segmentOnCell = | |||
std::lower_bound(cellData.segments.begin(), cellData.segments.end(), | |||
segment, [&](Segment a, Segment b) { | |||
return segmentOrdinals_[a] < segmentOrdinals_[b]; | |||
return segmentOrdinals_[a] < segmentOrdinals_[b]; //TODO will this be slow if ordinals moved to SegmentData? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? My idea is to get rid of the *Ordinals_ arrays in Conn, and keep that number in SegmentData
src/htm/algorithms/Connections.hpp
Outdated
using SegmentIdx= UInt16; /** Index of segment in cell. */ | ||
using SynapseIdx= UInt16; /** Index of synapse in segment. */ //TODO profile to use better (smaller?) types | ||
using SegmentIdx= unsigned char; /** Index of segment in cell. */ | ||
using SynapseIdx= unsigned char; /** Index of synapse in segment. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the new type, only 1B!. If should overflow we have check to inform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for calling me out on this! Reverted back to UInt16, as it's more usable for many potential synapses. Also the UInt16 actually runs faster.
// Add a tiebreaker to the overlaps so that the output is deterministic. | ||
vector<Real> overlaps_(overlaps.begin(), overlaps.end()); | ||
for(UInt i = 0; i < numColumns_; i++) | ||
overlaps_[i] += tieBreaker_[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed tieBreakers.
- saves memory
- makes inhibition faster
some weird error on Windows CI, restarting in hope it's just random |
@dkeeney if you have time, could you please test this on a local Windows machine, please? THe PR keeps throwing a weird err on Windows CI or crashing the CI build, while the PR is relatively easy and I don't see a reason why this should happen. |
Sure, I will try it out. |
Ok, here is the problem: in
The r3OutputArray size is 100. This sparseToBinary( ) function probably should have checked for index-out-of-range at least in debug mode. I will let you make the change. |
wow, you have an eye of a bug hunter :) Thank you! Is the SEH kind of segfault error on windows? I was suprised I didn't find anything like that. Also strange it passes (?) of Unixes. I had a PR getting rid of SparseToBinary in favour of SDR, should revisit it once things clear up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
@@ -64,7 +64,7 @@ Real64 BenchmarkHotgym::run(UInt EPOCHS, bool useSPlocal, bool useSPglobal, bool | |||
SpatialPooler spLocal(enc.dimensions, vector<UInt>{COLS}); // Spatial pooler with local inh | |||
spGlobal.setGlobalInhibition(true); | |||
spLocal.setGlobalInhibition(false); | |||
Random rnd(1); //uses fixed seed for deterministic output checks | |||
Random rnd(42); //uses fixed seed for deterministic output checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, back to 42 ... "the meaning of life" 👍
@@ -287,7 +287,6 @@ class SpatialPooler : public Serializable | |||
ar(CEREAL_NVP(overlapDutyCycles_)); | |||
ar(CEREAL_NVP(activeDutyCycles_)); | |||
ar(CEREAL_NVP(minOverlapDutyCycles_)); | |||
ar(CEREAL_NVP(tieBreaker_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since tieBreaker is deterministic it does not need to be saved so this is correct. But are you sure it is regenerated after the restore? If it should be different then answers will eventually drift off of the correct set on subsequent iterations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, tie breakers are totally removed and not used, as they did not have 100% the effect we expected. And you've obsoleted them in the long-dragging Deterministic builds PR 👍 where you figured we need to add the a > b
to the segment ordering algorithm.
This is because tie breakers sometimes work and break ties:
{1,1} + {0, 0.01} -> L < R
but sometimes they just break (we apply tie breaks flat out to all, for performance reasons)
{1, 0.99} + {0, 0.01} -> L ? R broken
TL;DR this PR removes tieBreakers altogether because those are not needed (we do deterministic sort)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the removal #534 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, somehow I missed the fact that they were removed. I like that you did that.
@@ -26,6 +26,7 @@ class VectorHelpers | |||
{ | |||
std::vector<T> binary(width); | |||
for (auto sparseIdx: sparseVector) { | |||
NTA_ASSERT(sparseIdx < binary.size()) << "attemping to insert out of bounds element! " << sparseIdx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 30, | ||
31, 32, 33, 34, 35, 36, 37, 38, 39, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 70, | ||
71, 72, 73, 74, 85, 86, 87, 88, 89 | ||
}, (UInt32)r3OutputArray.getCount()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, you corrected the paste error :)
I would really like to see some measurements of the run-time & memory usage of an HTM before and after these changes. Tests using real world data would be even better. Optimizing code without measuring the results is haphazard. |
There are two exception facilities on windows, SEH is the one that handles most hardware faults. We use the other one EHsc ...the one that handles only std::exceptions per the standard. C++ cannot use both facilities at the same time so hardware faults (like index out of range and divide by 0) will cause crashes. I suspect the reason Linux did not see this error is that the offset just happened to fall within some other accessible data space...would have caused a really hard to find bug if it hit something important. |
https://www.frontiersin.org/articles/10.3389/fncir.2016.00023/full To sum up, I think we can back up the "to char" change. If we never need it anyway, we could use the smaller version, if there is a possible scenario where 1000s synapses per segment would be useful, let's return to uint16. SynapseData is a different beast, which is much more populous, #360 |
Segments should not have 1000s of synapses, but they could have 1000s of potential synapses. This is only an issue for the spatial pooler which makes synapses for the whole potential pool. The temporal memory makes the synapses as needed. |
as there may be cases where 255 would be too small. UInt16 actually performs faster.
Reverted back to UInt16. This is now only the tieBreaker removal PR. Please rereview |
} | ||
|
||
overlapDutyCycles_.assign(numColumns_, 0); | ||
overlapDutyCycles_.assign(numColumns_, 0); //TODO make all these sparse or rm to reduce footprint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can get rid of overlapDutyCycles_
altogether. IIRC it is a boosting method? We have better/stronger boosting methods. I don't think that this even ever kicks in? But that needs to be checked before removing it, an issue for a different PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's used for something...bumpupWeakSegments or so.. I have waiting another PR that puts all of boosting code into 2 methods: one for boosting overlaps, other for weak columns (and shows the "weak columns" is not really useful, on the limited mnist, hotgym tests) But yes, another PR
Is this good to go? |
Thank you both for good reviews and help with the PR! |
unsigned char
for SynapseIdx, SegmentIdxFor #360