Binary self organizing map #53

fsicre · 2020-04-22T23:08:40Z

Hi,

First I want to thank you for this awesome library.

I'm experimenting on binary NLP technics and I was wondering if you could publish a bitmagic version of SOM (self organizing map) using about ~ 10 millions of extremely sparse (< 5 / 100 000) binary vectors of same size ~ 1Mbits. The map could be for instance 128 x 128 locations.

It could be a huge step froward for me to see a correct use of bitmagic on this particular subject.

Thanks for your help.

tlk00 · 2020-05-13T16:05:46Z

Are you asking for an example on how to use BM?
Could you provide more info on the case, I am not sure I understand the 128x128 locations problem really well.

Thank you!

fsieduc · 2020-05-21T14:55:32Z

Are you asking for an example on how to use BM?

yes, exactly, on the use case of a Self Organizing Map of binary vectors.

Usually, SOM are used on integers or float vectors as you can see on these to pages :
https://en.wikipedia.org/wiki/Self-organizing_map
https://www.google.com/search?q=self+organizing+map&tbm=isch

In my case I need to do this kind of classification on sparse binary vectors.

BitMagic seems to be a very very good candidate for that.

The naïve SOM algorithm is very simple :

build a 2D matrix of empty or random binary vectors (preferably with same sparsity than the vectors to classify)
iterate while the map is not stabilized
- pick a vector V to classify (for e.g. randomly in the ~ 10 millions)
- find the best location on the map i.e. the location which has the most similar vector to V, call it the Best Matching Unit (BMU)
- hack the BMU in order to make it even more similar to the vector V
- hack the BMU's neighbourhood's vectors in order to "move" them in the same direction of the BMU but with less effect (the farther away of the BMU, the lesser you "move" the neighbour)

The map will progressively evolved and finally each vector to classify will have a best location identified and all similar vectors in the ~ 10 millions will also be associated to locations nearby on the map.

Now imaging you have ~ 10 millions of extremely sparse binary vectors (same size ~ 1Mbits) and you want to place them on a 2D map of 128 x 128 locations.
Hamming Distance can be used.
The map will be a 2D matrix of binary vectors of ~ 1Mbits.
The algorithm on the picture below will be used in order to "hack" the locations' vectors.

As you can see, a lot of vectors will be loaded in RAM, a lot of BV comparisons will be performed, a lot of hamming distance will be calculated, a lot of bit modification will be done, so the question is

how to correctly use BitMagic for best / maximum speed ? (32/64/128Go RAM available)

https://www.semanticscholar.org/paper/An-alternative-approach-for-binary-and-categorical-Santana-Morais/f1f079640218b72822911ef3dc7394765703d62a/figure/0
https://d3i71xaburhd42.cloudfront.net/f1f079640218b72822911ef3dc7394765703d62a/3-Figure1-1.png

tlk00 · 2020-05-22T12:43:44Z

Yes, i understand the problem.
The closest example publicly available now is:
https://github.com/tlk00/BitMagic/tree/master/samples/xsample07a

It creates binary fingerprints and implements primitive (but relatively fast) clusterization using similarity measure (COUNT_AND). Not quite well documented yet, but the code should be instructive as a use case for how to compute binary distances.

I am planning to create a separate example on how to do a more efficient distance computations for Hamming, Jacquard distances using cache locking and SIMD. Current distance compute is fast but can be even faster actually to facilitate binary SOMs.

fsieduc · 2020-05-22T18:27:40Z

Awesome, many thanks for the hint.

fsieduc · 2020-05-24T19:56:18Z

xsample07a is very interesting, although it's difficult to understand exactly why you've choosen some technics. never mind.

this project is a side project for me.

once I have written something (in few weeks) would you mind my showing to you ? in order to have your advices concrete bitmagic usage.

tlk00 · 2020-05-26T23:27:08Z

Surest thing. I would certainly appreciate your example, please share it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary self organizing map #53

Binary self organizing map #53

fsicre commented Apr 22, 2020

tlk00 commented May 13, 2020

fsieduc commented May 21, 2020

tlk00 commented May 22, 2020

fsieduc commented May 22, 2020

fsieduc commented May 24, 2020

tlk00 commented May 26, 2020

Binary self organizing map #53

Binary self organizing map #53

Comments

fsicre commented Apr 22, 2020

tlk00 commented May 13, 2020

fsieduc commented May 21, 2020

tlk00 commented May 22, 2020

fsieduc commented May 22, 2020

fsieduc commented May 24, 2020

tlk00 commented May 26, 2020