Large Bin File Error #28

stanlivshin · 2015-08-04T17:56:36Z

DoubleBuffer vectors = ByteBuffer.allocateDirect(vocabSize * layerSize * 8).asDoubleBuffer();

this line was throwing error since the int multiplication vocabSize * layerSize * 8 > Integer.MAX_VALUE so negative number was passed into the method.

As a dirty fix i change it to the following:

DoubleBuffer vectors = DoubleBuffer.allocate(1000000000);

wko27 · 2015-08-04T19:29:58Z

Hi, do you mind opening a pull request?

I'd suggest a more proper fix as:

long bufferSize = vocabSize * layerSize * 8;
Preconditions.checkState(bufferSize <= Integer.MAX_INT, "Unable to allocate a buffer size of %s, vocab size is %s, layerSize is %s", bufferSize, vocabSize, layerSize);
DoubleBuffer vectors = DoubleBuffer.allocate(bufferSize);

jkinkead · 2015-08-22T15:41:16Z

I ran into this as well. Note that this will still only let you go as big as 16G worth of vectors, and you lose the memory mapping from calling allocateDirect. It might be better to shard the vectors into 1 or 2 G direct byte buffers, and let the model call in to the correct one.

@dirkgr FYI, side-effect of your efficiency fixes causes the max number of doubles to be 2^28 - 1, or about 250 million. Google's Google News vector file contains 3 million vectors of 300 entries, or 900 million doubles, and can't be loaded by this new code.

jkinkead · 2015-08-24T16:39:33Z

I'm going to look into a fix for this.

dirkgr · 2015-08-24T17:57:50Z

Thanks for looking at it. Let me know if you want me to contribute in some way. The limit is the number of doubles you can put into a DoubleBuffer, right? Because Java can't map more than 2GB of memory at a time?

jkinkead · 2015-08-24T18:31:25Z

I don't know if Java can't, but the API for ByteBuffer only accepts an int . . . so obviously you're capped at Integer.MAX_VALUE for what you can build.

jkinkead · 2015-08-24T22:21:48Z

See PR #29 @wko27

scobrown · 2016-06-17T23:53:36Z

Seems like this would benefit from using nd4j, if nothing else you could use their DoubleBuffer which supports longs for the length
https://github.com/deeplearning4j/nd4j/blob/master/nd4j-buffer/src/main/java/org/nd4j/linalg/api/buffer/BaseDataBuffer.java

If there is interest, I could maybe try it out and submit a pull request. Not sure how you feel about adding that dependency

jkinkead mentioned this issue Aug 24, 2015

Split large model files across multiple buffers. #29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large Bin File Error #28

Large Bin File Error #28

stanlivshin commented Aug 4, 2015

wko27 commented Aug 4, 2015

jkinkead commented Aug 22, 2015

jkinkead commented Aug 24, 2015

dirkgr commented Aug 24, 2015

jkinkead commented Aug 24, 2015

jkinkead commented Aug 24, 2015

scobrown commented Jun 17, 2016

Large Bin File Error #28

Large Bin File Error #28

Comments

stanlivshin commented Aug 4, 2015

wko27 commented Aug 4, 2015

jkinkead commented Aug 22, 2015

jkinkead commented Aug 24, 2015

dirkgr commented Aug 24, 2015

jkinkead commented Aug 24, 2015

jkinkead commented Aug 24, 2015

scobrown commented Jun 17, 2016