Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OverflowError: Python int too large to convert to C long #321

Closed
luopuya opened this issue Apr 10, 2015 · 15 comments
Closed

OverflowError: Python int too large to convert to C long #321

luopuya opened this issue Apr 10, 2015 · 15 comments

Comments

@luopuya
Copy link

luopuya commented Apr 10, 2015

word, weights = parts[0], map(REAL, parts[1:])
# ...
result.syn0[line_no] = weights

map return an iterator instead of a list in Python 3
seems to wrap in list call: list(map(REAL, parts[1:]))

@piskvorky
Copy link
Owner

Yes, it is wrapped in list call. This was fixed in this commit: 2562d00

Or is the issue something else? I'm not sure I understood your report.

@luopuya
Copy link
Author

luopuya commented Apr 10, 2015

Oh, I use the master branch.
I got another mistake with it.

At line 521 in reset_weights

def reset_weights(self):
    # ...
    random.seed(uint32(self.hashfxn(self.index2word[i] + str(self.seed))))

hash (built-in function) in 64 bit Python return an 64 bit integer, and rasie OverflowError: Python int too large to convert to C long

@luopuya
Copy link
Author

luopuya commented Apr 10, 2015

[In doc2vec]
Is it sentence (len < 5) would not produce an vector?
Thanks for your reply

@gojomo
Copy link
Collaborator

gojomo commented Jun 11, 2015

re: "OverflowError: Python int too large to convert to C long"

Are you still getting that? What's your OS & python version?

re: "sentence (len < 5)"

See #325 & #346 – known problem with min_count affecting short docs in Doc2Vec; fix is on its way.

@piskvorky piskvorky changed the title load_word2vec_format ( gensim.models.word2vec.py ) in Python 3 OverflowError: Python int too large to convert to C long Jul 1, 2015
@piskvorky
Copy link
Owner

@luopuya is this relevant? Do you still get the "OverflowError: Python int too large to convert to C long" in reset_weights?

@luopuya luopuya closed this as completed Jul 5, 2015
@luopuya
Copy link
Author

luopuya commented Jul 5, 2015

I fix it by setting my hashfxn in construct function

def myhashfxn(obj):
return hash(obj) % (2 ** 32)

Word2Vec(hashfxn=myhashfxn)

@luopuya luopuya reopened this Jul 5, 2015
@gojomo
Copy link
Collaborator

gojomo commented Jul 11, 2015

I'm looking at related 32-/64-bit issues, and would like to understand what's happening here... but I can only trigger a ValueError (not an OverflowError) when passing large values into random.seed(), and further, the call to uint32() should (and in my testing does) reliably mask the value into the acceptable range. (Essentially, it's already doing what myhashfxn() does.)

Is there a chance that you were getting the error from some older code (that doesn't match the "line 521" excerpt you quoted)? Or maybe you're using some older Numpy version of numpy.uint32() that would raise the OverflowError?

If your hashfxn-workaround still seems necessary, it'd be good to know the versions of Python, gensim, and Numpy in play.

@luopuya
Copy link
Author

luopuya commented Jul 17, 2015

Sorry for late reply. It is old version
Anaconda use conda to manage packages, conda's latest gensim version is 0.10.3.

@eyaler
Copy link

eyaler commented Jul 30, 2015

I am getting this error as well on:
python 3.4.3
anaconda 2.3.0 (64-bit)
numpy 1.9.2
gensim 0.10.3
code from: rare-technologies.com/deep-learning-with-word2vec-and-gensim/
the hashfxn workaround works

@eyaler
Copy link

eyaler commented Jul 30, 2015

@GraphGrailAi
Copy link

@eyaler I confirm the same error:
Winpython 3.4.3.2 64 bit
numpy 1.10b
scipy 0.15
gensim 0.12.1

@gojomo
Copy link
Collaborator

gojomo commented Aug 14, 2015

@GraphGrail – are you sure it's gensim 0.12.1 that's being run when you get the error, and not some older version? (Can you share the exact error message, with reported line numbers?)

(If this is happening, then maybe the Windows Python + numpy that people are using, that claims to be 64-bit, may not really be 64-bit? Another user having a similar problem suggested there is no true numpy support for 64-bit Windows... and even though I'd usually suggest 64-bit Python, they reported fewer problems, on Windows, after they switched to 32 bit.)

@tmylk tmylk mentioned this issue Nov 5, 2015
2 tasks
@tmylk
Copy link
Contributor

tmylk commented Jan 10, 2016

@GraphGrail Could you create a PR with a test that triggers this error? We now have continuous Win 32 and 64 builds.
Otherwise the issue will be closed in 7 days

@maribethrc
Copy link

I am still getting the same error as @luopuya with the following versions:
Genism 3.4.0
Python 3.6
Numpy 1.15.4

@luopuya where did you put your hashfxn? somewhere inside word2vec.py?

@gojomo
Copy link
Collaborator

gojomo commented Nov 28, 2018

@maribethrc What OS, and is there a chance you're running a 32-bit Python, or an inadvertently mixed set of 32/64-bit versions of Python and 3rd-party libraries? (How did you install Python, and how did you install required libraries like Numpy?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants