Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OverflowError when loading a large term-document matrix in MatrixMarket format. Fix #1998 #2001

Merged
merged 4 commits into from
Mar 28, 2018

Conversation

arlenk
Copy link
Contributor

@arlenk arlenk commented Mar 27, 2018

cython mmreader was (incorrectly) assuming that num_docs, num_terms, etc. would fit in a c int.

This version uses long longs for all related types.

Related issue from @KartikTaskhuman: #1998

@menshikh-iv menshikh-iv changed the title Bug-fix #1998: OverflowError when loading a large term-document matrix in Matrix Market format Fix OverflowError when loading a large term-document matrix in MatrixMarket format. Fix #1998 Mar 27, 2018
@menshikh-iv
Copy link
Contributor

menshikh-iv commented Mar 27, 2018

@arlenk please have a look to error - https://ci.appveyor.com/project/piskvorky/gensim-431bq/build/job/87sjiebimbr726ya#L306 (looks like a problem with python2 and L postfix for long long values)

Descriptions of reason (python2)

>>> isinstance(2, int)
True
>>> isinstance(2L, int)
False

@arlenk
Copy link
Contributor Author

arlenk commented Mar 28, 2018

@menshikh-iv

@arlenk please have a look to error - https://ci.appveyor.com/project/piskvorky/gensim-431bq/build/job/87sjiebimbr726ya#L306 (looks like a problem with python2 and L postfix for long long values)

unfortunately I can't seem to replicate this on osx, but I think you are right, the problem is that python 2 still treats ints and longs differently. I updated the tests to use isinstance(val, numbers.Integral) instead of isinstance(val, int). It seems like most other tests were already using number.Integral instead of int already.

Hopefully this will fix the issue on the windows build.

@menshikh-iv
Copy link
Contributor

Thanks for fast fix @arlenk 👍

@menshikh-iv menshikh-iv merged commit 10a3dab into piskvorky:develop Mar 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants