-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix phrases load, for backward compatibility #1758
Conversation
@menshikh-iv it's ready to validate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! What is the rationale behind introducing the new class (PhrasesTransformation
)?
gensim/models/phrases.py
Outdated
""" | ||
|
||
# for python 2 and 3 compatibility. basestring is used to check if model.scoring is a string | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gensim uses the six
library to bridge py2/py3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, note that it was from a previous commit…
gensim/models/phrases.py
Outdated
raise ValueError( | ||
'failed to load %s model with unknown scoring setting %s' % | ||
(cls.__name__, model.scoring)) | ||
# if there is non common_terms attribute, inizialize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo - initialize
Phrases and Phraser needs the same |
@piskvorky fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fast fix @alexgarel 🔥, the last fixes remain :)
gensim/test/test_phrases.py
Outdated
|
||
with temporary_file("test.pkl") as fpath: | ||
bigram = Phrases(self.sentences, min_count=1, threshold=1) | ||
del(bigram.common_terms) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better solution - checkout to the previous version, tran+save model and try to load old model here (in your current implementation, we possibly skip other bugs, that can be hidden due to the current bug)
FYI - del
isn't a function, this is operator -> no needed brackets.
gensim/models/phrases.py
Outdated
if hasattr(model, 'scoring'): | ||
if isinstance(model.scoring, six.string_types): | ||
if model.scoring == 'default': | ||
logger.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: we use 140 char limits for code, no need to split this call into several lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argh travis failed, it was 120 not 140 @menshikh-iv, you joker ! ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexgarel oh, sorry, my mistake (120 instead of 80, not 140)
gensim/test/test_phrases.py
Outdated
if sys.version_info[0] >= 3: | ||
unicode = str | ||
|
||
@contextlib.contextmanager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks useful, please move it to gensim.test.utils
https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/test/utils.py
… bug in loading phrases model before scoring). Also moving temporary_file context manager in gensim.test.utils
@menshikh-iv this should be ok. Your proposal to test load backward compatibility was right, for I extend it to scoring and found there was a bug in it. |
Big thanks for fast fix @alexgarel, I checked with 2.0.0, 1.0.1 - works fine! |
This should fix #1751.
Also it gives backward compatible load for Phraser.