Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix phrases load, for backward compatibility #1758

Merged
merged 8 commits into from
Dec 6, 2017

Conversation

alexgarel
Copy link
Contributor

This should fix #1751.
Also it gives backward compatible load for Phraser.

@alexgarel
Copy link
Contributor Author

@menshikh-iv it's ready to validate.

Copy link
Owner

@piskvorky piskvorky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! What is the rationale behind introducing the new class (PhrasesTransformation)?

"""

# for python 2 and 3 compatibility. basestring is used to check if model.scoring is a string
try:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gensim uses the six library to bridge py2/py3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, note that it was from a previous commit…

raise ValueError(
'failed to load %s model with unknown scoring setting %s' %
(cls.__name__, model.scoring))
# if there is non common_terms attribute, inizialize
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo - initialize

@alexgarel
Copy link
Contributor Author

alexgarel commented Dec 4, 2017

What is the rationale behind introducing the new class (PhrasesTransformation)?

Phrases and Phraser needs the same load tweaks to preserve backward compatibility, and I wanted to avoid copy/pasting same code in both classes (Phraser backward compatible load was not yet supported, but needs to).

@alexgarel
Copy link
Contributor Author

@piskvorky fixed.

Copy link
Contributor

@menshikh-iv menshikh-iv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fast fix @alexgarel 🔥, the last fixes remain :)


with temporary_file("test.pkl") as fpath:
bigram = Phrases(self.sentences, min_count=1, threshold=1)
del(bigram.common_terms)
Copy link
Contributor

@menshikh-iv menshikh-iv Dec 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better solution - checkout to the previous version, tran+save model and try to load old model here (in your current implementation, we possibly skip other bugs, that can be hidden due to the current bug)

FYI - del isn't a function, this is operator -> no needed brackets.

if hasattr(model, 'scoring'):
if isinstance(model.scoring, six.string_types):
if model.scoring == 'default':
logger.info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: we use 140 char limits for code, no need to split this call into several lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argh travis failed, it was 120 not 140 @menshikh-iv, you joker ! ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexgarel oh, sorry, my mistake (120 instead of 80, not 140)

if sys.version_info[0] >= 3:
unicode = str

@contextlib.contextmanager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks useful, please move it to gensim.test.utils https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/test/utils.py

… bug in loading phrases model before scoring). Also moving temporary_file context manager in gensim.test.utils
@alexgarel
Copy link
Contributor Author

@menshikh-iv this should be ok.

Your proposal to test load backward compatibility was right, for I extend it to scoring and found there was a bug in it.

@menshikh-iv
Copy link
Contributor

Big thanks for fast fix @alexgarel, I checked with 2.0.0, 1.0.1 - works fine!

@menshikh-iv menshikh-iv merged commit a7120d7 into piskvorky:develop Dec 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Gensim 3.1.0 Phrase loader incompatible with older models due to introduction of common_terms
3 participants