-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce Phraser
memory usage (drop frequencies)
#2208
Changes from 1 commit
242c80e
bba2e46
9f9b05f
c391fe5
d154e3a
40b6672
40dcbde
21c3911
80e9222
9943909
021226a
9de2495
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -208,6 +208,10 @@ def load(cls, *args, **kwargs): | |
""" | ||
model = super(PhrasesTransformation, cls).load(*args, **kwargs) | ||
# update older models | ||
# if value in phrasegrams dict is a tuple, load only the scores. | ||
if len(model.__dict__['phrasegrams']): | ||
if isinstance(list(model.__dict__['phrasegrams'].values())[0], tuple): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More readable:
|
||
model.__dict__['phrasegrams'].update((k, v[1]) for k, v in model.__dict__['phrasegrams'].items()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is hard to read. Can you rephrase this with normal dict access syntax, and using better variable names? (not Also, changing the dict at the same time you're iterating over it sounds tricky. It may be safer to iterate over each value, and if it's a tuple, change it right away (in-place assignment, without a full dict copy). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would there be any case where some values are tuple and some are not? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think so -- all values should always be in the same format (all old, all new). My suggestion was more around being careful with modify-while-iterating and creating-a-copy-of-large-dict. So whether you check each value or check just once makes little difference. In fact, per-value checks may be a bit cleaner, because there's no need for the special case of "dictionary is empty", or materializing all the values into a list only to pick the first one. |
||
# if no scoring parameter, use default scoring | ||
if not hasattr(model, 'scoring'): | ||
logger.info('older version of %s loaded without scoring function', cls.__name__) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if model.phrasegrams:
more Pythonic.