-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose max_final_vocab parameter in FastText constructor #2516
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @scribu,
Sorry for taking so long to review this. Are you still able to work on this PR?
I left you some comments. Please have a look.
Finally, please add some tests for your new functionality to demonstrate that it does what it is supposed to do.
window=5, | ||
min_count=5, | ||
max_vocab_size=None, | ||
max_final_vocab=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this parameter to the end of the list. We don't want to break backward compatibility for people who insist on passing keyword arguments as unnamed.
Also, please avoid mixing in innocuous formatting changes with actual functionality, as it makes your code more difficult to review. Roll back this formatting change in the constructor.
@@ -507,6 +534,10 @@ def __init__(self, sentences=None, corpus_file=None, sg=0, hs=0, size=100, alpha | |||
Limits the RAM during vocabulary building; if there are more unique | |||
words than this, then prune the infrequent ones. Every 10 million word types need about 1GB of RAM. | |||
Set to `None` for no limit. | |||
max_final_vocab : int, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth explaining the difference between this and max_vocab_size
. How do they interact together?
sample=sample, | ||
sorted_vocab=bool(sorted_vocab), | ||
null_word=null_word, | ||
ns_exponent=ns_exponent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future-proof your code. If we want to add more parameters to end of this list, we won't have to change the last line.
ns_exponent=ns_exponent | |
ns_exponent=ns_exponent, |
Continued in #2867 (I don't have permissions to push changes to this branch). |
I think it was just an oversight in #1915.