Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose max_final_vocab parameter in FastText constructor #2516

Closed
wants to merge 2 commits into from
Closed

Expose max_final_vocab parameter in FastText constructor #2516

wants to merge 2 commits into from

Conversation

scribu
Copy link
Contributor

@scribu scribu commented May 31, 2019

I think it was just an oversight in #1915.

Copy link
Collaborator

@mpenkov mpenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @scribu,

Sorry for taking so long to review this. Are you still able to work on this PR?

I left you some comments. Please have a look.

Finally, please add some tests for your new functionality to demonstrate that it does what it is supposed to do.

window=5,
min_count=5,
max_vocab_size=None,
max_final_vocab=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this parameter to the end of the list. We don't want to break backward compatibility for people who insist on passing keyword arguments as unnamed.

Also, please avoid mixing in innocuous formatting changes with actual functionality, as it makes your code more difficult to review. Roll back this formatting change in the constructor.

@@ -507,6 +534,10 @@ def __init__(self, sentences=None, corpus_file=None, sg=0, hs=0, size=100, alpha
Limits the RAM during vocabulary building; if there are more unique
words than this, then prune the infrequent ones. Every 10 million word types need about 1GB of RAM.
Set to `None` for no limit.
max_final_vocab : int, optional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth explaining the difference between this and max_vocab_size. How do they interact together?

sample=sample,
sorted_vocab=bool(sorted_vocab),
null_word=null_word,
ns_exponent=ns_exponent
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future-proof your code. If we want to add more parameters to end of this list, we won't have to change the last line.

Suggested change
ns_exponent=ns_exponent
ns_exponent=ns_exponent,

@mpenkov
Copy link
Collaborator

mpenkov commented Jun 26, 2020

Continued in #2867 (I don't have permissions to push changes to this branch).

@mpenkov mpenkov closed this Jun 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants