Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve
FastText
documentation #2353Improve
FastText
documentation #2353Changes from 13 commits
18f5302
b57a086
5fa0c9f
d66f55c
9696657
3019bea
74b740c
09ab630
2e728cb
c435a8e
c688877
677679c
29c5210
044d699
f9df136
cdc727a
4ea3f06
e532d62
931d3d7
177c712
bd83886
11fabca
2d490f0
9b6f8bb
9b5e161
6aa013a
7d2b562
25b24c7
b4e8405
1fc9bf2
72ec312
29c4faf
74410fc
a3456a4
96eab08
ff72185
9140cf6
31c79c3
2f479ca
c48a7f3
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this
model.epochs
coming from? The model instantiation above shows no such variable.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the epochs parameter is optional, so not included during instantiation.
Unfortunately, there is also some confusion about its name: the FastText constructor uses iter to specify the number of epochs, whereas the superclass uses the proper name epochs.
The presence of the epochs parameter to the train function (which seems to override the one set in the constructor) also complicates matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. If it's optional, let's not use it in
train
. Or if we use it intrain
, let's instantiate it explicitly. This neither-here-nor-there example is confusing ("where is this value come from?").Regarding
iter
/epochs
-- can you please rename it toepochs
, consistently? I remember some discussion around this (cc @menshikh-iv @gojomo ), but can't imagine why we'd want both. At most we could supportiter
for a while as an alias, but with a clear deprecation warning.This is a perfect opportunity to clean up some of the API mess, rather than piling on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree regarding the cleanup. My preference would be to leave epochs/iter out of the constructor. The model doesn't need that parameter until training time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Models in Gensim generally allow the
trained_model = Constructor(params_including_training_params)
pattern. So breaking that could be confusing to existing users (and a big change to backward incompatibility).I'm not totally opposed though, especially if we still allow ctr params for a while with "deprecated" warnings. The API needs a clean up, and now is a good time.
Not a big priority though, and the documentation examples can already promote the
instantiate then train, as 2 steps
pattern.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just training parameters you need to include in the constructor. It's also parameters for vocabulary creation. So you're managing at least 3 sets of separate parameters, 2 of which are duplicated by other methods of the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should promote them as separate steps in docs. Question is, do we deprecate (certainly not remove) them from ctr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your motivation in not removing them (backward compatibility). Unfortunately, the current mess won't go away until we remove things like this.
I think the first step should be to deprecate them. After a while, we can remove them, perhaps in time for a major release.
If we want a one-liner way to instantiate and train, we can always write a pure function and promote that. That should make it easier for users to cut over to the cleaner API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, deprecation is what I suggest.