-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve FastText
documentation
#2353
Merged
Merged
Changes from 7 commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
18f5302
WIP: doco improvements
mpenkov b57a086
more doco
mpenkov 5fa0c9f
Merge remote-tracking branch 'upstream/develop' into doc-improv
mpenkov d66f55c
flake8-docs updates
mpenkov 9696657
adding fixmes
mpenkov 3019bea
minor fixup
mpenkov 74b740c
review response
mpenkov 09ab630
Remove magic constant
mpenkov 2e728cb
deprecate the iter parameter to the FastText constructor
mpenkov c435a8e
minor documentation fixes
mpenkov c688877
review response: use absolute references
mpenkov 677679c
review response
mpenkov 29c5210
fix unit test
mpenkov 044d699
Revert "deprecate the iter parameter to the FastText constructor"
mpenkov f9df136
Revert "fix unit test"
mpenkov cdc727a
more documentation improvements
mpenkov 4ea3f06
comment out pesky import
mpenkov e532d62
fix typo
mpenkov 931d3d7
improve tutorial notebook
mpenkov 177c712
minor documentation update
mpenkov bd83886
flake8-docs
mpenkov 11fabca
more doco fixes
mpenkov 2d490f0
fix example
mpenkov 9b6f8bb
git rm docs/fasttext-notes.md
mpenkov 9b5e161
review response: include _fasttext_bin in docs
mpenkov 6aa013a
review response: make examples more readable
mpenkov 7d2b562
review response: remove blank line
mpenkov 25b24c7
review response: add emphasis
mpenkov b4e8405
review response: add comment
mpenkov 1fc9bf2
review response: add example
mpenkov 72ec312
review response: remove redundant line
mpenkov 29c4faf
review response: update comment
mpenkov 74410fc
Update gensim/models/fasttext.py
piskvorky a3456a4
review response: improve examples
mpenkov 96eab08
clarify example
mpenkov ff72185
review response: improve example
mpenkov 9140cf6
review response: improve tokenization in example
mpenkov 31c79c3
flake8
mpenkov 2f479ca
fix long lines
mpenkov c48a7f3
fixup: use correct parameter name
mpenkov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this
model.epochs
coming from? The model instantiation above shows no such variable.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the epochs parameter is optional, so not included during instantiation.
Unfortunately, there is also some confusion about its name: the FastText constructor uses iter to specify the number of epochs, whereas the superclass uses the proper name epochs.
The presence of the epochs parameter to the train function (which seems to override the one set in the constructor) also complicates matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. If it's optional, let's not use it in
train
. Or if we use it intrain
, let's instantiate it explicitly. This neither-here-nor-there example is confusing ("where is this value come from?").Regarding
iter
/epochs
-- can you please rename it toepochs
, consistently? I remember some discussion around this (cc @menshikh-iv @gojomo ), but can't imagine why we'd want both. At most we could supportiter
for a while as an alias, but with a clear deprecation warning.This is a perfect opportunity to clean up some of the API mess, rather than piling on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree regarding the cleanup. My preference would be to leave epochs/iter out of the constructor. The model doesn't need that parameter until training time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Models in Gensim generally allow the
trained_model = Constructor(params_including_training_params)
pattern. So breaking that could be confusing to existing users (and a big change to backward incompatibility).I'm not totally opposed though, especially if we still allow ctr params for a while with "deprecated" warnings. The API needs a clean up, and now is a good time.
Not a big priority though, and the documentation examples can already promote the
instantiate then train, as 2 steps
pattern.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just training parameters you need to include in the constructor. It's also parameters for vocabulary creation. So you're managing at least 3 sets of separate parameters, 2 of which are duplicated by other methods of the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should promote them as separate steps in docs. Question is, do we deprecate (certainly not remove) them from ctr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your motivation in not removing them (backward compatibility). Unfortunately, the current mess won't go away until we remove things like this.
I think the first step should be to deprecate them. After a while, we can remove them, perhaps in time for a major release.
If we want a one-liner way to instantiate and train, we can always write a pure function and promote that. That should make it easier for users to cut over to the cleaner API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, deprecation is what I suggest.