-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the train method of TranslationMatrix #1838
Merged
Merged
Changes from 21 commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
1aa3f33
fix the compatibility between python2 & 3
robotcator 24e6331
Merge https://github.com/RaRe-Technologies/gensim into fix-word2vec-n…
robotcator f6f571f
require explicit corpus size, epochs for train()
gojomo 5e9529b
make all train() calls use explicit count, epochs
gojomo 5c24a90
add tests to make sure that ValueError is indeed thrown
robotcator c89f285
update test
robotcator 10ff8a5
fix the word2vec's reset_from()
robotcator a6312ca
Merge branch 'fix-word2vec' into fix-word2vec-notebook
robotcator be5216a
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
robotcator 504bd09
require explicit corpus size, epochs for train()
gojomo 43f9689
make all train() calls use explicit count, epochs
gojomo 49e3d00
update notebooks
robotcator c9eab32
fix some error
robotcator 8024eb5
fix test error
robotcator d3562b6
Merge branch 'test-word2vec' of https://github.com/robotcator/gensim …
robotcator ff93cdf
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
robotcator c11d007
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
robotcator 155e1db
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
robotcator 6cfc651
make tagged_docs optional
robotcator c47303b
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
robotcator fd2f753
fix the train method
robotcator 6f05130
add comments for the translation matrix
robotcator File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -365,23 +365,23 @@ class BackMappingTranslationMatrix(utils.SaveLoad): | |
>>> src_model = Doc2Vec.load(datapath("small_tag_doc_5_iter50")) | ||
>>> dst_model = Doc2Vec.load(datapath("large_tag_doc_10_iter50")) | ||
>>> | ||
>>> model_trans = BackMappingTranslationMatrix(data, src_model, dst_model) | ||
>>> model_trans = BackMappingTranslationMatrix(src_model, dst_model) | ||
>>> trans_matrix = model_trans.train(data) | ||
>>> | ||
>>> result = model_trans.infer_vector(dst_model.docvecs[data[3].tags]) | ||
|
||
""" | ||
def __init__(self, tagged_docs, source_lang_vec, target_lang_vec, random_state=None): | ||
def __init__(self, source_lang_vec, target_lang_vec, tagged_docs=None, random_state=None): | ||
""" | ||
|
||
Parameters | ||
---------- | ||
tagged_docs : list of :class:`~gensim.models.doc2vec.TaggedDocument`, optional | ||
Documents that will be used for training | ||
source_lang_vec : :class:`~gensim.models.doc2vec.Doc2Vec` | ||
Source Doc2Vec model. | ||
target_lang_vec : :class:`~gensim.models.doc2vec.Doc2Vec` | ||
Target Doc2Vec model. | ||
tagged_docs : list of :class:`~gensim.models.doc2vec.TaggedDocument`, optional | ||
Documents that will be used for training | ||
random_state : {None, int, array_like}, optional | ||
Seed for random state. | ||
|
||
|
@@ -393,22 +393,24 @@ def __init__(self, tagged_docs, source_lang_vec, target_lang_vec, random_state=N | |
self.random_state = utils.get_random_state(random_state) | ||
self.translation_matrix = None | ||
|
||
if tagged_docs is not None: | ||
self.train(tagged_docs) | ||
|
||
def train(self, tagged_docs): | ||
"""Build the translation matrix that mapping from the source model's vector to target model's vector | ||
|
||
Parameters | ||
---------- | ||
tagged_docs : list of :class:`~gensim.models.doc2vec.TaggedDocument`, optional | ||
THIS ARGUMENT WILL BE IGNORED. | ||
tagged_docs : list of :class:`~gensim.models.doc2vec.TaggedDocument` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Need to add a description for this parameter, what is it. |
||
|
||
Returns | ||
------- | ||
numpy.ndarray | ||
Translation matrix that mapping from the source model's vector to target model's vector. | ||
|
||
""" | ||
m1 = [self.source_lang_vec.docvecs[item.tags].flatten() for item in self.tagged_docs] | ||
m2 = [self.target_lang_vec.docvecs[item.tags].flatten() for item in self.tagged_docs] | ||
m1 = [self.source_lang_vec.docvecs[item.tags].flatten() for item in tagged_docs] | ||
m2 = [self.target_lang_vec.docvecs[item.tags].flatten() for item in tagged_docs] | ||
|
||
self.translation_matrix = np.linalg.lstsq(m2, m1, -1)[0] | ||
return self.translation_matrix | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem isn't fixed in
train
method (you continue to ignore passed parameter).