-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Implement save_word2vec_format can for Doc2Vec #699
Conversation
@cedias Apologies for late reply. What will be the advantage of saving in this format? What is your use case? |
Well, since For the As for the load function i'm not really sure about the use case, besides API consistency, which is why I wanted an opinion about it :) |
I can see this being useful for others as well. Some thoughts:
Achieving those may require some new conventions in the method API and on-disk format to indicate the word-vec/doc-vec distinction... if at all possible those conventions should put a minimal burden on people using older files, other tools, or just one set (word-vecs or doc-vecs) of vectors. |
Ok, i'll work on it later this month then. Thanks for the tips. |
Hi @cedias Would you have time to work on this for our release this month? |
I believe this feature will be removed/realized in #1107 |
If Doc2Vec's Plausibly as per my earlier comment, there could be key-munging conventions for mixing word & doc vectors into the same flattened file on save, or even disentangling them on load. (Some downstream applications might like them mixed-together.) |
Another use case - I wanted to visualize docvecs in Tensorboard which require vectors to be in text file format, and this functionality would be useful for that. |
Ping @cedias, what status of this PR? Will you finish it soon? |
@menshikh-iv This feature was added in #1256. |
Then I close this PR, I hope the author agrees with @parulsethi (If not, you can reopen it) |
Hi,
I recently had to export my Doc2Vec model to W2V base format and the function save_word2vec_format wasn't implemented for D2V class and was simply calling W2V one.
Therefore I quickly made this implementation.
If you believe it's worthwhile I'll go ahead and implement the load function to properly test the save/load pipeline, let me know.