-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tidy up KeyedVectors.most_similar() API #3000
Conversation
I'm not sure if this fully addresses the `# TODO: Update to better match & share code with most_similar()` at line piskvorky#981 or not, so I've left it in.
w/r/t the remove of I presume the original intention was to guard against someone calling most_similar('word1', 'word2') was parsed as most_similar(positive=['w', 'o', 'r', 'd', '1'], negative=['w', 'o', 'r', 'd', 2']) instead of most_similar(positive=['word1'], negative=['w', 'o', 'r', 'd', '2']) so I figure this is an improvement. (In fact, in the case of the tweet I cited, it was only the good fortune that the key |
Thanks for this, sorry it's taken so long to review. A few test-cases that verify/demo the new possibilities (and ensure that the line-comment I made about a possible lost-capability isn't the case) would make this an easy merge. |
@simonwiles Ping. Are you able to complete this PR? |
Note to self:
|
Slightly-larger #3188 may make sense to apply 1st, then this. |
@piskvorky Is #3188 in scope for the current release? It isn't mentioned anywhere in the associated milestone or project. If not, my preference is to:
Please let me know your thoughts. |
My preference is to get in all API-breaking changes ASAP. Looking at #3188, it seems backward-compatible, so can wait potentially. Of course, merging both this and #3188 would be ideal, but I see @gojomo added new good review comments to #3188. So if #3188 would mean blocking the release, we can release without. If trivial let's do both. |
Merged. Thank you @simonwiles ! |
* Allow supplying a string-key as the negative arg. to most_similar() * Allow a single vector as a positive or negative arg. to most_similar() * Update comments * Accept single arguments when positive and negative are both supplied * Update most_similar_cosmul to match most_similar I'm not sure if this fully addresses the `# TODO: Update to better match & share code with most_similar()` at line piskvorky#981 or not, so I've left it in. * minor code cleanup * add unit tests * Update CHANGELOG.md * remove redundant variable declaration * enforce consistency * respond to review feedback * Update keyedvectors.py Co-authored-by: Michael Penkov <misha.penkov@gmail.com> Co-authored-by: Michael Penkov <m@penkov.dev>
* Allow supplying a string-key as the negative arg. to most_similar() * Allow a single vector as a positive or negative arg. to most_similar() * Update comments * Accept single arguments when positive and negative are both supplied * Update most_similar_cosmul to match most_similar I'm not sure if this fully addresses the `# TODO: Update to better match & share code with most_similar()` at line piskvorky#981 or not, so I've left it in. * minor code cleanup * add unit tests * Update CHANGELOG.md * remove redundant variable declaration * enforce consistency * respond to review feedback * Update keyedvectors.py Co-authored-by: Michael Penkov <misha.penkov@gmail.com> Co-authored-by: Michael Penkov <m@penkov.dev>
Fixes #2998.
This PR:
negative
argumentpositive
ornegative
argumentpositive
andnegative
arguments in the same callmost_similar_cosmul()
to match this behaviour as well.The API is now, imo, a little more straightforward in that it simply wraps single values in a list where appropriate.
There's a comment in
most_similar_cosmul
at line ~981:but I'm not sure whether the changes in the last commit completely address this or not, so I've left the comment in place.
I don't think these changes necessitate an update to the documentation, which has always specified that lists should be passed to these methods -- this is just a defensive pattern that preserves the shortcut that's in production and makes the API more resistant to violating the principle of least astonishment :)