Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up OOP / stub methods #2957

Open
piskvorky opened this issue Sep 24, 2020 · 0 comments
Open

Clean up OOP / stub methods #2957

piskvorky opened this issue Sep 24, 2020 · 0 comments
Labels
housekeeping internal tasks and processes

Comments

@piskvorky
Copy link
Owner

Do we really need such stub methods that only call the same superclass method with the same arguments? That's already the default which occurs if no method is present. By my understanding, doc-comment tools like Sphinx will, in their current versions, already propagate superclass API docs down to subclasses.

The only thing that's varying is the comment, and while it expresses a different expected-type from the superclass, in practice that doc may be misleading: I think (but have not recently checked) that these SaveLoad .load() methods can return objects that may not be what the caller expects. They return the class that's in the file, not the class-that-.load()-was-called-on.

If so, it might be a worthwhile short-term step as soon as 4.0.0 – for limiting the risk of confusion & requirement for redundant/caveat-filled docs – to deprecate the practice of calling SpecificClass.load(filename) entirely, despite its common appearance in previously-idiomatic gensim example code. Instead, either (1) call it only on class SaveLoad itself, to express that the only expectation for the returned type is that it's a SaveLoad subclass; (2) promote load functionality to model-specific top-level functions in each relevant model – a bit more like the load_facebook_model() function for loading Facebook-FasttText-native-format models – which might themselves do some type-checking, so any docs which imply they return a certain type are true; (3) just make one utils.py generic load() or load_model(), perhaps with an optional class-enforcement parameter, and encourage its use.

(For explicitness, I think I like this third option. In practice, it might appear in example code as:

from gensim.utils import load_model
from gensim.models import Word2Vec

w2v_model_we_hope = load_model('w2v.bin')
w2v_model_or_error = load_model('w2v.bin', expected_class=Word2Vec)

Plenty of code where the file is saved/loaded in the same example block, or under strong expectations & naming conventions, might skip the enforced-type-checking – but it'd be an option & true/explicit, rather than something that's implied-but-not-really-enforced in the current idiom Word2Vec.load('kv.bin'))

Despite the effort involved in making such changes, they could minimize duplicated code/comments & avoid some unintuitive gotchas in the current SaveLoad approach. They might also help make a future migration to some more standard big-model-serialization convention (as proposed by #2848) cleaner.

Originally posted by @gojomo in #2939 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
housekeeping internal tasks and processes
Projects
None yet
Development

No branches or pull requests

1 participant