-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RepresentationSeries: pca, nmf, tsne #140
RepresentationSeries: pca, nmf, tsne #140
Conversation
…ion" funcionts of representaion module - Implement full support for Representation Series in "Dimensionality Reduction" functions of representation module - add appropriate parameterized tests in `test_representation.py` Note: the dimensionality reduction functions *do support* both flat and represenational input; this is because as discussed in jbesomi#134 we will not implement a `to_repr` function, so users might have e.g. a flattened tfidf series they want to perform pca on -> they can use the function. Of course, we could remove the flat support (if we basically want to "forbid" users from using `VectorSeries` for tfidf/term_freq/count; we're not 100% sure. Co-authored-by: Maximilian Krahn <maximilian.krahn@icloud.com>
🎉 Opinions:
Review:
Hope you got some insights! 👍 |
I believe that it is a bad idea to restrict the
So two functions that both do dimensionality reduction need a different pipeline, that's not nice for users! Yes, This allows the user to call all dimensionality reduction functions the same way. This is why I see two possibilities:
I think it should stay an implementation detail and not be visible to users that |
Hey Henri, I totally agree with your view and understand your points! I agree with having all representation functions (pca, nmf, tfidf) to receive as input a
I believe the users should be very aware that Regarding the |
I will add something about that to the docstring! I agree that when writing a full tutorial on representation, we have to describe all this in detail; otherwise users will just use
I agree it's good to stick to the scikit-learn defaults and mention this in the tutorial (the representation tutorial will need to be very good 🤓 !) |
- dim. red. functions only support Representation Series input - appropriately change tests
Just removed support for QUESTION: About the clustering functions: should they:
|
Good question! a) Only vector series: would very inefficient in some cases ... b) We would not respect the rule "one function type for So, to recap right now we have:
This might be a bit confusing/overcomplicated/limited, what's your advice on that? An alternative might be: keep only Conclusion: I would like to see/understand your opinion! 👍 |
I agree it's really annoying that we can't just only work with
I think even if possible, it's not really a nice UX if users have to use I think the possibilities are:
I believe that possibility two is the most intuitive for users, as the only time they'll ever notice it is when trying to put a Series with |
Closing this, see #156 |
test_representation.py
Note: the dimensionality reduction functions do support both flat and represenational input; this is because as discussed in #134 we will not implement a
to_repr
function, so users might have e.g. a flattened tfidf series they want to perform pca on -> we allow them to use the dimensionality reduction functions. Of course, we could remove the flat support (if we basically want to "forbid" users from usingVectorSeries
for tfidf/term_freq/count); we're not 100% sure.