-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/check estimator #196
Fix/check estimator #196
Conversation
11218b2
to
af60553
Compare
@agoscinski I am honestly perplexed by some of the testing errors coming up right now (matmul failing for correctly-shaped matrices). Would appreciate thoughts -- @ceriottm @PicoCentauri @Luthaf and others also welcome to weigh in Running
I've checked, it's within the For the record, this only happens for |
33fe51b
to
766e3da
Compare
In the StandartFlexibleScaler Just replace |
Unfortunately, `np.average` flags an error with scikit learn (scikit learn
does not like the name `average`, even if we rename upon import). Unless
there's another function we can use, we might need to make a spoof
…On Thu, May 18, 2023, 07:09 Alexander Goscinski ***@***.***> wrote:
In the StandartFlexibleScaler np.ma.* is used which transforms the mean
to a <class 'numpy.ma.core.MaskedArray'> and thus also the transformed
array.c
https://github.com/lab-cosmo/scikit-matter/blob/766e3dabc42f26727b484b37fcd696ad64a9c222/src/skmatter/preprocessing/_data.py#L152
Then in the metrics code the @ operation between masks is executed as an
elementwise multiplication which results in the error because of not
broadcastability of the shapes
Just replace np.ma.* with regular np.*
—
Reply to this email directly, view it on GitHub
<#196 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALKVP3TLPJJBZPYP2QLHIZDXGYGPBANCNFSM6AAAAAAYFTRLLE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I dont fully understand the problem with np.average, could you elaborate. Where does scikit-learn return an error? |
126a26b
to
1ba16f8
Compare
Everything passes! Ready for a proper review with one or two outstanding q's:
|
1ba16f8
to
5dce498
Compare
I should note what many of the necessary fixes were:
|
I have a question here, shouldn't we explicitly add tests with |
If they are okay with it, sure. We can write in the doc also that it is not compatible with Pipeline yet, if this helps convincing them. In the end we need to mark this somehow programmatically. I am not sure how to do this, since scikit-learn Pipeline never checks for the type of the transformers, so changing the base class does not help here. Maybe the contributors have some useful suggestions.
Yes it is an estimator so it should inherit from it, what is the problem with it? That it does not always agrees with the shape of the input, right? We can make mark it as private class (_OrthogonalRegression), because it is only use for reconstruction measures, then I think we can ommit it for the estimator check. But it basically has the same problems as the sample selection classes, so I would do the apply the same solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks overall good. I would use the chance to add tests for not covered code
|
||
xnew -= col @ (col.T @ xnew) | ||
xnew -= (col @ (col.T @ xnew)).astype(xnew.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary? It seems weird to suddenly enforce the type here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this we get numpy.core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'divide' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It became quite a huge PR. I have the feeling if we merge it like it is, we will merge some buggy code which is hard to identify because it is entangled with so many changes, therefore I suggest to put the renaming of private variables into a separate PR to reduce the noise here. Then I can review it again.
These are the changes I could identify from this PR
* renaming member variables marked as private to sklearn style
* consistently validate and check input data in fit functions
* adding whitening option in PCovR
* KernelFlexibleCenterer was not consistently using validated kernel, this has been fixed
* adding tests tests/test_kernel_pcovr.py for different solvers
* adding tests tests/test_standard_flexible_scaler.py for taking average
* add sklearn estimator_checks tests
f0663c5
to
cdefe67
Compare
d5cb77d
to
097b273
Compare
Co-authored-by: Alexander Goscinski <alex.goscinski@posteo.de>
097b273
to
cb82ceb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment only. Otherwise looks fine.
Suggested merge commit
add sklearn estimator_checks tests and fix emerging test errors
* consistently validate and check input data in fit functions
* adding whitening option in PCovR
* KernelFlexibleCenterer was not consistently using validated kernel, this has been fixed
* adding tests tests/test_standard_flexible_scaler.py for taking average
* create new test file tests/test_check_estimators.py with sklearn estimator_checks tests
Co-authored-by: Alexander Goscinski <alex.goscinski@posteo.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
scikit-learn-contrib (our PR is scikit-learn-contrib/scikit-learn-contrib#62) requires that all estimators pass a check_estimators test ala https://scikit-learn.org/stable/modules/generated/sklearn.utils.estimator_checks.parametrize_with_checks.html#sklearn.utils.estimator_checks.parametrize_with_checks, which ours were not currently doing. I've been going through everything that inherits from the BaseEstimator class (or should) and making the required changes. @agoscinski would appreciate your input on the
linear_models
section, as I'm not sure that OrthogonalRegression can pass the estimators, as to my knowledge the predicted values may have a different shape that the fitted y, no? Please correct me if I'm wrong.📚 Documentation preview 📚: https://scikit-matter--196.org.readthedocs.build/en/196/