Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement] remove sklearn_check_version dependence from onedal/svm #1835

Merged
merged 25 commits into from
May 22, 2024

Conversation

icfaust
Copy link
Contributor

@icfaust icfaust commented May 14, 2024

Description

Add a design rule which forces all sklearn-related interfacing to occur in its proper package, namely sklearnex. This prevents bad estimator architectures which place too much of the sklearn-related aspects in the wrong area. Ideally this test would have no exceptions going forward.

This is a subsegment of #1829 which handles just the svm algorithms so that sklearn_check_version is not in onedal/svm

This also solves the PR #1727 by removing sklearn inheritance from onedal/svm classes

This also solves issues with gpu offloading with fit_proba that had been removed errantly in #1361

This also fixes an issue where sample_weights are always generated, which may impact performance.

Tasks -

  • Initial compile
  • correct public CI failures
  • Split into separate PRs as necessary (and merge them)
  • pass private CI

@icfaust
Copy link
Contributor Author

icfaust commented May 14, 2024

/intelci: run

@icfaust icfaust marked this pull request as ready for review May 15, 2024 04:11
@icfaust
Copy link
Contributor Author

icfaust commented May 15, 2024

/intelci: run

sklearnex/svm/_common.py Outdated Show resolved Hide resolved
@samir-nasibli
Copy link
Contributor

@icfaust
onedal4py depends on sklearn. Could you please clarify more why we can not use sklearn versioning here?

@icfaust
Copy link
Contributor Author

icfaust commented May 17, 2024

@icfaust onedal4py depends on sklearn. Could you please clarify more why we can not use sklearn versioning here?

The onedal folder installs via setup.py, requirements for the setup.py do not include a sklearn version (as per INSTALL.md). Therefore, if the user were install onedal without having run setup_sklearnex.py, they are in for a nasty surprise.

Secondly, sklearn_check_version is used only for input verification and conformance on raising errors in sklearn (in practice in onedal/). By design, sklearn conformance should be done in the sklearnex directory, and not in the onedal directory. Why would a user care about sklearn if they were using onedal classes directly? For example, why else have these functions defined at all? https://github.com/intel/scikit-learn-intelex/blob/main/onedal/utils/validation.py

Thirdly, operational maintenance of the onedal folder should be minimized (i.e. . By using sklearn_check_version, it is increasing our burden in observation and knowledge into the various onedal classes).

If we want to remove sklearn entirely, we need to fix issues in (not counting testing):
onedal/cluster/kmeans.py
onedal/cluster/dbscan.py (solved in #1837)
onedal/ensemble/forest.py
onedal/common/_mixin.py
onedal/utils/validation.py (This one would be difficult)

@icfaust icfaust requested a review from Alexsandruss May 17, 2024 05:17
@icfaust
Copy link
Contributor Author

icfaust commented May 17, 2024

/intelci: run

@samir-nasibli
Copy link
Contributor

samir-nasibli commented May 17, 2024

@icfaust onedal4py depends on sklearn. Could you please clarify more why we can not use sklearn versioning here?

The onedal folder installs via setup.py, requirements for the setup.py do not include a sklearn version (as per INSTALL.md). Therefore, if the user were install onedal without having run setup_sklearnex.py, they are in for a nasty surprise.

Secondly, sklearn_check_version is used only for input verification and conformance on raising errors in sklearn (in practice in onedal/). By design, sklearn conformance should be done in the sklearnex directory, and not in the onedal directory. Why would a user care about sklearn if they were using onedal classes directly? For example, why else have these functions defined at all? https://github.com/intel/scikit-learn-intelex/blob/main/onedal/utils/validation.py

Thirdly, operational maintenance of the onedal folder should be minimized (i.e. . By using sklearn_check_version, it is increasing our burden in observation and knowledge into the various onedal classes).

If we want to remove sklearn entirely, we need to fix issues in (not counting testing): onedal/cluster/kmeans.py onedal/cluster/dbscan.py (solved in #1837) onedal/ensemble/forest.py onedal/common/_mixin.py onedal/utils/validation.py (This one would be difficult)

The onedal folder installs via setup.py, requirements for the setup.py do not include a sklearn version (as per INSTALL.md). Therefore, if the user were install onedal without having run setup_sklearnex.py, they are in for a nasty surprise.

onedal4py already depends on daal4py, and it is actually part of current daal4py package. sklearn is a runtime dependency, we are versioning it on daal4py, so why we can not do it on onedal4py?
Probably make sense design onedal4py as a independent, but currently it is not sklearn-independent. And since for some primitives it is used in onedal4py and sklearn provide different ifaces/deprecate smth based on version, so make sense also having versioning there in my opinion for a while.

@icfaust
Copy link
Contributor Author

icfaust commented May 17, 2024

/intelci: run

@icfaust icfaust requested a review from Alexsandruss May 22, 2024 05:13
icfaust and others added 4 commits May 22, 2024 07:20
Co-authored-by: Alexander Andreev <alexander.andreev@intel.com>
@icfaust
Copy link
Contributor Author

icfaust commented May 22, 2024

/intelci: run

@icfaust
Copy link
Contributor Author

icfaust commented May 22, 2024

@icfaust onedal4py depends on sklearn. Could you please clarify more why we can not use sklearn versioning here?

The onedal folder installs via setup.py, requirements for the setup.py do not include a sklearn version (as per INSTALL.md). Therefore, if the user were install onedal without having run setup_sklearnex.py, they are in for a nasty surprise.
Secondly, sklearn_check_version is used only for input verification and conformance on raising errors in sklearn (in practice in onedal/). By design, sklearn conformance should be done in the sklearnex directory, and not in the onedal directory. Why would a user care about sklearn if they were using onedal classes directly? For example, why else have these functions defined at all? https://github.com/intel/scikit-learn-intelex/blob/main/onedal/utils/validation.py
Thirdly, operational maintenance of the onedal folder should be minimized (i.e. . By using sklearn_check_version, it is increasing our burden in observation and knowledge into the various onedal classes).
If we want to remove sklearn entirely, we need to fix issues in (not counting testing): onedal/cluster/kmeans.py onedal/cluster/dbscan.py (solved in #1837) onedal/ensemble/forest.py onedal/common/_mixin.py onedal/utils/validation.py (This one would be difficult)

The onedal folder installs via setup.py, requirements for the setup.py do not include a sklearn version (as per INSTALL.md). Therefore, if the user were install onedal without having run setup_sklearnex.py, they are in for a nasty surprise.

onedal4py already depends on daal4py, and it is actually part of current daal4py package. sklearn is a runtime dependency, we are versioning it on daal4py, so why we can not do it on onedal4py? Probably make sense design onedal4py as a independent, but currently it is not sklearn-independent. And since for some primitives it is used in onedal4py and sklearn provide different ifaces/deprecate smth based on version, so make sense also having versioning there in my opinion for a while.

There is a key distinction here: if daal4py depends on sklearn versions, then problems with sklearn versioning will be solved in daal4py. If we add sklearn versioning to onedal4py, we increase our maintenance burden. If you list the primitives that you mention I'd like to review, from my perspective sklearn has a stable API for what we use in onedal4py. If this becomes a problem in a future sklearn version, then this test will specifically trigger and highlight a discussion about the architecture which we should have.

@icfaust icfaust merged commit 8b5de5a into intel:main May 22, 2024
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants