[pyspark] support typing for model.py #9156

wbo4958 · 2023-05-15T02:57:20Z

This PR adds typing for model.py of xgboost.spark, and I will create the followup PR to support typing of core.py of xgboost.spark

wbo4958 · 2023-05-15T21:49:44Z

Seems the failure TestTreeMethod.test_categorical_ames_housing is not caused by this PR? @trivialfis any thoughts on this?

trivialfis · 2023-05-15T21:49:19Z

python-package/xgboost/spark/model.py

+    )
+
+    # xgboost types
+    XGB_ESTIMATOR = Union[XGBClassifier, XGBRanker, XGBRegressor]


Use XGBModel class instead. It's the parent class for all referenced classes here.

trivialfis · 2023-05-15T21:50:48Z

python-package/xgboost/spark/model.py

+    SPARK_XGB_ESTIMATOR = Union[SparkXGBClassifier, SparkXGBRanker, SparkXGBRegressor]
+    SPARK_XGB_ESTIMATOR_TYPE = Union[
+        Type[SparkXGBClassifier], Type[SparkXGBRanker], Type[SparkXGBRegressor]
+    ]
+    SPARK_XGB_MODEL = Union[
+        SparkXGBClassifierModel, SparkXGBRegressorModel, SparkXGBRankerModel
+    ]
+    SPARK_XGB_MODEL_TYPE = Union[
+        Type[SparkXGBClassifierModel],
+        Type[SparkXGBRegressorModel],
+        Type[SparkXGBRankerModel],
+    ]
+
+    SPARK_XGB_INSTANCE = TypeVar(
+        "SPARK_XGB_INSTANCE", SPARK_XGB_ESTIMATOR, SPARK_XGB_MODEL
+    )


Please consider the inheritance structure here instead of using Union.

Please use CamelCase for types.

Yeah, For example, If I replace

SPARK_XGB_ESTIMATOR = Union[SparkXGBClassifier, SparkXGBRanker, SparkXGBRegressor]

with

from .core import _SparkXGBEstimator SparkXGBEstimator = _SparkXGBEstimator

it complains with below error

xgboost/spark/model.py:41: error: Module "xgboost.spark.core" has no attribute "_SparkXGBEstimator" [attr-defined] xgboost/spark/model.py:57: error: Variable "xgboost.spark.model.SparkXGBEstimator" is not valid as a type [valid-type] xg

@trivialfis any thoughts on that?

I think it's the same issue one would encounter for any typed language, the modules need to be written in a structured way. Python duck typing hides the issue, now that we want to annotate the code, some refactoring needs to be done.

trivialfis · 2023-05-15T21:53:44Z

python-package/xgboost/spark/model.py

 from pyspark.ml.util import DefaultParamsReader, DefaultParamsWriter, MLReader, MLWriter
 from pyspark.sql import SparkSession

 from xgboost.core import Booster

 from .utils import get_class_name, get_logger

-
-def _get_or_create_tmp_dir():
+if TYPE_CHECKING:


Why type checking only? These are hard dependencies of the spark module, you can import them freely. The spark module is not part of package import:

xgboost/python-package/xgboost/__init__.py

Line 18 in 7375bd0

from .training import cv, train

. As a result, what you import here doesn't affect users that don't import the spark module.

If I put the imports outside of TYPE_CHECKING, It will run into circular import issue.

../../../anaconda3/envs/xgboost-dev/lib/python3.9/importlib/__init__.py:127: in import_module return _bootstrap._gcd_import(name[level:], package, level) tests/test_distributed/test_with_spark/test_spark_local_cluster.py:23: in <module> from xgboost.spark import SparkXGBClassifier, SparkXGBRegressor python-package/xgboost/spark/__init__.py:8: in <module> from .estimator import ( python-package/xgboost/spark/estimator.py:9: in <module> from .core import ( # type: ignore python-package/xgboost/spark/core.py:52: in <module> from .model import ( python-package/xgboost/spark/model.py:28: in <module> from . import ( E ImportError: cannot import name 'SparkXGBClassifier' from partially initialized module 'xgboost.spark' (most likely due to a circular import) (/home/bobwang/work.d/ml/xgboost/python-package/xgboost/spark/__init__.py)

@trivialfis any thoughts on that?

trivialfis · 2023-05-15T21:54:56Z

Seems the failure TestTreeMethod.test_categorical_ames_housing is not caused by this PR? @trivialfis any thoughts on this?

Please ignore it. It's the mighty internet error.

Currently, pyspark will run into circular imports issue when enabling typing for model.py. So this PR tried to refactor out pyspark a little bit to avoid this.

wbo4958 · 2023-05-17T02:30:43Z

I will create a followup PR to support typing for core.py

trivialfis reviewed May 15, 2023

View reviewed changes

trivialfis mentioned this pull request May 15, 2023

Better support for mypy. #6496

Open

[pyspark] factor out pyspark to support typing

57b7724

Currently, pyspark will run into circular imports issue when enabling typing for model.py. So this PR tried to refactor out pyspark a little bit to avoid this.

wbo4958 force-pushed the pyspark-mypy branch from fad4b3d to 57b7724 Compare May 17, 2023 02:30

wbo4958 mentioned this pull request May 17, 2023

[pyspark] support typing for core.py #9169

Closed

trivialfis approved these changes May 17, 2023

View reviewed changes

trivialfis merged commit caf326d into dmlc:master May 17, 2023

wbo4958 deleted the pyspark-mypy branch May 17, 2023 23:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pyspark] support typing for model.py #9156

[pyspark] support typing for model.py #9156

wbo4958 commented May 15, 2023

wbo4958 commented May 15, 2023

trivialfis May 15, 2023

trivialfis May 15, 2023

wbo4958 May 16, 2023

wbo4958 May 16, 2023

trivialfis May 16, 2023

trivialfis May 15, 2023 •

edited

Loading

wbo4958 May 16, 2023

wbo4958 May 16, 2023

trivialfis commented May 15, 2023

wbo4958 commented May 17, 2023

[pyspark] support typing for model.py #9156

[pyspark] support typing for model.py #9156

Conversation

wbo4958 commented May 15, 2023

wbo4958 commented May 15, 2023

trivialfis May 15, 2023

Choose a reason for hiding this comment

trivialfis May 15, 2023

Choose a reason for hiding this comment

wbo4958 May 16, 2023

Choose a reason for hiding this comment

wbo4958 May 16, 2023

Choose a reason for hiding this comment

trivialfis May 16, 2023

Choose a reason for hiding this comment

trivialfis May 15, 2023 • edited Loading

Choose a reason for hiding this comment

wbo4958 May 16, 2023

Choose a reason for hiding this comment

wbo4958 May 16, 2023

Choose a reason for hiding this comment

trivialfis commented May 15, 2023

wbo4958 commented May 17, 2023

trivialfis May 15, 2023 •

edited

Loading