From 7e1248ab770b8cf8b4b9929f551c8599d467a462 Mon Sep 17 00:00:00 2001 From: Hyukjin Kwon Date: Wed, 15 Jan 2025 15:52:56 +0900 Subject: [PATCH] [SPARK-50824][PYTHON] Avoid importing optional Python packages for checking ### What changes were proposed in this pull request? This PR proposes to avoid importing optional Python packages for checking, by using `importlib.util.find_spec` instead of actually loading/importing the package. ### Why are the changes needed? https://github.com/apache/spark/commit/a40919912f5ce7f63fff2907b30e473dd4155227 changed to import optional dependencies in main code. After that, technically https://github.com/apache/spark/commit/f223b8da9e23e4e028e145e0d4dd74eeae5d2d52 broke the Python Spark Core tests, (because now we will import `pyspark.testing`, and it will import optional dependencies) but it did not run the tests. By importing `deepspeed`, via logger, it can show stdout (https://github.com/microsoft/DeepSpeed/blob/master/accelerator/real_accelerator.py#L182). This broke the test in `pyspark.conf`. After that, the real test failure was found when core change was triggered at https://github.com/apache/spark/commit/6f3b778e1a12901726c2a35072904f36f46f7888. In the PR, build passed because it was before https://github.com/apache/spark/commit/f223b8da9e23e4e028e145e0d4dd74eeae5d2d52 was merged. ### Does this PR introduce _any_ user-facing change? Technically yes. There might be some side effects by importing optional dependencies, and this PR avoid them. ### How was this patch tested? CI in this PR should verify it. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49500 from HyukjinKwon/SPARK-50824. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- python/pyspark/testing/utils.py | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/python/pyspark/testing/utils.py b/python/pyspark/testing/utils.py index 233b432766b75..76f5b48ff9bb0 100644 --- a/python/pyspark/testing/utils.py +++ b/python/pyspark/testing/utils.py @@ -52,13 +52,9 @@ def have_package(name: str) -> bool: - try: - import importlib + import importlib - importlib.import_module(name) - return True - except Exception: - return False + return importlib.util.find_spec(name) is not None have_numpy = have_package("numpy")