Resolve pyspark / numpy conflicts #992

loomlike · 2023-01-18T18:31:06Z

Signed-off-by: Jun Ki Min 42475935+loomlike@users.noreply.github.com

Description

Pyspark is still relying on the old numpy api, referring np.bool that has been deprecated.
Because of that, when calling sparkDF.toPandas(), it throws AttributeError: module 'numpy' has no attribute 'bool'.
We have to upgrade pyspark version when they release new patch.

How was this PR tested?

Does this PR introduce any user-facing changes?

No. You can skip the rest of this section.
Yes. Make sure to clarify your proposed changes.

Signed-off-by: Jun Ki Min <42475935+loomlike@users.noreply.github.com>

blrchen · 2023-01-19T03:35:16Z

I am having concern to do version pin in Feathr. Its okay to do version pin in Registry as its a standalone app. But for feathr it's a library and normally used in an environment with other python libraries in same context. Introduce a version pin might introduces the risk for package installation error or incompatible issues with other python packages.

I actually already experienced some issue when numpy was pin earlier

Nightly notebook test fails with error ImportError: this version of pandas is incompatible with numpy < 1.20.3
Installation issue on python 3.10 RuntimeWarning: NumPy 1.20.3 may not yet support Python 3.10

And seems this is already fixed in Spark apache/spark#37817, probably we can just set a min version for pyspeak instead?

loomlike · 2023-01-19T19:51:07Z

@xiaoyongzhu @blrchen I verified that unless we explicitly call sparkDF.toPandas() on the dataframe that include boolean type features, we can avoid the pyspark's bug. Once pyspark has new release, let's change the pyspark dependency to >new_version to address this.

Until then, we can stick with the current version. I'll put the comment on our setup.py instead of pinning numpy.

…e notebooks Signed-off-by: Jun Ki Min <42475935+loomlike@users.noreply.github.com>

Signed-off-by: Jun Ki Min <42475935+loomlike@users.noreply.github.com>

Put numpy pinning back

c1bf09e

Signed-off-by: Jun Ki Min <42475935+loomlike@users.noreply.github.com>

loomlike requested review from xiaoyongzhu and blrchen January 18, 2023 18:31

loomlike added the safe to test Tag to execute build pipeline for a PR from forked repo label Jan 18, 2023

xiaoyongzhu previously approved these changes Jan 19, 2023

View reviewed changes

Remove pinning numpy and relevant codes causing pyspark error from th…

0dbf1f1

…e notebooks Signed-off-by: Jun Ki Min <42475935+loomlike@users.noreply.github.com>

loomlike dismissed xiaoyongzhu’s stale review via 0dbf1f1 January 19, 2023 20:01

loomlike changed the title ~~Put numpy pinning back~~ Resolve pyspark / numpy conflicts Jan 19, 2023

Merge branch 'main' into jumin/pin_numpy

336527d

Signed-off-by: Jun Ki Min <42475935+loomlike@users.noreply.github.com>

blairch approved these changes Jan 20, 2023

View reviewed changes

blrchen approved these changes Jan 20, 2023

View reviewed changes

xiaoyongzhu approved these changes Jan 21, 2023

View reviewed changes

xiaoyongzhu merged commit f9cdccd into feathr-ai:main Jan 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve pyspark / numpy conflicts #992

Resolve pyspark / numpy conflicts #992

loomlike commented Jan 18, 2023 •

edited

Loading

blrchen commented Jan 19, 2023

loomlike commented Jan 19, 2023

Resolve pyspark / numpy conflicts #992

Resolve pyspark / numpy conflicts #992

Conversation

loomlike commented Jan 18, 2023 • edited Loading

Description

How was this PR tested?

Does this PR introduce any user-facing changes?

blrchen commented Jan 19, 2023

loomlike commented Jan 19, 2023

loomlike commented Jan 18, 2023 •

edited

Loading