Skip to content

Commit

Permalink
Fixes pyspark depenencies in install target
Browse files Browse the repository at this point in the history
Pyspark has a bunch of hidden depedencies -- these are solved by the sql
and connect targets
  • Loading branch information
elijahbenizzy committed Aug 23, 2023
1 parent 1f0d5df commit 8236b47
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,12 @@ def load_requirements():
"dask-diagnostics": ["dask[diagnostics]"],
"dask-distributed": ["dask[distributed]"],
"ray": ["ray>=2.0.0", "pyarrow"],
"pyspark": ["pyspark[pandas_on_spark]", "pandas<2.0"], # I'm sure they'll add support soon,
"pyspark": [
# we have to run these dependencies cause Spark does not check to ensure the right target was called
"pyspark[pandas_on_spark,connect,sql]",
# This is problematic, see https://stackoverflow.com/questions/76072664/convert-pyspark-dataframe-to-pandas-dataframe-fails-on-timestamp-column
"pandas<2.0",
], # I'm sure they'll add support soon,
# but for now its not compatible
"pandera": ["pandera"],
},
Expand Down

0 comments on commit 8236b47

Please sign in to comment.