Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta Lake is not compatible with pyspark 2.4.3 #63

Closed
xctom opened this issue May 31, 2019 · 6 comments
Closed

Delta Lake is not compatible with pyspark 2.4.3 #63

xctom opened this issue May 31, 2019 · 6 comments

Comments

@xctom
Copy link

xctom commented May 31, 2019

Hi Delta Lake team,

I tried to setup for Delta Lake by following this instruction and got the same error as described in this issue. I also left comment about the error and environment in that issue.

Today I tried to downgrade my pyspark version to 2.4.2 and it worked:

pyspark --packages io.delta:delta-core_2.12:0.1.0
Python 2.7.16 (default, Apr 12 2019, 15:32:40)
[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.46.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Ivy Default Cache set to: /Users/xuc/.ivy2/cache
The jars for the packages stored in: /Users/xuc/.ivy2/jars
:: loading settings :: url = jar:file:/usr/local/lib/python2.7/site-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
io.delta#delta-core_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-bbcee9fd-e9b9-444d-88ac-389b062944c6;1.0
	confs: [default]
	found io.delta#delta-core_2.12;0.1.0 in central
:: resolution report :: resolve 195ms :: artifacts dl 2ms
	:: modules in use:
	io.delta#delta-core_2.12;0.1.0 from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   1   |   0   |   0   |   0   ||   1   |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-bbcee9fd-e9b9-444d-88ac-389b062944c6
	confs: [default]
	0 artifacts copied, 1 already retrieved (0kB/6ms)
19/05/31 12:05:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.2
      /_/

Using Python version 2.7.16 (default, Apr 12 2019 15:32:40)
SparkSession available as 'spark'.
>>> data = spark.range(5, 10)
>>> data.write.format("delta").mode("overwrite").save("/tmp/delta-table")
>>> exit()

Wonder if there is some compatibility issue?

@mukulmurthy
Copy link
Collaborator

Hi Chen,

There was a bug where Spark 2.4.2 was built with Scala 2.12 instead of 2.11. So if you're using Spark 2.4.2, you need to use Delta Lake 2.12, but for 2.4.3 (and future 2.4.x versions) use Delta Lake 2.11.

@xctom
Copy link
Author

xctom commented Jun 7, 2019

Got it. Thanks!

@xctom xctom closed this as completed Jun 7, 2019
@manuzhang
Copy link

@mukulmurthy It seems the quickstart doc is not up-to-date for Spark 2.4.3.

@mukulmurthy
Copy link
Collaborator

You're correct; thanks for calling that out. We'll fix that.

LantaoJin added a commit to LantaoJin/delta that referenced this issue May 27, 2020
@kc-1891
Copy link

kc-1891 commented Aug 31, 2021

Hi, we are facing the same issue 'module not found: io.delta#delta-core_2.12;1.0.0' and we have spark-3.1.2-bin-hadoop3.2
Any help on how do we resolve this issue and run the below command successfully?
pyspark --packages io.delta:delta-core_2.12:1.0.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"

image

@tdas
Copy link
Contributor

tdas commented Aug 31, 2021

"connection reset" sounds like connection issues with maven servers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants