Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

org.apache.spark.sql.delta.sources.DeltaDataSource could not be instantiated #6

Closed
saj1th opened this issue Apr 24, 2019 · 4 comments
Closed

Comments

@saj1th
Copy link

saj1th commented Apr 24, 2019

Trying to run

./spark-shell --packages io.delta:delta-core_2.11:0.1.0

&

val df = spark.read.format("delta").load(deltaPath)

This error get thrown

java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.delta.sources.DeltaDataSource could not be instantiated
  at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:581)
  at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:803)
  at java.base/java.util.ServiceLoader$ProviderImpl.get(ServiceLoader.java:721)
  at java.base/java.util.ServiceLoader$3.next(ServiceLoader.java:1394)
  at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44)
  at scala.collection.Iterator.foreach(Iterator.scala:941)
  at scala.collection.Iterator.foreach$(Iterator.scala:941)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
  at scala.collection.IterableLike.foreach(IterableLike.scala:74)
  at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
  at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:250)
  at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:248)
  at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108)
  at scala.collection.TraversableLike.filter(TraversableLike.scala:262)
  at scala.collection.TraversableLike.filter$(TraversableLike.scala:262)
  at scala.collection.AbstractTraversable.filter(Traversable.scala:108)
  at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
  ... 49 elided
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class
  at org.apache.spark.sql.delta.sources.DeltaDataSource.<init>(DeltaDataSource.scala:42

Environment

Scala 2.11.12
Spark 2.4.2
@zsxwing
Copy link
Member

zsxwing commented Apr 24, 2019

This error is because your Spark is built with Scala 2.12 but the delta-core jar you are using is built with Scala 2.11. If you use the Scala 2.12 version of delta-core like this, it should work:

./spark-shell --packages io.delta:delta-core_2.12:0.1.0

@saj1th
Copy link
Author

saj1th commented Apr 24, 2019

Cool! Thanks!

@saj1th saj1th closed this as completed Apr 24, 2019
@tdas
Copy link
Contributor

tdas commented Apr 24, 2019

For more info, you can verify which version of Scala is Spark running with by looking at the startup graphics of the Spark/Pyspark shell.

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.2
      /_/

Using Scala version 2.12.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)

Note the Scala version here ^^^ whether it is 2.11 or 2.12. You should use the same version in Delta core, either --packages io.delta:delta-core_2.12:0.1.0 or --packages io.delta:delta-core_2.12:0.1.0.

@xctom
Copy link

xctom commented May 31, 2019

@zsxwing Hi I also met this error.

I ran pyspark --packages io.delta:delta-core_2.12:0.1.0

But got the same error

Python 2.7.16 (default, Apr 12 2019, 15:32:40)
[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.46.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Ivy Default Cache set to: /Users/xuc/.ivy2/cache
The jars for the packages stored in: /Users/xuc/.ivy2/jars
:: loading settings :: url = jar:file:/usr/local/lib/python2.7/site-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
io.delta#delta-core_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-fedfbd0b-ef8a-4239-b4d1-3a1af140aa07;1.0
	confs: [default]
	found io.delta#delta-core_2.12;0.1.0 in central
:: resolution report :: resolve 145ms :: artifacts dl 4ms
	:: modules in use:
	io.delta#delta-core_2.12;0.1.0 from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   1   |   0   |   0   |   0   ||   1   |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-fedfbd0b-ef8a-4239-b4d1-3a1af140aa07
	confs: [default]
	0 artifacts copied, 1 already retrieved (0kB/5ms)
19/05/30 22:43:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.3
      /_/

Using Python version 2.7.16 (default, Apr 12 2019 15:32:40)
SparkSession available as 'spark'.
>>> data = spark.range(0, 5)

>>>
>>> data.write.format("delta").save("/tmp/delta-table")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pyspark/sql/readwriter.py", line 734, in save
    self._jwrite.save(path)
  File "/usr/local/lib/python2.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/local/lib/python2.7/site-packages/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/local/lib/python2.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o37.save.
: java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.delta.sources.DeltaDataSource could not be instantiated
	at java.util.ServiceLoader.fail(ServiceLoader.java:232)
	at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
	at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
	at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
	at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
	at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
	at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
	at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
	at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:245)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: org.apache.spark.internal.Logging.$init$(Lorg/apache/spark/internal/Logging;)V
	at org.apache.spark.sql.delta.sources.DeltaDataSource.<init>(DeltaDataSource.scala:42)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at java.lang.Class.newInstance(Class.java:442)
	at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
	... 24 more

The version for scala and java are here. I already have scala version to be 2.12.8 so not sure what happened here:

java -version
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode)
scala -version
Scala code runner version 2.12.8 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.

LantaoJin added a commit to LantaoJin/delta that referenced this issue Mar 24, 2020
rymurr pushed a commit to rymurr/delta that referenced this issue Sep 30, 2020
rymurr pushed a commit to rymurr/delta that referenced this issue Nov 3, 2020
rymurr pushed a commit to rymurr/delta that referenced this issue Nov 3, 2020
rymurr pushed a commit to rymurr/delta that referenced this issue Aug 25, 2021
* Parse file metadata as a separate task
* change version to distinguish this branch
* log store chooses where checkpoitns go (delta-io#6)
* handle snapshot names (delta-io#9)

Signed-off-by: Ryan Murray rymurr@gmail.com
rymurr pushed a commit to rymurr/delta that referenced this issue Aug 25, 2021
* Parse file metadata as a separate task
* change version to distinguish this branch
* log store chooses where checkpoitns go (delta-io#6)
* handle snapshot names (delta-io#9)
jbguerraz pushed a commit to jbguerraz/delta that referenced this issue Jul 6, 2022
tdas pushed a commit to tdas/delta that referenced this issue May 31, 2023
Implement HiveOnDelta with StorageHandler
**DDL:**
```
create external table deltaTbl(a string, b int)
stored by 'io.delta.hive.DeltaStorageHandler'
location '/test/delta'
```
- must be external table
- must not be a Hive partition table
- if DeltaTable is a partitionTable, then the partition column should be after data column when creating Hive table
- Hive's schema should be match with the under delta'schema ,including column number &column name
- the delta.table.path should be existed

**Read:**
`set hive.input.format = io.delta.hive.HiveInputFormat`
- support read a non-partition or a partition table
- support push down filter with delta's partition column, currently support predicate (=,!=,>,>=,<,<=,in,like)
- auto-detected delta's partition change

**Unit Tests:**
 - Added(`build/sbt clean test`)
 - `build/sbt clean package` test ok in real Hive Cluster using delta-core-shaded-assembly-0.4.0.jar and hive-delta_2.12-0.4.0.jar
andreaschat-db added a commit to andreaschat-db/delta that referenced this issue Apr 23, 2024
# This is the 1st commit message:

flush

# This is the commit message delta-io#2:

flush

# This is the commit message delta-io#3:

First sane version without isRowDeleted

# This is the commit message delta-io#4:

Hack RowIndexMarkingFilters

# This is the commit message delta-io#5:

Add support for non-vectorized readers

# This is the commit message delta-io#6:

Metadata column fix
andreaschat-db added a commit to andreaschat-db/delta that referenced this issue Apr 23, 2024
# This is the 1st commit message:

flush

# This is the commit message delta-io#2:

flush

# This is the commit message delta-io#3:

First sane version without isRowDeleted

# This is the commit message delta-io#4:

Hack RowIndexMarkingFilters

# This is the commit message delta-io#5:

Add support for non-vectorized readers

# This is the commit message delta-io#6:

Metadata column fix

# This is the commit message delta-io#7:

Avoid non-deterministic UDF to filter deleted rows

# This is the commit message delta-io#8:

metadata with Expression ID

# This is the commit message delta-io#9:

Fix complex views issue

# This is the commit message delta-io#10:

Tests

# This is the commit message delta-io#11:

cleaning

# This is the commit message delta-io#12:

More tests and fixes
andreaschat-db added a commit to andreaschat-db/delta that referenced this issue Apr 23, 2024
# This is the 1st commit message:

flush

# This is the commit message delta-io#2:

flush

# This is the commit message delta-io#3:

First sane version without isRowDeleted

# This is the commit message delta-io#4:

Hack RowIndexMarkingFilters

# This is the commit message delta-io#5:

Add support for non-vectorized readers

# This is the commit message delta-io#6:

Metadata column fix

# This is the commit message delta-io#7:

Avoid non-deterministic UDF to filter deleted rows

# This is the commit message delta-io#8:

metadata with Expression ID

# This is the commit message delta-io#9:

Fix complex views issue

# This is the commit message delta-io#10:

Tests

# This is the commit message delta-io#11:

cleaning

# This is the commit message delta-io#12:

More tests and fixes

# This is the commit message delta-io#13:

Partial cleaning

# This is the commit message delta-io#14:

cleaning and improvements

# This is the commit message delta-io#15:

cleaning and improvements

# This is the commit message delta-io#16:

Clean RowIndexFilter
andreaschat-db added a commit to andreaschat-db/delta that referenced this issue Apr 26, 2024
# This is the 1st commit message:

flush

# This is the commit message delta-io#2:

flush

# This is the commit message delta-io#3:

First sane version without isRowDeleted

# This is the commit message delta-io#4:

Hack RowIndexMarkingFilters

# This is the commit message delta-io#5:

Add support for non-vectorized readers

# This is the commit message delta-io#6:

Metadata column fix
andreaschat-db added a commit to andreaschat-db/delta that referenced this issue Apr 26, 2024
# This is the 1st commit message:

flush

# This is the commit message delta-io#2:

flush

# This is the commit message delta-io#3:

First sane version without isRowDeleted

# This is the commit message delta-io#4:

Hack RowIndexMarkingFilters

# This is the commit message delta-io#5:

Add support for non-vectorized readers

# This is the commit message delta-io#6:

Metadata column fix

# This is the commit message delta-io#7:

Avoid non-deterministic UDF to filter deleted rows

# This is the commit message delta-io#8:

metadata with Expression ID

# This is the commit message delta-io#9:

Fix complex views issue

# This is the commit message delta-io#10:

Tests

# This is the commit message delta-io#11:

cleaning

# This is the commit message delta-io#12:

More tests and fixes
andreaschat-db added a commit to andreaschat-db/delta that referenced this issue Apr 26, 2024
# This is the 1st commit message:

flush

# This is the commit message delta-io#2:

flush

# This is the commit message delta-io#3:

First sane version without isRowDeleted

# This is the commit message delta-io#4:

Hack RowIndexMarkingFilters

# This is the commit message delta-io#5:

Add support for non-vectorized readers

# This is the commit message delta-io#6:

Metadata column fix

# This is the commit message delta-io#7:

Avoid non-deterministic UDF to filter deleted rows

# This is the commit message delta-io#8:

metadata with Expression ID

# This is the commit message delta-io#9:

Fix complex views issue

# This is the commit message delta-io#10:

Tests

# This is the commit message delta-io#11:

cleaning

# This is the commit message delta-io#12:

More tests and fixes

# This is the commit message delta-io#13:

Partial cleaning

# This is the commit message delta-io#14:

cleaning and improvements

# This is the commit message delta-io#15:

cleaning and improvements

# This is the commit message delta-io#16:

Clean RowIndexFilter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants