spark-avro 3.2.0 doesn't work with spark 2.2.0 (abstract OutputWriter.write) #240

gnmerritt · 2017-07-12T20:38:41Z

I'm trying to upgrade to the recently released spark 2.2.0. I'm using spark-avro version 3.2.0 and I get the following error when trying to write to an avro file.

org.apache.spark.SparkException: Job aborted.
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:215)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
...
Caused by: java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.OutputWriter.write(Lorg/apache/spark/sql/catalyst/InternalRow;)V
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:327)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:261)
	... 8 more

At first glance it appeared to be related to #208 but on closer inspection the problem here is the OutputWriter.write method and not OutputWriterFactory.getFileExtension which was causing trouble with the spark 2.1.0 upgrade.

Happy to provide more info or help debug, just let me know!

The text was updated successfully, but these errors were encountered:

squito · 2017-07-18T17:05:28Z

I'm seeing this too. I think its from SPARK-19085 apache/spark@b3d3962

You get a compile error if you try to compile after changing the spark version to 2.2.0:

info] Compiling 5 Scala sources to /Users/irashid/github/pub/spark-avro/target/scala-2.11/classes...
[error] /Users/irashid/github/pub/spark-avro/src/main/scala/com/databricks/spark/avro/AvroOutputWriter.scala:41: class AvroOutputWriter needs to be abstract, since method write in class OutputWriter of type (row: org.apache.spark.sql.catalyst.InternalRow)Unit is not defined
[error] (Note that org.apache.spark.sql.catalyst.InternalRow does not match org.apache.spark.sql.Row)
[error] private[avro] class AvroOutputWriter(
[error]                     ^
[error] /Users/irashid/github/pub/spark-avro/src/main/scala/com/databricks/spark/avro/AvroOutputWriter.scala:69: method write overrides nothing.
[error] Note: the super classes of class AvroOutputWriter contain the following, non final members named write:
[error] def write(row: org.apache.spark.sql.catalyst.InternalRow): Unit
[error]   override def write(row: Row): Unit = {
[error]                ^
[error] two errors found
[error] (compile:compileIncremental) Compilation failed

this was discussed in the PR for spark: apache/spark#16479 (comment)

marcintustin · 2017-08-03T22:46:55Z

I'm also hitting this, would love to see the PR merged

sathish-io · 2017-08-09T19:24:06Z

I got the same problem, any fix coming soon ?

gnmerritt · 2017-08-09T19:55:29Z

We're using the fix proposed in #242 and it seems to be working fine. You'll just have to build spark-avro with the patch applied and point your maven/gradle/whatever at the custom build rather than the released version.

ritesh-dineout · 2017-08-15T17:16:01Z

Is there any confirmation when this PR will be merged. We are facing this in production and would appreciate if this is released sooner.

airawat · 2017-08-17T01:50:16Z

We have similar urgency as well. Do share when this PR will be merged.

ljank · 2017-08-22T07:58:17Z

Since the PR has been already merged, is there any roadmap/timeline for the release? Custom build is always an option, but going the clean way feels way better :) Thank you!

nightscape · 2017-08-30T13:04:33Z

If you need it now, you can use the Jitpack build like this:

spark-shell --repositories https://jitpack.io --packages com.github.databricks:spark-avro:204864b6cf

jung-kim · 2017-08-31T19:02:17Z

I understand there are work arounds, but what is blocking the release of this fix?

omervk · 2017-09-03T06:30:47Z

+1 waiting for the release. meanwhile using @nightscape's suggestion.

reflog · 2017-09-04T13:49:13Z

@nightscape's solution works!

mateo41 · 2017-09-06T21:09:03Z

I know other people have asked, but is there a timeline for when the spark-avro release?

rxin · 2017-09-07T13:03:55Z

We want to make a release, although the eng team is pretty swamped at the moment.

For now please use the workaround provided above.

dsfarrar · 2017-09-12T20:35:19Z

+1 waiting for the release. I appreciate the engineering team's time!

geek311 · 2017-09-26T23:01:01Z

I have built the jar from the master branch. It is the 4.0.0 snapshot - what should my sbt dependency be to use this instead of the spark-avro 3.2.0? This error is coming up with Spark 2.2 and Redshift write operations

geek311 · 2017-09-27T14:18:35Z

On Building the jar from the master branch it gives me the spark-avro-assembly-4.0.0-SNAPSHOT.jar and if on the CDH cluster i replace the spark-avro-3.2.0 jar with this newly built jar from the master branch and then spark-submit with --jars having the new assembly jar then it is giving me LinkageError. Can you pls tell me how to solve this? What changes do i need to make in my sbt to point to this assembly jar? I have only passed it in the cluster as --jars

Exception in thread "main" java.lang.LinkageError: loader constraint violation: when resolving method "org.apache.spark.streaming.StreamingContext$.getOrCreate(Ljava/lang/String;Lscala/Function0;Lorg/apache/hadoop/conf/Configuration;Z)Lorg/apache/spark/streaming/StreamingContext;" the class loader (instance of org/apache/spark/util/ChildFirstURLClassLoader) of the current class and the class loader (instance of sun/misc/Launcher$AppClassLoader) for the method's defining class, org/apache/spark/streaming/StreamingContext$, have different Class objects for the type scala/Function0 used in the signature

marcintustin · 2017-09-27T14:28:29Z

You have conflicting versions of scala in the mix. I strongly recommend working with fat jar assemblies to avoid this.

On Wed, Sep 27, 2017 at 10:18 geek311 ***@***.***> wrote: On Building the jar from the master branch it gives me the spark-avro-assembly-4.0.0-SNAPSHOT.jar and if on the CDH cluster i replace the spark-avro-3.2.0 jar with this newly built jar from the master branch and then spark-submit with --jars having the new assembly jar then it is giving me LinkageError. Can you pls tell me how to solve this? What changes do i need to make in my sbt to point to this assembly jar? I have only passed it in the cluster as --jars Exception in thread "main" java.lang.LinkageError: loader constraint violation: when resolving method "org.apache.spark.streaming.StreamingContext$.getOrCreate(Ljava/lang/String;Lscala/Function0;Lorg/apache/hadoop/conf/Configuration;Z)Lorg/apache/spark/streaming/StreamingContext;" the class loader (instance of org/apache/spark/util/ChildFirstURLClassLoader) of the current class and the class loader (instance of sun/misc/Launcher$AppClassLoader) for the method's defining class, org/apache/spark/streaming/StreamingContext$, have different Class objects for the type scala/Function0 used in the signature — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#240 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABLqOCGkFO31nBCT860YZc6zKSSuLbZfks5smllAgaJpZM4OWLCV> .

-- Marcin Tustin Tel: +1 917 553 3974

geek311 · 2017-09-27T17:43:17Z

I have the following in my project/plugins.sbt for building the fat jar.
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.4")

And my sbt has -
"com.databricks" % "spark-redshift_2.11" % "3.0.0-preview1"
"org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.2.0.cloudera1"

And build.sbt has all project.settings set to scalaVersion := 2.11.8

I have put the spark-avro-assembly-4.0.0-SNAPSHOT.jar in the [root]/lib directory and also tried putting it in the sub-project/lib directory as my root project has many sub-projects. Is there anything else I need to do in my sbt to point to my local snapshot jar?

My scala version is 2.11.8. I am using Intellij Idea and I see Scala 2.11.8 jars in my idea libraries but i also do see scala 2.10 in ~/.sbt folder and ~/.ivy2/cache folders. I had cleaned them but they reappear - how I can I fix this? I don't get the linkage error if I don't use the SNAPSHOT jar and just these -
com.databricks.spark-redshift_2.11-3.0.0-preview1.jar
com.databricks.spark-avro_2.11-3.2.0.jar

But then it is back to square one of InternalRow error

geek311 · 2017-09-28T00:24:41Z

I am not getting any linkage error with the published spark-avro 3.2.0 jar while using with redshift 3.0.0-preview1 jar. But on checking out the master branch and building the jar using sbt assembly, and adding it as an unmanaged dependency in sbt root/lib, the LinkageError comes. Any advice? Or when can this fix be published ?

geek311 · 2017-09-28T00:59:45Z

I was able to resolve that error and now I am getting the following - is there something else i need to change in the patched master jar for spark 2.2 CDH 5.10 cluster?

Caused by: java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
at org.apache.avro.file.SnappyCodec.compress(SnappyCodec.java:43)
at org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:361)
at org.apache.avro.file.DataFileWriter.writeBlock(DataFileWriter.java:394)
at org.apache.avro.file.DataFileWriter.sync(DataFileWriter.java:413)
at org.apache.avro.file.DataFileWriter.flush(DataFileWriter.java:422)
at org.apache.avro.file.DataFileWriter.close(DataFileWriter.java:445)
at org.apache.avro.mapreduce.AvroKeyRecordWriter.close(AvroKeyRecordWriter.java:83)
at com.databricks.spark.avro.AvroOutputWriter.close(AvroOutputWriter.scala:84)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.releaseResources(FileFormatWriter.scala:337)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:330)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)

beatlevic · 2017-10-06T07:36:02Z

+1 waiting for the release. Is there anything that is blocking the release?

geek311 · 2017-10-08T16:31:25Z

After days of trying several options, I found a simple workaround. Hope this saves people's tons of time and a lot of frustration!
Solution : Just add this line in the spark redshift write block - .option("tempformat","CSV")
Bypass the default AVRO format that spark-redshift uses and use CSV instead. And viola! everything works great with the released spark-redshift-3.0.0preview1.jar and the released spark-avro jar.
sbt entries are -
"com.databricks" %% "spark-redshift" % "3.0.0-preview1"
"com.databricks" %% "spark-avro" % "3.2.0"

Maynot be super effecient but this one line change saves the botheration of building from master, applying that patch in the cluster (plus other problems that go with it). Atleast temporarily till the jar is released

ryanmickler · 2017-10-12T23:44:14Z

I'm hitting the same error with spark-avro 3.2.0 against spark 2.1.0

pmatpadi · 2017-10-24T17:50:05Z

thanks a lot for the csv workaround @geek311

lokkju · 2017-10-26T14:58:46Z

Hey, question for the engineering team here - how are you handling cross-spark-version compatibility? I'm running into the same issue with a custom OutputWriter implementation, and not sure how to support both Spark 2.1 and Spark 2.2, as this is a breaking change. I don't really want to maintain two branches...

Suggestions as to how you're planning on handling this - or if you're just going to have a min required spark version - would be appreciated.

gatorsmile · 2017-10-26T17:07:27Z

cc @gengliangwang Could you take a look at this issue?

gengliangwang · 2017-10-26T18:01:44Z

We will make a release recently. Sorry for the waiting.

nemo83 · 2017-10-27T13:31:51Z

Hello, thanks for solving this issue. Any ETA on the release?

Thanks!

luckyvaliin · 2017-11-01T14:35:05Z

How can i get the latest release ?

nemo83 · 2017-11-01T14:49:29Z

I think http://search.maven.org/#artifactdetails%7Ccom.databricks%7Cspark-avro_2.11%7C4.0.0%7Cjar is just out!

luckyvaliin · 2017-11-01T15:31:36Z

Yes i got that and its just working great !! Thank you.

gatorsmile · 2017-11-02T17:36:52Z

We will send out a release announcement soon. Thanks!

gnmerritt · 2017-11-08T19:18:06Z

thanks guys

squito mentioned this issue Jul 18, 2017

Spark 2.2.0 Support #242

Merged

jornfranke mentioned this issue Jul 30, 2017

Spark 2.2 support ZuInnoTe/spark-hadoopoffice-ds#6

Closed

lokkju mentioned this issue Oct 26, 2017

[SPARK-19085][SQL] cleanup OutputWriterFactory and OutputWriter apache/spark#16479

Closed

gnmerritt closed this as completed Nov 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-avro 3.2.0 doesn't work with spark 2.2.0 (abstract OutputWriter.write) #240

spark-avro 3.2.0 doesn't work with spark 2.2.0 (abstract OutputWriter.write) #240

gnmerritt commented Jul 12, 2017

squito commented Jul 18, 2017 •

edited

Loading

marcintustin commented Aug 3, 2017

sathish-io commented Aug 9, 2017

gnmerritt commented Aug 9, 2017

ritesh-dineout commented Aug 15, 2017

airawat commented Aug 17, 2017

ljank commented Aug 22, 2017 •

edited

Loading

nightscape commented Aug 30, 2017

jung-kim commented Aug 31, 2017

omervk commented Sep 3, 2017

reflog commented Sep 4, 2017

mateo41 commented Sep 6, 2017

rxin commented Sep 7, 2017

dsfarrar commented Sep 12, 2017

geek311 commented Sep 26, 2017 •

edited

Loading

geek311 commented Sep 27, 2017

marcintustin commented Sep 27, 2017 via email

geek311 commented Sep 27, 2017

geek311 commented Sep 28, 2017

geek311 commented Sep 28, 2017

beatlevic commented Oct 6, 2017

geek311 commented Oct 8, 2017

ryanmickler commented Oct 12, 2017

pmatpadi commented Oct 24, 2017 •

edited

Loading

lokkju commented Oct 26, 2017

gatorsmile commented Oct 26, 2017

gengliangwang commented Oct 26, 2017

nemo83 commented Oct 27, 2017

luckyvaliin commented Nov 1, 2017

nemo83 commented Nov 1, 2017

luckyvaliin commented Nov 1, 2017

gatorsmile commented Nov 2, 2017 •

edited

Loading

gnmerritt commented Nov 8, 2017

spark-avro 3.2.0 doesn't work with spark 2.2.0 (abstract OutputWriter.write) #240

spark-avro 3.2.0 doesn't work with spark 2.2.0 (abstract OutputWriter.write) #240

Comments

gnmerritt commented Jul 12, 2017

squito commented Jul 18, 2017 • edited Loading

marcintustin commented Aug 3, 2017

sathish-io commented Aug 9, 2017

gnmerritt commented Aug 9, 2017

ritesh-dineout commented Aug 15, 2017

airawat commented Aug 17, 2017

ljank commented Aug 22, 2017 • edited Loading

nightscape commented Aug 30, 2017

jung-kim commented Aug 31, 2017

omervk commented Sep 3, 2017

reflog commented Sep 4, 2017

mateo41 commented Sep 6, 2017

rxin commented Sep 7, 2017

dsfarrar commented Sep 12, 2017

geek311 commented Sep 26, 2017 • edited Loading

geek311 commented Sep 27, 2017

marcintustin commented Sep 27, 2017 via email

geek311 commented Sep 27, 2017

geek311 commented Sep 28, 2017

geek311 commented Sep 28, 2017

beatlevic commented Oct 6, 2017

geek311 commented Oct 8, 2017

ryanmickler commented Oct 12, 2017

pmatpadi commented Oct 24, 2017 • edited Loading

lokkju commented Oct 26, 2017

gatorsmile commented Oct 26, 2017

gengliangwang commented Oct 26, 2017

nemo83 commented Oct 27, 2017

luckyvaliin commented Nov 1, 2017

nemo83 commented Nov 1, 2017

luckyvaliin commented Nov 1, 2017

gatorsmile commented Nov 2, 2017 • edited Loading

gnmerritt commented Nov 8, 2017

squito commented Jul 18, 2017 •

edited

Loading

ljank commented Aug 22, 2017 •

edited

Loading

geek311 commented Sep 26, 2017 •

edited

Loading

pmatpadi commented Oct 24, 2017 •

edited

Loading

gatorsmile commented Nov 2, 2017 •

edited

Loading