Error while using spark-redshift jar #315

ghost · 2016-12-29T07:13:19Z

Hi,

Getting the below error while using the jar to integrate redshift with spark locally.

Exception in thread "main" java.lang.AbstractMethodError: com.databricks.spark.redshift.RedshiftFileFormat.prepareRead(Lorg/apache/spark/sql/SparkSession;Lscala/collection/immutable/Map;Lscala/collection/Seq;)Lscala/collection/immutable/Map;

at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:160)
	at com.databricks.spark.redshift.RedshiftRelation.buildScan(RedshiftRelation.scala:168)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$3.apply(DataSourceStrategy.scala:141)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$3.apply(DataSourceStrategy.scala:141)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:184)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:183)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:257)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:179)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:137)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:60)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:55)
	at org.apache.spark.sql.execution.SparkStrategies$SpecialLimits$.apply(SparkStrategies.scala:54)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:60)
	at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:77)
	at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:75)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:82)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:82)
	at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2462)
	at org.apache.spark.sql.Dataset.head(Dataset.scala:1861)
	at org.apache.spark.sql.Dataset.take(Dataset.scala:2078)
	at org.apache.spark.sql.Dataset.showString(Dataset.scala:240)
	at org.apache.spark.sql.Dataset.show(Dataset.scala:533)
	at org.apache.spark.sql.Dataset.show(Dataset.scala:493)
	at org.apache.spark.sql.Dataset.show(Dataset.scala:502)
	at simpleSample.RedshiftToSpark$.main(RedshiftToSpark.scala:53)
	at simpleSample.RedshiftToSpark.main(RedshiftToSpark.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

I find that prepareRead method is not in the RedshiftFileFormat.

Thanks & Regards,
Ravi

The text was updated successfully, but these errors were encountered:

JoshRosen · 2016-12-29T18:09:30Z

Which version of Spark are you using? If you're using 2.1.x then I suspect that changes to internal APIs may have broke spark-redshift, in which case we'll need to make a new release.

JoshRosen · 2016-12-29T18:13:25Z

Actually, looking a little more closely since this problem relates to prepareRead I don't think it's a 2.1.x issue since that method had been completely removed from Spark by that point (see apache/spark#13698). According to https://issues.apache.org/jira/browse/SPARK-15983 that change went into 2.0.

Thus: are you using a newer version of spark-redshift with Spark 1.x? You'll need to use a 1.x version of this library with Spark 1.x; newer versions won't work with Spark 1.x.

lminer · 2017-01-06T19:07:13Z

I'm getting the same exception with a different stack trace and only when I switch from spark 2.0.1 to spark 2.1.0/hadoop 2.7/mesos/spark-redshift_2.11-2.0.1.jar/RedshiftJDBC41-1.1.17.1017.jar

48f7-81e8-02403dbc2b57-S107): java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.OutputWriterFactory.getFileExtension(Lorg/apache/hadoop/mapreduce/TaskAttemptContext;)Ljava/lang/String;
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.<init>(FileFormatWriter.scala:232)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:182)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

schwartzmx · 2017-01-10T04:40:24Z

I'm getting this error as well, with spark 2.1.0, I've also tried using the 3.0.0-preview1 of this library, previously was using 2.0.0.

java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.OutputWriterFactory.getFileExtension(Lorg/apache/hadoop/mapreduce/TaskAttemptContext;)Ljava/lang/String;
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.<init>(FileFormatWriter.scala:232)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:182)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Edit: Here's a bit bigger stack trace that may help.

17/01/09 22:45:34 ERROR FileFormatWriter: Aborting job null.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 1.0 failed 1 times, most recent failure: Lost task 5.0 in stage 1.0 (TID 6, localhost, executor driver): java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.OutputWriterFactory.getFileExtension(Lorg/apache/hadoop/mapreduce/TaskAttemptContext;)Ljava/lang/String;
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.<init>(FileFormatWriter.scala:232)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:182)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:127)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:121)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:121)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:121)
	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:101)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
	at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:492)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:198)
	at com.databricks.spark.redshift.RedshiftWriter.unloadData(RedshiftWriter.scala:295)
	at com.databricks.spark.redshift.RedshiftWriter.saveToRedshift(RedshiftWriter.scala:392)
	at com.databricks.spark.redshift.DefaultSource.createRelation(DefaultSource.scala:108)
	at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
	at org.lucidhq.SFRedshiftETL.SFObject.redshiftLoad(SFObject.scala:115)
	at org.lucidhq.SFRedshiftETL.SFObject.load(SFObject.scala:256)
	at org.lucidhq.SFRedshiftETL.SFRedshiftETL$$anonfun$run$1.apply(main.scala:61)
	at org.lucidhq.SFRedshiftETL.SFRedshiftETL$$anonfun$run$1.apply(main.scala:44)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.lucidhq.SFRedshiftETL.SFRedshiftETL$.run(main.scala:44)
	at org.lucidhq.SFRedshiftETL.SFRedshiftETL$$anonfun$main$1.apply(main.scala:83)
	at org.lucidhq.SFRedshiftETL.SFRedshiftETL$$anonfun$main$1.apply(main.scala:83)
	at scala.Option.map(Option.scala:146)
	at org.lucidhq.SFRedshiftETL.SFRedshiftETL$.main(main.scala:83)
	at org.lucidhq.SFRedshiftETL.SFRedshiftETL.main(main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.OutputWriterFactory.getFileExtension(Lorg/apache/hadoop/mapreduce/TaskAttemptContext;)Ljava/lang/String;
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.<init>(FileFormatWriter.scala:232)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:182)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

lminer · 2017-01-13T22:10:31Z

@JoshRosen Any plans to make a new release soon? Seems like it's needed to use this with 2.1.0.

elyast · 2017-01-20T00:01:28Z

@JoshRosen hit the same issue after upgrading from Spark 2.0.2 to Spark 2.1.0 our pipeline started throwing exceptions with the same cause

Caused by: java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.OutputWriterFactory.getFileExtension(Lorg/apache/hadoop/mapreduce/TaskAttemptContext;)Ljava/lang/String;

We are using spark-redsfhit 2.0.1 with https://s3.amazonaws.com/redshift-downloads/drivers/RedshiftJDBC41-1.1.17.1017.jar

carlos-eduardo-gb · 2017-01-20T13:36:57Z

@elyast hit the same issue using spark 2.1.0.

I make this question in Stackoverflow

Using the version 2.0.2 of Spark you have the same issue? I'm not able to make the spark-redshift work in 2.0.2, if possible a help will be useful.

elyast · 2017-01-20T19:17:50Z

found the root cause, spark 2.1 added new method to the interface:
org.apache.spark.sql.execution.datasources.OutputWriterFactory#def getFileExtension(context: TaskAttemptContext): String

which is not implemented in spark-avro, hence AbstractMethodError

apurva-sharma · 2017-01-29T19:13:43Z

Ran into the same issue with spark 2.1.0 , is there a work around (besides bumping the spark version down?).

elyast · 2017-01-30T02:47:32Z

@apurva-sharma you can build this patch: databricks/spark-avro#206 and replace spark-avro dependency with this custom version, at least it worked for us

apurva-sharma · 2017-01-30T22:58:07Z

@elyast thanks for that, I can verify that monkey patching spark-avro as above worked for me with spark 2.1.0
It will be great if this is merged.

elyast · 2017-01-30T22:59:10Z

@apurva-sharma +1

alexander-branevskiy · 2017-02-08T05:20:10Z

looks like spark-avro was fixed. any updates here?

sanketvega · 2017-02-13T07:24:49Z

any updates when this issue will be fixed?

diegorep · 2017-02-13T19:55:03Z

^ @JoshRosen

caeleth · 2017-02-24T18:05:52Z

Atm this driver is completely unusable ...

hnfmr · 2017-02-25T04:16:13Z

Fixed mine by adding this line to sbt project build.sbt:

dependencyOverrides += spark_avro_320
where
val spark_avro_320: ModuleID = "com.databricks" % "spark-avro_2.11" % "3.2.0"

I am using spark-redshift 3 btw...

Hopefully this library can be actively supported in the long run, it looks like it has not been updated for several months....

mrdmnd · 2017-03-09T08:55:42Z

I've tried what @hnfmr suggests, but I am still running into this issue.

hnfmr · 2017-03-09T09:04:32Z

@mrdmnd To be specific, I am using the Spark-Redshift v3.0.0-preview1 and my build.sbt looks like:

lazy val app = (project in file("app")).
  .settings(commonSettings: _*)
  .settings(
    libraryDependencies += "com.databricks" % "spark-redshift_2.11" % "3.0.0-preview1",
    dependencyOverrides += "com.databricks" % "spark-avro_2.11" % "3.2.0"
  )
)

BTW, I am using Spark 2.1.0... hope this helps

wafisher · 2017-03-14T22:19:58Z

@elyast Can you please describe what you did? My guess:

Clone the spark-avro repo and checkout the commit of that PR (post-merge).
Build the jar.
Use SBT to use this jar. (Do you know how to do this offhand?)

Thank you!

sadowski · 2017-03-14T23:20:00Z

Also seeing this issue here. @hnfmr's fix is working for me now, but it would be nice to have this properly fixed. Spark is a popular tool and Redshift usage is only going to grow.

Exact workaround was to add the following to my build.sbt file:

// Temporary fix for: https://github.com/databricks/spark-redshift/issues/315
dependencyOverrides += "com.databricks" % "spark-avro_2.11" % "3.2.0"

mrdmnd · 2017-03-22T07:24:57Z

Yeah, I had a minor typo. Can confirm that this works.

cockroachzl · 2017-03-23T06:26:46Z

I use Zeppelin to do ETL to redshift and encountered the same AbstractMethodError.

By configuring the spark interpreter to exclude com.databricks:spark-avro_2.11:3.0.0 while depending on com.databricks:spark-redshift_2.11:2.0.1, and then to specify another dependency on com.databricks:spark-avro_2.11:3.2.0 works for me

Thanks a lot!

Aung-Myint-Thein · 2017-04-12T10:12:01Z

Yes! Just update or replace spark-avro_2.11-3.1.0.jar with spark-avro_2.11-3.2.0.jar and this problem should be solved now.

https://mvnrepository.com/artifact/com.databricks/spark-avro_2.11/3.2.0

cshintov · 2017-04-24T15:50:54Z

HI, I have got the same problem.
I am using spark 2.1.0 and tried using spark-redshift 3.0.0-preview1 and 2.0.1, 2.0.0. All of them gives the same error.

java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.OutputWriterFactory.getFileExtension(Lorg/apache/hadoop/mapreduce/T$
skAttemptContext;)Ljava/lang/String;
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.(FileFormatWriter.scala:232)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeT$
sk(FileFormatWriter.scala:182)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/04/24 21:12:15 ERROR TaskSetManager: Task 1 in stage 2.0 failed 1 times; aborting job
17/04/24 21:12:15 ERROR FileFormatWriter: Aborting job null.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 2.0 failed 1 times, most recent failure: Lost task 1.0 in s
tage 2.0 (TID 202, localhost, executor driver): java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.OutputWriterFactory.
getFileExtension(Lorg/apache/hadoop/mapreduce/TaskAttemptContext;)Ljava/lang/String;
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.(FileFormatWriter.scala:232)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTa
sk(FileFormatWriter.scala:182)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

giaosudau · 2017-06-13T08:08:34Z

I have the same problem and I am using code in spark branch 2.2. Spark avro was spark-avro_2.11-3.2.0.jar already.

Caused by: java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.OutputWriter.write(Lorg/apache/spark/sql/catalyst/InternalRow;)V
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:318)
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:249)
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247)
  at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:252)

davidzhao · 2017-06-17T06:37:51Z

Any updates on this one? It seems that the underlying dependency (spark-avro_2.11-3.2.0) has resolved this issue. Instead of having everyone depend on the workaround, could the owner release a version that depends on the 3.2.0 of spark-avro?

schwartzmx · 2017-06-20T19:07:50Z

It seems this issue and repo are getting stale, would love to have this updated. @JoshRosen would it be possible to open this up to new contributors?

tylermichael · 2017-08-18T20:24:05Z

Any updates on this? I'm using this through pyspark and am unable to try the work arounds suggested.

dnaumenko · 2017-08-25T11:18:44Z

Looks like this issue is going to be fixed in next version of spark-avro lib - databricks/spark-avro#242. It's merged to master 8 days ago

pmatpadi · 2017-09-28T01:05:39Z

Thanks for the hint on updating spark-avro dependency version.

I resolved this issue with below spark-submit command in AWS EMR environment:
spark-submit --deploy-mode cluster --class <main_class> --packages com.databricks:spark-redshift_2.11:3.0.0-preview1,com.databricks:spark-avro_2.11:3.2.0,com.amazon.redshift:redshift-jdbc42:1.2.8.1005 --repositories http://redshift-maven-repository.s3-website-us-east-1.amazonaws.com/release s3://<path_to_my_spark_application_jar>

marcintustin · 2018-01-11T23:22:12Z

I've updated to spark-avro 4.0.0 and I still have this issue.

zhassanbey · 2018-02-15T22:17:30Z

I have faced the same exception with spark_2.11 v2.1.1.
The reason was that my project depended on a custom library which in turn depended on spark-avro v4.0.0 artifact in 'compileOnly' scope. Therefore the spark-avro dependency wasn't really propagating to my project. After I added spark-avro v4.0.0 dependency explicitly the problem resolved.

schwartzmx · 2018-02-16T19:37:31Z

I wouldn't expect this to be properly fixed, it seems databricks has decided to not update this library anymore outside of their own Databricks Runtime which as far as I can tell requires you to be using their entire platform, 717a4ad#diff-04c6e90faac2675aa89e2176d2eec7d8

Renien · 2018-03-09T09:36:25Z

@hnfmr Thanks for the hint. Even I faced the same issue while running ALS model in GCP and storing the output using com.databricks.spark.csv. Initially I was using com.databricks.spark.csv - 1.2.0 with Spark 2.2.0 and issue occurred. I've updated with latest version 1.5.0 and solved my issue.

yggowda · 2018-11-02T00:29:12Z

I was using older version of spark-redshift_2.11 changed to 3.0.0..preview and it started working

schwartzmx · 2018-11-02T02:34:52Z

If anyone is still having issues and wants to collab on this, I forked both this connector and spark-avro, we can get a working group around fixing these. I think this library & spark-avro are dead from an open-source continued support/contributor perspective (outside of the Databricks Runtime 717a4ad#diff-04c6e90faac2675aa89e2176d2eec7d8) as the last commits are 2+ years old aside from README updates.

We would be required to adhere to the licensing (Apache 2.0), via NOTICE and some other means.
If interested in some collab/updates, email me @schwartzmx@gmail.com

Love the library @databricks but many people use this connector and library, I asked if this could be opened up to collaborators and didn't hear a response in over a year. It seems like it was quietly moved to closed source, which is understandable.. from a business perspective.

Thanks for all the initial work on this, has helped a ton of people out to begin with and I personally have built many ETLs and data analysis tools from using this connector.

Cheers,
Phil

JoshRosen added the stale / awaiting update label Dec 30, 2016

jkommeren mentioned this issue May 18, 2017

Support for Spark 2.1.0 ZuInnoTe/spark-hadoopoffice-ds#2

Closed

autodidacticon mentioned this issue Aug 28, 2017

Spark 2.2 Abstract Method Error #360

Closed

schwartzmx mentioned this issue Nov 2, 2018

Alternatives or Active Forks? #382

Open

Error while using spark-redshift jar #315

Error while using spark-redshift jar #315

Comments

ghost commented Dec 29, 2016 • edited by JoshRosen Loading

JoshRosen commented Dec 29, 2016 • edited Loading

JoshRosen commented Dec 29, 2016

lminer commented Jan 6, 2017 • edited Loading

schwartzmx commented Jan 10, 2017 • edited Loading

lminer commented Jan 13, 2017

elyast commented Jan 20, 2017

carlos-eduardo-gb commented Jan 20, 2017

elyast commented Jan 20, 2017

apurva-sharma commented Jan 29, 2017

elyast commented Jan 30, 2017

apurva-sharma commented Jan 30, 2017

elyast commented Jan 30, 2017

alexander-branevskiy commented Feb 8, 2017

sanketvega commented Feb 13, 2017

diegorep commented Feb 13, 2017

caeleth commented Feb 24, 2017

hnfmr commented Feb 25, 2017

mrdmnd commented Mar 9, 2017

hnfmr commented Mar 9, 2017

wafisher commented Mar 14, 2017

sadowski commented Mar 14, 2017

mrdmnd commented Mar 22, 2017

cockroachzl commented Mar 23, 2017

Aung-Myint-Thein commented Apr 12, 2017

cshintov commented Apr 24, 2017

giaosudau commented Jun 13, 2017

davidzhao commented Jun 17, 2017

schwartzmx commented Jun 20, 2017

tylermichael commented Aug 18, 2017

dnaumenko commented Aug 25, 2017

pmatpadi commented Sep 28, 2017

marcintustin commented Jan 11, 2018

zhassanbey commented Feb 15, 2018

schwartzmx commented Feb 16, 2018

Renien commented Mar 9, 2018

yggowda commented Nov 2, 2018

schwartzmx commented Nov 2, 2018 • edited Loading

ghost commented Dec 29, 2016 •

edited by JoshRosen

Loading

JoshRosen commented Dec 29, 2016 •

edited

Loading

lminer commented Jan 6, 2017 •

edited

Loading

schwartzmx commented Jan 10, 2017 •

edited

Loading

schwartzmx commented Nov 2, 2018 •

edited

Loading