Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample example which does not works #53

Open
fabiofumarola opened this issue Jan 10, 2016 · 15 comments
Open

Sample example which does not works #53

fabiofumarola opened this issue Jan 10, 2016 · 15 comments

Comments

@fabiofumarola
Copy link

The library looks interesting. I tried a simple example with a sample app but I got the following error

[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 3.0 failed 1 times, most recent failure: Lost task 5.0 in stage 3.0 (TID 29, localhost): java.lang.NullPointerException
[error]     at com.tribbloids.spookystuff.utils.Utils$.uriSlash(Utils.scala:55)
[error]     at com.tribbloids.spookystuff.utils.Utils$$anonfun$uriConcat$1.apply(Utils.scala:49)
[error]     at com.tribbloids.spookystuff.utils.Utils$$anonfun$uriConcat$1.apply(Utils.scala:48)
[error]     at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
[error]     at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
[error]     at com.tribbloids.spookystuff.utils.Utils$.uriConcat(Utils.scala:48)
[error]     at com.tribbloids.spookystuff.pages.PageUtils$.autoRestore(PageUtils.scala:183)
[error]     at com.tribbloids.spookystuff.actions.TraceView$$anonfun$4.apply(TraceView.scala:95)
[error]     at com.tribbloids.spookystuff.actions.TraceView$$anonfun$4.apply(TraceView.scala:95)
[error]     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
[error]     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
[error]     at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
[error]     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
[error]     at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
[error]     at scala.collection.AbstractTraversable.map(Traversable.scala:105)
[error]     at com.tribbloids.spookystuff.actions.TraceView.fetchOnce(TraceView.scala:95)
[error]     at com.tribbloids.spookystuff.actions.TraceView$$anonfun$2.apply(TraceView.scala:83)
[error]     at com.tribbloids.spookystuff.actions.TraceView$$anonfun$2.apply(TraceView.scala:83)
[error]     at scala.util.Try$.apply(Try.scala:161)
[error]     at com.tribbloids.spookystuff.utils.Utils$.retry(Utils.scala:22)
[error]     at com.tribbloids.spookystuff.actions.TraceView.fetch(TraceView.scala:82)
[error]     at com.tribbloids.spookystuff.sparkbinding.PageRowRDD$$anonfun$26.apply(PageRowRDD.scala:491)
[error]     at com.tribbloids.spookystuff.sparkbinding.PageRowRDD$$anonfun$26.apply(PageRowRDD.scala:490)
[error]     at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
[error]     at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
[error]     at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
[error]     at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
[error]     at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
[error]     at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
[error]     at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
[error]     at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
[error]     at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
[error]     at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
[error]     at org.apache.spark.scheduler.Task.run(Task.scala:88)
[error]     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
[error]     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[error]     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[error]     at java.lang.Thread.run(Thread.java:745)

The application is pretty simple

object SimpleApp {

  def main(args: Array[String]) {
    val conf = new SparkConf().setMaster("local[*]").setAppName("Test")
    val sc = new SparkContext(conf)
    val spooky = new com.tribbloids.spookystuff.SpookyContext(sc)
    import spooky.dsl._

    val df = spooky.wget("https://news.google.com/?output=rss&q=barack%20obama"
    ).join(S"item title".texts)(
      Wget(x"http://api.mymemory.translated.net/get?q=${'A}&langpair=en|fr")
    )('A ~ 'title, S"translatedText".text ~ 'translated).toDF()


    val csv = df.toCSV()

    csv.foreach(println)
  }
}

Do you have any ideas?

@fabiofumarola
Copy link
Author

Could you be interested in help for the library development?

@DominikRoy
Copy link

DominikRoy commented Aug 18, 2016

Hallo, I have a similar issue. I tried to run the sample app, but I got the following error:

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.Accumulator.<init>(Ljava/lang/Object;Lorg/apache/spark/AccumulatorParam;Lscala/Option;)V
    at com.tribbloids.spookystuff.Metrics$.accumulator(SpookyContext.scala:20)
    at com.tribbloids.spookystuff.Metrics$.$lessinit$greater$default$1(SpookyContext.scala:25)
    at com.tribbloids.spookystuff.SpookyContext.<init>(SpookyContext.scala:68)
    at com.tribbloids.spookystuff.SpookyContext.<init>(SpookyContext.scala:72)
    at FTest$.main(FTest.scala:15)
    at FTest.main(FTest.scala)
16/08/18 11:48:16 INFO SparkContext: Invoking stop() from shutdown hook
16/08/18 11:48:16 INFO SparkUI: Stopped Spark web UI at http://127.0.1.1:4040

The application has the following code:

object FTest {
    def main(args: Array[String]) {
    //val logFile = "/home/ait/spark/README.md" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application").setMaster("local[*]")
    val sc = new SparkContext(conf)
    assert(sc.parallelize(1 to 100).reduce(_ + _) == 5050)
    val spooky = new SpookyContext(sc)
    import spooky.dsl._
    spooky.wget("https://news.google.com/?output=rss&q=barack%20obama").join(S"item title".texts)(
      Wget(x"http://api.mymemory.translated.net/get?q=${'A}&langpair=en|fr"))('A ~ 'title, S"translatedText".text ~ 'translated).toDF()
  }
}

Could it be because of a wrong configuration?
Furthermore, I loaded all the jar files I need in my IDE, so spark should be working. So can you help me or give me hint why this error occured?

Thx in advance.

@fahadsiddiqui
Copy link

Hello @DominikRoy this seems to me a version incompatibility. Use Spark dependencies of correct versions supported by your spookystuff version.

@nimbusgo
Copy link

nimbusgo commented Nov 15, 2017

I'm getting the same error and I'm wondering what version of spark I should be using? I don't see this specified in the documentation.

Currently I'm trying spark 1.6.2 with scala 2.10.5 trying to use com.tribbloids.spookystuff:spookystuff-core:0.3.2

Have also tried with spark 2.1.1 (scala 2.11) but that broke even sooner.

what version of spark works?

also get the same error with spark 1.5.1

@tribbloid
Copy link
Owner

tribbloid commented Nov 15, 2017 via email

@nimbusgo
Copy link

nimbusgo commented Nov 15, 2017

what version of spark would you recommend I use with 0.4.0-SNAPSHOT?

also, should I just be adding it via

spark-shell --jars spookystuff-core-0.4.0-SNAPSHOT.jar for example or do I need to include more?

@nimbusgo
Copy link

nimbusgo commented Nov 15, 2017

currently attempted (including spookystuff-core-0.4.0-SNAPSHOT.jar ) on spark 1.5.1 and 1.6.2 and I get an error when attempting this:

import com.tribbloids.spookystuff.actions._
import com.tribbloids.spookystuff.dsl._
import com.tribbloids.spookystuff.SpookyContext

//this is the entry point of all queries & configurations
val spooky = SpookyContext(sc)

errors with:

error: bad symbolic reference. A signature in AbstractConf.class refers to term dsl
in package org.apache.spark.ml which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling AbstractConf.class.
error: bad symbolic reference. A signature in AbstractConf.class refers to term utils
in value org.apache.spark.ml.dsl which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling AbstractConf.class.
<console>:36: error: bad symbolic reference. A signature in AbstractConf.class refers to term messaging
in value org.apache.spark.ml.utils which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling AbstractConf.class.
         val spooky = SpookyContext(sc)

org.apache.spark.ml is present, but I'm not sure why it's expecting org.apache.spark.ml.dsl to exist

@tribbloid
Copy link
Owner

tribbloid commented Nov 15, 2017 via email

@nimbusgo
Copy link

so, after some guesswork it looks like I should probably be including spookystuff-assembly-0.4.0-SNAPSHOT-spark1.6.jar which I'm doing now.

currently got this happening now:
java.lang.UnsupportedClassVersionError: com/tribbloids/spookystuff/session/python/PythonProcess : Unsupported major.minor version 52.0

guessing it's got something to do with java/py4j version inconsistencies

@tribbloid
Copy link
Owner

tribbloid commented Nov 15, 2017 via email

@tribbloid
Copy link
Owner

tribbloid commented Nov 15, 2017 via email

@nimbusgo
Copy link

Thanks for the advice, was able to get it functioning on the cluster without version errors now.

I'm a little new to the library syntax and it seems the quickstart example is a little out of date for 0.4.0-SNAPSHOT.

When executing this:

spooky.wget("https://news.google.com/?output=rss&q=barack%20obama").join(S"item title".texts){
    Wget(x"http://api.mymemory.translated.net/get?q=${'A}&langpair=en|fr")
}('A ~ 'title, S"translatedText".text ~ 'translated).toDF()

I get this error:

error: com.tribbloids.spookystuff.rdd.FetchedDataset does not take parameters
       }('A ~ 'title, S"translatedText".text ~ 'translated).toDF()

Are there any quickstart examples that work for 0.4.0-SNAPSHOT that I can take a look at?

@tribbloid
Copy link
Owner

tribbloid commented Nov 16, 2017 via email

@tribbloid
Copy link
Owner

Its been 13 days, should I close it?

@dev590t
Copy link

dev590t commented Dec 13, 2020

org.apache.spark.ml is present, but I'm not sure why it's expecting org.apache.spark.ml.dsl to exist

After check the code source, I see org.apache.spark.ml.dsl is a package contained in the directory mldsl/ of project.
You can publish the source code in your local repository, and include into your spark-shell

~/.m2/repository/com/tribbloids/spookystuff/spookystuff-mldsl/0.7.0-SNAPSHOT/spookystuff-mldsl-0.7.0-SNAPSHOT.jar

The module mldsl should be published on maven repository, and add as dependancy in documentation page!
spookystuff is a interested project, but if the getting start example doesn't work. It can dissuade many developer to use it.
It is urgent to update the documentation website. Where we can modify the documentation page? @tribbloid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants