-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the remaining odds and ends to Execution[T] #985
Conversation
This is a very strange failure:
But, cascading.tuple.hadoop.TupleSerialization does extend Serialization: |
BTW, this also passes locally for me. |
Fails locally for me: [error] x An ExecutionJob should |
Current theory: using Execution, which is using threads due to the scala.concurrent.ExecutionContext can expose some classloader issues. I'm trying to explicitly set the classloader. This is a tricky one for me. I have not really debugged classpath issues before. @cwensel any comments? Is calling cascading from multiple threads going to work given all the classloader work going on internally? |
Keeps passing for me. :/ @ianoc , can you try this patch? |
Cascading has been embedded in a few multithreaded contexts. Spring source guys requested some minor changes years ago, and Lingual JDBC runs under threads (connection pooling etc). Make sure you are setting the Context ClassLoader where appropriate. Cascading either calls it directly, or Hadoop itself when doing class loading (where we delegate too) relies on it. |
@cwensel can you take a look at this stack: (line 3977) We don't do much reflection at all (two calls, both get the context loader) in our code. What is looks to me like is that the classloader that hadoop is using is somehow inconsistent. I've tried a few ideas to fix it, but nothing is working out. The context here is that the job itself is being started in a different thread than all the taps were created. |
you can try messing around with having your own UnitOfWorkExecutorStrategy on the Flow and own the context loader the thread executor utilizes. but to have the problem you are seeing, you would need two peer classloaders loading the same classpath independently. classloaders aren't child first. maybe try loading org.apache.hadoop.mapred.OutputFormat statically early and force it into a parent to see what happens. or, you have some messed up dependencies. fwiw, Lingual jumps through a bunch of hoops to allow for dynamic classloading of Tap/Schemes from remote sources. things work great, even embedded (which is why we have the capability). ckw On Jul 30, 2014, at 8:59 PM, P. Oscar Boykin notifications@github.com wrote:
Chris K Wensel |
Well, setting the context classloader explicitly to the one that created the Configuration in the job works for 2.9.3, but 2.10.4 errors out with this: /home/travis/build.sh: line 41: 1473 Killed ./sbt -Dlog4j.configuration=file://$TRAVIS_BUILD_DIR/project/travis-log4j.properties ++$TRAVIS_SCALA_VERSION assembly I tried restarting it twice, and the log ends with that both times. |
I think this works now, but Travis just can't download the jars. Passes for me and Ian. I appreciate Travis, but I'd guess the false positive rate is greater than 20%. This dramatically undermines faith in the test failures (and we often ignore real failures because of it). |
@@ -136,7 +219,9 @@ object Execution { | |||
def runStats(conf: Config, mode: Mode)(implicit cec: ConcurrentExecutionContext) = { | |||
for { | |||
(flowDef, fn) <- Future(result(conf, mode)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no reason for these first two to be in Future threads.
…mporary file size
…r/scalding into back_to_the_future_201407
Add the remaining odds and ends to Execution[T]
I think Execution is in a usable state, and the preferred way to write Library code that needs to do multistep operation.
Please review these last changes.