-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add typed version of RichPipe 'using' #1049
Conversation
I wonder if we can do better and have a method like: def onComplete(fn: Try[Unit] => ()): TypedPipe[T] Which we can then pass down to cascading. Here you could use a normal lazy val and just call: lazy val lookup = new RemoteHandle
pipe.map { t => lookup(t) }.onComplete { _ => lookup.release } Then we get access to all the methods on TypedPipe without copying them. What do you think of this? It is more work to plumb that completion function all the way through to cascading, but I think the API is nicer and the power greater. |
Cool! I didn't think of that way of doing it. I just started looking at the code so it isn't obvious to me how to hook this in, but I'lll give it a shot! |
I can give you some coaching. The first step will be to make a new instance of TypedPipe, like: class WithOnComplete[T](pipe: TypedPipe[T], fn: Try[Unit] => ()) extends TypedPipe[T] {
} Next, we are going to have to alter how toPipe works to use that function if it exists. There is already some case matching on the subclass of TypedPipe. Look for TypedPipeFactory for instance. |
So I tried the following approach which didn't work, but I am wondering if it is worth the trouble. This will have me create a new "Each" and the part of "using" that was nice is that it gave you clear control to say that this is the setup and cleanup for that step which we would lose here. I added the following to Operations.scala
For TypedPipe.scala added:
And I then tried to create a test which fails
with error:
|
I guess it is because it could not serialize the cleanUp. You are using the Externalizer, but then ignoring it in the your cleanup method of your Operation. Can you instead try getting the function out of the externalizer? Also, to be sure, mark your cleanup function as Serialization is one of the major pains of dealing with Hadoop. |
Also, your toIteratorExecution is not going to work: it is not applying the cleanUp. I think you just want:
Since after you force it to disk, it will apply the cleanup, and the toIteratorExecution will not be on another WithOnComplete. |
Thanks for the suggestions but it still errors. I also am pretty sure that my implementation of cross and flatMap are wrong. Those changes didn't fix it. I believe that the implementation of cross and flatMap is wrong as well as cleanUp won't be called... do I add a forceToDiskExecution there as well? |
Ahh, no. You should have: override def cross[U](tiny: TypedPipe[U]): TypedPipe[(T, U)] =
new WithOnComplete(typedPipe.cross(tiny), onComplete)
override def flatMap[U](f: T => TraversableOnce[U]): TypedPipe[U] =
new WithOnComplete(typedPipe.flatMap(f), onComplete) |
Ah yes that makes sense. I was afraid of delaying the cleanup but it can
On Mon, Sep 15, 2014 at 2:27 PM, P. Oscar Boykin notifications@github.com
|
Can you update the pull request and post a gist of the full stack trace? On Mon, Sep 15, 2014 at 12:09 PM, Adam Poswolsky notifications@github.com
Oscar Boykin :: @posco :: http://twitter.com/posco |
done, gist is here: https://gist.github.com/aposwolsky/ed1e2e7e58130bab6fdc |
class TypedPipeWithOnCompleteTest extends Specification { | ||
import Dsl._ | ||
var cleanupCalled: Boolean = false | ||
def cleanupFn(): Unit = { cleanupCalled = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function is going to require the Specification to be serialized, and that class has a ton of stuff that can be serialized.
Instead do this:
class CleanUp {
var called: Boolean = false
def cleanUp(): Unit = { called = true; () }
}
// in the Specification:
class ... extends Specification {
val cc = new CleanUp
// pass this to the job.
new Job(args, cc)
// in the job, call cc.cleanUp
}
Now, use that
Thanks! You ended up pretty much rewriting every line :) Was still having serialization issues so changed it to use a counter instead which did the trick.. |
@@ -0,0 +1 @@ | |||
/Users/adamp/.ivy2/cache/com.twitter/algebird-core_2.10/jars/algebird-core_2.10-0.7.0.jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should not be here.
Cleaned it all up and merged into one commit. |
This looks all pretty great. One last request: can you add a few more tests that show at least a groupBy with a onComplete on the mapside and on the reduceside? |
Added a more robust test. I tried to do withReducers(2) so that I can have On Tue, Sep 16, 2014 at 6:19 PM, P. Oscar Boykin notifications@github.com
|
This is great. |
Add typed version of RichPipe 'using'
Add typed version of RichPipe 'using'