-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scalding-core: merge flow step strategies to allow reducer estimation combined with other strategies #1094
Conversation
… combined with other strategies
@@ -231,7 +253,11 @@ class Job(val args: Args) extends FieldConversions with java.io.Serializable { | |||
listeners.foreach { flow.addListener(_) } | |||
stepListeners.foreach { flow.addStepListener(_) } | |||
skipStrategy.foreach { flow.setFlowSkipStrategy(_) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we do the same for skipStrategy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bias to laziness
API-wise, do you think it makes sense to add a method to Job I'm wondering if it's a fair assumption to always compose instead of overwriting (it seems like it is fair). What do you think? |
* The whole thing is a bit bonkers with wildcards and casting, but it works. | ||
*/ | ||
private def andThenFlowStepStrategy[A]( | ||
first: Option[FlowStepStrategy[_]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why take _ here, and then cast. Why not make the caller cast? I mean, it should be: Option[FlowStepStrategy[A]]
right? then you don't need the cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this is clearly a Monoid. Why not just make a Monoid[FlowStepStrategy[A]] and use that?
I'd rather not be the one to fix the whole strategy-chaining API. I only propose private solutions here that solve concrete problems. As for Monoid/Semigroup, I don't really see the need to generalize at this point. I pulled the plus function out into something that looks like a Semigroup, but I'll leave it to the next person that has a use case to do the "extends Semigroup" part while threading implicit references around. |
LGTM, thanks for the PR I'd rather see Outside of the scope of the PR, I think it would be nice to not rely on the mutability of cascading's flowstep variable (so anyone who sets it has to be careful to remember to compose) but this seems like a good fix w/o changing the API. |
ping @johnynek - my bias is to get something quick'n'dirty that works for our purposes, but you have a better idea how the api should work. |
scalding-core: merge flow step strategies to allow reducer estimation combined with other strategies
I agree with Alex that if a method conforms to a typeclass that we already have in scope, we should use it, but I won't force the issue here since it is private. I view this as an additional form of documentation. |
Reducer estimation works fine by itself, but if your job defines a custom flow step strategy that does something else, the reducer estimator strategy gets overwritten. This PR does not modify any public APIs but simply runs strategies in sequence if one is present from the ExecutionContext.