scalding-core: merge flow step strategies to allow reducer estimation combined with other strategies #1094

ejconlon · 2014-11-11T22:05:37Z

Reducer estimation works fine by itself, but if your job defines a custom flow step strategy that does something else, the reducer estimator strategy gets overwritten. This PR does not modify any public APIs but simply runs strategies in sequence if one is present from the ExecutionContext.

… combined with other strategies

isnotinvain · 2014-11-11T22:51:42Z

scalding-core/src/main/scala/com/twitter/scalding/Job.scala

@@ -231,7 +253,11 @@ class Job(val args: Args) extends FieldConversions with java.io.Serializable {
        listeners.foreach { flow.addListener(_) }
        stepListeners.foreach { flow.addStepListener(_) }
        skipStrategy.foreach { flow.setFlowSkipStrategy(_) }


should we do the same for skipStrategy?

bias to laziness

isnotinvain · 2014-11-11T22:57:01Z

API-wise, do you think it makes sense to add a method to Job addFlowStepStrategy which does the compose logic, but leave @deprecated def stepStrategy which does no composing?

I'm wondering if it's a fair assumption to always compose instead of overwriting (it seems like it is fair). What do you think?

johnynek · 2014-11-11T23:00:03Z

scalding-core/src/main/scala/com/twitter/scalding/Job.scala

+   * The whole thing is a bit bonkers with wildcards and casting, but it works.
+   */
+  private def andThenFlowStepStrategy[A](
+    first: Option[FlowStepStrategy[_]],


why take _ here, and then cast. Why not make the caller cast? I mean, it should be: Option[FlowStepStrategy[A]] right? then you don't need the cast.

Also, this is clearly a Monoid. Why not just make a Monoid[FlowStepStrategy[A]] and use that?

ejconlon · 2014-11-12T19:58:36Z

I'd rather not be the one to fix the whole strategy-chaining API. I only propose private solutions here that solve concrete problems.

As for Monoid/Semigroup, I don't really see the need to generalize at this point. I pulled the plus function out into something that looks like a Semigroup, but I'll leave it to the next person that has a use case to do the "extends Semigroup" part while threading implicit references around.

isnotinvain · 2014-11-13T22:13:15Z

LGTM, thanks for the PR

I'd rather see private[scalding] object FlowStepStrategies made into a public Monoid but as you said it's probably not a huge win (but it is consistent).

Outside of the scope of the PR, I think it would be nice to not rely on the mutability of cascading's flowstep variable (so anyone who sets it has to be careful to remember to compose) but this seems like a good fix w/o changing the API.

ejconlon · 2014-11-13T22:23:56Z

ping @johnynek - my bias is to get something quick'n'dirty that works for our purposes, but you have a better idea how the api should work.

scalding-core: merge flow step strategies to allow reducer estimation combined with other strategies

johnynek · 2014-11-14T06:09:40Z

I agree with Alex that if a method conforms to a typeclass that we already have in scope, we should use it, but I won't force the issue here since it is private.

I view this as an additional form of documentation.

scalding-core: merge flow step strategies to allow reducer estimation…

4e61b03

… combined with other strategies

isnotinvain reviewed Nov 11, 2014
View reviewed changes

johnynek reviewed Nov 11, 2014
View reviewed changes

rework

c71af42

johnynek added a commit that referenced this pull request Nov 14, 2014

Merge pull request #1094 from twitter/econlon-merge-flow-step-strategies

0c49072

scalding-core: merge flow step strategies to allow reducer estimation combined with other strategies

johnynek merged commit 0c49072 into develop Nov 14, 2014

johnynek deleted the econlon-merge-flow-step-strategies branch November 14, 2014 06:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scalding-core: merge flow step strategies to allow reducer estimation combined with other strategies #1094

scalding-core: merge flow step strategies to allow reducer estimation combined with other strategies #1094

ejconlon commented Nov 11, 2014

isnotinvain Nov 11, 2014

ejconlon Nov 12, 2014

isnotinvain commented Nov 11, 2014

johnynek Nov 11, 2014

johnynek Nov 11, 2014

ejconlon commented Nov 12, 2014

isnotinvain commented Nov 13, 2014

ejconlon commented Nov 13, 2014

johnynek commented Nov 14, 2014

scalding-core: merge flow step strategies to allow reducer estimation combined with other strategies #1094

scalding-core: merge flow step strategies to allow reducer estimation combined with other strategies #1094

Conversation

ejconlon commented Nov 11, 2014

isnotinvain Nov 11, 2014

Choose a reason for hiding this comment

ejconlon Nov 12, 2014

Choose a reason for hiding this comment

isnotinvain commented Nov 11, 2014

johnynek Nov 11, 2014

Choose a reason for hiding this comment

johnynek Nov 11, 2014

Choose a reason for hiding this comment

ejconlon commented Nov 12, 2014

isnotinvain commented Nov 13, 2014

ejconlon commented Nov 13, 2014

johnynek commented Nov 14, 2014