Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scalding-core: merge flow step strategies to allow reducer estimation combined with other strategies #1094

Merged
merged 2 commits into from
Nov 14, 2014

Conversation

ejconlon
Copy link

Reducer estimation works fine by itself, but if your job defines a custom flow step strategy that does something else, the reducer estimator strategy gets overwritten. This PR does not modify any public APIs but simply runs strategies in sequence if one is present from the ExecutionContext.

@@ -231,7 +253,11 @@ class Job(val args: Args) extends FieldConversions with java.io.Serializable {
listeners.foreach { flow.addListener(_) }
stepListeners.foreach { flow.addStepListener(_) }
skipStrategy.foreach { flow.setFlowSkipStrategy(_) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we do the same for skipStrategy?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bias to laziness

@isnotinvain
Copy link
Contributor

API-wise, do you think it makes sense to add a method to Job addFlowStepStrategy which does the compose logic, but leave @deprecated def stepStrategy which does no composing?

I'm wondering if it's a fair assumption to always compose instead of overwriting (it seems like it is fair). What do you think?

* The whole thing is a bit bonkers with wildcards and casting, but it works.
*/
private def andThenFlowStepStrategy[A](
first: Option[FlowStepStrategy[_]],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why take _ here, and then cast. Why not make the caller cast? I mean, it should be: Option[FlowStepStrategy[A]] right? then you don't need the cast.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this is clearly a Monoid. Why not just make a Monoid[FlowStepStrategy[A]] and use that?

@ejconlon
Copy link
Author

I'd rather not be the one to fix the whole strategy-chaining API. I only propose private solutions here that solve concrete problems.

As for Monoid/Semigroup, I don't really see the need to generalize at this point. I pulled the plus function out into something that looks like a Semigroup, but I'll leave it to the next person that has a use case to do the "extends Semigroup" part while threading implicit references around.

@isnotinvain
Copy link
Contributor

LGTM, thanks for the PR

I'd rather see private[scalding] object FlowStepStrategies made into a public Monoid but as you said it's probably not a huge win (but it is consistent).

Outside of the scope of the PR, I think it would be nice to not rely on the mutability of cascading's flowstep variable (so anyone who sets it has to be careful to remember to compose) but this seems like a good fix w/o changing the API.

@ejconlon
Copy link
Author

ping @johnynek - my bias is to get something quick'n'dirty that works for our purposes, but you have a better idea how the api should work.

johnynek added a commit that referenced this pull request Nov 14, 2014
scalding-core: merge flow step strategies to allow reducer estimation combined with other strategies
@johnynek johnynek merged commit 0c49072 into develop Nov 14, 2014
@johnynek johnynek deleted the econlon-merge-flow-step-strategies branch November 14, 2014 06:08
@johnynek
Copy link
Collaborator

I agree with Alex that if a method conforms to a typeclass that we already have in scope, we should use it, but I won't force the issue here since it is private.

I view this as an additional form of documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants