-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Traverse reduce-side "Iterables" more than once #91
Comments
This case would show the issue: val xs: DList[(Int, Int) = ...
xs.groupByKey.map { case (_, vs) = vs.sum / vs.size } While a fix might not be trivial we can try to at least throw an exception if we detect this situation (when the iterable is used twice). |
I have spent considerable time on this issue. I cannot find any meaningful improvement in the short-term. The only possible improvement (that I can imagine) would require a significant alteration to the existing API and considerable code refactoring. Some example improvements would be:
All other apparent improvements result in either meaninglessness (they do not provide any safety benefit) or bugs (improper operation of the scoobi library). This may be a limit of my imagination, but I believe I have exhausted this pursuit to the extent of my ability. [1] Stackless Scala With Free Monads, Rúnar Óli Bjarnason, The Third Scala Workshop, London, Apr 17th 2012. |
Moving out to 0.8 for now - a solution may sneak back into 0.7. |
When operating over an
Iterable
value in Scoobi, it could refer to the "values"Iterable
of a Hadoop reduce method. There are cases that demonstrate that thisIterable
can only be traversed once from Scoobi user code (the second traversal would result in an emptyIterable
).Need to investigate whether the
Iterable
provided by Hadoop'sreduce
can only be traversed once. If not, fix whatever Scoobi is doing to ensure user code can also traverse it more than once.The text was updated successfully, but these errors were encountered: