Skip to content

Commit

Permalink
[PYSPARK] Fix doc of "fold"function in rdd.py
Browse files Browse the repository at this point in the history
According to the discussion in #5587, it’s necessary to point out the
lambda function in “fold” needs to take the opposite order.
  • Loading branch information
Alain committed Apr 20, 2015
1 parent 53b54cb commit 555731d
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion python/pyspark/rdd.py
Original file line number Diff line number Diff line change
Expand Up @@ -820,14 +820,17 @@ def fold(self, zeroValue, op):
as its result value to avoid object allocation; however, it should not
modify C{t2}.
Note that the provided lambda function should take the opposite order,
which means C{t1} needs to be elements and C{t2} be the "zero value."
>>> from operator import add
>>> sc.parallelize([1, 2, 3, 4, 5]).fold(0, add)
15
"""
def func(iterator):
acc = zeroValue
for obj in iterator:
acc = op(acc, obj)
acc = op(obj, acc)
yield acc
vals = self.mapPartitions(func).collect()
return reduce(op, vals, zeroValue)
Expand Down

0 comments on commit 555731d

Please sign in to comment.