[PYSPARK] Fix doc of "fold"function in rdd.py

According to the discussion in #5587, it’s necessary to point out the lambda function in “fold” needs to take the opposite order.
apache · Apr 20, 2015 · 555731d · 555731d
1 parent 53b54cb
commit 555731d
Showing 1 changed file with 4 additions and 1 deletion.
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
@@ -820,14 +820,17 @@ def fold(self, zeroValue, op):
         as its result value to avoid object allocation; however, it should not
         modify C{t2}.
 
+        Note that the provided lambda function should take the opposite order,
+        which means C{t1} needs to be elements and C{t2} be the "zero value."
+
         >>> from operator import add
         >>> sc.parallelize([1, 2, 3, 4, 5]).fold(0, add)
         15
         """
         def func(iterator):
             acc = zeroValue
             for obj in iterator:
-                acc = op(acc, obj)
+                acc = op(obj, acc)
             yield acc
         vals = self.mapPartitions(func).collect()
         return reduce(op, vals, zeroValue)