[SPARK-30941][PYSPARK] Add a note to asDict to document its behavior …

…when there are duplicate fields ### What changes were proposed in this pull request? Adding a note to document `Row.asDict` behavior when there are duplicate fields. ### Why are the changes needed? When a row contains duplicate fields, `asDict` and `_get_item_` behaves differently. We should document it to let users know the difference explicitly. ### Does this PR introduce any user-facing change? No. Only document change. ### How was this patch tested? Existing test. Closes apache#27853 from viirya/SPARK-30941. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit d21aab4) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
scunniff · Nov 5, 2020 · 375e827 · 375e827
1 parent 435721e
commit 375e827
Showing 1 changed file with 6 additions and 0 deletions.
diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py
@@ -1528,6 +1528,12 @@ def asDict(self, recursive=False):
 
         :param recursive: turns the nested Rows to dict (default: False).
 
+        .. note:: If a row contains duplicate field names, e.g., the rows of a join
+            between two :class:`DataFrame` that both have the fields of same names,
+            one of the duplicate fields will be selected by ``asDict``. ``__getitem__``
+            will also return one of the duplicate fields, however returned value might
+            be different to ``asDict``.
+
         .. note:: If a row contains duplicate field names, e.g., the rows of a join
             between two :class:`DataFrame` that both have the fields of same names,
             one of the duplicate fields will be selected by ``asDict``. ``__getitem__``