Skip to content

Commit

Permalink
[SPARK-30941][PYSPARK] Add a note to asDict to document its behavior …
Browse files Browse the repository at this point in the history
…when there are duplicate fields

### What changes were proposed in this pull request?

Adding a note to document `Row.asDict` behavior when there are duplicate fields.

### Why are the changes needed?

When a row contains duplicate fields, `asDict` and `_get_item_` behaves differently. We should document it to let users know the difference explicitly.

### Does this PR introduce any user-facing change?

No. Only document change.

### How was this patch tested?

Existing test.

Closes apache#27853 from viirya/SPARK-30941.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit d21aab4)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
  • Loading branch information
viirya authored and Sean Cunniff committed Nov 5, 2020
1 parent 435721e commit 375e827
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions python/pyspark/sql/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -1528,6 +1528,12 @@ def asDict(self, recursive=False):
:param recursive: turns the nested Rows to dict (default: False).
.. note:: If a row contains duplicate field names, e.g., the rows of a join
between two :class:`DataFrame` that both have the fields of same names,
one of the duplicate fields will be selected by ``asDict``. ``__getitem__``
will also return one of the duplicate fields, however returned value might
be different to ``asDict``.
.. note:: If a row contains duplicate field names, e.g., the rows of a join
between two :class:`DataFrame` that both have the fields of same names,
one of the duplicate fields will be selected by ``asDict``. ``__getitem__``
Expand Down

0 comments on commit 375e827

Please sign in to comment.