Skip to content

Commit

Permalink
[SPARK-49263][CONNECT] Spark Connect python client: Consistently hand…
Browse files Browse the repository at this point in the history
…le boolean Dataframe reader options

### What changes were proposed in this pull request?

Using `spark.read.option("Foo", True)` resulted in an uppercase `'True'` string in Python Spark Connect client, while in all other cases (scala with both Spark Connect and no Spark Connect, pyspark with no Spark Connect) it would be normalized to `'true'`. This is because `to_str` helper should be used instead of `str`.

### Why are the changes needed?

This is now inconsistent with other cases. Passing `"True"` as boolean options seems to be breaking Delta CDF reader (to be fixed separately, that it should be able to handle the literal case-insensitively)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unittest added

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47790 from juliuszsompolski/SPARK-49263.

Lead-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Co-authored-by: Julek Sompolski <Juliusz Sompolski>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
  • Loading branch information
HyukjinKwon and HyukjinKwon committed Aug 19, 2024
1 parent 542b24a commit a6a62e5
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 3 deletions.
8 changes: 6 additions & 2 deletions python/pyspark/sql/connect/plan.py
Original file line number Diff line number Diff line change
Expand Up @@ -281,9 +281,13 @@ def __init__(
assert schema is None or isinstance(schema, str)

if options is not None:
new_options = {}
for k, v in options.items():
assert isinstance(k, str)
assert isinstance(v, str)
if v is not None:
assert isinstance(k, str)
assert isinstance(v, str)
new_options[k] = v
options = new_options

if paths is not None:
assert isinstance(paths, list)
Expand Down
2 changes: 1 addition & 1 deletion python/pyspark/sql/connect/readwriter.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ def schema(self, schema: Union[StructType, str]) -> "DataFrameReader":
schema.__doc__ = PySparkDataFrameReader.schema.__doc__

def option(self, key: str, value: "OptionalPrimitiveType") -> "DataFrameReader":
self._options[key] = str(value)
self._options[key] = cast(str, to_str(value))
return self

option.__doc__ = PySparkDataFrameReader.option.__doc__
Expand Down
1 change: 1 addition & 0 deletions python/pyspark/sql/tests/test_datasources.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,7 @@ def test_checking_csv_header(self):
)
df = (
self.spark.read.option("header", "true")
.option("quote", None)
.schema(schema)
.csv(path, enforceSchema=False)
)
Expand Down

0 comments on commit a6a62e5

Please sign in to comment.