Skip to content

Commit

Permalink
[SPARK-47125][SQL] Return null if Univocity never triggers parsing
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

This PR proposes to prevent `null` for `tokenizer.getContext`. This is similar with apache#28029. `getContext` seemingly via the univocity library, it can return null if `begingParsing` is not invoked (https://github.com/uniVocity/univocity-parsers/blob/master/src/main/java/com/univocity/parsers/common/AbstractParser.java#L53). This can happen when `parseLine` is not invoked at https://github.com/apache/spark/blob/e081f06ea401a2b6b8c214a36126583d35eaf55f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala#L300 - `parseLine` invokes `begingParsing`.

### Why are the changes needed?

To fix up a bug.

### Does this PR introduce _any_ user-facing change?

Yes. In a very rare case, when `CsvToStructs` is used as a sole predicate against an empty row, it might trigger NPE. This PR fixes it.

### How was this patch tested?

Manually tested, but test case will be done in a separate PR. We should backport this to all branches.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45210 from HyukjinKwon/SPARK-47125.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
  • Loading branch information
HyukjinKwon authored and ericm-db committed Mar 5, 2024
1 parent 7f82682 commit 4783e8d
Showing 1 changed file with 1 addition and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ class UnivocityParser(

// Retrieve the raw record string.
private def getCurrentInput: UTF8String = {
if (tokenizer.getContext == null) return null
val currentContent = tokenizer.getContext.currentParsedContent()
if (currentContent == null) null else UTF8String.fromString(currentContent.stripLineEnd)
}
Expand Down

0 comments on commit 4783e8d

Please sign in to comment.