Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-4093] Fix NPE when insert records that partition column is null #5573

Closed
wants to merge 1 commit into from

Conversation

watermelon12138
Copy link
Contributor

… into partition hudi table

Tips

What is the purpose of the pull request

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@nsivabalan
Copy link
Contributor

high level question/thoughts.
is this about partition column value being null or partition path field config value set to null?
also, can we ensure we look at all write paths? i.e. spark-sql, spark-datasource as well and ensure we are consistent.

@nsivabalan
Copy link
Contributor

@watermelon12138 : can you respond to my question above.

@nsivabalan nsivabalan added priority:major degraded perf; unable to move forward; potential bugs writer-core Issues relating to core transactions/write actions labels Jun 23, 2022
@watermelon12138
Copy link
Contributor Author

high level question/thoughts. is this about partition column value being null or partition path field config value set to null? also, can we ensure we look at all write paths? i.e. spark-sql, spark-datasource as well and ensure we are consistent.

@nsivabalan I am so sorry,I didn't respond in time. This problem is about partition column value being null. The solution is that when the value of the partition column is null, the default partition is created.

@watermelon12138 watermelon12138 force-pushed the fixInsertNPE branch 3 times, most recently from 068c83d to d6b556c Compare September 17, 2022 06:33
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan added the release-0.12.2 Patches targetted for 0.12.2 label Dec 6, 2022
userSpecifiedDataTypes.keySet.foreach { name =>
val dataType = userSpecifiedDataTypes.get(name).getOrElse("")
if (!dataType.isInstanceOf[StringType]) {
partitionStr = partitionStr.replace(s"$name=default", s"$name=__HIVE_DEFAULT_PARTITION__")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is needed only for Spark3?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also consider adding a test for this scenario.

@@ -610,7 +610,7 @@ public static Option<String> getNullableValAsString(GenericRecord rec, String fi
* @return field value either converted (for certain data types) or as it is.
*/
public static Object convertValueForSpecificDataTypes(Schema fieldSchema, Object fieldValue, boolean consistentLogicalTimestampEnabled) {
if (fieldSchema == null) {
if (fieldSchema == null || fieldValue == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be fixed by #7349?

@codope codope changed the title [HUDI-4093] fix NPE when insert records that partition column is null… [HUDI-4093] Fix NPE when insert records that partition column is null Dec 7, 2022
@codope
Copy link
Member

codope commented Dec 12, 2022

@watermelon12138 I suggest you to try out latest master as #7349 has been merged. It should fix the NPE.

@codope codope removed the release-0.12.2 Patches targetted for 0.12.2 label Dec 12, 2022
@bvaradar bvaradar self-assigned this Feb 18, 2023
@bvaradar
Copy link
Contributor

bvaradar commented Apr 5, 2023

@watermelon12138 : If this is not an issue in current master, can you kindly close it ?

@bvaradar bvaradar closed this Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:major degraded perf; unable to move forward; potential bugs writer-core Issues relating to core transactions/write actions
Projects
Status: 🏁 Triaged
Status: ✅ Done
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants