[SPARK-49991][SQL] Make HadoopMapReduceCommitProtocol respect 'mapreduce.output.basename' to generate file names #48494

yaooqinn · 2024-10-16T08:46:00Z

What changes were proposed in this pull request?

In 'HadoopMapReduceCommitProtocol', task output files are generated ahead instead of calling org.apache.hadoop.mapreduce.lib.output.FileOutputFormat#getDefaultWorkFile, which uses the mapreduce.output.basename as the prefix of output files.
In this pull request, we modify the HadoopMapReduceCommitProtocol.getFilename method to also look up this config instead of using the hardcoded 'part'.

Why are the changes needed?

Given a custom file name is a useful feature for users. They can use it to distinguish files added by different engines, on different days, etc. We can also align the usage scenario with other SQL on Hadoop engines for better Hadoop compatibility.

Does this PR introduce any user-facing change?

Yes, a Hadoop configuration 'mapreduce.output.basename' can be used in file datasource output files

How was this patch tested?

new tests

Was this patch authored or co-authored using generative AI tooling?

no`

…uce.output.basename' to generate file names

yaooqinn · 2024-10-17T05:26:19Z

cc @cloud-fan @dongjoon-hyun, thanks

yaooqinn added 2 commits October 16, 2024 16:30

[SPARK-49991][SQL] Make HadoopMapReduceCommitProtocol respect 'mapred…

2c9909e

…uce.output.basename' to generate file names

[SPARK-49991][SQL] Make HadoopMapReduceCommitProtocol respect 'mapred…

007321f

…uce.output.basename' to generate file names

github-actions bot added SQL CORE labels Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-49991][SQL] Make HadoopMapReduceCommitProtocol respect 'mapreduce.output.basename' to generate file names #48494

[SPARK-49991][SQL] Make HadoopMapReduceCommitProtocol respect 'mapreduce.output.basename' to generate file names #48494

yaooqinn commented Oct 16, 2024 •

edited

Loading

yaooqinn commented Oct 17, 2024

[SPARK-49991][SQL] Make HadoopMapReduceCommitProtocol respect 'mapreduce.output.basename' to generate file names #48494

Are you sure you want to change the base?

[SPARK-49991][SQL] Make HadoopMapReduceCommitProtocol respect 'mapreduce.output.basename' to generate file names #48494

Conversation

yaooqinn commented Oct 16, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

yaooqinn commented Oct 17, 2024

yaooqinn commented Oct 16, 2024 •

edited

Loading