Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-4875][VL]Support spark sql conf sortBeforeRepartition to avoid stage partial retry casuing result mismatch #4872

Merged
merged 10 commits into from
Mar 18, 2024

Conversation

zjuwangg
Copy link
Contributor

@zjuwangg zjuwangg commented Mar 6, 2024

What changes were proposed in this pull request?

Spark introduced spark.sql.execution.sortBeforeRepartition config in https://issues.apache.org/jira/browse/SPARK-23207 to keep the result correct, and we should do the same thing in gluten plan to achieve the same affact.

(Fixes: #4875)

How was this patch tested?

  • Added ut to verify the executed plan is as expected.

Copy link

github-actions bot commented Mar 6, 2024

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

github-actions bot commented Mar 6, 2024

Run Gluten Clickhouse CI

1 similar comment
Copy link

github-actions bot commented Mar 7, 2024

Run Gluten Clickhouse CI

@zjuwangg zjuwangg changed the title [VL]Support spark sql conf sortBeforeRepartition to avoid stage partial retry casuing result mismatch [GLUTEN-4875][VL]Support spark sql conf sortBeforeRepartition to avoid stage partial retry casuing result mismatch Mar 7, 2024
Copy link

github-actions bot commented Mar 7, 2024

#4875

Copy link

github-actions bot commented Mar 7, 2024

Run Gluten Clickhouse CI

1 similar comment
Copy link

github-actions bot commented Mar 7, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Mar 7, 2024

Run Gluten Clickhouse CI

2 similar comments
Copy link

github-actions bot commented Mar 7, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Mar 7, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Mar 7, 2024

Run Gluten Clickhouse CI

1 similar comment
Copy link

github-actions bot commented Mar 7, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Mar 7, 2024

Run Gluten Clickhouse CI

cpp/core/jni/JniWrapper.cc Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Mar 7, 2024

Run Gluten Clickhouse CI

2 similar comments
Copy link

github-actions bot commented Mar 7, 2024

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

2 similar comments
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

…d stage partial retry casuing result mismatch
Copy link

Run Gluten Clickhouse CI

@zjuwangg zjuwangg requested a review from marin-ma March 14, 2024 02:35
@zjuwangg
Copy link
Contributor Author

@marin-ma plz help review again, there is a ci pre-check failure due to network problem.

@marin-ma
Copy link
Contributor

LGTM. Please document the sql query modification in the code and I will proceed with merging. Thanks!

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@marin-ma marin-ma merged commit fa86e76 into apache:main Mar 18, 2024
17 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_4872_time.csv log/native_master_03_17_2024_2ca27fb04_time.csv difference percentage
q1 37.58 35.76 -1.818 95.16%
q2 23.73 23.82 0.089 100.38%
q3 37.87 36.74 -1.135 97.00%
q4 38.32 39.53 1.218 103.18%
q5 69.45 70.25 0.798 101.15%
q6 7.07 7.38 0.306 104.32%
q7 81.44 84.15 2.706 103.32%
q8 85.30 84.82 -0.480 99.44%
q9 123.48 122.79 -0.696 99.44%
q10 43.57 46.92 3.352 107.69%
q11 20.48 20.14 -0.345 98.32%
q12 28.43 25.87 -2.558 91.00%
q13 47.66 47.00 -0.661 98.61%
q14 20.50 22.07 1.569 107.66%
q15 31.37 32.36 0.995 103.17%
q16 14.32 14.09 -0.231 98.39%
q17 100.05 100.46 0.411 100.41%
q18 142.26 143.53 1.275 100.90%
q19 15.84 13.60 -2.245 85.83%
q20 26.99 29.63 2.634 109.76%
q21 226.37 226.19 -0.174 99.92%
q22 13.96 13.82 -0.136 99.02%
total 1236.03 1240.91 4.874 100.39%

@zjuwangg zjuwangg deleted the sort_before_rr branch March 18, 2024 03:13
liuneng1994 pushed a commit to loneylee/gluten that referenced this pull request Mar 18, 2024
taiyang-li pushed a commit to bigo-sg/gluten that referenced this pull request Mar 25, 2024
taiyang-li pushed a commit to bigo-sg/gluten that referenced this pull request Oct 8, 2024
taiyang-li pushed a commit to bigo-sg/gluten that referenced this pull request Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] Shuffle+Repartition on an DataFrame could lead to incorrect answers
4 participants