Added Sorting Functions #233

Amalicia · 2022-06-24T11:39:44Z

Description

This PR is implements the asc and desc functions alongside the orderBy and sort functions currently present in Spark.

Related Issue

Resolves #70

Motivation and Context

Currently when performing orderBy or sort operations in a DataFrame, you have to use Spark columns and functions. This PR aims to allow the use of doric columns when sorting.

How Has This Been Tested?

This pull request contains appropriate tests?

codecov · 2022-06-24T15:33:57Z

Codecov Report

Merging #233 (0c02a25) into main (0b3f5e7) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #233      +/-   ##
==========================================
+ Coverage   96.61%   96.64%   +0.03%     
==========================================
  Files          53       54       +1     
  Lines         943      952       +9     
  Branches       18       12       -6     
==========================================
+ Hits          911      920       +9     
  Misses         32       32

Flag	Coverage Δ
spark-2.4.x	`93.00% <100.00%> (+0.07%)`	⬆️
spark-3.0.x	`95.82% <100.00%> (+0.04%)`	⬆️
spark-3.1.x	`96.78% <100.00%> (+0.03%)`	⬆️
spark-3.2.x	`96.78% <100.00%> (+0.03%)`	⬆️
spark-3.3.x	`96.78% <100.00%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
core/src/main/scala/doric/sem/SortingOps.scala	`100.00% <100.00%> (ø)`
...re/src/main/scala/doric/syntax/CommonColumns.scala	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0b3f5e7...0c02a25. Read the comment docs.

alfonsorr · 2022-06-27T15:08:24Z

~~Rebase your branch with the latest updates of Maine. They are missing the changes for spark 3.3~~

Sorry in the phone show me that the changes from the last pr were reverted 😵‍💫

eruizalo

LGTM!

Just rebase your branch because codecov is acting weird, saying the coverage will drop because of your PR, and that's not true

I updated the issue #70 as well because we were only talking about column functions (we forgot about DataFrame order)

Amalicia · 2022-06-29T14:52:42Z

Apologies for the delay on this! The past few days have been a little bit busy for me. Looking at the updated #70, it seems all thats left on this issue is to add def sortWithinPartitions(sortExprs: Column*): Dataset[T], right? 😄

eruizalo · 2022-06-29T16:53:23Z

No problem at all! We really appreciate what you are doing.

About #70, yeah, that's the only función left, I updated the issue to track more related functions, but it doesn't mean that you have to implement it (you can though, but if you don't we will do it as soon as we can).

Anyway, whenever you rebase your branch and mark your PR as "ready for review" we will merge it.

Thank you again for your help and effort!

Amalicia · 2022-06-30T14:50:16Z

The Spark 2.4 tests are failing due the Spark minus months returning a different result to the LocalDate equivalent... It would appear this is some flakiness that will occur on certain days of the year.

One option to get around this and prevent it from happening again would be to hardcode the date to a certain value instead of using LocalDate.now(). Any other thoughts or suggestions? 🤔

Amalicia · 2022-07-08T08:54:52Z

As a follow up to the above comment about the LocalDate flakiness, I found the PR that updated the logic to use the LocalDate api: apache/spark#25153. I'll move the existing logic to the spark 3.x tests and try and come up with a solution for the 2.4 tests

alfonsorr · 2022-07-11T09:36:53Z

Hi! I'm back, sorry for this absence. I see you got it fixed but to test it for all versions, we can set the expected value according to the spark version. Something like:

val expectedResult = if (spark.version.take(1).toInt > 2)
 List(Date.valueOf(LocalDate.now.minusMonths(3)), null).map(Option(_))
 else
 \\ spark 2 result

df.testColumns2("dateCol", -3)(
        (d, m) => colDate(d).addMonths(m.lit),
        (d, m) => f.add_months(f.col(d), m),
        expectedResult
      )

Thanks for all the work 😄

Amalicia · 2022-07-11T10:55:49Z

Welcome back! No worries at all!!

Sounds good! Just for understanding, when should tests be placed into the Spark 3 folders like I did in b2b4c38 compared to when logic like this should be implemented?

alfonsorr · 2022-07-11T11:06:35Z

The folders contain in the name the versions that apply to them, if it has 3.0 it will be added when it's spark 3.0, if the folder has more than one, it will be added for all the versions added there. And the scala folder is for all versions.
In your case, we can add the test for version (and folder) 2.4 and the other these for the folder 3.0_3.1_3.2_3.3 or we can simplify it by adding it to the scala folder and put a condition for the version.

We use the folders if it's more a compile problem (different interfaces, methods, or classes that don't exist in a spark version, etc) but in your case it exists, it's compatible, but the only problem is that depending on the version, it returns a value or other.

alfonsorr · 2022-07-11T14:53:38Z

Codecov error, I'm going to launch it again. 😓

Amalicia requested a review from a team as a code owner June 24, 2022 11:39

github-actions bot added spark_2.4 PR changes to spark 2.4 spark_3.0 PR changes to spark 3.0 spark_3.1 PR changes to spark 3.1 spark_3.2 PR changes to spark 3.2 spark_3.3 PR changes to spark 3.3 labels Jun 24, 2022

Amalicia marked this pull request as draft June 24, 2022 11:40

eruizalo mentioned this pull request Jun 28, 2022

Sorting functions #70

Closed

12 tasks

eruizalo previously approved these changes Jun 28, 2022

View reviewed changes

alfonsorr previously approved these changes Jun 29, 2022

View reviewed changes

Amalicia added 3 commits June 29, 2022 15:32

Initial WIP

be9643c

Add more sorting functions

5e2e261

Fix build 🤞

1cfe321

Amalicia dismissed stale reviews from alfonsorr and eruizalo via 1cfe321 June 29, 2022 14:50

Amalicia force-pushed the feature/sort-function branch from 0cc01c6 to 1cfe321 Compare June 29, 2022 14:50

Add sortWithinPartitions 🤞

1e35a1e

Fix flaky test

b2b4c38

Amalicia marked this pull request as ready for review July 8, 2022 09:50

PR feedback

0c02a25

alfonsorr approved these changes Jul 11, 2022

View reviewed changes

alfonsorr merged commit b77d57f into hablapps:main Jul 12, 2022

Amalicia deleted the feature/sort-function branch July 12, 2022 07:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Sorting Functions #233

Added Sorting Functions #233

Amalicia commented Jun 24, 2022 •

edited

Loading

codecov bot commented Jun 24, 2022 •

edited

Loading

alfonsorr commented Jun 27, 2022 •

edited

Loading

eruizalo left a comment •

edited

Loading

Amalicia commented Jun 29, 2022

eruizalo commented Jun 29, 2022

Amalicia commented Jun 30, 2022

Amalicia commented Jul 8, 2022

alfonsorr commented Jul 11, 2022

Amalicia commented Jul 11, 2022

alfonsorr commented Jul 11, 2022

alfonsorr commented Jul 11, 2022

Added Sorting Functions #233

Added Sorting Functions #233

Conversation

Amalicia commented Jun 24, 2022 • edited Loading

Description

Related Issue

Motivation and Context

How Has This Been Tested?

codecov bot commented Jun 24, 2022 • edited Loading

Codecov Report

alfonsorr commented Jun 27, 2022 • edited Loading

eruizalo left a comment • edited Loading

Choose a reason for hiding this comment

Amalicia commented Jun 29, 2022

eruizalo commented Jun 29, 2022

Amalicia commented Jun 30, 2022

Amalicia commented Jul 8, 2022

alfonsorr commented Jul 11, 2022

Amalicia commented Jul 11, 2022

alfonsorr commented Jul 11, 2022

alfonsorr commented Jul 11, 2022

Amalicia commented Jun 24, 2022 •

edited

Loading

codecov bot commented Jun 24, 2022 •

edited

Loading

alfonsorr commented Jun 27, 2022 •

edited

Loading

eruizalo left a comment •

edited

Loading