Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Comparison Operators to Date Columns #230

Merged

Conversation

Amalicia
Copy link
Contributor

Description

I've recently been trying to migrate one of our Spark jobs from standard Spark to Doric. However, in doing this migration I noticed that comparison operators on dates currently aren't present in Doric when they are in Spark. This PR adds the comparison operators <, >, <=, >= to the Doric DateColumn.

I have tried to keep things in the same style and as similar in implementation to the NumericOperations class however if any parts don't meet a required format/style please let me know and I will happily change

Related Issue

#229

Motivation and Context

Currently it is not possible to compare dates. For example, should you want to filter a DataFrame for all dates before x date that cannot currently be done using the doric DateColumn

How Has This Been Tested?

  • This pull request contains appropriate tests?

@Amalicia Amalicia requested a review from a team as a code owner June 16, 2022 14:52
@github-actions github-actions bot added spark_2.4 PR changes to spark 2.4 spark_3.0 PR changes to spark 3.0 spark_3.1 PR changes to spark 3.1 spark_3.2 PR changes to spark 3.2 labels Jun 16, 2022
@codecov
Copy link

codecov bot commented Jun 16, 2022

Codecov Report

Merging #230 (551fdfb) into main (091998e) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #230      +/-   ##
==========================================
+ Coverage   96.37%   96.39%   +0.02%     
==========================================
  Files          53       53              
  Lines         827      831       +4     
  Branches       12       10       -2     
==========================================
+ Hits          797      801       +4     
  Misses         30       30              
Flag Coverage Δ
spark-2.4.x 93.62% <100.00%> (+0.03%) ⬆️
spark-3.0.x 95.31% <100.00%> (+0.02%) ⬆️
spark-3.1.x 96.42% <100.00%> (+0.02%) ⬆️
spark-3.2.x 96.42% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
core/src/main/scala/doric/syntax/DateColumns.scala 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 091998e...551fdfb. Read the comment docs.

Copy link
Member

@alfonsorr alfonsorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, only a couple of things to make it spark more ✨

core/src/test/scala/doric/syntax/DateColumnsSpec.scala Outdated Show resolved Hide resolved
df.show()

test[Date]("date")
test[LocalDate]("localDate")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing that the test is failing for spark 2.4, split this test in two, one for Date and Timestamp as you already have, and the ones for LocalDate and Instant have to go on the spark_3.0_3.1_3.2 folder. We have a few examples doing it, but any doubt let us know! Also, if you want to try against 2.4 on your local computer, launching sbt -DsparkVersion=2.4 or if you use IntelliJ, in sbt settings put the spark version you want to test against:
Captura de Pantalla 2022-06-16 a las 19 03 03

We will have to improve our documentation for new contributors 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱 I hadn't even thought I might need to run the tests differently for the different spark versions. Thanks for the heads up!

I believe I've done what you said, I just have one question on your preferences: I implemented the trait DateOperationsSpec which contains the testing logic for the comparison operators. We could reimplement this in the spark_3.0_3.1_3.2 folder, however it would look identical. This has me thinking that maybe I could extract this trait out to something along the lines of DateOperationsFixture and create a new folder called fixtures. We could then use DateOperationsFixture in both the 2.4.x tests and the 3.x tests. Or of course, we could just leave the trait DateOperationsSpec and import it in DateColumns3xSpec. Do you have any thoughts or preferences on this? 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe implement the common logic in a trait, and extend it in the base scala folder(it runs for all the versions) for Date and Timestamp, and for the 3.x versions extend again the trait and only call for LocalDate and instant (yeah is a mess, but is the simplest way we found).

As you can imagine, the scala version in src and these execute in all versions and it is reachable from the rest of the specialized folders. And then we have the specialized folders that contain the name with what versions are going to be executed. Sbt checks the version and gets the folders it needs to build it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've hopefully addressed this in 551fdfb. Please let me know if you have any further comments or feedback! 😄

@alfonsorr
Copy link
Member

Lets give a chance to the test to talk, but looks very good.

@alfonsorr
Copy link
Member

alfonsorr commented Jun 17, 2022

Thanks again for the million times. If you are interested in any other collaboration we are open to all the help, ideas and suggestions.
We expect to make a release next week with this pr and a couple more 😄

@Amalicia
Copy link
Contributor Author

Of course! This has been a lot of fun so I'd definitely be interested in helping/contributing in any way I can!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spark_2.4 PR changes to spark 2.4 spark_3.0 PR changes to spark 3.0 spark_3.1 PR changes to spark 3.1 spark_3.2 PR changes to spark 3.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature request]: Add Comparison Operators to DateColumn [Feature request]: Time comparators
2 participants