-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32133][SQL] Forbid time field steps for date start/end in Sequence #28926
[SPARK-32133][SQL] Forbid time field steps for date start/end in Sequence #28926
Conversation
@@ -2612,6 +2614,9 @@ object Sequence { | |||
val stepDays = step.days | |||
val stepMicros = step.microseconds | |||
|
|||
require(scale != MICROS_PER_DAY || stepMonths != 0 || stepDays != 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add scale constraint here to DateType
Follow: #28856 |
@@ -2612,6 +2614,9 @@ object Sequence { | |||
val stepDays = step.days | |||
val stepMicros = step.microseconds | |||
|
|||
require(scale != MICROS_PER_DAY || stepMonths != 0 || stepDays != 0, | |||
"sequence step must be a day interval if start and end values are dates") | |||
|
|||
if (stepMonths == 0 && stepMicros == 0 && scale == MICROS_PER_DAY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove this branch now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we need the require
check in eval, and I remove SPARK-32198
branch code from this PR, is it ok ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TJX2014 . What is SPARK-32198? It's not created yet.
I remove SPARK-32198 branch code from this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I have the same question like @cloud-fan .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun Sorry, the correct PR is SPARK-31982, this PR is forbid step and the other is cross the timezone.
val startMicros: Long = num.toLong(start) * scale | ||
val stopMicros: Long = num.toLong(stop) * scale | ||
|
||
// Date to timestamp is not equal from GMT and Chicago timezones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we exclude these changes and make this PR smaller to only focus on "forbid time field steps for date start/end"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Thanks, I have followed the suggestion and make a new jira ticket for this.
e6ccfec
to
7021539
Compare
@@ -2679,6 +2682,11 @@ object Sequence { | |||
|final int $stepDays = $step.days; | |||
|final long $stepMicros = $step.microseconds; | |||
| | |||
|if (${scale}L == ${MICROS_PER_DAY}L && $stepMonths == 0 && $stepDays == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the scale
is known before generating the code. We can do better
val check = if (scale == MICROS_PER_DAY) {
s"""
if ($stepMonths == 0 && $stepDays == 0) ...
"""
} else {
""
}
@@ -2612,6 +2612,9 @@ object Sequence { | |||
val stepDays = step.days | |||
val stepMicros = step.microseconds | |||
|
|||
require(scale != MICROS_PER_DAY || stepMonths != 0 || stepDays != 0, | |||
"sequence step must be a day interval if start and end values are dates") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use
if (...) throw new IllegalArgumentException
to be more consistent with the codegen version.
test("SPARK-32133: Sequence step must be a day interval " + | ||
"if start and end values are dates") { | ||
val e = intercept[Exception]( | ||
checkEvaluation(Sequence( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should use checkExceptionInExpression
for error tests.
checkEvaluation(Sequence( | ||
Cast(Literal("2011-03-01"), DateType), | ||
Cast(Literal("2011-03-02"), DateType), | ||
Option(Literal(stringToInterval("interval 1 day")))), | ||
Seq(Date.valueOf("2011-03-01"), Date.valueOf("2011-03-02"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Thank you for your suggestion, I have done. Besides, I add a positive test case, hope this could be better.
@@ -1854,4 +1854,18 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper | |||
Literal(stringToInterval("interval 1 year"))), | |||
Seq(Date.valueOf("2018-01-01"))) | |||
} | |||
|
|||
test("SPARK-32133: Sequence step must be a day interval " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of adding a new test, shall we just put the new negative test in the existing test case Sequence of dates
? We can point to the JIRA number in the code comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have moved the negative test to Sequence of dates
and point to the JIRA number in the code comment.
ok to test |
retest this please |
Test build #124752 has finished for PR 28926 at commit
|
retest this please |
Test build #124759 has finished for PR 28926 at commit
|
Test build #5049 has finished for PR 28926 at commit
|
retest this please |
Test build #124791 has finished for PR 28926 at commit
|
@@ -2612,6 +2612,11 @@ object Sequence { | |||
val stepDays = step.days | |||
val stepMicros = step.microseconds | |||
|
|||
if(scale == MICROS_PER_DAY && stepMonths == 0 && stepDays == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val check = if (scale == MICROS_PER_DAY) { | ||
s""" | ||
if ($stepMonths == 0 && $stepDays == 0) { | ||
throw new IllegalArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation?
val check = if (scale == MICROS_PER_DAY) { | ||
s""" | ||
if ($stepMonths == 0 && $stepDays == 0) { | ||
throw new IllegalArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for generated code, we shall use the multiline string syntax:
s"""
|if ...
|""".stripMargin
Test build #124896 has finished for PR 28926 at commit
|
throw new IllegalArgumentException( | ||
"sequence step must be a day interval if start and end values are dates") | ||
} | ||
|
||
if (stepMonths == 0 && stepMicros == 0 && scale == MICROS_PER_DAY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably add comments for each branch. For example, this branch is for adding days to date start/end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
… github.com:TJX2014/spark into master-SPARK-31982-sequence-cross-dst-follow-presto
// Adding pure days to date start/end | ||
backedSequenceImpl.eval(start, stop, fromLong(stepDays)) | ||
|
||
} else if (stepMonths == 0 && stepDays == 0 && scale == 1) { | ||
// Adding pure microseconds to timestamp start/end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Thanks, I add more comments for pure days and months branch, the former exception check seems the exception content has enough informs, and the last branch seems have detail content.
Test build #125059 has finished for PR 28926 at commit
|
retest this please |
Test build #125097 has finished for PR 28926 at commit
|
Retest this please. |
Test build #125231 has finished for PR 28926 at commit
|
Retest this please. |
Test build #125242 has finished for PR 28926 at commit
|
retest this please |
Test build #125333 has finished for PR 28926 at commit
|
retest this please |
Test build #125358 has finished for PR 28926 at commit
|
### What changes were proposed in this pull request? This PR aims to disable SBT `unidoc` generation testing in Jenkins environment because it's flaky in Jenkins environment and not used for the official documentation generation. Also, GitHub Action has the correct test coverage for the official documentation generation. - #28848 (comment) (amp-jenkins-worker-06) - #28926 (comment) (amp-jenkins-worker-06) - #28969 (comment) (amp-jenkins-worker-06) - #28975 (comment) (amp-jenkins-worker-05) - #28986 (comment) (amp-jenkins-worker-05) - #28992 (comment) (amp-jenkins-worker-06) - #28993 (comment) (amp-jenkins-worker-05) - #28999 (comment) (amp-jenkins-worker-04) - #29010 (comment) (amp-jenkins-worker-03) - #29013 (comment) (amp-jenkins-worker-04) - #29016 (comment) (amp-jenkins-worker-05) - #29025 (comment) (amp-jenkins-worker-04) - #29042 (comment) (amp-jenkins-worker-03) ### Why are the changes needed? Apache Spark `release-build.sh` generates the official document by using the following command. - https://github.com/apache/spark/blob/master/dev/create-release/release-build.sh#L341 ```bash PRODUCTION=1 RELEASE_VERSION="$SPARK_VERSION" jekyll build ``` And, this is executed by the following `unidoc` command for Scala/Java API doc. - https://github.com/apache/spark/blob/master/docs/_plugins/copy_api_dirs.rb#L30 ```ruby system("build/sbt -Pkinesis-asl clean compile unidoc") || raise("Unidoc generation failed") ``` However, the PR builder disabled `Jekyll build` and instead has a different test coverage. ```python # determine if docs were changed and if we're inside the amplab environment # note - the below commented out until *all* Jenkins workers can get `jekyll` installed # if "DOCS" in changed_modules and test_env == "amplab_jenkins": # build_spark_documentation() ``` ``` Building Unidoc API Documentation ======================================================================== [info] Building Spark unidoc using SBT with these arguments: -Phadoop-3.2 -Phive-2.3 -Pspark-ganglia-lgpl -Pkubernetes -Pmesos -Phadoop-cloud -Phive -Phive-thriftserver -Pkinesis-asl -Pyarn unidoc ``` ### Does this PR introduce _any_ user-facing change? No. (This is used only for testing and not used in the official doc generation.) ### How was this patch tested? Pass the Jenkins without doc generation invocation. Closes #29017 from dongjoon-hyun/SPARK-DOC-GEN. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Retest this please. |
Test build #125419 has finished for PR 28926 at commit
|
retest this please |
Test build #125454 has finished for PR 28926 at commit
|
retest this please |
Test build #125487 has started for PR 28926 at commit |
Retest this please |
Test build #125506 has finished for PR 28926 at commit
|
retest this please |
Test build #125577 has finished for PR 28926 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. I tested PySpark/SparkR locally. Merged to master. Thank you, @TJX2014 and all!
Tests passed in 869 seconds
OK: 2306
Failed: 0
Warnings: 0
Skipped: 1
Thank you very much. |
What changes were proposed in this pull request?
1.Add time field steps check for date start/end in Sequence at
org.apache.spark.sql.catalyst.expressions.Sequence.TemporalSequenceImpl
2.Add a UT:
SPARK-32133: Sequence step must be a day interval if start and end values are dates
atorg.apache.spark.sql.catalyst.expressions.CollectionExpressionsSuite
Why are the changes needed?
Sequence time field steps for date start/end looks strange in spark as follows:
While this behavior in Prosto make sense:
Does this PR introduce any user-facing change?
Yes, after this patch, users will get informed
sequence step must be a day interval if start and end values are dates
whenuse time field steps for date start/end in Sequence.
How was this patch tested?
Unit test.