Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4508] [SQL] build native date type to conform behavior to Hive #3732

Closed
wants to merge 7 commits into from

Conversation

adrian-wang
Copy link
Contributor

Store daysSinceEpoch as an Int value(4 bytes) to represent DateType, instead of using java.sql.Date(8 bytes as Long) in catalyst row. This ensures the same comparison behavior of Hive and Catalyst.
Subsumes #3381
I thinks there are already some tests in JavaSQLSuite, and for python it will not affect python's datetime class.

@SparkQA
Copy link

SparkQA commented Dec 18, 2014

Test build #24576 has started for PR 3732 at commit 2e167a4.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 18, 2014

Test build #24576 has finished for PR 3732 at commit 2e167a4.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class Date extends Ordered[Date] with Serializable

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24576/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Dec 18, 2014

Test build #24579 has started for PR 3732 at commit 6ef2b1f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 18, 2014

Test build #24579 has finished for PR 3732 at commit 6ef2b1f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class Date extends Ordered[Date] with Serializable

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24579/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Dec 29, 2014

Test build #24862 has started for PR 3732 at commit 3b4d5d8.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 29, 2014

Test build #24862 has finished for PR 3732 at commit 3b4d5d8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class Date extends Ordered[Date] with Serializable

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24862/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Dec 30, 2014

Test build #24882 has started for PR 3732 at commit d66b01c.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 30, 2014

Test build #24882 has finished for PR 3732 at commit d66b01c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class Date extends Ordered[Date] with Serializable

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24882/
Test PASSed.

@marmbrus
Copy link
Contributor

I'm a little confused about what semantics various systems are providing here. Does java.sql.Data provide greater precision that hive? If so, I'm not sure if we want to aim for the lowest common denominator. Can you investigate how other systems handle this?

@SparkQA
Copy link

SparkQA commented Jan 4, 2015

Test build #25022 has started for PR 3732 at commit 2c2bda4.

  • This patch merges cleanly.

@adrian-wang
Copy link
Contributor Author

Hi @marmbrus I have tried MySQL, and MySQL will treat cast('2015-01-03 18:25:04' as date) == cast('2015-01-03 18:29:02' as date), there is such a precision lost.

@adrian-wang
Copy link
Contributor Author

The code seems conflict now, I have just rebased

@SparkQA
Copy link

SparkQA commented Jan 4, 2015

Test build #25022 has finished for PR 3732 at commit 2c2bda4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class Date extends Ordered[Date] with Serializable

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25022/
Test PASSed.

@adrian-wang
Copy link
Contributor Author

and MySQL doesn't allow to cast date back to timestamp type

@adrian-wang
Copy link
Contributor Author

For oracle's doc about java.sql.Date, In https://docs.oracle.com/javase/6/docs/api/java/sql/Date.html
To conform with the definition of SQL DATE, the millisecond values wrapped by a java.sql.Date instance must be 'normalized' by setting the hours, minutes, seconds, and milliseconds to zero in the particular time zone with which the instance is associated.

For SQL-92 spec, http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
In page 111,
2) For a ,

        Case:

        a) If DATE is specified, then the data type contains the <date-
          time field>s years, months, and days.

The spec also clarifies that the definition is taken from ISO-8601[http://www.w3.org/TR/NOTE-datetime], in which it says,
Complete date:
YYYY-MM-DD (eg 1997-07-16)

Hence, the time part of java.sql.Date is not necessary. I think that simply results from the inheritance of the same base type, namely java.util.Date, with java.sql.Timestamp

@marmbrus
Copy link
Contributor

Hmm, okay. Thanks for doing the research! Can you fix the conflict so we can merge please?

/cc @rxin more api considerations...

@rxin
Copy link
Contributor

rxin commented Jan 11, 2015

Actually can we merge this after we merge #3958 ?

So @adrian-wang you will likely need to do a slightly larger rebase (should still be straightforward though).

@adrian-wang
Copy link
Contributor Author

Sure, I'll wait #3958 till it is merged.

@rxin
Copy link
Contributor

rxin commented Jan 14, 2015

@adrian-wang #3958 has been merged. Can you bring this PR up to date?

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25525 has started for PR 3732 at commit 810c8c3.

  • This patch merges cleanly.

@adrian-wang
Copy link
Contributor Author

Yes, I have just done the rebasing.

@SparkQA
Copy link

SparkQA commented Jan 28, 2015

Test build #26213 has started for PR 3732 at commit c37832b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 28, 2015

Test build #26215 has started for PR 3732 at commit a2fdd4e.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 28, 2015

Test build #26213 has finished for PR 3732 at commit c37832b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26213/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Jan 28, 2015

Test build #26215 has finished for PR 3732 at commit a2fdd4e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26215/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Jan 28, 2015

Test build #26216 has started for PR 3732 at commit 0ed0fdc.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 28, 2015

Test build #26210 has finished for PR 3732 at commit f0005b1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26210/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Jan 28, 2015

Test build #26216 has finished for PR 3732 at commit 0ed0fdc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26216/
Test PASSed.

@marmbrus
Copy link
Contributor

marmbrus commented Feb 2, 2015

Thanks! Merging to master.

@asfgit asfgit closed this in 1646f89 Feb 2, 2015
asfgit pushed a commit that referenced this pull request Feb 3, 2015
The previous #3732 is reverted due to some test failure.
Have fixed that.

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #4325 from adrian-wang/datenative and squashes the following commits:

096e20d [Daoyuan Wang] fix for mixed timezone
0ed0fdc [Daoyuan Wang] fix test data
a2fdd4e [Daoyuan Wang] getDate
c37832b [Daoyuan Wang] row to catalyst
f0005b1 [Daoyuan Wang] add date in sql parser and java type conversion
024c9a6 [Daoyuan Wang] clean some import order
d6715fc [Daoyuan Wang] refactoring Date as Primitive Int internally
374abd5 [Daoyuan Wang] spark native date type support
asfgit pushed a commit that referenced this pull request Feb 3, 2015
The previous #3732 is reverted due to some test failure.
Have fixed that.

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #4325 from adrian-wang/datenative and squashes the following commits:

096e20d [Daoyuan Wang] fix for mixed timezone
0ed0fdc [Daoyuan Wang] fix test data
a2fdd4e [Daoyuan Wang] getDate
c37832b [Daoyuan Wang] row to catalyst
f0005b1 [Daoyuan Wang] add date in sql parser and java type conversion
024c9a6 [Daoyuan Wang] clean some import order
d6715fc [Daoyuan Wang] refactoring Date as Primitive Int internally
374abd5 [Daoyuan Wang] spark native date type support

(cherry picked from commit db821ed)
Signed-off-by: Michael Armbrust <michael@databricks.com>
guavuslabs-builder pushed a commit to ThalesGroup/spark that referenced this pull request Mar 23, 2015
This PR might have some issues with apache#3732 ,
and this would have merge conflicts with apache#3820 so the review can be delayed till that 2 were merged.

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes apache#3822 from adrian-wang/parquetdate and squashes the following commits:

2c5d54d [Daoyuan Wang] add a test case
faef887 [Daoyuan Wang] parquet support for primitive date
97e9080 [Daoyuan Wang] parquet support for date type
asfgit pushed a commit that referenced this pull request Mar 23, 2015
This PR might have some issues with #3732 ,
and this would have merge conflicts with #3820 so the review can be delayed till that 2 were merged.

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #3822 from adrian-wang/parquetdate and squashes the following commits:

2c5d54d [Daoyuan Wang] add a test case
faef887 [Daoyuan Wang] parquet support for primitive date
97e9080 [Daoyuan Wang] parquet support for date type

(cherry picked from commit 4659468)
Signed-off-by: Cheng Lian <lian@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants