Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-3923: [Java] JDBC Time Fetches Without Timezone #3066

Closed
wants to merge 10 commits into from
Closed

ARROW-3923: [Java] JDBC Time Fetches Without Timezone #3066

wants to merge 10 commits into from

Conversation

mikepigott
Copy link
Contributor

@mikepigott mikepigott commented Dec 1, 2018

https://issues.apache.org/jira/browse/ARROW-3923

Hello! I was reading through the JDBC source code and I noticed that a java.util.Calendar was required for creating an Arrow Schema and Arrow Vectors from a JDBC ResultSet, when none is required.

This change makes the Calendar optional.

Unit Tests:
The existing SureFire plugin configuration uses a UTC calendar for the database, which is the default Calendar in the existing code. Likewise, no changes to the unit tests are required to provide adequate coverage for the change.

@xhochy xhochy changed the title [ARROW-3923] JDBC Time Fetches Without Timezone ARROW-3923: JDBC Time Fetches Without Timezone Dec 1, 2018
@xhochy xhochy changed the title ARROW-3923: JDBC Time Fetches Without Timezone ARROW-3923: [Java] JDBC Time Fetches Without Timezone Dec 1, 2018
@xhochy
Copy link
Member

xhochy commented Dec 5, 2018

@jacques-n @siddharthteotia @BryanCutler Can you have a look a this?

@mikepigott
Copy link
Contributor Author

Hi, I just posted #3133 which adds a configuration class to the conversion process. If you want, I can incorporate that change here, to make this a non-breaking change.

Bring master up to date.
@BryanCutler
Copy link
Member

I'm not too familiar with this module, let's ping the original author @atuldambalkar and @laurentgo to possibly take a look

@wesm
Copy link
Member

wesm commented Dec 14, 2018

@siddharthteotia @laurentgo @atuldambalkar can you review?

@laurentgo
Copy link
Collaborator

I'll try to give a review asap, but there some danger on relying on the default timezone, especially as Arrow is fairly explicit about timezones (that said, haven't fully reviewed it yet, so might be have addressed...)

@mikepigott
Copy link
Contributor Author

Thanks @laurentgo for taking a look! I was looking through the Arrow code and it looks like there are various variations on timestamp types, and not all of them require a time zone.

The current code forces everything to UTC if a time zone isn't provided, which didn't seem right to me, because the database field itself may not have a time zone attached to it. Likewise, it didn't seem right to enforce a time zone when Arrow didn't appear to need one.

@atuldambalkar
Copy link
Contributor

atuldambalkar commented Jan 24, 2019

Hi @mikepigott
Sorry for not reverting earlier. I looked at all the JIRAs - this one, 3965 and 3966. I understand that the fixes for 3966 which is Pull-3134 pretty much supersede earlier pull requests this one 3066 and 3133. In that case I will only review Pull 3134. I hope this is okay.

@mikepigott
Copy link
Contributor Author

Hi, 3966 supercedes 3965, but this one is independent - if this PR was rejected, I didn't want to impact those two. Thanks for taking a look!

@atuldambalkar
Copy link
Contributor

Hi @mikepigott - I looked at JDBC ResultSet API and I tend to agree with your changes and use the Calendar object if explicitly provided by the caller. Basically, in case of non-availability of Calendar object from the caller, we will be falling back to the way JDBC ResultSet API implements Date/Time related API.

Hi @laurentgo - What's your opinion?

Copy link
Contributor

@praveenbingo praveenbingo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be backward incompatible? previously UTC would have been used and not the local calendar..

@mikepigott
Copy link
Contributor Author

@praveenbingo - yes, you are right, this is a backwards-incompatible change. I have a separate PR, #3965 that introduces a configuration object. If that PR is approved before this one, I'm happy to rework this PR to use the configuration object and prevent this from being backwards-incompatible.

Arrow Master 1/29/2019
@praveenbingo
Copy link
Contributor

@mikepigott - Done. Please go ahead with your suggestion. Thanks.

@atuldambalkar
Copy link
Contributor

Sounds good @praveenbingo @mikepigott . I will take a look at PR 3966.

@mikepigott
Copy link
Contributor Author

@praveenbingo: I have pulled in the JdbcToArrowConfig from master and reverted the breaking change. This is ready for your review!

@codecov-io
Copy link

Codecov Report

Merging #3066 into master will increase coverage by 0.77%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3066      +/-   ##
==========================================
+ Coverage   87.77%   88.55%   +0.77%     
==========================================
  Files         659      434     -225     
  Lines       82110    53263   -28847     
  Branches     1069        0    -1069     
==========================================
- Hits        72076    47165   -24911     
+ Misses       9923     6098    -3825     
+ Partials      111        0     -111
Impacted Files Coverage Δ
cpp/src/arrow/csv/reader.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/array/builder_union.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/array/builder_union.cc 4.16% <0%> (-95.84%) ⬇️
cpp/src/arrow/csv/reader.cc 0.53% <0%> (-90.38%) ⬇️
cpp/src/arrow/adapters/orc/adapter.cc 0.26% <0%> (-73.04%) ⬇️
cpp/src/plasma/eviction_policy.cc 54.9% <0%> (-45.1%) ⬇️
cpp/src/plasma/events.cc 52.5% <0%> (-35%) ⬇️
cpp/src/arrow/csv/options.h 66.66% <0%> (-33.34%) ⬇️
cpp/src/plasma/thirdparty/ae/ae.c 41.7% <0%> (-29.39%) ⬇️
cpp/src/parquet/column_scanner.cc 59.09% <0%> (-22.73%) ⬇️
... and 277 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f08e109...4d95da0. Read the comment docs.

Copy link
Contributor

@praveenbingo praveenbingo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. Thanks @mikepigott

@wesm @siddharthteotia - Could you please help merging.

@xhochy
Copy link
Member

xhochy commented Feb 4, 2019

@siddharthdave If there are no objections from your side, I'll merge this tomorrow.

@siddharthdave
Copy link
Contributor

@xhochy , you probably meant @siddharthteotia :-)

@xhochy xhochy closed this in c93e2ae Feb 5, 2019
@xhochy
Copy link
Member

xhochy commented Feb 5, 2019

Thank you @mikepigott !

xhochy pushed a commit that referenced this pull request Feb 8, 2019
https://issues.apache.org/jira/browse/ARROW-3923

Hello!  I was reading through the JDBC source code and I noticed that a java.util.Calendar was required for creating an Arrow Schema and Arrow Vectors from a JDBC ResultSet, when none is required.

This change makes the Calendar optional.

Unit Tests:
The existing SureFire plugin configuration uses a UTC calendar for the database, which is the default Calendar in the existing code.  Likewise, no changes to the unit tests are required to provide adequate coverage for the change.

Author: Michael Pigott <mikepigott@users.noreply.github.com>
Author: Mike Pigott <mpigott@gmail.com>

Closes #3066 from mikepigott/jdbc-timestamp-no-calendar and squashes the following commits:

4d95da0 <Mike Pigott> ARROW-3923: Supporting a null Calendar in the config, and reverting the breaking change.
cd9a230 <Mike Pigott> Merge branch 'master' into jdbc-timestamp-no-calendar
509a1cc <Michael Pigott> Merge pull request #5 from apache/master
789c8c8 <Michael Pigott> Merge pull request #4 from apache/master
e5b19ee <Michael Pigott> Merge pull request #3 from apache/master
3b17c29 <Michael Pigott> Merge pull request #2 from apache/master
881c6c8 <Michael Pigott> Merge pull request #1 from apache/master
089cff4 <Mike Pigott> Format fixes
a58a4a5 <Mike Pigott> Fixing calendar usage.
e12832a <Mike Pigott> Allowing for timestamps without a time zone.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants