-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for R Dates and POSIXct but no support for handling timezone support #35
Conversation
…of data frames now)
* requires FileIO rdata_single branch to be pulled https://github.com/jsams/FileIO.jl/tree/rdata_single/
According to this StackOverflow answer, the three-letter abbreviations are only used for printing, but |
src/convert.jl
Outdated
if hasnames(rv) | ||
if class(rv) == ["Date"] | ||
return date2julia(rv, hasna, nas) | ||
elseif class(rv) == ["POSIXct"; "POSIXt"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
;
is a concatenation operator, should be just ,
.
Also, since you are reusing ["Date"]
and ["POSIXct", ...]
constants here and in date[time]2julia()
, it makes sense to declare them as constants (const Date_class
, POSIXct_class
)
Thanks for your work!
You're right, it's not ideal :) For 2 reasons: 1) it's harder to review, 2) I cannot merge it since it clashes with #33.
RCall has some tests for datetime. |
On timezones, it's not clear to me what the behavior should be if I try to do that. There's three scenarios:
(Despite the stack overflow question @nalimilan referred to, all the data I have on me has the three letter abbreviations in tzone, though it does also generate more reliable strings when given enough information at constructor time.) Because they are so much easier to work with, I think we should prefer DateTimes over ZonedDateTimes where possible. So, in my mind, the correct behavior would be
Given my experience, I think 1. is the correct behavior as it would allow correct round-robin between R and Julia if Julia were to also write out in GMT but without specifying a timezone. Anything else I think would produce issues. I think 3. is about the best we can do, and the lack of a ZonedDateTime let's the user know we aren't taking a strong stance on the accuracy of the time zone. But I also see that 2 could just as easily be: Unfortunately, trying to test anything that plays with localtime will be a pain to test as the results will depend on what timezone the machine doing the test is in. The test would basically just duplicate the timezone converting code in the function, which kind of defeats the point of testing. Alternatively, we can just return ZonedDateTime in all instances, and if the user wants it in localtime as a DateTime, they can convert post-import. Here, we would still treat scenario 1 as GMT and scenario 3 as localtime with a warning. This is much easier to implement/test, and I think would make the code much less complex. |
In general we should try to keep the imported data intact. What about the following plan:
The default |
I'm very reluctant to discard any timezone information silently, as it can give completely incorrect results. I think a good rule would be to adopt exactly the same behavior as R. That is, we should interpret time zone codes just as it would on conversion, so that further operations in Julia will give the same behavior as in R. According to |
I've pushed a commit that returns ZonedDateTimes in all instances and makes a best effort attempt to use the available timezone information and falls back to UTC otherwise. As TimeZones.jl improves, the guesses of the TimeZone should as well. I think kludging together our own DateTime type is less useful than warning and providing a type that already has a reasonably rich set of functions for converting from different time zones. Also, I don't have time to do all of that. I need to get back to working on my dissertation full time. I've tried to craft these PR's to be useful for you and your package and to get them to your standards. But I don't have time to chase after moving goal posts. Thanks for the guidance and writing the package in the first place! |
My goal wasn't to discourage you. It's fine to start with a limited support for time zones, as long as the warning makes it clear to users that something may not work as they expect. We can always make the system more complex later when somebody needs it. |
I'll see if I can update this to the latest master as well later this evening. Thanks for pulling in the rds stuff! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for resuming this PR!
Overall, I looks fine, except few rearrangements (jlvec()
) and avoiding DataArrays
.
src/convert.jl
Outdated
@@ -1,6 +1,9 @@ | |||
# converters from selected RSEXPREC to Hash | |||
# They are used to translate SEXPREC attributes into Hash | |||
|
|||
import TimeZones: unix2zdt | |||
import DataArrays: @data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need DataArray
. Currently RData
doesn't require DataArrays
.
Vector{Union{Data, Missing}}
would do just fine.
src/convert.jl
Outdated
@@ -98,8 +101,11 @@ function sexp2julia(rv::RVEC) | |||
# TODO dimnames? | |||
# FIXME add force_missing option to control whether always convert to Union{T, Missing} | |||
jv = jlvec(rv, false) | |||
if hasnames(rv) | |||
# if data has no NA, convert to simple Vector | |||
if class(rv) == R_Date_Class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current design jlvec()
is supposed to do the job of converting R vectors to Julia ones.
So class(rv)
checking should go to jlvec(rv::RVEC)
, data2julia()
should become jlvec(Dates.Date, rv::RNullableVector{R}, force_missing::Bool=true)
and datetime2julia()
-> jlvec(DateTime, ...)
.
The benefit is that you don't have to replicate the support of vector names or scalar conversion (which should only be done for top-level objects or for list elements, but not for dataframes).
src/convert.jl
Outdated
@@ -128,3 +134,53 @@ function sexp2julia(rl::RList) | |||
map(sexp2julia, rl.data) | |||
end | |||
end | |||
|
|||
function date2julia(rv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jlvec(Date, ...)
as noted above
src/convert.jl
Outdated
ZonedDateTime(Dates.unix2datetime(seconds), tz, from_utc=true) | ||
end | ||
|
||
function datetime2julia(rv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jlvec(DateTime, ...)
as noted above
src/convert.jl
Outdated
else | ||
dates = Dates.epochdays2date.(rv.data .+ epoch_conv) | ||
end | ||
if hasnames(rv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just return dates
. Any further processing should be handled by sexp2julia()
end | ||
|
||
function unix2zdt(seconds::Real; tz::TimeZone=tz"UTC") | ||
ZonedDateTime(Dates.unix2datetime(seconds), tz, from_utc=true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if tz != tz"UTC"
, is from_utc
still correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The tests ensure that the time and timezone is preserved for a non-UTC timezone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method definition is a duplicate of that imported from TimeZones
test/RDS.jl
Outdated
@@ -1,7 +1,9 @@ | |||
module TestRDS | |||
using Base.Test | |||
using DataArrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for DataArrays
.
test/RDS.jl
Outdated
@@ -42,5 +44,59 @@ module TestRDS | |||
@test eltypes(rdf_decomp) == eltypes(df) | |||
@test isequal(rdf_decomp, df) | |||
end | |||
|
|||
@testset "Test Date conversion" begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok to have extensive date/datetime tests in RDS testset, but we also need to check that conversion of Date/DateTime columns is also supported (I guess with the current code it would fail for 1-row dataframes). Could you please add these to RDA.jl
?
NEWS.md
Outdated
@@ -2,9 +2,11 @@ | |||
|
|||
##### Changes | |||
* add support for `.rds` files (single object data files from R) [#22], [#33] | |||
* add support for `Date` and `POSIXct`, though still lacking complete timezone handling [#34] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it would fail if R has some exotic timezone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it fails for e.g. 3-letter timezones, as I documented before. And generally, I'm outsourcing all timezone handling to TimeZones.jl. Anything they don't handle, I don't. Further, anything can be put in that attribute (which is after all just a string); so, it's entirely possible that R allows things that would be difficult to know what to do with.
In short, I'm not fully replicating R's behavior.
test/RDS.jl
Outdated
@test datetimes[1] == ZonedDateTime(DateTime("2017-01-01T21:23"), tz"UTC") | ||
# tz"CST" is invalid, but if TimeZones ever enables support for these 3 | ||
# letter codes, a test would be useful. For now, intentionally not testing | ||
#@test_broken datetimes[2] == ZonedDateTime(DateTime("2017-01-01T13:23"), tz"CST") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to test that unsupported timezones behave as we expect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is why I have the @test_warn
when loading the file. It warns that things may not be working as expected. That said, I couldn't get the @test_broken
to actually work the way I thought it was supposed to. If you have better advice here, I'll try it out. Will comment again when I have made your suggested changes.
* refactor to use jlvec * replace try block with key lookup * remove DataArray dependency * factor date conversion to rdays2date * add test for date and datetimes in data frames, including single row
Ok, made the requested changes and added dataframe checks for both single and multi-row dataframes. addendum: Made another little refactoring that just occurred to me makes sense. It's not clear what to do when R does not specify any timezone. Right now, I'm assuming UTC and not warning. It might be reasonable to want to warn about this? I'll leave that decision to you as to how noisy and pedantic you want your library to be. |
allow specifying fallback (default) timezone
@jsams Thanks for the updates! I've made some small changes, I hope you are fine with them. |
Yea, I don't have any problems with those changes. Thanks! |
Branched off my rds branch, which maybe isn't ideal for you, but hopefully it works out.
It seems like supporting timezones is not possible because R uses an ambiguous string format. It doesn't seem as if RCall is doing anything about timezones either. I am a little concerned that RCall's magic constant for converting dates is different from mine, but you can see where mine comes from and is producing correct results.
except for the timezone issue, should close #34