-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POSIXct -> nanotime -> POSIXct consistency #115
Comments
Can we back off here a little? What is Nobody claims |
Hi Eddel. I agree: information will always be lost in conversion, but you could make it less lossy by using the digits beyond µs precision. For example: > nanotime::as.nanotime(bit64::as.integer64(as.numeric(ts)*1E9))
[1] 2023-06-03 23:25:15.557356032
> as.POSIXct(nanotime::as.nanotime(bit64::as.integer64(as.numeric(ts)*1E9))) == ts
[1] TRUE Here I generate some random timestamps in > tmp <- as.POSIXct(replicate(100000, utils$time$now()), origin = '1970-01-01')
> length(unique(tmp))
[1] 100000
> mean(tmp == as.POSIXct(nanotime::as.nanotime(bit64::as.integer64(as.numeric(tmp)*1E9))))
[1] 0.9317
> mean(tmp == as.POSIXct(nanotime::as.nanotime(tmp)))
[1] 0.23126 As is now, 23 % of timestamps were transformed consistently, while if I use the PS |
Here is a good explanation of the issue, and a way to solve. I dind't find this Rcpp::cppFunction('
List modf(double x) {
double ipart;
double fraction = modf(x, &ipart);
List result = List::create(_["int"] = ipart, _["fraction"] = fraction);
return result;
}
')
splt <- data.table::data.table(ts = tmp)
splt <- cbind(splt, data.table::rbindlist(lapply(as.numeric(splt$ts), modf)))
splt[, int64 := bit64::as.integer64(int)*1E9 + round(fraction*1E9)]
splt[, ntime := nanotime::as.nanotime(int64)]
splt[, consistent := ts == as.POSIXct(ntime)]
mean(splt$consistent) # outputs 1 Hence 100 % consistency, at least on the 100k previous samples. Wonder whether something like this would work in all cases? I don't understand all the details/pecularities yet though, so take it with a big grain of salt ;). |
A bit further testing:
For timestamps close to the epoch, you can't guarantee the consistency since the fractional part of POSIXct is much more precise than nanotime in that region. But from a few months onward from the epoch, when the integer part gets relatively larger, I believe you can. |
And here is a proposal that I tested on many current (1M) and random (between 1900-2100) timestamps: #' POSIXct to nanotime
#' Enforces POSIXct -> nanotime -> POSIXct consistency for timestamps
#' further away than approximately +/- 3 months away from Unix epoch.
#' See: https://github.com/eddelbuettel/nanotime/issues/115
as.nanotime.POSIXct <- function(x) {
stopifnot(inherits(x, 'POSIXct'))
xx <- as.numeric(x)
frac <- sign(xx) * (abs(xx) %% 1) # equivalent to C++ modf
nanotime::as.nanotime(bit64::as.integer64(xx)*1E9L + round(frac*1E9))
} |
That one I was aware of -- the basic limitation of a Which I had not carried 'forward' to 'much less precision for future dates' and 'backward' to 'more (but "useless") precision for past dates'. And I like your proposal and the use of |
So the original implementation is: setMethod("nanotime",
"POSIXct",
function(x) {
## force last three digits to be zero
n <- names(x)
res <- new("nanotime", as.integer64(as.numeric(x) * 1e6) * 1000)
if (!is.null(n)) {
names(S3Part(res, strictS3=TRUE)) <- n
}
res
}) The idea was to limit the precision to about the significant digits at our current epoch, which in retrospect is not of much use and probably annoying. I do like the idea of a round trip that is mostly lossless, but looking at performance it does come at quite a cost: library(nanotime)
library(rbenchmark)
to_nanotime_1 <- function(x) {
n <- names(x)
xx <- as.numeric(x)
frac <- sign(xx) * (abs(xx) %% 1) # equivalent to C++ modf
res <- new("nanotime", bit64::as.integer64(xx)*1E9L + round(frac*1E9))
if (!is.null(n)) {
names(S3Part(res, strictS3=TRUE)) <- n
}
res
}
to_nanotime_2 <- function(x) {
n <- names(x)
xx <- as.numeric(x)
frac <- xx - trunc(xx)
res = new("nanotime", bit64::as.integer64(xx)*1E9 + round(frac*1E9))
if (!is.null(n)) {
names(S3Part(res, strictS3=TRUE)) <- n
}
res
}
to_nanotime_3 <- function(x) {
n <- names(x)
res <- new("nanotime", bit64::as.integer64(as.numeric(x) * 1e9))
if (!is.null(n)) {
names(S3Part(res, strictS3=TRUE)) <- n
}
res
}
d <- 1:1e6 + runif(1e6)
pct <- as.POSIXct(d, origin="1970-01-01")
benchmark(
"n1" = {
to_nanotime_1(pct)
},
"n2" = {
to_nanotime_2(pct)
},
"n3" = {
to_nanotime_3(pct)
},
replications = 100,
columns = c("test", "replications", "elapsed",
"relative", "user.self", "sys.self"))
Which on my computer gives:
So maybe I'd be in favor of changing the current behavior to not truncate to milliseconds, but not try to preserve the round trip either. Except for a region around the epoch, these floating point numbers are equivalent in terms of significant digits. What are your thoughts? |
Maybe a hybrid method by adding a new argument that allows to switch between 'fast(er)' and '(more) accurate'? |
@lsilvest, here are a few regression tests that may be useful: local({
# This one should break if cutting off on µs
# See: https://github.com/eddelbuettel/nanotime/issues/115
ts <- as.POSIXct(as.numeric('0x1.91ef2f6e3abb9p+30'), origin = '1970-01-01')
stopifnot(ts == as.POSIXct(as.nanotime.POSIXct(ts)))
# This one should break if multiplying the double by 1E9 directly
# without considering the integer and fraction parts separately.
ts <- as.POSIXct(as.numeric('0x1.91f18e4d0065p+30'), origin = '1970-01-01')
stopifnot(ts == as.POSIXct(as.nanotime.POSIXct(ts)))
# This one is negative, making sure that negative doubles are also consistent.
ts <- as.POSIXct(as.numeric('-0x1.c6e8c4d077ae4p+30'), origin = '1970-01-01')
stopifnot(ts == as.POSIXct(as.nanotime.POSIXct(ts)))
}) The If my preference matters: I would also allow a switch. PS a <- bit64::as.integer64(2)
b <- bit64::as.integer64(2)
microbenchmark::microbenchmark(a*b, 2*2)
# Unit: nanoseconds
# expr min lq mean median uq max neval
# a * b 56300 58200 63766 59200 62050 168700 100
# 2 * 2 0 100 250 200 300 1300 100 |
I think it's optimized, but you are measuring the overhead of the S3 dispatch. If you choose a long vector I think you will see And thanks for the regression tests! Back to some considerations on the problem, and I'll try to characterize exactly what we would be doing if we didn't do the rounding proposed here. If I take the second number provided by @DavorJ and also look at the closest shortest decimal representations of the floating point immediately before and the floating point immediately after, we get:
So what we would be doing with the uncorrected version is effectively mapping And if we want to think about the precision that the fractional part of POSIXct the other way round:
Still unsure what is the best option here, but the above shows the rounding errors are happening in the zone where POSIXct can in any case not ensure the exactness of its decimal. So I think it's not the precision we are bothered with here, but the effect of surprise on a user who will not get a guaranteed round trip. |
Had the exact same thought when reading: the benchmark is measuring the forced type conversion that is the cost and "tax" of I alsso like the follow-up. It is know that POSIXct has design deficiencies and limits. We cannot change that. Maybe the best we can do is to consider a second (optional) conversion as suggested. |
Yes, that's most likely the best option. I'm slightly leaning towards having the exact conversion as default, maybe with the additional parameter |
I actually like that -- both the new as default choice, and the new opt-in switch for which |
add parameter 'accurate' for conversions from 'POSIXct'; closes #115
Take this timestamp:
Hence the original POSIXct is not preserved.
Nanotime has extra 3 digits of precision, but it seems that POSIXct double precision is cut off at µs and the 3 extra digits are not utilized (i.e. they are hardcoded to 0). If they were used to describe the double, then this situation would not be possible, and POSIXct -> nanotime -> POSIXct consistency would be guaranteed, no?
One could also argue that the conversion, as is now, is loosing information from POSIXct timestamp.
I am using
Rcpp_1.0.7
,RcppCCTZ_0.2.10
,zoo_1.8-12
andnanotime_0.3.7
on R v3.6.3.The text was updated successfully, but these errors were encountered: