POSIXct -> nanotime -> POSIXct consistency #115

DavorJ · 2023-06-03T23:59:23Z

Take this timestamp:

ts <- as.POSIXct(as.numeric("0x1.91ef2f6e3abb9p+30"), origin = '1970-01-01')

> print(as.numeric(ts), digits = 22)
[1] 1685834715.5573561

> nanotime::as.nanotime(ts)
[1] 2023-06-03 23:25:15.557356000

> print(as.numeric(as.POSIXct(nanotime::as.nanotime(ts))), digits = 22)
[1] 1685834715.5573559

Hence the original POSIXct is not preserved.

Nanotime has extra 3 digits of precision, but it seems that POSIXct double precision is cut off at µs and the 3 extra digits are not utilized (i.e. they are hardcoded to 0). If they were used to describe the double, then this situation would not be possible, and POSIXct -> nanotime -> POSIXct consistency would be guaranteed, no?

One could also argue that the conversion, as is now, is loosing information from POSIXct timestamp.

I am using Rcpp_1.0.7, RcppCCTZ_0.2.10, zoo_1.8-12 and nanotime_0.3.7 on R v3.6.3.

The text was updated successfully, but these errors were encountered:

eddelbuettel · 2023-06-04T00:10:11Z

Can we back off here a little? What is "0x1.91ef2f6e3abb9p+30" supposed to be, and how do you reason it (whatever it is( fits into a _POSIXct which we now is a (64 bit) double (with a 53 bit mantissa) ?

Nobody claims nanotime can "losslessly" convert from "any possible POSIXct representation". All we claim, just like Python and Julia and others do, is that time can be represented as nanosecond increments since the epoch. No more, no less.

DavorJ · 2023-06-04T08:16:24Z

Hi Eddel. 0x1.91ef2f6e3abb9p+30 is just a sprintf('%a', number) representation of a double. The conversion is fine with most doubles, but not with all. Hence to make the example reproducible I need a very specific double.

I agree: information will always be lost in conversion, but you could make it less lossy by using the digits beyond µs precision. For example:

> nanotime::as.nanotime(bit64::as.integer64(as.numeric(ts)*1E9))
[1] 2023-06-03 23:25:15.557356032

> as.POSIXct(nanotime::as.nanotime(bit64::as.integer64(as.numeric(ts)*1E9))) == ts
[1] TRUE

Here I generate some random timestamps in tmp and then check in how many cases the conversion is consistent:

> tmp <- as.POSIXct(replicate(100000, utils$time$now()), origin = '1970-01-01')
> length(unique(tmp))
[1] 100000
> mean(tmp == as.POSIXct(nanotime::as.nanotime(bit64::as.integer64(as.numeric(tmp)*1E9))))
[1] 0.9317
> mean(tmp == as.POSIXct(nanotime::as.nanotime(tmp)))
[1] 0.23126

As is now, 23 % of timestamps were transformed consistently, while if I use the *1E9 it becomes 93 %. My question is, can we make it 100 %? Using *1E9 will introduce rounding errors of its own. A better way would be to just take the decimal part of the double and round it up to ns precision, and then convert to nanotime.

PS utils$time$now() is just an internal function based on microbenchmark::get_nanotime() I use to generate more exact now-timestamps since Sys.time() is not precise enough (i.e. it will generate lots of duplicates.)

DavorJ · 2023-06-04T09:07:28Z

Here is a good explanation of the issue, and a way to solve.

I dind't find this modf exposed by any package, so here is something very quick & dirty:

Rcpp::cppFunction('
  List modf(double x) {

    double ipart;
    double fraction = modf(x, &ipart);

    List result = List::create(_["int"] = ipart, _["fraction"] = fraction);
    return result;
  }
')

splt <- data.table::data.table(ts = tmp)
splt <- cbind(splt, data.table::rbindlist(lapply(as.numeric(splt$ts), modf)))
splt[, int64 := bit64::as.integer64(int)*1E9 + round(fraction*1E9)]
splt[, ntime := nanotime::as.nanotime(int64)]
splt[, consistent := ts == as.POSIXct(ntime)]
mean(splt$consistent) # outputs 1

Hence 100 % consistency, at least on the 100k previous samples. Wonder whether something like this would work in all cases? I don't understand all the details/pecularities yet though, so take it with a big grain of salt ;).

DavorJ · 2023-06-04T09:47:32Z

A bit further testing:

                            ts      int     fraction             int64                         ntime consistent
 1: 1970-01-02 13:12:18.245308   133938  0.245308399   133938245308399 1970-01-02 13:12:18.245308399      FALSE
 2: 1970-02-11 13:07:27.118945  3589647  0.118945599  3589647118945599 1970-02-11 13:07:27.118945599      FALSE
 3: 1969-12-05 00:53:03.532405 -2329617 -0.467594147 -2329617467594147 1969-12-05 00:53:02.532405853      FALSE

For timestamps close to the epoch, you can't guarantee the consistency since the fractional part of POSIXct is much more precise than nanotime in that region. But from a few months onward from the epoch, when the integer part gets relatively larger, I believe you can.

DavorJ · 2023-06-04T10:15:50Z

And here is a proposal that I tested on many current (1M) and random (between 1900-2100) timestamps:

#' POSIXct to nanotime
#' Enforces POSIXct -> nanotime -> POSIXct consistency for timestamps
#' further away than approximately  +/- 3 months away from Unix epoch.
#' See: https://github.com/eddelbuettel/nanotime/issues/115
as.nanotime.POSIXct <- function(x) {
  stopifnot(inherits(x, 'POSIXct'))
  xx <- as.numeric(x)
  frac <- sign(xx) * (abs(xx) %% 1) # equivalent to C++ modf
  nanotime::as.nanotime(bit64::as.integer64(xx)*1E9L + round(frac*1E9))
}

eddelbuettel · 2023-06-04T15:22:53Z

For timestamps close to the epoch, you can't guarantee the consistency since the fractional part of POSIXct is much more precise than nanotime in that region. But from a few months onward from the epoch, when the integer part gets relatively larger, I believe you can.

That one I was aware of -- the basic limitation of a double for a timestamp is the "uneven distribution" over the range of values. Which is related to the fact that we can have only have 'a little less than a microsec' precision for dates right now.

Which I had not carried 'forward' to 'much less precision for future dates' and 'backward' to 'more (but "useless") precision for past dates'.

And I like your proposal and the use of modf() (and it is not often that I an old C math library function that I had never (?) heard of (as the operator dominated my use)). I am sure @lsilvest and I will take a look!

lsilvest · 2023-06-05T04:27:06Z

So the original implementation is:

setMethod("nanotime",
          "POSIXct",
          function(x) {
              ## force last three digits to be zero
              n <- names(x)
              res <- new("nanotime", as.integer64(as.numeric(x) * 1e6) * 1000)
              if (!is.null(n)) {
                  names(S3Part(res, strictS3=TRUE)) <- n
              }
              res
          })

The idea was to limit the precision to about the significant digits at our current epoch, which in retrospect is not of much use and probably annoying.

I do like the idea of a round trip that is mostly lossless, but looking at performance it does come at quite a cost:

library(nanotime)
library(rbenchmark)


to_nanotime_1 <- function(x) {
    n <- names(x)
    xx <- as.numeric(x)
    frac <- sign(xx) * (abs(xx) %% 1) # equivalent to C++ modf
    res <- new("nanotime", bit64::as.integer64(xx)*1E9L + round(frac*1E9))
    if (!is.null(n)) {
        names(S3Part(res, strictS3=TRUE)) <- n
    }
    res
}

to_nanotime_2 <- function(x) {
    n <- names(x)
    xx <- as.numeric(x)
    frac <- xx - trunc(xx)
    res = new("nanotime", bit64::as.integer64(xx)*1E9 + round(frac*1E9))
    if (!is.null(n)) {
        names(S3Part(res, strictS3=TRUE)) <- n
    }
    res
}

to_nanotime_3 <- function(x) {
    n <- names(x)
    res <- new("nanotime", bit64::as.integer64(as.numeric(x) * 1e9))
    if (!is.null(n)) {
        names(S3Part(res, strictS3=TRUE)) <- n
    }
    res    
}


d <- 1:1e6 + runif(1e6)
pct <- as.POSIXct(d, origin="1970-01-01")

benchmark(
    "n1" = {
        to_nanotime_1(pct)
    },
    "n2" = {
        to_nanotime_2(pct)
    },
    "n3" = {
        to_nanotime_3(pct)
    },
    replications = 100,
    columns = c("test", "replications", "elapsed",
                "relative", "user.self", "sys.self"))

Which on my computer gives:

  test replications elapsed relative user.self sys.self
1   n1          100   3.258    6.845     3.259        0
2   n2          100   2.900    6.092     2.900        0
3   n3          100   0.476    1.000     0.475        0

So maybe I'd be in favor of changing the current behavior to not truncate to milliseconds, but not try to preserve the round trip either. Except for a region around the epoch, these floating point numbers are equivalent in terms of significant digits. What are your thoughts?

eddelbuettel · 2023-06-05T11:29:36Z

Maybe a hybrid method by adding a new argument that allows to switch between 'fast(er)' and '(more) accurate'?

DavorJ · 2023-06-05T18:50:15Z

@lsilvest, here are a few regression tests that may be useful:

local({
  # This one should break if cutting off on µs
  # See: https://github.com/eddelbuettel/nanotime/issues/115
  ts <- as.POSIXct(as.numeric('0x1.91ef2f6e3abb9p+30'), origin = '1970-01-01')
  stopifnot(ts == as.POSIXct(as.nanotime.POSIXct(ts)))

  # This one should break if multiplying the double by 1E9 directly
  # without considering the integer and fraction parts separately.
  ts <- as.POSIXct(as.numeric('0x1.91f18e4d0065p+30'), origin = '1970-01-01')
  stopifnot(ts == as.POSIXct(as.nanotime.POSIXct(ts)))

  # This one is negative, making sure that negative doubles are also consistent.
  ts <- as.POSIXct(as.numeric('-0x1.c6e8c4d077ae4p+30'), origin = '1970-01-01')
  stopifnot(ts == as.POSIXct(as.nanotime.POSIXct(ts)))
})

The as.nanotime.POSIXct() should point to your function.

If my preference matters: I would also allow a switch.

PS bit64 implementation is... well... not-optimized?. On scalars it is also incredibly slow:

a <- bit64::as.integer64(2)
b <- bit64::as.integer64(2)
microbenchmark::microbenchmark(a*b, 2*2)
# Unit: nanoseconds
#   expr   min    lq  mean median    uq    max neval
#  a * b 56300 58200 63766  59200 62050 168700   100
#  2 * 2     0   100   250    200   300   1300   100

lsilvest · 2023-06-07T03:59:06Z

I think it's optimized, but you are measuring the overhead of the S3 dispatch. If you choose a long vector I think you will see bit64 is efficient.

And thanks for the regression tests!

Back to some considerations on the problem, and I'll try to characterize exactly what we would be doing if we didn't do the rounding proposed here.

If I take the second number provided by @DavorJ and also look at the closest shortest decimal representations of the floating point immediately before and the floating point immediately after, we get:

0x1.91f18e4d0064fp+30 -> 1685873555.250385
0x1.91f18e4d0065p+30  -> 1685873555.2503853
0x1.91f18e4d00651p+30 -> 1685873555.2503855

So what we would be doing with the uncorrected version is effectively mapping 0x1.91f18e4d0065p+30 to 0x1.91f18e4d00651p+30.

And if we want to think about the precision that the fractional part of POSIXct the other way round:

> sprintf("%a", 1685873555.2503851)
[1] "0x1.91f18e4d0064fp+30"
> sprintf("%a", 1685873555.2503852)
[1] "0x1.91f18e4d0065p+30"
> sprintf("%a", 1685873555.2503853)
[1] "0x1.91f18e4d0065p+30"
> sprintf("%a", 1685873555.2503854)
[1] "0x1.91f18e4d0065p+30"
> sprintf("%a", 1685873555.2503855)
[1] "0x1.91f18e4d00651p+30"
> sprintf("%a", 1685873555.2503856)
[1] "0x1.91f18e4d00651p+30"

Still unsure what is the best option here, but the above shows the rounding errors are happening in the zone where POSIXct can in any case not ensure the exactness of its decimal. So I think it's not the precision we are bothered with here, but the effect of surprise on a user who will not get a guaranteed round trip.

eddelbuettel · 2023-06-07T12:00:58Z

I think it's optimized, but you are measuring the overhead of the S3 dispatch.

Had the exact same thought when reading: the benchmark is measuring the forced type conversion that is the cost and "tax" of bit64 to give us 64 bit integer meaning.

I alsso like the follow-up. It is know that POSIXct has design deficiencies and limits. We cannot change that. Maybe the best we can do is to consider a second (optional) conversion as suggested.

lsilvest · 2023-06-08T03:13:37Z

Yes, that's most likely the best option. I'm slightly leaning towards having the exact conversion as default, maybe with the additional parameter accurate=TRUE? Unless someone has a better name for this parameter that also conveys the notion of speed? And maybe you prefer the default to be accurate=FALSE? I don't have a strong opinion here. Once we've decided on that I can make the changes and will add the units tests provided by @DavorJ.

eddelbuettel · 2023-06-08T03:15:16Z

I actually like that -- both the new as default choice, and the new opt-in switch for which accurate=TRUE is a good name.

add parameter 'accurate' for conversions from 'POSIXct'; closes #115

eddelbuettel closed this as completed in 2b8dba4 Jul 8, 2023

eddelbuettel added a commit that referenced this issue Jul 8, 2023

Merge pull request #116 from eddelbuettel/bugfix/posixct_consistency

80d3550

add parameter 'accurate' for conversions from 'POSIXct'; closes #115

eddelbuettel mentioned this issue Jun 20, 2024

New macOS errors at CRAN #128

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POSIXct -> nanotime -> POSIXct consistency #115

POSIXct -> nanotime -> POSIXct consistency #115

DavorJ commented Jun 3, 2023 •

edited

Loading

eddelbuettel commented Jun 4, 2023 •

edited

Loading

DavorJ commented Jun 4, 2023

DavorJ commented Jun 4, 2023 •

edited

Loading

DavorJ commented Jun 4, 2023

DavorJ commented Jun 4, 2023

eddelbuettel commented Jun 4, 2023

lsilvest commented Jun 5, 2023

eddelbuettel commented Jun 5, 2023

DavorJ commented Jun 5, 2023

lsilvest commented Jun 7, 2023 •

edited

Loading

eddelbuettel commented Jun 7, 2023

lsilvest commented Jun 8, 2023

eddelbuettel commented Jun 8, 2023

POSIXct -> nanotime -> POSIXct consistency #115

POSIXct -> nanotime -> POSIXct consistency #115

Comments

DavorJ commented Jun 3, 2023 • edited Loading

eddelbuettel commented Jun 4, 2023 • edited Loading

DavorJ commented Jun 4, 2023

DavorJ commented Jun 4, 2023 • edited Loading

DavorJ commented Jun 4, 2023

DavorJ commented Jun 4, 2023

eddelbuettel commented Jun 4, 2023

lsilvest commented Jun 5, 2023

eddelbuettel commented Jun 5, 2023

DavorJ commented Jun 5, 2023

lsilvest commented Jun 7, 2023 • edited Loading

eddelbuettel commented Jun 7, 2023

lsilvest commented Jun 8, 2023

eddelbuettel commented Jun 8, 2023

DavorJ commented Jun 3, 2023 •

edited

Loading

eddelbuettel commented Jun 4, 2023 •

edited

Loading

DavorJ commented Jun 4, 2023 •

edited

Loading

lsilvest commented Jun 7, 2023 •

edited

Loading