Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] as.Date fails going from timestamp[us] to timestamp[s] #32879

Closed
asfimport opened this issue Sep 6, 2022 · 2 comments
Closed

[R] as.Date fails going from timestamp[us] to timestamp[s] #32879

asfimport opened this issue Sep 6, 2022 · 2 comments
Assignees
Milestone

Comments

@asfimport
Copy link
Collaborator

Using as.Date to convert from timestamp to date fails in Arrow even though this is fine in R.

library(arrow)
library(dplyr)
library(lubridate)

tf <- tempfile()
dir.create(tf)
tbl <- tibble::tibble(x = as_datetime('2022-05-05T00:00:01.676632'))
write_dataset(tbl, tf)
open_dataset(tf) %>%
  mutate(date = as.Date(x)) %>%
  collect()
#> Error in `collect()`:
#> ! Invalid: Casting from timestamp[us, tz=UTC] to timestamp[s, tz=UTC] would lose data: 1651708801676632
#> /home/nic2/arrow/cpp/src/arrow/compute/exec.cc:799  kernel_->exec(kernel_ctx_, input, out)
#> /home/nic2/arrow/cpp/src/arrow/compute/exec.cc:767  ExecuteSingleSpan(input, &output)
#> /home/nic2/arrow/cpp/src/arrow/compute/exec/expression.cc:597  executor->Execute( ExecBatch(std::move(arguments), all_scalar ? 1 : input.length), &listener)
#> /home/nic2/arrow/cpp/src/arrow/compute/exec/expression.cc:579  ExecuteScalarExpression(call->arguments[i], input, exec_context)
#> /home/nic2/arrow/cpp/src/arrow/compute/exec/project_node.cc:91  ExecuteScalarExpression(simplified_expr, target, plan()->exec_context())
#> /home/nic2/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:573  iterator_.Next()
#> /home/nic2/arrow/cpp/src/arrow/record_batch.cc:337  ReadNext(&batch)
#> /home/nic2/arrow/cpp/src/arrow/record_batch.cc:351  ToRecordBatches()

tbl %>%
  mutate(date = as.Date(x))
#> # A tibble: 1 × 2
#>   x                   date      
#>   <dttm>              <date>    
#> 1 2022-05-05 00:00:01 2022-05-05

Reporter: Nicola Crane / @thisisnic
Assignee: Dewey Dunnington / @paleolimbot

PRs and other links:

Note: This issue was originally created as ARROW-17637. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
The naive cast to date32() works:

> Array$create(lubridate::as_datetime('2022-05-05T00:00:01.676632'))
Array
<timestamp[us, tz=UTC]>
[
  2022-05-05 00:00:01.676632
]
> Array$create(lubridate::as_datetime('2022-05-05T00:00:01.676632'))$cast(date32())
Array
<date32[day]>
[
  2022-05-05
]

The issue looks to be in this extra cast, something about handling timezones: https://github.com/apache/arrow/blob/master/r/R/dplyr-funcs-datetime.R#L329

Basically, if x is timestamp type, we either need to keep the same unit from x (it's a parameter to the type, default is "s", hence the error), or pass the right cast option to allow truncation. (And probably not cast at all if it's already the same timezone.)

@asfimport
Copy link
Collaborator Author

Dewey Dunnington / @paleolimbot:
Issue resolved by pull request 14935
#14935

@asfimport asfimport added this to the 11.0.0 milestone Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants