-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support arrays natively #44
Comments
Could you please supply minimal r code to create and populate a sample table with some array data? |
Sure. The following comes from the example linked to above. Follow-on code found there too. (Today is my first time trying out library("RPostgreSQL")
pg <- dbConnect(PostgreSQL())
# Make the data set in the form I have it.
rs <- dbGetQuery(pg, "
DROP TABLE IF EXISTS icecream ;
CREATE TABLE icecream (id text, date date, flavours text[]);
INSERT INTO icecream (id, date, flavours) VALUES
('a', '2013-01-01', ARRAY['Chocolate', 'Vanilla']),
('b', '2013-01-01', ARRAY['Strawberry', 'Vanilla']),
('b', '2013-02-01', ARRAY['Raspberry', 'Lemon']),
('c', '2013-01-01', ARRAY['Raspberry', 'Blueberry']);")
rs <- dbDisconnect(pg) |
Here is an edited version of the follow-on code referred to in the previous comment. sql <- "
SELECT id, date, UNNEST(flavours) AS flavour
FROM icecream"
library(dplyr)
library(tidyr)
tbl(src_postgres(), sql(sql)) %>%
mutate_(indicator=~TRUE) %>%
as.data.frame %>%
spread('flavour', 'indicator', fill=FALSE) I guess the utility of addressing the use case depends on whether there is much that can be done with "list-columns" in R. Here's a (quite ugly) example of using list-columns. # R version of desired data pulled in from PostgreSQL with
# arrays mapped to lists
df <- structure(list(id = c("a", "b", "c"),
date = structure(c(15706, 15706, 15737), class = "Date"),
flavours = list(list("Chocolate", "Vanilla"),
list("Strawberry", "Vanilla", "Raspberry"),
list("Raspberry", "Blueberry"))),
.Names = c("id", "date", "flavours"),
row.names = c(NA, 3L), class = "data.frame")
available_flavours <- unique(unlist(df$flavours))
has_flavour <- function(list, flavour) {
flavour %in% list
}
for (i in available_flavours) {
df[, i] <- unlist(Map(has_flavour, df$flavours, flavour=i))
} |
I think a benefit of supporting array types is that it would facilitate using PostgreSQL (and I guess other RDBMSs) as a link across languages. For example, the following PL/Python function in PostgreSQL produces arrays as the natural mapping from Python lists. CREATE OR REPLACE FUNCTION word_tokenize(string text)
RETURNS text[] AS
$BODY$
import nltk, re
text = string.decode('utf-8')
return nltk.word_tokenize(text)
$BODY$ LANGUAGE plpythonu; This produces output like this:
Of course, I recognize there may be too much overhead, etc., to make this worth pursuing. |
I've been testing out the C library 'libpqtypes' for sending Postgres arrays to R. Is there any chance we could get libpqtyes included in this package, or is it better off in its own package? That would allow us to get arrays and other weird data types not handled by libpq. |
+1 |
Not sure if this fits for this discussion, but the ability to pass an array of values would be nice when a simplistic query for selecting multiple values is needed. Would probably need to be careful about array types here:
Any items that need contribution on this? Happy to support as needed, as this functionality would be quite helpful. |
@colearendt I think what you're referring to is a different FR all together. The original request is about an array in SQL being returned to R |
Yes. @colearendt is referring to a different issue. Note that ids <- 1:4L
pg <- src_postgres()
table <- tbl(pg, sql('select * from table'))
table %>%
filter(id %in% ids) |
@krlmlr I wonder if a more-incremental step towards r-dbi/DBI/#203 would be to implement roundtrip support for one-dimensional arrays of already-understood type ( If it helps |
Thanks. In principle, we could return and accept this data as a list of integers, characters, ... . We need to be a bit careful, currently lists are always treated as blobs, we should check at least the first non- If this is supported by many other databases, we could eventually standardize. Would you like to take a stab? |
Not sure I have the skills, but I will play around with this a bit. There's already a little bit of support for this via the library(DBI)
library(dplyr, warn.conflicts = FALSE)
pg <- dbConnect(RPostgres::Postgres())
# Make a test table in PostgreSQL
icecream <-
tbl(pg, sql("
SELECT * FROM (VALUES
('a', ARRAY['2013-01-01'::date, '2014-01-02'::date],
ARRAY['Chocolate', 'Vanilla']),
('b', ARRAY['2013-01-01'::date, '2015-01-02'::date],
ARRAY['Strawberry', 'Vanilla']),
('b', ARRAY['2013-02-01'::date],
ARRAY['Raspberry']),
('c', ARRAY['2013-01-01'::date, '2016-01-02'::date],
ARRAY['Raspberry', 'Blueberry']))
AS t (id, dates, flavours)"))
# Pull data into R
icecream_local <- collect(icecream)
icecream_local
#> # A tibble: 4 x 3
#> id dates flavours
#> <chr> <pq__date> <pq__text>
#> 1 a {2013-01-01,2014-01-02} {Chocolate,Vanilla}
#> 2 b {2013-01-01,2015-01-02} {Strawberry,Vanilla}
#> 3 b {2013-02-01} {Raspberry}
#> 4 c {2013-01-01,2016-01-02} {Raspberry,Blueberry}
# Write data back to PostgreSQL without field.types
dbWriteTable(pg, "icecream_again", icecream_local, overwrite = TRUE)
icecream_again <- tbl(pg, "icecream_again")
icecream_again
#> # Source: table<icecream_again> [?? x 3]
#> # Database: postgres [iangow@localhost:5432/crsp]
#> id dates flavours
#> <chr> <chr> <chr>
#> 1 a {2013-01-01,2014-01-02} {Chocolate,Vanilla}
#> 2 b {2013-01-01,2015-01-02} {Strawberry,Vanilla}
#> 3 b {2013-02-01} {Raspberry}
#> 4 c {2013-01-01,2016-01-02} {Raspberry,Blueberry}
# Write data back to PostgreSQL with field.types
dbWriteTable(pg, "icecream_again", icecream_local, overwrite = TRUE,
field.types = c("id" = "text", "dates" = "date[]",
"flavours" = "text[]"))
# Test that data made it back OK
icecream_again <- tbl(pg, "icecream_again")
icecream_again %>%
mutate(date = unnest(dates),
flavour = unnest(flavours))
#> # Source: lazy query [?? x 5]
#> # Database: postgres [iangow@localhost:5432/crsp]
#> id dates flavours date flavour
#> <chr> <pq__date> <pq__text> <date> <chr>
#> 1 a {2013-01-01,2014-01-02} {Chocolate,Vanilla} 2013-01-01 Chocolate
#> 2 a {2013-01-01,2014-01-02} {Chocolate,Vanilla} 2014-01-02 Vanilla
#> 3 b {2013-01-01,2015-01-02} {Strawberry,Vanilla} 2013-01-01 Strawberry
#> 4 b {2013-01-01,2015-01-02} {Strawberry,Vanilla} 2015-01-02 Vanilla
#> 5 b {2013-02-01} {Raspberry} 2013-02-01 Raspberry
#> 6 c {2013-01-01,2016-01-02} {Raspberry,Blueberry} 2013-01-01 Raspberry
#> 7 c {2013-01-01,2016-01-02} {Raspberry,Blueberry} 2016-01-02 Blueberry
rs <- dbDisconnect(pg)
# This function would need to be able to handle values such as
# "{'This is, say, that', 'Hi!', 'Einstein, Albert'} which should result
# in
# c('This is, say, that', 'Hi!', 'Einstein, Albert')
convert_array <- function(x) {
x <- gsub("(^\\{|\\}$)", "", x)
strsplit(x, split = ",")
}
convert_date_array <- function(x) {
lapply(convert_array(x), as.Date)
}
temp <-
icecream_local %>%
mutate(dates_alt = convert_date_array(dates))
temp
#> # A tibble: 4 x 4
#> id dates flavours dates_alt
#> <chr> <pq__date> <pq__text> <list>
#> 1 a {2013-01-01,2014-01-02} {Chocolate,Vanilla} <date [2]>
#> 2 b {2013-01-01,2015-01-02} {Strawberry,Vanilla} <date [2]>
#> 3 b {2013-02-01} {Raspberry} <date [1]>
#> 4 c {2013-01-01,2016-01-02} {Raspberry,Blueberry} <date [2]>
temp %>%
tidyr::unnest(dates_alt)
#> # A tibble: 7 x 4
#> id dates flavours dates_alt
#> <chr> <pq__date> <pq__text> <date>
#> 1 a {2013-01-01,2014-01-02} {Chocolate,Vanilla} 2013-01-01
#> 2 a {2013-01-01,2014-01-02} {Chocolate,Vanilla} 2014-01-02
#> 3 b {2013-01-01,2015-01-02} {Strawberry,Vanilla} 2013-01-01
#> 4 b {2013-01-01,2015-01-02} {Strawberry,Vanilla} 2015-01-02
#> 5 b {2013-02-01} {Raspberry} 2013-02-01
#> 6 c {2013-01-01,2016-01-02} {Raspberry,Blueberry} 2013-01-01
#> 7 c {2013-01-01,2016-01-02} {Raspberry,Blueberry} 2016-01-02 Created on 2021-06-15 by the reprex package (v2.0.0) |
Being able to read and write array types to/from Postgres natively would be fantastic. Is this on the roadmap? |
Yes, perhaps in the very long term. What would the equivalent in Arrow look like? |
+1 |
See here for some discussion we've had on R-SIG-DB on this issue in the past.
Here are hints of an indicative use case. Looking at
? tidyr::unnest
, it seems that what would be ideal would be if arrays could be mapped to what you call "list-columns" and then operations performed on those (in the example, I am doing theunnest
in PostgreSQL).The text was updated successfully, but these errors were encountered: