Support arrays natively #44

iangow · 2015-05-22T20:15:11Z

See here for some discussion we've had on R-SIG-DB on this issue in the past.

Here are hints of an indicative use case. Looking at ? tidyr::unnest, it seems that what would be ideal would be if arrays could be mapped to what you call "list-columns" and then operations performed on those (in the example, I am doing the unnest in PostgreSQL).

The text was updated successfully, but these errors were encountered:

hadley · 2015-05-22T21:30:36Z

Could you please supply minimal r code to create and populate a sample table with some array data?

iangow · 2015-05-22T21:33:55Z

Sure. The following comes from the example linked to above. Follow-on code found there too.

(Today is my first time trying out dplyr with PostgreSQL. Very nice.)

library("RPostgreSQL")

pg <- dbConnect(PostgreSQL())

# Make the data set in the form I have it.
rs <- dbGetQuery(pg, "
     DROP TABLE IF EXISTS icecream ;

     CREATE TABLE icecream (id text, date date, flavours text[]);

     INSERT INTO icecream (id, date, flavours) VALUES 
     ('a', '2013-01-01', ARRAY['Chocolate', 'Vanilla']),
     ('b', '2013-01-01', ARRAY['Strawberry', 'Vanilla']),
     ('b', '2013-02-01', ARRAY['Raspberry', 'Lemon']),
     ('c', '2013-01-01', ARRAY['Raspberry', 'Blueberry']);")

rs <- dbDisconnect(pg)

iangow · 2015-05-22T22:01:07Z

Here is an edited version of the follow-on code referred to in the previous comment.

sql <- "
    SELECT id, date, UNNEST(flavours) AS flavour 
    FROM icecream"

library(dplyr)
library(tidyr)

tbl(src_postgres(), sql(sql)) %>%
    mutate_(indicator=~TRUE) %>%
    as.data.frame %>%
    spread('flavour', 'indicator', fill=FALSE)

I guess the utility of addressing the use case depends on whether there is much that can be done with "list-columns" in R. Here's a (quite ugly) example of using list-columns.

# R version of desired data pulled in from PostgreSQL with 
# arrays mapped to lists
df <- structure(list(id = c("a", "b", "c"), 
                     date = structure(c(15706, 15706, 15737), class = "Date"), 
                     flavours = list(list("Chocolate", "Vanilla"), 
                                 list("Strawberry", "Vanilla", "Raspberry"), 
                                 list("Raspberry", "Blueberry"))),
                .Names = c("id", "date", "flavours"), 
                row.names = c(NA, 3L), class = "data.frame")

available_flavours <-  unique(unlist(df$flavours))
has_flavour <- function(list, flavour) {
    flavour %in% list
}

for (i in available_flavours) {
    df[, i] <- unlist(Map(has_flavour, df$flavours, flavour=i))
}

iangow · 2015-05-22T22:12:12Z

I think a benefit of supporting array types is that it would facilitate using PostgreSQL (and I guess other RDBMSs) as a link across languages. For example, the following PL/Python function in PostgreSQL produces arrays as the natural mapping from Python lists.

CREATE OR REPLACE FUNCTION word_tokenize(string text)
  RETURNS text[] AS
$BODY$
    import nltk, re
    text = string.decode('utf-8')
    return nltk.word_tokenize(text)
$BODY$ LANGUAGE plpythonu;

This produces output like this:

crsp=# SELECT word_tokenize('Could you please supply minimal r code to create and populate a sample table with some array data?');
                                             word_tokenize                                             
-------------------------------------------------------------------------------------------------------
 {Could,you,please,supply,minimal,r,code,to,create,and,populate,a,sample,table,with,some,array,data,?}
(1 row)

Of course, I recognize there may be too much overhead, etc., to make this worth pursuing.

wmay · 2016-12-06T18:45:34Z

I've been testing out the C library 'libpqtypes' for sending Postgres arrays to R. Is there any chance we could get libpqtyes included in this package, or is it better off in its own package? That would allow us to get arrays and other weird data types not handled by libpq.

http://libpqtypes.esilo.com/

rsaporta · 2017-03-10T20:44:03Z

+1

colearendt · 2017-04-13T23:48:28Z

Not sure if this fits for this discussion, but the ability to pass an array of values would be nice when a simplistic query for selecting multiple values is needed. Would probably need to be careful about array types here:

ids <- c(1,2,3,4)
dbSendQuery(myconn, 'select * from table where id = ANY ($1);', ids)

Any items that need contribution on this? Happy to support as needed, as this functionality would be quite helpful.

rsaporta · 2017-04-20T13:18:15Z

@colearendt I think what you're referring to is a different FR all together. The original request is about an array in SQL being returned to R

iangow · 2017-04-20T13:24:32Z

Yes. @colearendt is referring to a different issue. Note that dplyr handles that case pretty nicely.

ids <- 1:4L
pg <- src_postgres()
table <- tbl(pg, sql('select * from table'))
table %>%
   filter(id %in% ids)

iangow · 2021-06-13T12:55:10Z

@krlmlr I wonder if a more-incremental step towards r-dbi/DBI/#203 would be to implement roundtrip support for one-dimensional arrays of already-understood type (int, text, etc.) here.

If it helps psycopg2 and sqlalchemy handle this in Python. It's quite convenient to be able to store and retrieve Python lists in/from a PostgreSQL database. (Even dictionaries are easily converted using json.dumps to JSON and stored as json or jsonb.)

krlmlr · 2021-06-13T13:14:38Z

Thanks. In principle, we could return and accept this data as a list of integers, characters, ... . We need to be a bit careful, currently lists are always treated as blobs, we should check at least the first non-NULL element to determine the underlying type (or maybe detect/accept vctrs::list_of) .

If this is supported by many other databases, we could eventually standardize.

Would you like to take a stab?

iangow · 2021-06-15T13:27:34Z

Would you like to take a stab?

Not sure I have the skills, but I will play around with this a bit.

There's already a little bit of support for this via the field.types argument to dbWriteTable.

library(DBI)
library(dplyr, warn.conflicts = FALSE)

pg <- dbConnect(RPostgres::Postgres())

# Make a test table in PostgreSQL
icecream <- 
    tbl(pg, sql("
     SELECT * FROM (VALUES 
     ('a', ARRAY['2013-01-01'::date, '2014-01-02'::date],
        ARRAY['Chocolate', 'Vanilla']),
     ('b', ARRAY['2013-01-01'::date, '2015-01-02'::date], 
        ARRAY['Strawberry', 'Vanilla']),
     ('b', ARRAY['2013-02-01'::date], 
        ARRAY['Raspberry']),
     ('c', ARRAY['2013-01-01'::date, '2016-01-02'::date], 
        ARRAY['Raspberry', 'Blueberry'])) 
            AS t (id, dates, flavours)"))

# Pull data into R
icecream_local <- collect(icecream)
icecream_local
#> # A tibble: 4 x 3
#>   id    dates                   flavours             
#>   <chr> <pq__date>              <pq__text>           
#> 1 a     {2013-01-01,2014-01-02} {Chocolate,Vanilla}  
#> 2 b     {2013-01-01,2015-01-02} {Strawberry,Vanilla} 
#> 3 b     {2013-02-01}            {Raspberry}          
#> 4 c     {2013-01-01,2016-01-02} {Raspberry,Blueberry}

# Write data back to PostgreSQL without field.types
dbWriteTable(pg, "icecream_again", icecream_local, overwrite = TRUE)
icecream_again <- tbl(pg, "icecream_again")
icecream_again
#> # Source:   table<icecream_again> [?? x 3]
#> # Database: postgres [iangow@localhost:5432/crsp]
#>   id    dates                   flavours             
#>   <chr> <chr>                   <chr>                
#> 1 a     {2013-01-01,2014-01-02} {Chocolate,Vanilla}  
#> 2 b     {2013-01-01,2015-01-02} {Strawberry,Vanilla} 
#> 3 b     {2013-02-01}            {Raspberry}          
#> 4 c     {2013-01-01,2016-01-02} {Raspberry,Blueberry}

# Write data back to PostgreSQL with field.types
dbWriteTable(pg, "icecream_again", icecream_local, overwrite = TRUE,
             field.types = c("id" = "text", "dates" = "date[]",
                             "flavours" = "text[]"))

# Test that data made it back OK
icecream_again <- tbl(pg, "icecream_again")
icecream_again %>% 
    mutate(date = unnest(dates),
           flavour = unnest(flavours))
#> # Source:   lazy query [?? x 5]
#> # Database: postgres [iangow@localhost:5432/crsp]
#>   id    dates                   flavours              date       flavour   
#>   <chr> <pq__date>              <pq__text>            <date>     <chr>     
#> 1 a     {2013-01-01,2014-01-02} {Chocolate,Vanilla}   2013-01-01 Chocolate 
#> 2 a     {2013-01-01,2014-01-02} {Chocolate,Vanilla}   2014-01-02 Vanilla   
#> 3 b     {2013-01-01,2015-01-02} {Strawberry,Vanilla}  2013-01-01 Strawberry
#> 4 b     {2013-01-01,2015-01-02} {Strawberry,Vanilla}  2015-01-02 Vanilla   
#> 5 b     {2013-02-01}            {Raspberry}           2013-02-01 Raspberry 
#> 6 c     {2013-01-01,2016-01-02} {Raspberry,Blueberry} 2013-01-01 Raspberry 
#> 7 c     {2013-01-01,2016-01-02} {Raspberry,Blueberry} 2016-01-02 Blueberry

rs <- dbDisconnect(pg)

# This function would need to be able to handle values such as 
# "{'This is, say, that', 'Hi!', 'Einstein, Albert'} which should result
# in 
# c('This is, say, that', 'Hi!', 'Einstein, Albert')
convert_array <- function(x) {
    x <- gsub("(^\\{|\\}$)", "", x)
    strsplit(x, split = ",")
}

convert_date_array <- function(x) {
    lapply(convert_array(x), as.Date)
}

temp <-
    icecream_local %>%
    mutate(dates_alt = convert_date_array(dates)) 

temp
#> # A tibble: 4 x 4
#>   id    dates                   flavours              dates_alt 
#>   <chr> <pq__date>              <pq__text>            <list>    
#> 1 a     {2013-01-01,2014-01-02} {Chocolate,Vanilla}   <date [2]>
#> 2 b     {2013-01-01,2015-01-02} {Strawberry,Vanilla}  <date [2]>
#> 3 b     {2013-02-01}            {Raspberry}           <date [1]>
#> 4 c     {2013-01-01,2016-01-02} {Raspberry,Blueberry} <date [2]>

temp %>%
    tidyr::unnest(dates_alt)
#> # A tibble: 7 x 4
#>   id    dates                   flavours              dates_alt 
#>   <chr> <pq__date>              <pq__text>            <date>    
#> 1 a     {2013-01-01,2014-01-02} {Chocolate,Vanilla}   2013-01-01
#> 2 a     {2013-01-01,2014-01-02} {Chocolate,Vanilla}   2014-01-02
#> 3 b     {2013-01-01,2015-01-02} {Strawberry,Vanilla}  2013-01-01
#> 4 b     {2013-01-01,2015-01-02} {Strawberry,Vanilla}  2015-01-02
#> 5 b     {2013-02-01}            {Raspberry}           2013-02-01
#> 6 c     {2013-01-01,2016-01-02} {Raspberry,Blueberry} 2013-01-01
#> 7 c     {2013-01-01,2016-01-02} {Raspberry,Blueberry} 2016-01-02

^{Created on 2021-06-15 by the reprex package (v2.0.0)}

januz · 2021-09-15T19:25:11Z

Being able to read and write array types to/from Postgres natively would be fantastic. Is this on the roadmap?

krlmlr · 2021-11-01T06:33:03Z

Yes, perhaps in the very long term. What would the equivalent in Arrow look like?

jjfantini · 2022-06-30T18:16:57Z

+1

krlmlr transferred this issue from r-dbi/RPostgres Nov 1, 2021

krlmlr added the interface 🔌 label Nov 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support arrays natively #44

Support arrays natively #44

iangow commented May 22, 2015

hadley commented May 22, 2015

iangow commented May 22, 2015

iangow commented May 22, 2015

iangow commented May 22, 2015

wmay commented Dec 6, 2016 •

edited

Loading

rsaporta commented Mar 10, 2017

colearendt commented Apr 13, 2017

rsaporta commented Apr 20, 2017

iangow commented Apr 20, 2017

iangow commented Jun 13, 2021

krlmlr commented Jun 13, 2021

iangow commented Jun 15, 2021

januz commented Sep 15, 2021

krlmlr commented Nov 1, 2021

jjfantini commented Jun 30, 2022

Support arrays natively #44

Support arrays natively #44

Comments

iangow commented May 22, 2015

hadley commented May 22, 2015

iangow commented May 22, 2015

iangow commented May 22, 2015

iangow commented May 22, 2015

wmay commented Dec 6, 2016 • edited Loading

rsaporta commented Mar 10, 2017

colearendt commented Apr 13, 2017

rsaporta commented Apr 20, 2017

iangow commented Apr 20, 2017

iangow commented Jun 13, 2021

krlmlr commented Jun 13, 2021

iangow commented Jun 15, 2021

januz commented Sep 15, 2021

krlmlr commented Nov 1, 2021

jjfantini commented Jun 30, 2022

wmay commented Dec 6, 2016 •

edited

Loading