Better support for RedShift #215

krlmlr · 2019-04-04T20:05:10Z

Perhaps with a separate subclass.

Existing solutions:

kmishra9 · 2020-01-14T21:58:21Z

Ran into #211 as well -- it seems that copy_to and subsequently *_join(..., copy = TRUE) functionality relies on it and is broken with Redshift as well. For whatever reason, the RPostgreSQL::PostgreSQL() package's driver implementation does work for many of the use cases where RPostgres is broken, so for niche use cases, swap over!

isteves · 2020-04-13T09:22:47Z

@krlmlr I currently use Redshift in case I can be of help with testing/etc.

kmishra9 · 2021-06-22T20:04:12Z

A teammate of mine pointed out coming across this thread, so I thought I'd leave the functional replacement to copy_to that I wrote up until it's implemented in the Driver packages! It's super suboptimal, relying on the very old redshiftTools package, but better than nothing, in a pinch!

#*******************************************************************************
# Organization - Cricket Health
# Description - Functional replacement for dplyr::copy_to(), which doesn't work w/ Redshift
#*******************************************************************************

upload_df_to_rs <-
    function(df,
             rs,
             rs_schema = 'public',
             rs_relation_name = deparse(substitute(df)),
             overwrite = TRUE) {
        #' @Description: A function that works similarly to dplyr::copy_to() that uploads a local dataframe to Redshift via temporarily saved files in S3
        #' @param df: the target dataframe to upload
        #' @param rs: a database connection object (the result of a DBI::dbConnect()) call, indicating the database where the data should be uploaded
        #' @param rs_schema: a string, indicating the schema within rs to copy the file to; example: "lnd"
        #' @param rs_relation_name: a string, indicating the table within rs_schema to copy the file to
        #' @param overwrite: a boolean, indicating whether an existing table at the path {rs}.{rs_schema}.{rs_relation_name} should be dropped first or not. If one exists and overwrite is not TRUE, an error will occur
        #' @Returns: A dbplyr table reference to the uploaded df

        require('tidyverse')
        require('glue')

        table_path <-
            glue::glue('{rs_schema}.{rs_relation_name}')

        table_paths <-
            rs %>%
            tbl(in_schema('information_schema', 'tables')) %>%
            collect() %>%
            transmute(table_path = glue('{table_schema}.{table_name}')) %>%
            distinct() %>%
            pull(table_path)

        rs_schemas <-
            rs %>%
            tbl('pg_namespace') %>%
            select(table_schema = nspname) %>%
            distinct() %>%
            pull(table_schema)

        if (overwrite && table_path %in% table_paths) {
            DBI::dbExecute(
                conn = rs,
                statement = glue('DROP TABLE {table_path};')
            )
        }

        if (!(rs_schema %in% rs_schemas)) {
            DBI::dbExecute(
                conn = rs,
                statement = glue('CREATE SCHEMA {rs_schema};')
            )
        }

        redshiftTools::rs_create_table(
            df = df,
            dbcon = rs,
            table_name = table_path,
            split_files = 1,
            bucket = 'cricket-data-digest/temp-uploads',
            region = 'us-east-1',
            access_key = keyring::key_get('AWS_ACCESS_KEY_ID_DPU'),
            secret_key = keyring::key_get('AWS_SECRET_ACCESS_KEY_DPU'),
            session_token = keyring::key_get('AWS_SESSION_TOKEN_DPU')
        ) %>% assert_that()

        message('^ Ignore any "Client error: (403) Forbidden" messages ^')

        table_ref <-
            rs %>% tbl(in_schema(schema = rs_schema, table = rs_relation_name))

        return(table_ref)
    }

- `dbExistsTable()`, `dbListTables()` and `dbListObjects()` now work for Redshift, with the limitation that only the topmost tables on the search path are returned (#215, #326).

- `Redshift()` connections now adhere to almost all of the DBI specification when connecting to a Redshift cluster. BLOBs are not supported on Redshift, and there are limitations with enumerating temporary tables (#215).

github-actions · 2022-09-15T00:59:47Z

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

krlmlr added the enhancement label Apr 4, 2019

krlmlr mentioned this issue Sep 29, 2020

dbListTables: list Materialized Views as well #261

Closed

krlmlr added install Issues with custom installations feature and removed enhancement labels Sep 6, 2021

krlmlr added this to the 1.4.0 milestone Sep 12, 2021

krlmlr mentioned this issue Sep 14, 2021

Better Redshift support #330

Merged

krlmlr closed this as completed in #330 Sep 14, 2021

github-actions bot locked and limited conversation to collaborators Sep 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better support for RedShift #215

Better support for RedShift #215

krlmlr commented Apr 4, 2019

kmishra9 commented Jan 14, 2020 •

edited

Loading

isteves commented Apr 13, 2020

kmishra9 commented Jun 22, 2021 •

edited

Loading

github-actions bot commented Sep 15, 2022

Better support for RedShift #215

Better support for RedShift #215

Comments

krlmlr commented Apr 4, 2019

kmishra9 commented Jan 14, 2020 • edited Loading

isteves commented Apr 13, 2020

kmishra9 commented Jun 22, 2021 • edited Loading

github-actions bot commented Sep 15, 2022

kmishra9 commented Jan 14, 2020 •

edited

Loading

kmishra9 commented Jun 22, 2021 •

edited

Loading