Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for RedShift #215

Closed
krlmlr opened this issue Apr 4, 2019 · 4 comments · Fixed by #330
Closed

Better support for RedShift #215

krlmlr opened this issue Apr 4, 2019 · 4 comments · Fixed by #330
Labels
feature install Issues with custom installations
Milestone

Comments

@krlmlr
Copy link
Member

krlmlr commented Apr 4, 2019

Perhaps with a separate subclass.

Existing solutions:

@kmishra9
Copy link

kmishra9 commented Jan 14, 2020

Ran into #211 as well -- it seems that copy_to and subsequently *_join(..., copy = TRUE) functionality relies on it and is broken with Redshift as well. For whatever reason, the RPostgreSQL::PostgreSQL() package's driver implementation does work for many of the use cases where RPostgres is broken, so for niche use cases, swap over!

@isteves
Copy link

isteves commented Apr 13, 2020

@krlmlr I currently use Redshift in case I can be of help with testing/etc.

@kmishra9
Copy link

kmishra9 commented Jun 22, 2021

A teammate of mine pointed out coming across this thread, so I thought I'd leave the functional replacement to copy_to that I wrote up until it's implemented in the Driver packages! It's super suboptimal, relying on the very old redshiftTools package, but better than nothing, in a pinch!

#*******************************************************************************
# Organization - Cricket Health
# Description - Functional replacement for dplyr::copy_to(), which doesn't work w/ Redshift
#*******************************************************************************

upload_df_to_rs <-
    function(df,
             rs,
             rs_schema = 'public',
             rs_relation_name = deparse(substitute(df)),
             overwrite = TRUE) {
        #' @Description: A function that works similarly to dplyr::copy_to() that uploads a local dataframe to Redshift via temporarily saved files in S3
        #' @param df: the target dataframe to upload
        #' @param rs: a database connection object (the result of a DBI::dbConnect()) call, indicating the database where the data should be uploaded
        #' @param rs_schema: a string, indicating the schema within rs to copy the file to; example: "lnd"
        #' @param rs_relation_name: a string, indicating the table within rs_schema to copy the file to
        #' @param overwrite: a boolean, indicating whether an existing table at the path {rs}.{rs_schema}.{rs_relation_name} should be dropped first or not. If one exists and overwrite is not TRUE, an error will occur
        #' @Returns: A dbplyr table reference to the uploaded df

        require('tidyverse')
        require('glue')

        table_path <-
            glue::glue('{rs_schema}.{rs_relation_name}')

        table_paths <-
            rs %>%
            tbl(in_schema('information_schema', 'tables')) %>%
            collect() %>%
            transmute(table_path = glue('{table_schema}.{table_name}')) %>%
            distinct() %>%
            pull(table_path)

        rs_schemas <-
            rs %>%
            tbl('pg_namespace') %>%
            select(table_schema = nspname) %>%
            distinct() %>%
            pull(table_schema)

        if (overwrite && table_path %in% table_paths) {
            DBI::dbExecute(
                conn = rs,
                statement = glue('DROP TABLE {table_path};')
            )
        }

        if (!(rs_schema %in% rs_schemas)) {
            DBI::dbExecute(
                conn = rs,
                statement = glue('CREATE SCHEMA {rs_schema};')
            )
        }

        redshiftTools::rs_create_table(
            df = df,
            dbcon = rs,
            table_name = table_path,
            split_files = 1,
            bucket = 'cricket-data-digest/temp-uploads',
            region = 'us-east-1',
            access_key = keyring::key_get('AWS_ACCESS_KEY_ID_DPU'),
            secret_key = keyring::key_get('AWS_SECRET_ACCESS_KEY_DPU'),
            session_token = keyring::key_get('AWS_SESSION_TOKEN_DPU')
        ) %>% assert_that()

        message('^ Ignore any "Client error: (403) Forbidden" messages ^')

        table_ref <-
            rs %>% tbl(in_schema(schema = rs_schema, table = rs_relation_name))

        return(table_ref)
    }

@krlmlr krlmlr added install Issues with custom installations feature and removed enhancement labels Sep 6, 2021
@krlmlr krlmlr added this to the 1.4.0 milestone Sep 12, 2021
krlmlr added a commit that referenced this issue Sep 13, 2021
- `dbExistsTable()`, `dbListTables()` and `dbListObjects()` now work for Redshift, with the limitation that only the topmost tables on the search path are returned (#215, #326).
krlmlr added a commit that referenced this issue Sep 14, 2021
- `Redshift()` connections now adhere to almost all of the DBI specification when connecting to a Redshift cluster. BLOBs are not supported on Redshift, and there are limitations with enumerating temporary tables (#215).
@github-actions
Copy link
Contributor

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature install Issues with custom installations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants