-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow adding TRUNCATECOLUMNS option to Redshift COPY #43
base: master
Are you sure you want to change the base?
Conversation
@stefankeidel this seems really straightforward and practical. I think it'd make sense to have a test which creates an (insanely) large record, uses this option, and then makes sure we read a truncated version out of Redshift since unlike some of our other configuration options, this is pretty straightforward to actually test for. Curious if @awm33 has any opinions/thoughts here? |
This test adds 100 cats with a long description and asserts that they all insert correctly (Redshift bails if the content is too long if the TRUNCATECOLUMNS option is not set) and that the longest record for that column equals the max column length. Tested using the docker setup for this project: source /code/venv--target-redshift/bin/activate pytest tests/test_target_redshift.py -k 'test_truncate_columns'
Good idea! Added a test that does roughly that. Lmk if that works |
Test looks good. If we can get @awm33 to weigh in here, I think this is good to merge. Really nice work @stefankeidel! |
@AlexanderMann @stefankeidel I'm wondering if we should create a subobject for Redshift COPY options to group them? @stefankeidel Did you try unselecting ( |
@awm33 that seems like a reasonable thing to do. I think a good enhancement for all of our config would be grouping all of the various things. Like, for psycopg2 we can make the connection object just a 1:1 mapping in a sub-object etc. You're suggesting doing it here so folks don't have a bunch of work to do in the future? |
Yeah, this is for Regarding a subgrouping: Makes sense! Wdyt about something like this?
|
If we're going this route, I'd prefer nested values ie: Also, I'm wondering if we want to simply make this an array of strings which get passed right through to the |
Hmm, we already have prefixed
I like this idea! Not sure if we should do some verification or if we can just assume people that are using such an option know what they're doing? |
This allows to pass a list of options to redshift's copy command instead of just enabling to set a single option.
I implemented it using the prefix |
@stefankeidel @AlexanderMann I kind of regret us prefixing everything with redshift :). Other targets have COPY commands (postgres and snowflake) and snowflake has a TRUNCATECOLUMNS options too. I propose something that we can use with other targets as well, since they offer something similar:
|
@awm33 I like that format, but what do you think about the copy-options-as-array-of-strings idea posted by @AlexanderMann above and implemented in this latest revision? Do we want keywords for every single option we want to support or just assume users know what they're doing? |
@awm33 on that note...should this really be something we put into |
Is this ready to be merged + released? Happen to be looking for exactly this config option =) |
This adds a configuration parameter (defaulting to
False
) which triggers theTRUNCATECOLUMNS
option in every Redshift COPY statement sent by the target.The use case for us is the combination with
tap-intercom
where some of the content can exceed 64k, but the content for those few records/fields where that happens can be safely ignored. I couldn't find another way to truncate the content before sending to Redshift.It might be useful to at some point add a more flexible way to include other options as well, but this should work for now.