-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to disable hive partitioning wild cards #232
Comments
Understood. Could you live with the ability to turn this on and off at the function level? Meaning for the whole installation of the loader, it would or would not perform hive wildcard xforms? This would be relatively easy to support, while selectively doing it per prefix will require a bit more thinking. |
Also - how many of these event types do you have? If you could suppress specific prefixes from Hive wildcard transforms, would that be achievable or do you have too many event types to list? |
I am the user that reported this to AWS support (and they logged this issue on my behalf). Thank you for looking into this. Disabling this feature at the function level would be fine as we currently do not have plans to use hive wildcards. Managing a list of prefixes to exclude is also fine. We currently have 13 event types, and while this might increase slightly in the future, it should remain easy to maintain a list. |
Hi Ian, Thank you for looking into this issue. The problem was raised on jbrew8's behalf, and we would really appreciate any quick workarounds or solution implemented in the near future. |
OK. So here's my proposal. Please download version 2.8.0 from https://awslabs-code-us-east-1.s3.amazonaws.com/LambdaRedshiftLoader/AWSLambdaRedshiftLoader-2.8.0.zip, which has not yet been pushed to github. Set an environment variable
Given that you only have 13 prefixes, you could load those directly into the variable, but note that all environment variables together cannot exceed 4K. I have tested this within my account and it works great, but would like to validate with you first before shipping. |
Thanks @IanMeyers . I'll give this a shot and let you know how it works. |
@IanMeyers I had a chance to try your changes- and it looks like they solve our problem. With the new environment variable set prefixes that contain an equals sign are treated literally, and the data is loaded into RedShift correctly. Thank you for your quick turn around on this issue. |
Wonderful – will push the changes.
From: GitHub Notifications ***@***.***>
Reply to: awslabs/aws-lambda-redshift-loader ***@***.***>
Date: Wednesday, 11 August 2021 at 18:13
To: awslabs/aws-lambda-redshift-loader ***@***.***>
Cc: "Meyers, Ian" ***@***.***>, Mention ***@***.***>
Subject: Re: [awslabs/aws-lambda-redshift-loader] Option to disable hive partitioning wild cards (#232)
@IanMeyers<https://github.com/IanMeyers> I had a chance to try your changes- and it looks like they solve our problem. With the new environment variable set prefixes that contain an equals sign are treated literally, and the data is loaded into RedShift correctly. Thank you for your quick turn around on this issue.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#232 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABQLY4G3IRBPZCFQDKKJLCTT4KVPZANCNFSM5BVHDFYQ>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>.
Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.
|
The avro files we are trying to load into RedShift are stored in folders with "=" in their names, i.e.
When loading data from the following S3 prefix,
com.hoopladigital.brazecurrentsstaging/StagingCurrentFull/dataexport.prod-03.S3.integration.60d3692fcab9ca5f83919aab/event_type%3Dusers.behaviors.app.FirstSession
The lambda failed with this error:
As shown in the error message above, the"event_type=/date=" portion of the error message was transformed assuming that we are taking advantage of the hive partitioning wildcards (https://github.com/awslabs/aws-lambda-redshift-loader#hive-partitioning-style-wildcards) and replaces the event_type value with *.
We don't want to use this feature- I need the lambda to use the exact folder name that I provided in the prefix. Is there a way for me to configure the lambda to not use hive partitioning wild cards?
line 1584 of index.js:
inputInfo.prefix = inputInfo.bucket + '/' + searchKey.transformHiveStylePrefix();
line 78 of index.js
transformHiveStylePrefix()
The text was updated successfully, but these errors were encountered: