-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/redshift iam auth #818
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code looks great, but your circle build failed. just fix that and then
did you set this up using a ~/.aws/credentials
file?
dbt/contracts/connection.py
Outdated
'iam_duration_seconds': { | ||
'type': ['null', 'integer'], | ||
'minimum': 900, | ||
'maximum': 3600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean if a model takes >1hr to run, that connection will close? (that's fine, just want to understand)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's right. I couldn't find any specifics in the docs, but keen to set the timeout to 900s then experiment with a time.sleep()
after acquiring the connection. I bet that once a query is issues, it will complete even if the creds are expired, but subsequent queries on the same connection will likely fail.
I wish we could make the timeout longer, but 3600s is the max per the docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is pretty typical of aws federated auth. >1hr would be great.
@cmcarthur yeah, this uses the I (think) I fixed the unit tests, and I just added a bunch more in lieu of a working integration test against Redshift. Gave this a couple of spins against a Redshift cluster and it looks like the tmp credentials + validation are working really well! |
pep8 failed
fix this and then |
@drewbanin where should we put documentation issues for this release? i just want to make sure we update the docs to say, if you use in-transaction post hooks and this auth method, it can cause problems because of the credential timeouts EDIT: OH wait, what happens if you have a timeout of 60s, issue a BEGIN, run a query that takes 90s, and then try to COMMIT it? |
I'm a little confused about the timeout behavior here: I set up my IAM credentials to expire after 900 seconds, and then I configured a model to sleep for 960 seconds with some post hooks. The log output looks like: 2018-07-03 14:09:35,742: Using redshift connection "long_model". So, really interestingly, the query succeeded even though it ran for longer than the specified timeout. Also, a subsequent query using the same connection also completed successfully. Going to keep poking around to get a good feel for how this works |
@cmcarthur I can very reliably reproduce queries that continue to work after they should have expired. I'm even checking the expiration time returned by boto3. My logs effectively look like:
So, I think this must be a Redshift/AWS quirk/bug??? I dug through the boto3 source, but it's all just semantic wrappers around AWS APIs. I initially assumed that maybe they were throwing the timeout argument away and using some longer timeout instead, but I no longer believe that that's the case. So, I think that we can probably just document the intended behavior, and point to the relevant AWS docs. We can include a caveat that, at the time of writing, |
whoa, that's unexpected! yeah, document the intended behavior and . |
Reopening from #769 to integrate with new jsonschema validations.
Closes #757
@cmcarthur we're not currently running integration tests against Redshift... I think it's about time to change that! I'm going to see if I can add unit tests to at least validate the
profiles.yml
configs.Usage
This PR adds IAM authentication for Redshift to dbt. A new
method
parameter is available forredshift
connections. If themethod
parameter is not supplied, or if the valuedatabase
is given, then simple user/pass authentication will be used. If themethod
parameter is set toiam
, then dbt will use IAM Authentication to generate temporary credentials for the connection.Example profile with both auth methods shown below:
Properties:
method:
Can be one of
iam
,database
, or unsupplied. Default=databaseiam_duration_seconds:
The number of seconds after which the temporary credentials should expire. Default=900 (via boto)
cluster_id:
The name of the cluster (used to generate default credentials)
Discussion
The process of setting up IAM authentication is a little confusing. The documentation provided by Amazon is expectedly a little verbose, but I was able to get this set up with little prior information in about 15 minutes. Let's just be sure to link to the Amazon documentation when this is released.
Thanks to @danielchalef for the original code for this PR!