-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't access BigQuery tables whose original sources are spreadsheets on Google Drive with the oauth method #3040
Comments
Thanks for the detailed report @yu-iskw, for finding the relevant diff here, and for the associated PR. I'm trying to reproduce this error locally. So far I haven't been able to:
Does the change in #3041 resolve this issue for you locally? Even if I can't manage to reproduce the error, that's a helpful indication. In any case, we definitely need better automated integration testing for external data sources (including GSheets) in BigQuery. |
I have reproduced this bug myself. In my case, I personally (johnflux@mycompany.com) have access to the spreadsheet, but my company ("mycompany") as a group does not. If I use 'bq' from the command line, I get a permission failed. From the webui it works. If I change the spreadsheet to grant access to "mycompany" then both work. I'm not a sysadmin for my company, and I have no idea how to test the provided patch sorry. |
@johnflux To clarify, were you able to reproduce this error using dbt? Or using the
If you're able to reproduce this error with dbt==0.19.0, then you can try installing dbt (in a virtual environment) from the branch with the proposed fix:
Then re-run the model that depends on the GSheet, and see if it works, where previously it does not. |
We're using BigQuery with dbt version 0.18.1. We're getting that same Database Error:
@jtcohen6 we installed that patch branch and got the exact same error. We couldn't locate anything in the logs either. |
We also see a failure that may be related to this change. Our failure only occurs on version We get the same error as has been previously mentioned:
|
Hey @walshie4! Could you try installing the fix from #3041 in a virtualenv, and see if that removes the error? Something like:
I haven't yet been able to replicate this error myself, so hearing that the proposed fix resolves the issue is a good indication that it's the right change. |
I've got a project that reproduces the failure message with dbt 0.17, and confirmed that the PR #3041 does not resolve the issue for me. |
As another data point, I reproduced the same error by directly running the However, even after running that gcloud command, dbt is still giving me the same error. The BigQuery documentation does suggest that this is an issue with not providing the appropriate scope, so I don't know why it's still failing even after patching in PR #3041. https://cloud.google.com/bigquery/external-data-drive#python Looking at that page for guidance, I'm able to reproduce the Drive error with just a small python snippet: from google.cloud import bigquery
import google.auth
credentials, project = google.auth.default(
scopes=[
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/bigquery",
]
)
client = bigquery.Client(credentials=credentials, project=project)
query_job = client.query('SELECT count(some_field) as c FROM my_dataset.some_external_table_linked_to_a_google_sheet')
print(list(query_job)) So this issue seems to me like it's an issue with the BigQuery client code itself. Still digging... |
Okay, I think I found the fix. It's necessary to run this command to add the Drive scope to the application default credentials: gcloud auth application-default login --scopes=https://www.googleapis.com/auth/drive,https://www.googleapis.com/auth/bigquery (Note that you also have to add the bigquery scope and anything else needed by dbt). |
And as expected, this is already documented within the dbt site here: https://docs.getdbt.com/reference/warehouse-profiles/bigquery-profile#local-oauth-gcloud-setup That documentation recommends running this command: gcloud auth application-default login \
--scopes=https://www.googleapis.com/auth/userinfo.email,\
https://www.googleapis.com/auth/cloud-platform,\
https://www.googleapis.com/auth/drive.readonly However, when I run that command I get this error:
If I add |
@heisencoder Thanks for digging! I wonder if @yu-iskw @dalebradman @walshie4 Could you try again after running the |
@jtcohen6 This is a good point! With my recent test cases, I've just been using the Python Bigquery client directly, but as you mentioned we may need to expand this to readwrite access based on the findings of #2953. All that said, the dbt documentation has been specifying the drive.readonly scope for the gcloud command, so I'm curious whether the readonly scope actually works now. |
To summarize from the conversation over in #2953:
Given the confusion this issue stirred up inn the meantime, there's definitely appetite to clarify this behavior, by better documenting it or making it more intuitive. Right now, the Drive scope is "optional" for users who authenticate and develop locally via Is making that distinction still desirable? I think there are two valid answers:
Let's use this issue to discuss which of these two options makes sense, and plan for a code change that will help us avoid this confusion in the future. |
I'm not optimistic that #3041 will actually change anything, since the OAUTH method is just using the scopes that were passed into the |
Finally got around to testing #3041 and it appears to not solve the problem in my local reproduction setup. Added a print in here to let me know what scopes are being passed. Still errors out:
After running
This fixes the problem with both version I'm still looking into why this isn't working when running as a service account which has been given access to the document and BQ. Will report back if I find anything of value |
Okay I'm back, and I have good news. After digging through here and trying out some of the changes I think I understand how this happened and how we can resolve the issue. To start the issue was first introduced in PR #2908 where this line which passed scopes when using OAuth was replaced with this line which relies on the defaults configured with This is why even after this change if you authenticate with the scopes added (using However, if you have a setup like we do over at Honey 😉 you will see the problem because we use a service account in an environment with no default credentials available and authenticate by activating the service account with Looking back now this is actually not a mentioned path on the BQ profile setup page (can't remember if it was back a ~year ago when we set this up) which may be why this path got hit with the bug. Tested our setup with #3041 and it does resolve the issue (manually adding the 4 lines of changes causes a previous failure to then work in the same environment back-to-back). I think we should merge that change, and consider updating the docs to cover the Let me know if you have any questions. I also support making these optional in some form for those who don't need them. |
@walshie4 Heroic! Thanks for the detailed investigative work.
It all becomes clear. If using a service account to authenticate via the Given that, I'm happy merging #3041 and sneaking this into v0.19.1. We'll likely want to cut a second release candidate, and ask you folks to test it out. I'd also welcome a PR here that updates "Local OAuth gcloud setup" to include mention of Service Account OAuth gcloud setup as well. |
Just in case this helps anyone who may be trying to use the OAuth path within dbt from a GCP context: even without this regression (i.e. the credentials you create are requesting https://www.googleapis.com/auth/drive scope, as in 0.18 and 0.19.1), if the scopes you enable on your cluster (ex. GKE) are a subset of https://www.googleapis.com/auth/drive like https://www.googleapis.com/auth/spreadsheets.readonly, and even if you only practically use that subset, the request made by dbt may fail with the error described in this bug ( I didn't dig much further than to expand scope on GKE from
I think dbt should consider making |
This seems reasonable to me! @kconvey Could you drop that proposal over in https://github.com/fishtown-analytics/dbt/issues/2953? It sounds like, in addition to tightening the scopes requested by default, we should give users the option to request even tighter ones, e.g. if dbt doesn't need read access to |
@walshie4 We just merged the fix in #3041 and included it in v0.19.1rc2. When you get a chance, could you try installing the release candidate and authenticating via the magical service account / |
Just tried it out. Can confirm that the |
Can also confirm that this works once you re-authenticate with the required scopes! 🥳 |
Describe the bug
It is impossible to access BigQuery tables whose original sources are spreadsheets on Google Drive with the oauth method, because we don't pass any
scopes
parameters to the method to create BigQuery credentials. We need a scope ofhttps://www.googleapis.com/auth/drive
to access such tables. Moreover, there is no way to grant scopes, not permissions, to a google cloud authentication. We can only pass the scope information when creating a credentials.Steps To Reproduce
oauth
method for BigQuery.Expected behavior
Even when we use the
oauth
method of BigQuery profile, we should be able to access BigQuery tables whose original data sources are spreadsheets on Google Drive with the scope ofhttps://www.googleapis.com/auth/drive
.Screenshots and log output
System information
Which database are you using dbt with?
The output of
dbt --version
:The operating system you're using:
Mac OS 10.15.7 (19H114)
The output of
python --version
:Python 3.8.5
The text was updated successfully, but these errors were encountered: