-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tighten OAuth Scopes for BigQuery #23
Comments
@heisencoder I totally buy it. Let's use the tightest scopes that still support reading from external data sources: Cloud Storage, Drive, etc. I don't think this is something we have integration tests for today, but I imagine I could create a random private GSheet and/or data in a private GCS bucket. (I already have a public bucket |
Thanks Jeremy! Let me know if you want any help. |
I tried this out locally—revoking all my After looking a bit more online, BigQuery's docs and related posts all list full @heisencoder Separately from dbt, have you been able to query GSheet / GCS data through BigQuery with less-than-full scopes? |
Thanks Jeremy for looking into this more, and I apologize for not having done this research in-depth myself. From the sources you've quoted, it does appear that full Drive access is needed for reading from Sheets. I'll open an issue with the cloud team to see whether it's possible to update the Drive API to allow a more restrictive scope for BigQuery Access. The other open question is whether the full cloud-platform scope is needed for reading from GCP, or whether the read-only scope is sufficient. From looking at https://developers.google.com/identity/protocols/oauth2/scopes, I'm a little more optimistic that a read-only scope would work in this context. More specific scopes that I've seen on each of the pages are:
The Cloud SQL documentation does say that the |
I gave a go at more testing here. In my previous comment, I think I was confusing two different things: (1) (We should better document in the codebase that these scopes do not apply to Now that I've had a chance to test this for real, and informed by what we've long had in the docs, I do think we could manage to cut back on both these scopes to just:
In my testing, I'm able to query data living in Drive with I haven't had a chance to test with external queries against Bigtable or Cloud SQL, as those aren't things I have ready access to. @heisencoder Any chance you could test those as well? |
Thanks Jeremy for digging into this some more! @kconvey as FYI We're currently running into some issues with querying Sheets-backed external tables via a service account, so don't know yet if it's a scope issue or something else. We should have more insight soon. With regard to running a local Python script that accesses an external table, I've actually had luck with just specifying the With regard to external sheets, this URL provides some sample Python code that creates a new external table in a GCP project that you own (as defined by your default GCP project), and then queries it: https://cloud.google.com/bigquery/external-data-drive#creating_and_querying_a_temporary_table The Google Sheet in that example is publicly available. |
Had some thoughts over in dbt-labs/dbt-core#3040 which I'll partially repeat here:
I would even add that the dbt docs could explain a bit what combinations are needed for enabling common setups like "Reading from sheets requires |
Describe the feature
Reduce the BigQuery OAuth scopes down to the minimal set needed, in both dbt and dbt Cloud.
Additional context
Currently, the dbt BigQuery connector requests these three OAuth scopes:
The BigQuery scope is needed to access the database, but the cloud-platform and drive scopes are probably too broad. These scopes were originally added to address issue dbt-labs/dbt-core#502, primarily to allow for access for reading from Google Sheets. However, I don't immediately see a need for the
cloud-platform
scope, which gives access to a wide range of GCP resources, such as the following:Similarly, the
drive
scope has this access:I would think that minimally, these scopes could be reduced to the 'read-only' variants, and could probably be reduced further depending on the access needed for external tables. Maybe something like:
But I don't know yet whether these scopes are too restrictive.
Also note that dbt Cloud has the same list of OAuth scopes, so whatever is changed in dbt should also be changed in dbt Cloud.
The text was updated successfully, but these errors were encountered: