Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tighten OAuth Scopes for BigQuery #23

Closed
heisencoder opened this issue Dec 15, 2020 · 7 comments · Fixed by #63
Closed

Tighten OAuth Scopes for BigQuery #23

heisencoder opened this issue Dec 15, 2020 · 7 comments · Fixed by #63
Labels
good_first_issue Good for newcomers type:enhancement New feature or request

Comments

@heisencoder
Copy link

Describe the feature

Reduce the BigQuery OAuth scopes down to the minimal set needed, in both dbt and dbt Cloud.

Additional context

Currently, the dbt BigQuery connector requests these three OAuth scopes:

The BigQuery scope is needed to access the database, but the cloud-platform and drive scopes are probably too broad. These scopes were originally added to address issue dbt-labs/dbt-core#502, primarily to allow for access for reading from Google Sheets. However, I don't immediately see a need for the cloud-platform scope, which gives access to a wide range of GCP resources, such as the following:

  • View and manage your data across all Google Cloud Platform services, such as:
  • View and manage your tables, datasets, and jobs in Google BigQuery
  • View and manage your data in Google Cloud Storage
  • View and manage your instances in Google Cloud SQL

Similarly, the drive scope has this access:

This app wants permission to access everything in your Google Drive. It will be able to do the same things you can do, including:

See your files
Upload and download your files
Delete your files
See the names and emails of people you share files with
Share and stop sharing your files with others
Remove people from your files
Organize your Drive
There may be private information in your Google Drive, like financial records, medical reports, photos or tax info.

I would think that minimally, these scopes could be reduced to the 'read-only' variants, and could probably be reduced further depending on the access needed for external tables. Maybe something like:

But I don't know yet whether these scopes are too restrictive.

Also note that dbt Cloud has the same list of OAuth scopes, so whatever is changed in dbt should also be changed in dbt Cloud.

@jtcohen6
Copy link
Contributor

@heisencoder I totally buy it. Let's use the tightest scopes that still support reading from external data sources: Cloud Storage, Drive, etc.

I don't think this is something we have integration tests for today, but I imagine I could create a random private GSheet and/or data in a private GCS bucket. (I already have a public bucket gs://dbt-external-tables-testing/ that I used for testing create external table over in dbt-labs/dbt-external-tables#49.)

@heisencoder
Copy link
Author

Thanks Jeremy! Let me know if you want any help.

@jtcohen6
Copy link
Contributor

I tried this out locally—revoking all my gcloud user's scopes and re-adding them one by one—and I'm surprised that the combination of drive.readonly + drive.metadata.readonly is not enough for querying Google Sheets registered as external tables in BigQuery. I'm getting either Failed to read the spreadsheet. Error code: PERMISSION_DENIED or Request had insufficient authentication scopes..

After looking a bit more online, BigQuery's docs and related posts all list full drive and cloud-platform scopes.

@heisencoder Separately from dbt, have you been able to query GSheet / GCS data through BigQuery with less-than-full scopes?

@heisencoder
Copy link
Author

Thanks Jeremy for looking into this more, and I apologize for not having done this research in-depth myself. From the sources you've quoted, it does appear that full Drive access is needed for reading from Sheets. I'll open an issue with the cloud team to see whether it's possible to update the Drive API to allow a more restrictive scope for BigQuery Access.

The other open question is whether the full cloud-platform scope is needed for reading from GCP, or whether the read-only scope is sufficient. From looking at https://developers.google.com/identity/protocols/oauth2/scopes, I'm a little more optimistic that a read-only scope would work in this context. More specific scopes that I've seen on each of the pages are:

  • Cloud Storage: https://www.googleapis.com/auth/devstorage.read_only
  • Bigtable: https://www.googleapis.com/auth/bigtable.data.readonly
  • Cloud SQL: https://www.googleapis.com/auth/sqlservice.admin

The Cloud SQL documentation does say that the cloud-platform scope is needed for calling Instances.export, so I don't know offhand whether BigQuery is using that feature and requires a bigger scope, or whether the BigQuery admin scope itself is sufficient for reading from Cloud SQL.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Mar 10, 2021

I gave a go at more testing here. In my previous comment, I think I was confusing two different things: (1) authentication-default scopes (the ones used with oauth method, and which we've documented here with (2) the scopes requested by dbt here for use with non-gcloud-based connection methods (service accounts and oauth tokens):

https://github.com/fishtown-analytics/dbt/blob/b70fb543f5825744b7705009aef815b8071ab516/plugins/bigquery/dbt/adapters/bigquery/connections.py#L129-L131

(We should better document in the codebase that these scopes do not apply to oauth-based connections, which instead rely on the scopes passed to gcloud auth authentication-default, i.e. to avoid confusing issues like dbt-labs/dbt-core#3040.)

Now that I've had a chance to test this for real, and informed by what we've long had in the docs, I do think we could manage to cut back on both these scopes to just:

  • https://www.googleapis.com/auth/bigquery
  • https://www.googleapis.com/auth/drive.readonly

In my testing, I'm able to query data living in Drive with readonly access, and data living in Google Cloud Storage with just the bigquery scope (despite no explicit mention of a cloud-platform or devstorage scope).

I haven't had a chance to test with external queries against Bigtable or Cloud SQL, as those aren't things I have ready access to. @heisencoder Any chance you could test those as well?

@heisencoder
Copy link
Author

Thanks Jeremy for digging into this some more!

@kconvey as FYI

We're currently running into some issues with querying Sheets-backed external tables via a service account, so don't know yet if it's a scope issue or something else. We should have more insight soon.

With regard to running a local Python script that accesses an external table, I've actually had luck with just specifying the bigquery scope in the Python code, but adding the drive.readonly scope to the gcloud command (i.e., gcloud auth application-default login --scopes=https://www.googleapis.com/auth/bigquery,https://www.googleapis.com/auth/drive.readonly)

With regard to external sheets, this URL provides some sample Python code that creates a new external table in a GCP project that you own (as defined by your default GCP project), and then queries it:

https://cloud.google.com/bigquery/external-data-drive#creating_and_querying_a_temporary_table

The Google Sheet in that example is publicly available.

@kconvey
Copy link
Contributor

kconvey commented Mar 23, 2021

Had some thoughts over in dbt-labs/dbt-core#3040 which I'll partially repeat here:

I think dbt should consider making scopes a configurable profile / project config option for BigQuery. If you went down the local gcloud path and set your default scopes to include https://www.googleapis.com/auth/drive, etc., you wouldn't need to set this for local runs, so perhaps it could default to just bigquery. If you were running somewhere remotely, as a service account, you can't set default scopes, so you may need to set this option, but making it configurable would mean you could specify something like https://www.googleapis.com/auth/spreadsheets.readonly, if that was all you needed. If you were running in that context and your request failed with insufficient scopes, definitely a pointer to the dbt docs which describe the config option would be extremely helpful.

I would even add that the dbt docs could explain a bit what combinations are needed for enabling common setups like "Reading from sheets requires scopes: ['https://www.googleapis.com/auth/spreadsheets.readonly', 'https://www.googleapis.com/auth/bigquery']"

@jtcohen6 jtcohen6 transferred this issue from dbt-labs/dbt-core Oct 12, 2021
@jtcohen6 jtcohen6 added type:enhancement New feature or request good_first_issue Good for newcomers labels Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good_first_issue Good for newcomers type:enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants