-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow BigQuery to default on project name #2908
Allow BigQuery to default on project name #2908
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice start @max-sixty!
I ran into an issue testing this out locally.
what's the easiest way to test this?
Fair question. We don't use oauth for our integration tests, since both CircleCI + Azure DevOps connect via service account. I'll think about if we have any way of mocking this.
def get_bigquery_defaults(cls): | ||
""" Returns (credentials, project_id) """ | ||
# Cached, because the underlying implementation shells out. | ||
return google.auth.default(scopes=cls.SCOPE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I test this locally, cls.get_bigquery_defaults()
seems to return
(<google.oauth2.credentials.Credentials object at 0x112630ca0>, None)
resulting in the database remaining None
below, and returning the following error from the google cloud client:
OSError: Project was not passed and could not be determined from the environment.
I don't think this is an issue on my end, but curious to hear what might be different, if you've managed to get this working locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What auth are you using? I'm using ADC — i.e. running gcloud auth login --update-adc
.
What do you get from:
In [1]: from google.auth import default
In [2]: default()
Out[2]: (<google.oauth2.credentials.Credentials at 0x10a19e850>, '{foo_project}')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, this was definitely on me:
In [1]: from google.auth import default
In [2]: default()
Out[2]: (<google.oauth2.credentials.Credentials at 0x10a19e850>, None)
I originally set up local gcloud oauth a while ago, so I updated the auth as you recommended, and the second arg now returns my default project.
Even so, I'm still running into this issue when dbt tries to get information about the None
database by running list_None
:
google.api_core.exceptions.BadRequest: 400 GET https://bigquery.googleapis.com/bigquery/v2/projects/None/datasets?maxResults=10000: Invalid project ID 'None'. Project IDs must contain 6-63 lowercase letters, digits, or dashes. Some project IDs also include domain name separated by a colon. IDs must start with a letter and may not end with a dash.
I think we may need to fill in the missing database/project earlier on—ideally as a __post_init__
on BigQueryCredentials
—because dbt needs the database value in more places than just its connection client. This would also be relevant in case users want to make use of the Jinja context variable target.database
/target.project
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, good point. I pushed a change. Generally I would have that sort of function at the module level; but put them as methods as it seems to be fairly class-focused atm. Let me know which you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's list_None
? What's the test you're running locally to test this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the first things dbt does at the start of a compile/run is grab metadata to populate its relational cache. It will log something to the effect of:
Acquiring new bigquery connection "list_[project_name]".
This was showing up for me as:
Acquiring new bigquery connection "list_None".
But now it looks good! target.database
/target.project
are also working as expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great — thanks for checking @jtcohen6 !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great! Could you add a changelog entry, and add yourself to the list of contributors?
Looking at #2805, we added some unit tests to mock the BQ connection and ensure that the types and required fields all lined up. We might do well to add the same here, to ensure that dbt is happy with a profile that's missing a database
/project
.
I don't think we have a way to integration-test this, given our current CI setup. Although it's hardly a substitute for automated testing, I'll start using this default-project profile setup for my local connections to BigQuery.
Also — though I appreciate that you separated out the one-line fix in #2907 — I'd be fine with your pulling that in here, since it more or less touches the same code.
def get_bigquery_defaults(cls): | ||
""" Returns (credentials, project_id) """ | ||
# Cached, because the underlying implementation shells out. | ||
return google.auth.default(scopes=cls.SCOPE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the first things dbt does at the start of a compile/run is grab metadata to populate its relational cache. It will log something to the effect of:
Acquiring new bigquery connection "list_[project_name]".
This was showing up for me as:
Acquiring new bigquery connection "list_None".
But now it looks good! target.database
/target.project
are also working as expected
core/dbt/contracts/connection.py
Outdated
database: str | ||
# Most DBs have this as required, but BigQuery is Optional, and mypy | ||
# doesn't seem to allow overriding the type in `BigQueryCredentials` | ||
database: Optional[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also set database
to Optional[str]
in dbt-spark, which inherits/replaces the base Credentials
dataclass, too. I wonder why mypy doesn't mind it there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point, thanks @jtcohen6 . I think I'm confusing errors — the error it issued was actually Attributes without a default cannot follow attributes with one
. Let's try the latest push
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also does repo actually run mypy?
I see a lot of type annotations in the code, but also a lot of errors when running mypy? Mostly in the test
path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we run mypy, but only on the core/dbt
path:
https://github.com/fishtown-analytics/dbt/blob/5ba5271da99bc1ef7fbad2f7d0b45634087300fa/tox.ini#L15
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW — if these were in the mypy configs explicitly, then mypy wouldn't raise errors in an editor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" Returns (credentials, project_id) """ | ||
# This method is copied from ` BigQueryConnectionManager`, because it's | ||
# required in both classes. | ||
# We could move this & the scopes to the module level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to move this to module level, there's no state involved and it's shared 👍 go for it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like there's just a few small pep8 errors:
plugins/bigquery/dbt/adapters/bigquery/connections.py:48:1: E302 expected 2 blank lines, found 1
plugins/bigquery/dbt/adapters/bigquery/connections.py:50:4: W291 trailing whitespace
plugins/bigquery/dbt/adapters/bigquery/connections.py:52:1: W293 blank line contains whitespace
plugins/bigquery/dbt/adapters/bigquery/connections.py:53:52: W291 trailing whitespace
Co-authored-by: Kyle Wigley <kwigley44@gmail.com>
Is this failure a flake or causal? https://app.circleci.com/pipelines/github/fishtown-analytics/dbt/1600/workflows/12269ee7-f010-40a0-bcb5-ba2a957d19fe/jobs/33567 |
Looks like this was just a fluke. Re-running from failed now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @max-sixty! Thanks for delving into our testing flows as well. Just needs a changelog entry and then this is ready to ship
Great, done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very cool! thanks for the contribution :)
Cheers for the guidance @jtcohen6 ! |
resolves #2828
Description
As discussed in the issue.
I couldn't quite get my head around the tests; seems like there's lots of nesting there — what's the easiest way to test this?
Checklist
CHANGELOG.md
and added information about my change to the "dbt next" section.