-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add validate_sql method to base adapter with implementation for SQLAdapters #8001
Changes from 1 commit
3a99352
d71a162
3b74488
32bf27a
2b8592b
83950d7
7d2a6f6
58b82d6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
kind: Features | ||
body: Add dry_run method to BaseAdapter with implementation for SQLAdapter | ||
time: 2023-06-29T17:57:12.599313-07:00 | ||
custom: | ||
Author: tlento | ||
Issue: "7839" |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -289,6 +289,15 @@ def execute( | |
""" | ||
return self.connections.execute(sql=sql, auto_begin=auto_begin, fetch=fetch, limit=limit) | ||
|
||
def dry_run(self, sql: str) -> AdapterResponse: | ||
"""Submit the given SQL for validation, but not execution. | ||
|
||
This is a thin wrapper around ConnectionManager.dry_run. | ||
|
||
:param str sql: The sql to validate | ||
""" | ||
return self.connections.dry_run(sql=sql) | ||
|
||
@available.parse(lambda *a, **k: []) | ||
def get_column_schema_from_query(self, sql: str) -> List[BaseColumn]: | ||
"""Get a list of the Columns with names and data types from the given sql.""" | ||
|
@@ -785,7 +794,6 @@ def _make_match( | |
schema: str, | ||
identifier: str, | ||
) -> List[BaseRelation]: | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Weird, must be some auto-formatter somewhere. :/ |
||
matches = [] | ||
|
||
search = self._make_match_kwargs(database, schema, identifier) | ||
|
@@ -1063,7 +1071,6 @@ def _get_one_catalog( | |
schemas: Set[str], | ||
manifest: Manifest, | ||
) -> agate.Table: | ||
|
||
kwargs = {"information_schema": information_schema, "schemas": schemas} | ||
table = self.execute_macro( | ||
GET_CATALOG_MACRO_NAME, | ||
|
@@ -1453,7 +1460,6 @@ def render_model_constraint(cls, constraint: ModelLevelConstraint) -> Optional[s | |
def catch_as_completed( | ||
futures, # typing: List[Future[agate.Table]] | ||
) -> Tuple[agate.Table, List[Exception]]: | ||
|
||
# catalogs: agate.Table = agate.Table(rows=[]) | ||
tables: List[agate.Table] = [] | ||
exceptions: List[Exception] = [] | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,7 +52,6 @@ def add_query( | |
bindings: Optional[Any] = None, | ||
abridge_sql_log: bool = False, | ||
) -> Tuple[Connection, Any]: | ||
|
||
connection = self.get_thread_connection() | ||
if auto_begin and connection.transaction_open is False: | ||
self.begin() | ||
|
@@ -152,6 +151,18 @@ def execute( | |
table = dbt.clients.agate_helper.empty_table() | ||
return response, table | ||
|
||
def dry_run(self, sql: str) -> AdapterResponse: | ||
"""Submit the given SQL to the engine for validation, but not execution. | ||
|
||
By default we simply prefix the query with the explain keyword and allow the | ||
exceptions thrown by the underlying engine on invalid SQL inputs to bubble up | ||
to the exception handler. | ||
|
||
:param sql str: The sql to validate | ||
""" | ||
explain_sql = f"explain {sql}" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Generally we would wrap this in a Jinja macro, and call that macro here (via {% macro get_explain_sql(query) %}
explain {{ query }}
{% endmacro %} That way it's "user space" code that someone could could override if they wanted to. Including an adapter maintainer, if the only thing that's different on their data platform is (e.g.) a synonym for There are legitimate cases for using Python templating over Jinja templating: Easier to write, to unit-test. In cases where we believe it's tricky-to-write & functionally uncontroversial, such that no user would realistically want/need to override, that approach can make more sense. Not saying we need to go one way or the other — just offering my perspective & context on how we've done this in the past. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've added a commit to switch this to macros instead. It seems like it works. The downside of macros is user-land code gets coupled into our dry run logic, and people could conceivably override the macro to execute queries instead of doing dry run or explain or whatever. Whether or not that matters really depends on how we think about this method. If it's a low level primitive we probably want to assert more control (which is how I was approaching it originally). If it's an accessor that we want to be closely tied in to the rest of dbt's workings then I suspect the current macro-based approach is the better one. You all have far more experience with dbt-core and knowledge of where it might be heading, so I'll defer to you on which approach we should take here. |
||
return self.execute(explain_sql, auto_begin=True)[0] | ||
|
||
def add_begin_query(self): | ||
return self.add_query("BEGIN", auto_begin=False) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
from typing import Type | ||
|
||
import pytest | ||
|
||
from dbt.exceptions import DbtRuntimeError | ||
from dbt.adapters.base.impl import BaseAdapter | ||
|
||
|
||
class BaseDryRunMethod: | ||
"""Tests the behavior of the dry run method for the relevant adapters. | ||
|
||
The valid and invalid SQL should work with most engines by default, but | ||
both inputs can be overridden as needed for a given engine to get the correct | ||
behavior. | ||
|
||
The base method is meant to throw the appropriate custom exception when dry_run | ||
fails. | ||
""" | ||
|
||
@pytest.fixture(scope="class") | ||
def valid_sql(self) -> str: | ||
"""Returns a valid statement for issuing as a dry run query. | ||
|
||
Ideally this would be checkable for non-execution. For example, we could use a | ||
CREATE TABLE statement with an assertion that no table was created. However, | ||
for most adapter types this is unnecessary - the EXPLAIN keyword has exactly the | ||
behavior we want, and here we are essentially testing to make sure it is | ||
supported. As such, we return a simple SELECT query, and leave it to | ||
engine-specific test overrides to specify more detailed behavior as appropriate. | ||
""" | ||
|
||
return "select 1" | ||
|
||
@pytest.fixture(scope="class") | ||
def invalid_sql(self) -> str: | ||
"""Returns an invalid statement for issuing a bad dry run query.""" | ||
|
||
return "Let's run some invalid SQL and see if we get an error!" | ||
|
||
@pytest.fixture(scope="class") | ||
def expected_exception(self) -> Type[Exception]: | ||
"""Returns the Exception type thrown by a failed query. | ||
|
||
Defaults to dbt.exceptions.DbtRuntimeError because that is the most common | ||
base exception for adapters to throw.""" | ||
return DbtRuntimeError | ||
|
||
def test_valid_dry_run(self, adapter: BaseAdapter, valid_sql: str) -> None: | ||
"""Executes a dry run query on valid SQL. No news is good news.""" | ||
with adapter.connection_named("test_valid_dry_run"): | ||
adapter.dry_run(valid_sql) | ||
|
||
def test_invalid_dry_run( | ||
self, | ||
adapter: BaseAdapter, | ||
invalid_sql: str, | ||
expected_exception: Type[Exception], | ||
) -> None: | ||
"""Executes a dry run query on invalid SQL, expecting the exception.""" | ||
with pytest.raises(expected_exception): | ||
with adapter.connection_named("test_invalid_dry_run"): | ||
adapter.dry_run(invalid_sql) | ||
|
||
|
||
class TestDryRunMethod(BaseDryRunMethod): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since 'run' generally refers to a
dbt run
command (imagine if we develop a--dry-run
flag for dbt at some point) should we just call this 'explain' or 'validate_sql'?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dry_run
andvalidate_sql
both seem fine to me - the former references flag settings in BigQuery in ways that set user expectations in accordance with what we're planning to provide (i.e., theBigQueryResponse
which appears to have all of the cost estimate parameters in place), while the latter is better differentiated from dbt core commands.explain
implies that we'll return an explain plan, which isn't currently the case. It seemed simplest to not use explain for that reason and instead get something in place we could use and evolve, but I'm open to calling thisexplain
and adding explain result fetching later. We could make this an ExplainResponse return that is a subtype of AdapterResponse for backwards compatibility.@jtcohen6 any opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validate_sql
makes sense to me! That captures the narrower goal of this method for now