-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement relation filtering on get_catalog macro #964
Closed
Closed
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
bbac669
changelog
mikealfare 28dd554
add _get_one_catalog_by_relations method, redirect _get_one_catalog t…
mikealfare b9cfd81
override get_catalog_by_relations to align with how get_catalog is ov…
mikealfare eab3067
turn off get_catalog_by_relations to test
mikealfare 73979ee
call get_catalog by relation
mikealfare 17c245f
guard against multiple info schemas in get_catalog_by_relations
mikealfare 9b0bd5f
reuse get_catalog logic, add ability to pass relations into new method
mikealfare 5236d40
redirect get_catalog_relations to get_catalog to check plumbing
mikealfare bf6e910
manually create schema_map so that it's limited to the relations prov…
mikealfare 18d30f7
add check to guarantee relations is populated
mikealfare 844cfbf
catch exception when info_schemas is empty
mikealfare 2d4d3c2
catch exception when info_schemas is empty
mikealfare c3e608d
update the connection name
mikealfare 5221389
mimic get_catalog behavior
mikealfare ccfeebd
mimic get_catalog behavior
mikealfare d609da9
redirect to new method
mikealfare 3afd659
error on empty list of relations
mikealfare f28c181
use the cached version of the relation to ensure we have column metadata
mikealfare 52b9dc5
move cache logic into the cache
mikealfare 642bd12
fix typo
mikealfare 534cd0f
remove whitespace fixes to reduce PR confusion
mikealfare 0efc5a4
remove whitespace fixes to reduce PR confusion
mikealfare 0e5c614
remove whitespace fixes to reduce PR confusion
mikealfare File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
kind: Features | ||
body: Support limiting get_catalog by object name | ||
time: 2023-12-15T19:11:54.536441-05:00 | ||
custom: | ||
Author: mikealfare | ||
Issue: "900" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
from dbt.adapters.base import BaseRelation | ||
from dbt.adapters.cache import RelationsCache | ||
from dbt.exceptions import MissingRelationError | ||
from dbt.utils import lowercase | ||
|
||
|
||
class SparkRelationsCache(RelationsCache): | ||
def get_relation_from_stub(self, relation_stub: BaseRelation) -> BaseRelation: | ||
""" | ||
Case-insensitively yield all relations matching the given schema. | ||
|
||
:param BaseRelation relation_stub: The relation to look for | ||
:return BaseRelation: The cached version of the relation | ||
""" | ||
with self.lock: | ||
results = [ | ||
relation.inner | ||
for relation in self.relations.values() | ||
if all( | ||
{ | ||
lowercase(relation.database) == lowercase(relation_stub.database), | ||
lowercase(relation.schema) == lowercase(relation_stub.schema), | ||
lowercase(relation.identifier) == lowercase(relation_stub.identifier), | ||
} | ||
) | ||
] | ||
if len(results) == 0: | ||
raise MissingRelationError(relation_stub) | ||
return results[0] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,10 +4,10 @@ | |
from typing import Any, Dict, Iterable, List, Optional, Union, Type, Tuple, Callable, Set | ||
|
||
from dbt.adapters.base.relation import InformationSchema | ||
from dbt.adapters.capability import CapabilityDict, CapabilitySupport, Support, Capability | ||
from dbt.contracts.graph.manifest import Manifest | ||
|
||
from typing_extensions import TypeAlias | ||
|
||
import agate | ||
|
||
import dbt | ||
|
@@ -19,6 +19,7 @@ | |
from dbt.adapters.spark import SparkConnectionManager | ||
from dbt.adapters.spark import SparkRelation | ||
from dbt.adapters.spark import SparkColumn | ||
from dbt.adapters.spark.cache import SparkRelationsCache | ||
from dbt.adapters.spark.python_submissions import ( | ||
JobClusterPythonJobHelper, | ||
AllPurposeClusterPythonJobHelper, | ||
|
@@ -101,12 +102,20 @@ class SparkAdapter(SQLAdapter): | |
ConstraintType.foreign_key: ConstraintSupport.NOT_ENFORCED, | ||
} | ||
|
||
_capabilities = CapabilityDict( | ||
{Capability.SchemaMetadataByRelations: CapabilitySupport(support=Support.Full)} | ||
) | ||
|
||
Relation: TypeAlias = SparkRelation | ||
RelationInfo = Tuple[str, str, str] | ||
Column: TypeAlias = SparkColumn | ||
ConnectionManager: TypeAlias = SparkConnectionManager | ||
AdapterSpecificConfigs: TypeAlias = SparkConfig | ||
|
||
def __init__(self, config) -> None: # type: ignore | ||
super().__init__(config) | ||
self.cache: SparkRelationsCache = SparkRelationsCache() | ||
|
||
@classmethod | ||
def date_function(cls) -> str: | ||
return "current_timestamp()" | ||
|
@@ -377,6 +386,25 @@ def get_catalog( | |
catalogs, exceptions = catch_as_completed(futures) | ||
return catalogs, exceptions | ||
|
||
def get_catalog_by_relations( | ||
self, manifest: Manifest, relations: Set[BaseRelation] | ||
) -> Tuple[agate.Table, List[Exception]]: | ||
with executor(self.config) as tpe: | ||
futures: List[Future[agate.Table]] = [] | ||
for relation in relations: | ||
futures.append( | ||
tpe.submit_connected( | ||
self, | ||
str(relation), | ||
self._get_one_catalog_by_relations, | ||
relation.information_schema_only(), | ||
[relation], | ||
manifest, | ||
) | ||
) | ||
catalogs, exceptions = catch_as_completed(futures) | ||
return catalogs, exceptions | ||
|
||
def _get_one_catalog( | ||
self, | ||
information_schema: InformationSchema, | ||
|
@@ -390,13 +418,27 @@ def _get_one_catalog( | |
|
||
database = information_schema.database | ||
schema = list(schemas)[0] | ||
relations = self.list_relations(database, schema) | ||
return self._get_relation_metadata_at_column_level(relations) | ||
|
||
def _get_relation_metadata_at_column_level(self, relations: List[BaseRelation]) -> agate.Table: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was the second half of |
||
columns: List[Dict[str, Any]] = [] | ||
for relation in self.list_relations(database, schema): | ||
for relation in relations: | ||
logger.debug("Getting table schema for relation {}", str(relation)) | ||
columns.extend(self._get_columns_for_catalog(relation)) | ||
return agate.Table.from_object(columns, column_types=DEFAULT_TYPE_TESTER) | ||
|
||
def _get_one_catalog_by_relations( | ||
self, | ||
information_schema: InformationSchema, | ||
relations: List[BaseRelation], | ||
manifest: Manifest, | ||
) -> agate.Table: | ||
cached_relations = [ | ||
self.cache.get_relation_from_stub(relation_stub) for relation_stub in relations | ||
] | ||
return self._get_relation_metadata_at_column_level(cached_relations) | ||
|
||
def check_schema_exists(self, database: str, schema: str) -> bool: | ||
results = self.execute_macro(LIST_SCHEMAS_MACRO_NAME, kwargs={"database": database}) | ||
|
||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no method on
RelationsCache
that returns a specificBaseRelation
. The next best option is to constantly return the set of relations in a schema (RelationsCache.get_relations
) and then look for the target relation in that set. This method duplicates the logic inget_relations
and adds the third check onidentifier
.