-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement relation filtering on get_catalog macro #964
Conversation
…o this method to reduce code duplication
|
||
def _get_relation_metadata_at_column_level(self, relations: List[BaseRelation]) -> agate.Table: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the second half of _get_one_catalog
above. It is broken out as it's own method now so it can be reused by the new method _get_one_catalog_by_relations
below.
|
||
|
||
class SparkRelationsCache(RelationsCache): | ||
def get_relation_from_stub(self, relation_stub: BaseRelation) -> BaseRelation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no method on RelationsCache
that returns a specific BaseRelation
. The next best option is to constantly return the set of relations in a schema (RelationsCache.get_relations
) and then look for the target relation in that set. This method duplicates the logic in get_relations
and adds the third check on identifier
.
resolves #900
Problem
When
get_catalog
runs, it needs to return all relations within a schema, hence it cannot be parallelized or scale. It also means that all relations will be returned, not just those managed by dbt.Solution
Allow for a set of relations to be passed in to limit the query against the database.
Checklist