Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement relation filtering on get_catalog macro #964

Closed
wants to merge 23 commits into from

Conversation

mikealfare
Copy link
Contributor

@mikealfare mikealfare commented Dec 16, 2023

resolves #900

Problem

When get_catalog runs, it needs to return all relations within a schema, hence it cannot be parallelized or scale. It also means that all relations will be returned, not just those managed by dbt.

Solution

Allow for a set of relations to be passed in to limit the query against the database.

  • add _get_one_catalog_relations method
  • point _get_one_catalog to this method since the second half is the same
  • register that this capability is available

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

@mikealfare mikealfare self-assigned this Dec 16, 2023
@cla-bot cla-bot bot added the cla:yes label Dec 16, 2023
@mikealfare mikealfare added the backport 1.7.latest Tag for PR to be backported to the 1.7.latest branch label Dec 16, 2023
@mikealfare mikealfare marked this pull request as ready for review December 16, 2023 00:42
@mikealfare mikealfare requested a review from a team as a code owner December 16, 2023 00:42
@mikealfare mikealfare marked this pull request as draft December 18, 2023 18:16

def _get_relation_metadata_at_column_level(self, relations: List[BaseRelation]) -> agate.Table:
Copy link
Contributor Author

@mikealfare mikealfare Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the second half of _get_one_catalog above. It is broken out as it's own method now so it can be reused by the new method _get_one_catalog_by_relations below.



class SparkRelationsCache(RelationsCache):
def get_relation_from_stub(self, relation_stub: BaseRelation) -> BaseRelation:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no method on RelationsCache that returns a specific BaseRelation. The next best option is to constantly return the set of relations in a schema (RelationsCache.get_relations) and then look for the target relation in that set. This method duplicates the logic in get_relations and adds the third check on identifier.

@mikealfare mikealfare marked this pull request as ready for review December 20, 2023 01:01
@mikealfare mikealfare marked this pull request as draft February 13, 2024 23:53
@mikealfare mikealfare closed this Nov 9, 2024
@mikealfare mikealfare deleted the applied-state/get-catalog-by-object branch November 9, 2024 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.7.latest Tag for PR to be backported to the 1.7.latest branch cla:yes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ADAP-930] [Feature] Implement relation filtering on get_catalog macro
1 participant