Add new get_catalog_relations macro, Supporting Changes #8648

peterallenwebb · 2023-09-13T23:43:13Z

resolves #8521

Checklist

I have read the contributing guide and understand what's expected of me
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

…lations in a schema the adapter should return data about

github-actions · 2023-09-13T23:43:28Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

codecov · 2023-09-13T23:45:37Z

Codecov Report

Attention: 14 lines in your changes are missing coverage. Please review.

Comparison is base (f17c1f3) 86.65% compared to head (d10e802) 86.50%.
Report is 9 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8648      +/-   ##
==========================================
- Coverage   86.65%   86.50%   -0.15%     
==========================================
  Files         176      176              
  Lines       25712    25804      +92     
==========================================
+ Hits        22280    22323      +43     
- Misses       3432     3481      +49

Flag	Coverage Δ
integration	`83.27% <69.56%> (-0.14%)`	⬇️
unit	`65.11% <52.17%> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
core/dbt/adapters/base/relation.py	`86.02% <33.33%> (-7.87%)`	⬇️
core/dbt/adapters/base/impl.py	`81.94% <72.09%> (-1.64%)`	⬇️

... and 15 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mikealfare

I read through your draft and have some comments. Most of them are just thoughts that I'm throwing against the wall, so take them with a grain of salt; I'm not strongly for or against anything that I said.

core/dbt/adapters/base/impl.py

core/dbt/include/global_project/macros/adapters/metadata.sql

…queries

…ions

…support

* Use profile specified in --profile with dbt init * Update .changes/unreleased/Fixes-20230424-161642.yaml Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> * Refactor run() method into functions, replace exit() calls with exceptions * Update help text for profile option --------- Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com>

…relations

…ions

gshank

My only concern is the lines that show up as untested, which looks like it mainly happens because none of our test cases contain more than 100 relations. I assume that part was hand tested? I wonder if we should create a large test case... but I don't think we need to hold this up for that.

Otherwise looks good.

mikealfare

I left a bunch of comments; let me know if you'd like to discuss any of them. Also, this PR lacks a changelog, do we need one?

mikealfare · 2023-09-26T20:49:52Z

core/dbt/adapters/base/impl.py

@@ -415,6 +418,29 @@ def _get_catalog_schemas(self, manifest: Manifest) -> SchemaSearchMap:
        lowercase strings.
        """
        info_schema_name_map = SchemaSearchMap()
+        relations = self._get_catalog_relations(manifest)
+        for relation in relations:


I can't put my finger on why, but I feel like this piece should be a method on SchemaSearchMap, something like SchemaSearchMap.extend(relations). Then this becomes:

info_schema_name_map = SchemaSearchMap() info_schema_name_map.extend(self._get_catalog_relations(manifest)) return info_schema_name_map

Hey, @mikealfare, following this up, since I didn't incorporate every one of your requested changes.

My thinking at the time was that the code I had was tested and working, so I didn't want continue tweaking relatively small details after we had already done a lot of collaboration and gone multiple rounds of review. I should have at least left a note to that effect.

However, if you think this or any of the other remaining issues ought to be addressed before 1.7.0, let me know and I am happy to open a follow up issue.

core/dbt/adapters/base/relation.py

mikealfare · 2023-09-26T21:24:13Z

plugins/postgres/dbt/include/postgres/macros/catalog.sql


+
+{% macro postgres__get_catalog(information_schema, schemas) -%}


For some reason I thought schemas here was a list of strings (the schema names), and not a list of BaseRelation objects that act as schemas (by not defining identifier). If that's the case, the where clause produced below should return nothing (relation.identifier is "falsy" because of jinja things, upper(sch.nspname) = upper('{{ relation.schema }}') fails, I think because it becomes upper(sch.nspname) = upper('')).

The schemas parameter to this macro is indeed a list of strings. The loop below converts it to relations: List[obj] where the obj is acting like a BaseRelation (specifically a "schema relation") for the limited purpose of calling get_catalog_where_clause.

So when this macro is called with schemas = [ "schema_a", "schema_b", "schema_c" ], it will in turn call get_catalog_where_clause with relations = [ { "schema": "schema_a" }, { "schema": "schema_b" }, { "schema": "schema_c" } ].

This is just to make it easier to reuse get_catalog_where_clause.

Oh, does jinja treat relation.schema the same as relation.get("schema") or relation["schema"]?

mikealfare · 2023-09-26T21:28:59Z

plugins/postgres/dbt/include/postgres/macros/catalog.sql

-          upper(sch.nspname) = upper('{{ schema }}'){%- if not loop.last %} or {% endif -%}
-        {%- endfor -%}
-      )
+    {{ postgres__get_catalog_where_clause(relations) }}


If the where clause is not reused, I would keep it in line here.

I made it a separate macro because I needed it twice (old implementation an new implementation). However, I think I like your approach at the adapter level of simply rerouting the existing macro to the new macro, avoiding the need for maintaining two versions of the query.

mikealfare · 2023-09-26T21:29:55Z

core/dbt/adapters/base/impl.py

@@ -222,6 +223,8 @@ class BaseAdapter(metaclass=AdapterMeta):
        ConstraintType.foreign_key: ConstraintSupport.ENFORCED,
    }

+    CATALOG_BY_RELATION_SUPPORT = False


Putting a placeholder here to discuss "feature flag-like things".

mikealfare · 2023-09-26T21:46:06Z

core/dbt/adapters/base/impl.py

+        # databases
+        return info_schema_name_map
+
+    def _get_catalog_relations_by_info_schema(


This might be more readable with a defaultdict. Untested pseudo code:

from collections import defaultdict def _get_catalog_relations_by_info_schema(self, manifest): relations_by_info_schema = defaultdict(list) for relation in self._get_catalog_relations(manifext): relations_by_info_schema[relation.information_schema_only()].append(relation) return dict(relations_by_info_schema)

mikealfare · 2023-09-26T22:26:19Z

core/dbt/adapters/base/impl.py

-                futures.append(fut)
+            relation_count = len(self._get_catalog_relations(manifest))
+            if relation_count <= 100 and self.CATALOG_BY_RELATION_SUPPORT:
+                relations_by_schema = self._get_catalog_relations_by_info_schema(manifest)


A few observations:

We're calling self._get_catalog_relations() twice, once for the length and again inside of self._get_catalog_relations_by_info_schema

self._get_catalog_relations_by_info_schema is doing two things, getting the relations and reformatting the resulting list into a dictionary

The new method self._get_catalog_relations_by_info_schema is only used once, here

relations_by_schema is grouping by the information_schema, not relation.schema, which is misleading

relations_by_schema is created to filter a list of relations

Perhaps we could simplify this by reusing the results of self._get_catalog_relations(manifest) (let's refer to this as relations) and filtering relations by info_schema while looping through the info_schema values?

Some pseudo code to add context (and to organize my own thoughts):

def get_catalog(self, manifest): with executor(self.config) as tpe: futures = [] relations = self._get_catalog_relations(manifest) if self.CATALOG_BY_RELATION_SUPPORT and len(relations) <= 100: info_schemas = {relation.information_schema_only() for relation in relations} for info_schema in info_schemas: name = ".".join([str(info_schema.database), "information_schema"]) relations_in_this_info_schema = [ r for r in relations if r.information_schema_only() == info_schema ], fut = tpe.submit_connected( self, name, self._get_one_catalog_by_relations, info_schema, relations_in_this_info_schema, manifest, ) futures.append(fut) else: # old way

For this function, it may be worth reviewing the state of the code today if you still have concerns. Gerda refactored it a bit, and made changes along the lines you are suggesting.

mikealfare · 2023-09-26T22:37:59Z

core/dbt/adapters/base/impl.py

@@ -1093,20 +1114,57 @@ def _get_one_catalog(
        results = self._catalog_filter_table(table, manifest)  # type: ignore[arg-type]
        return results

+    def _get_one_catalog_by_relations(


What does the name of this macro mean? Put another way, how do I know what this should return? It looks like this runs the new macro, but also limits it to objects in the manifest; is that right? This only gets called once, and in the branch of code where we're passing in an explicit list of relations that we got from the manifest. Is the filter still necessary?

I think there's value in wrapping the macro itself in an adapter method. But if we do need to filter it for some reason, then we should do that after this is called.

…king

* Add new get_catalog_relations macro, allowing dbt to specify which relations in a schema the adapter should return data about * Implement postgres adapter support for relation filtering on catalog queries * Code review changes adding feature flag for catalog-by-relation-list support * Use profile specified in --profile with dbt init (#7450) * Use profile specified in --profile with dbt init * Update .changes/unreleased/Fixes-20230424-161642.yaml Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> * Refactor run() method into functions, replace exit() calls with exceptions * Update help text for profile option --------- Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> * add TestLargeEphemeralCompilation (#8376) * Fix a couple of issues in the postgres implementation of get_catalog_relations * Add relation count limit at which to fall back to batch retrieval * Better feature detection mechanism for adapters. * Code review changes to get_catalog_relations and adapter feature checking * Add changelog entry --------- Co-authored-by: ezraerb <ezraerb@alum.mit.edu> Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> Co-authored-by: Michelle Ark <MichelleArk@users.noreply.github.com>

Add new get_catalog_relations macro, allowing dbt to specify which re…

6b17896

…lations in a schema the adapter should return data about

cla-bot bot added the cla:yes label Sep 13, 2023

mikealfare reviewed Sep 14, 2023

View reviewed changes

core/dbt/adapters/base/impl.py Show resolved Hide resolved

core/dbt/adapters/base/impl.py Outdated Show resolved Hide resolved

core/dbt/include/global_project/macros/adapters/metadata.sql Outdated Show resolved Hide resolved

graciegoheen mentioned this pull request Sep 14, 2023

[CT-3121] [Implementation] get schema of inputs prior to executing unit test command #8649

Closed

1 task

peterallenwebb added 3 commits September 14, 2023 17:06

Implement postgres adapter support for relation filtering on catalog …

22edf72

…queries

Merge remote-tracking branch 'origin/main' into paw/get-catalog-relat…

604df7e

…ions

Code review changes adding feature flag for catalog-by-relation-list …

4489822

…support

peterallenwebb marked this pull request as ready for review September 19, 2023 17:57

peterallenwebb requested review from a team as code owners September 19, 2023 17:57

peterallenwebb requested review from Fleid and MichelleArk September 19, 2023 17:57

ezraerb and others added 5 commits September 25, 2023 09:22

add TestLargeEphemeralCompilation (#8376)

9b8e15d

Fix a couple of issues in the postgres implementation of get_catalog_…

aedc0bc

…relations

Add relation count limit at which to fall back to batch retrieval

38b356d

Merge remote-tracking branch 'origin/main' into paw/get-catalog-relat…

471d29d

…ions

gshank approved these changes Sep 26, 2023

View reviewed changes

mikealfare reviewed Sep 26, 2023

View reviewed changes

Better feature detection mechanism for adapters.

a9addc0

dataders mentioned this pull request Sep 29, 2023

[CT-1664] [Feature] Add a new UnsupportedException exception #6448

Open

peterallenwebb added 2 commits September 29, 2023 15:38

Code review changes to get_catalog_relations and adapter feature chec…

703faff

…king

Add changelog entry

d10e802

peterallenwebb merged commit c4f09b1 into main Sep 29, 2023
49 of 50 checks passed

peterallenwebb deleted the paw/get-catalog-relations branch September 29, 2023 20:13

dataders mentioned this pull request Oct 11, 2023

[ADAP-943] [ADAP-928] [Feature] Implement relation filtering on get_catalog macro dbt-labs/dbt-snowflake#801

Closed

3 tasks

tlento mentioned this pull request Nov 6, 2023

Update typing-extensions version to >=4.4 #9012

Merged

5 tasks

jeremyyeo mentioned this pull request Jan 17, 2024

[CT-3562] [Feature] Catalog queries filters for > 100 nodes #9394

Closed

2 tasks

dbeatty10 mentioned this pull request Feb 1, 2024

[SPIKE] Experiment with performance of catalog queries #9506

Open

1 task

tlento mentioned this pull request Feb 26, 2024

Update dbt-semantic-interfaces dependency to compatible range #9671

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new get_catalog_relations macro, Supporting Changes #8648

Add new get_catalog_relations macro, Supporting Changes #8648

peterallenwebb commented Sep 13, 2023 •

edited

Loading

github-actions bot commented Sep 13, 2023

codecov bot commented Sep 13, 2023 •

edited

Loading

mikealfare left a comment

gshank left a comment

mikealfare left a comment

mikealfare Sep 26, 2023

peterallenwebb Oct 6, 2023

mikealfare Sep 26, 2023

peterallenwebb Sep 27, 2023

mikealfare Sep 27, 2023

mikealfare Sep 26, 2023

mikealfare Sep 26, 2023

mikealfare Sep 26, 2023

mikealfare Sep 26, 2023

peterallenwebb Oct 6, 2023

mikealfare Sep 26, 2023



		{% macro postgres__get_catalog(information_schema, schemas) -%}

Add new get_catalog_relations macro, Supporting Changes #8648

Add new get_catalog_relations macro, Supporting Changes #8648

Conversation

peterallenwebb commented Sep 13, 2023 • edited Loading

Checklist

github-actions bot commented Sep 13, 2023

codecov bot commented Sep 13, 2023 • edited Loading

Codecov Report

mikealfare left a comment

Choose a reason for hiding this comment

gshank left a comment

Choose a reason for hiding this comment

mikealfare left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterallenwebb commented Sep 13, 2023 •

edited

Loading

codecov bot commented Sep 13, 2023 •

edited

Loading