Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove usage of __TABLES__ #1213

Closed
wants to merge 18 commits into from

Conversation

mikealfare
Copy link
Contributor

@mikealfare mikealfare commented May 1, 2024

resolves #113

rebase of #364

@mikealfare mikealfare self-assigned this May 1, 2024
@cla-bot cla-bot bot added the cla:yes label May 1, 2024
@dbeatty10
Copy link
Contributor

Key comments from description of #364 that we'll want to consider sufficiently prior to merging this PR.

Trade-offs and outstanding questions

The problem is that it doesn't contain row_count or size_bytes, so it is not a perfect drop-in replacement. This could be possibly overcome by using the INFORMATION_SCHEMA.TABLE_STORAGE view to get the number of rows + the size in logical bytes, but that has its own complications (see below).

  1. Is it okay to go from reporting the row count and size in bytes and then degrade to 0 for both?
    • Is there a way to make this work without an environment variable? e.g., is there a way for the adapter to know the relevant region(s)?
  2. Are there any region-specific effects that would could cause a table/view/external to not be reflected?
  3. Is this method slower, faster, or the same?

@amychen1776
Copy link

Given the tradeoffs, I'm quite weary of removing displaying row count and size given how many folks have found that to be helpful.

on materialized_views.table_catalog = tables.project_id
and materialized_views.table_schema = tables.dataset_id
and materialized_views.table_name = tables.table_id
partitions.total_logical_bytes as size_bytes,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbeatty10 @amychen1776 It looks like you're saying that we're not getting the size nor the row count. Are these two fields (partitions.total_logical_bytes and partitions.total_rows) coming through as 0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried out this PR, so I'm not sure how partitions.total_logical_bytes and partitions.total_rows are coming through. How are they coming through for you @mikealfare ?

It would be awesome if they successfully preserve non-null and non-zero values for size_bytes and total_rows!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we need an integration test or two. I know I worked on this when I thought this was the source of a bug a few weeks ago. It turned out to not be the case though, so I left it there. @jtcohen6 @amychen1776, would you consider prioritizing this for the upcoming sprint so that we could capture the value here? We would get two main things:

  • get off of __TABLES__, which is an undocumented dependency (we should be using TABLES
  • reduce permissions required to run dbt-bigquery

@mikealfare mikealfare requested a review from a team as a code owner May 21, 2024 16:18
@dbeatty10 dbeatty10 mentioned this pull request Jun 6, 2024
6 tasks
@mikealfare mikealfare closed this Nov 9, 2024
@mikealfare mikealfare deleted the dbeatty/using-information-schema-rebase branch November 9, 2024 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-162] Upgrade from the __tables__ construct to the information_schema.tables construct
5 participants