Claires suggestions #2

clrcrl · 2020-02-14T19:49:38Z

Did these in separate commits so you can take or leave what you'd like!

Add a gitignore
- As soon as I tried to run dbt, I had a ton of files I didn't want to commit, so I added a .gitignore
Specify a project (database) for sources
- Adding the database: parameter (== project on BQ) means that I can query things in your project! However, it breaks dbt docs generate since I don't have permission to read your information schema:

$ dbt docs generate
Running with dbt=0.15.2
Found 6 models, 0 tests, 0 snapshots, 0 analyses, 138 macros, 0 operations, 0 seed files, 2 sources

14:48:19 | Concurrency: 8 threads (target='dev_bigquery')
14:48:19 |
14:48:20 | Done.
14:48:20 | Building catalog
Encountered an error:
Database Error
  Access Denied: Table fh-bigquery:INFORMATION_SCHEMA.SCHEMATA: User does not have permission to query table fh-bigquery:INFORMATION_SCHEMA.SCHEMATA.

Fix errant SQL
- Once I got this to run, I found this error.
Use simpler incremental logic
- This works as expected. Not sure if there's a reason you went the other way here!

fhoffa · 2020-02-14T23:21:50Z

reddit_aita/models/aita_comments.sql

@@ -6,12 +6,6 @@ FROM {{ source('reddit_comments', '20*') }}
 WHERE subreddit = 'AmItheAsshole'
 AND _table_suffix > '19_'

-{%- if is_incremental() -%}
-{%- if execute -%}


So I'm using this weird complex logic because one of these queries is much better for BigQuery:

SELECT * FROM * WHERE _table_suffix < '2019_01'

SELECT * FROM * WHERE _table_suffix < (SELECT date FROM ...)

One is a constant and will prune how much data is read. The other one is variable, and BigQuery doesn't optimize it as well.

I can get that as a constant by using run_query(). I have another option, but it doesn't work now (dbt-labs/dbt-core#2136)

clrcrl added 4 commits February 14, 2020 14:11

Add a gitignore

b973071

Specify a project (database) for sources

0b3ddc6

Fix errant SQL

5535e15

Use simpler incremental logic

10a7dd9

fhoffa reviewed Feb 14, 2020

View reviewed changes

fhoffa mentioned this pull request Mar 12, 2020

BQ clustering can improve merge performance dbt-labs/dbt-core#2196

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claires suggestions #2

Claires suggestions #2

clrcrl commented Feb 14, 2020 •

edited

Loading

fhoffa Feb 14, 2020

Claires suggestions #2

Are you sure you want to change the base?

Claires suggestions #2

Conversation

clrcrl commented Feb 14, 2020 • edited Loading

fhoffa Feb 14, 2020

Choose a reason for hiding this comment

clrcrl commented Feb 14, 2020 •

edited

Loading