Skip to content
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.

DBT + Hamilton Example #236

Merged
merged 3 commits into from
Nov 27, 2022
Merged

DBT + Hamilton Example #236

merged 3 commits into from
Nov 27, 2022

Conversation

elijahbenizzy
Copy link
Collaborator

@elijahbenizzy elijahbenizzy commented Nov 26, 2022

Shows how Hamilton + DBT work together

Changes

Adds a dbt/ in the examples directory.

How I tested this

Ran locally.

Notes

Its simple, but a good start and will work for a write-up.

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

One task to train + infer, the other to load the data.
Some caveats due to the earliness of the dbt API.
@elijahbenizzy elijahbenizzy changed the title WIP for dbt example DBT + Hamilton Example Nov 27, 2022
@elijahbenizzy elijahbenizzy force-pushed the dbt-example branch 2 times, most recently from e1f6056 to 096132f Compare November 27, 2022 01:59
Copy link
Collaborator

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Will need to publish a new example docker image once we merge this.

examples/dbt/README.md Outdated Show resolved Hide resolved
Comment on lines 26 to 40
titanic_dag = driver.Driver(
{
"random_state": 5,
"test_size": 0.2,
"model_to_use": "create_new",
},
data_loader,
feature_transforms,
model_pipeline,
adapter=adapter,
)
# gather resutls
results = titanic_dag.execute(
final_vars=["model_predict"], inputs={"raw_passengers_df": raw_passengers_df}
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the inference set, is just all the data, i.e. the training set. Not a separate set or anything.

Copy link
Collaborator

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, was too :shipit: happy earlier -- a few corrections required given what is actually being returned/predicted over.

@@ -0,0 +1,44 @@
def model(dbt, session):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah so predict probably isn't the right name of this file.

We've organized the code into two separate DBT models:
1. [raw_passengers](models/raw_passengers.sql) This is a simple select and join using duckdb and DBT. Due to the simplicity of DBT -- its just as you would write if it were embedded within a python program, or you were executing SQL on your own!
It does, however, automatically get materialized.
2. [predict](models/predict.py)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the name of this.


- feature engineering to extract a test/train set
- train a model using the train set
- run inference over an inference set
Copy link
Collaborator

@skrawcz skrawcz Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- run inference over an inference set
- run inference over the entire data set...

@elijahbenizzy elijahbenizzy merged commit bea1e92 into main Nov 27, 2022
@elijahbenizzy elijahbenizzy deleted the dbt-example branch November 27, 2022 21:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants