This respository contains real working examples of Twirl code, to show what the developer experience is like and suggest patterns for working with data inside an orchestrator like Twirl. The examples are set up as a single working Twirl project, showing that multiple work streams can co-exist together in one place.
This example uses Github pull request data to make predictions about how quickly a pull request will close, demonstrating how to train a model on data living in BigQuery, store the model results in GCS, and use the most recently trained model to make a prediction for new requests.
-
Tag:
github_predictions
-
Included assets:
❯ twirl list @github_predictions
bigquery/raw_github/pull_requests
bigquery/clean/pull_requests
bigquery/github_models/pr_closing_time_outcomes
bigquery/github_models/pull_request_features
bigquery/github_models/repository_features
bigquery/github_models/user_features
gcs/github_models/pr_closing_time_models
bigquery/github_models/pr_closing_time_model_stats
bigquery/github_models/pr_closing_time_predictions
- Visualization:
- Features demonstrated:
- Machine learning
- File collections
- Schemas
- Job specific CPU/Mem resources
This example shows a realistic example of combining dbt modeling and Python based machine learning with Prophet to forecast the performance of various ecommerce products based on the recent past. The example begins by pulling data from a postgres database, showing how to use Twirl state to do so easily.
-
Tag:
ecommerce
-
Included assets:
❯ twirl list @ecommerce
bigquery/raw_ecommerce/events
bigquery/raw_ecommerce/pageviews
bigquery/raw_ecommerce/products
bigquery/raw_ecommerce/purchases
bigquery/raw_ecommerce/users
stg_ecommerce__events
stg_ecommerce__pageviews
stg_ecommerce__products
stg_ecommerce__purchases
stg_ecommerce__users
dim_customers
dim_products
fct_orders
fct_product_performance
- Visualization:
- Features demonstrated:
- Data ingestion from Postgres
- dbt integration
- Schemas
- Job State
- Append update method
- Merge update method
This example leverages Python and Dataflow to process a series of contracts saved as PDF files, generate embeddings for every page of the files and scan them for any mention of GDPR along with page number for where those references occurred, showcasing how Twirl can be used in a legal tech setting.
-
Tag:
legal
-
Included assets:
❯ twirl list @legal
gcs/contracts
bigquery/contracts/contract_text
bigquery/contracts/contract_embedding
bigquery/contracts/gdpr_data
- Visualization:
- Features demonstrated:
- Job specific requirements.txt
- Combining BigQuery and GCS to process and structure non-tabular data
- Using Google Cloud Dataflow for tasks that require more distributed processing power
Please explore the bigquery
, dbt
, and gcs
directories to see example code. Feel free to also check out the project_config.py
, detailing Twirl's configuration. Note how little is needed for the dbt integration!