The Semantic Model Generator
is an open-source tool used to generate and curate a semantic model for Snowflake Cortex Analyst.
The tool is rendered as a Streamlit application, which can be started in Streamlit in Snowflake or locally with open-source Streamlit. Setup instructions are separated below for the two methods - Please proceed to your preferred deployment method: SiS deployment or local deployment.
If you want to see what a semantic model looks like, skip to Examples.
- Table of Contents
- Streamlit in Snowflake Deployment
- Local Deployment
- Partner Tool Translation
- Usage
- Examples
- Release
Note: Deploying this app in Streamlit in Snowflake requires support for python 3.9+ in Streamlit in Snowflake. This feature is part of the 2024-08 BCR (Behavior Change Bundle) which is currently disabled by default. A user with ACCOUNTADMIN privileges can enable this feature in your account by running the following code in Snowsight:
SELECT SYSTEM$ENABLE_BEHAVIOR_CHANGE_BUNDLE('2024_08');
You can check to see if this BCR is already enabled by running:
SELECT SYSTEM$BEHAVIOR_CHANGE_BUNDLE_STATUS('2024_08');
A typical error you might see if you do not have this feature enabled looks like this:
For more information on this bundle please see our BCR Documentation.
Snowflake CLI is recommended for deploying the app in Streamlit in Snowflake. Please see Snowflake CLI installation instructions to install. Snowflake CLI version 3.0+ is recommended. Follow the below instructions to install the Semantic Model Generator in Streamlit in Snowflake.
If you do not have Snowflake CLI installed, the setup can be replicated manually with the VS Code Snowflake extension, Snowsight or Snowflake Native Git Integration:
- If using Snowsight, you may use the files upload wizard to upload files. Please pay close attention to maintain the directory structure referenced in
setup_sis/app_setup.sql
. - If using the Native Git Integration, copy and paste the code from this setup file and run in Snowsight.
- Configure Snowflake CLI
Follow instructions for your preferred connection method to connect Snowflake CLI to your Snowflake account. Please test your connection. Depending on your connection configuration, you may need to continue passing credentials for subsequent Snowflake CLI commands.
- Deploy app in Streamlit in Snowflake
Run the below command from the project root directory to create all necessary objects in Snowflake.
Hint: The owner of the app will be the role specified in your Snowflake CLI connection. To use a different role, append --role <DESIRED_ROLE>
to the end of the command, replacing <DESIRED_ROLE>
.
snow sql -f sis_setup/app_setup.sql
The app, titled Semantic Model Generator can be opened directly in Snowsight. Alternatively, you may run the below command in your terminal to open it.
snow streamlit get-url SEMANTIC_MODEL_GENERATOR --open --database cortex_analyst_semantics --schema semantic_model_generator
The Semantic Model Generator supports translating metadata from a Looker Explore. To add this functionality to the Streamlit in Snowflake app, we need to create an external access integration to allow Snowflake to reach your Looker instance.
Have the following information available:
<LOOKER_URL>
: Looker Base URL. Pass the domain excluding https:// such as snowflakedemo.looker.com.<CLIENT_SECRET>
: Looker Client Secret<APP_ROLE>
: the owning role used in step #2 above
If using the CLI method, run the below command to create the external access integration. Before running, replace the parameters.
Hint: Creating external access integrations may require increased privileges. To use a different role from what is specified in your Snowflake CLI connection, append --role <DESIRED_ROLE>
to the end of the command, replacing <DESIRED_ROLE>
.
snow sql -f sis_setup/looker_integration.sql -D "looker_url=<LOOKER_URL>" -D "client_secret=<CLIENT_SECRET>" -D "streamlit_role=<APP_ROLE>"
If using any other method (e.g., Git Integration), you will need to update and run looker_integration.sql in Snowsight with these variables replaced.
Local usage of the Streamlit app offers Snowflake connections via Snowflake connections.toml OR environment variables. The app will first check for a connections.toml before using environment variables. Please follow the setup for your desired method and continue to Start Local Streamlit App once completed.
The Snowflake Python connector lets you add connection definitions to a connections.toml configuration file. A connection definition refers to a collection of connection-related parameters. Snowflake Python libraries currently support TOML version 1.0.0.
Please follow these instructions to create the appropriate connections.toml file. For convenience, we recommend setting the desired connection as your default.
At minimum, the below environment variables are required if using this method to connect to Snowflake.
SNOWFLAKE_ROLE
SNOWFLAKE_WAREHOUSE
SNOWFLAKE_USER
SNOWFLAKE_ACCOUNT_LOCATOR
SNOWFLAKE_HOST
For your convenience, we have created examples of setting these environment variables in .env.example
. If using environment variables, we recommend creating a .env
file in the root directory of this repo. Please feel free to set your environment variables directly in terminal if preferred.
To find your Account locator, please execute the following sql command in your account.
SELECT CURRENT_ACCOUNT_LOCATOR();
B. To find the SNOWFLAKE_HOST
for your
account, follow these instructions. The easiest way to find your account URL is to click the Copy account URL
button from the Account panel in Snowsight:
However, if you have trouble authenticating with this URL, you can try building the URL manually:
- Currently we recommend you to look under the
Account locator (legacy)
method of connection for better compatibility on API. - It typically follows format of:
<accountlocator>.<region>.<cloud>.snowflakecomputing.com
. Ensure that you omit thehttps://
prefix.
Our semantic model generator currently support three types of authentication.
If no SNOWFLAKE_AUTHENTICATOR
environment variable is set, the default is snowflake
, which uses standard username/password support (#1 below).
- Username and Password
Set the below environment variables:
SNOWFLAKE_USER="<your-snowflake-username>"
SNOWFLAKE_PASSWORD="<your-snowflake-password>"
If you have MFA enabled, using this default authenticator should send a push notification to your device.
- Username/Password with MFA passcode
Using a passcode from your authenticator app:
SNOWFLAKE_AUTHENTICATOR="username_password_mfa"
SNOWFLAKE_PASSWORD="<your-snowflake-password>"
SNOWFLAKE_MFA_PASSCODE="<your-snowflake-mfa-passcode>" # if your authenticator app reads "123 456", fill in "123456" (No spaces)
Using a passcode embedded in the password:
SNOWFLAKE_AUTHENTICATOR="username_password_mfa"
SNOWFLAKE_PASSWORD="<your-snowflake-password>"
SNOWFLAKE_MFA_PASSCODE_IN_PASSWORD="true"
- Single Sign-On (SSO) with Okta
# no SNOWFLAKE_PASSWORD needed
SNOWFLAKE_AUTHENTICATOR="externalbrowser"
Once you have completed 1 of the 2 setup options above, you're ready to start the local Streamlit app.
- If you have Make on your machine, run the below command to install dependencies for the Streamlit app.
make setup_admin_app
This uses pip
to install dependencies from the root pyproject.toml
. Feel free to use conda
or any package manager
you prefer. Below is an alternative to the Make command.
pip install .
The generator supports merging data from your existing semantic models built with partners such as dbt and Looker. To use the Looker-specific features, please use the below command to install the extras. This is only required for looker translation.
pip install -e ".[looker]"
- You can run the app using the provided Makefile target if you have Make on your machine.
make run_admin_app
Alternatively, you can use your current version of Python to start the app manually. Please use Python 3.8-3.11.
python3 -m streamlit run app.py
We continue to add support for partner semantic and metric layers. Our aim is to expedite the creation of Cortex Analyst semantic files using logic and metadata from partner tools. Please see below for details about current partner support.
IMPORTANT: Instructions for each of the below sources are provided in the Streamlit application.
Source | Method | Requirements |
---|---|---|
DBT Semantic Model | We extract and translate metadata from semantic_models in uploaded DBT yaml file(s) and merge with a generated Cortex Analyst semantic file table-by-table. | DBT models and sources leading up to the semantic model layer(s) must be tables/views in Snowflake. |
DBT SQL Model | DBT SQL Models should be materialized in Snowflake with persist docs to capture comments. A new semantic file can be generated for these newly materialized tables/views in Snowflake directly. | |
Looker Explore | We materialize your Explore dataset in Looker as Snowflake table(s) and generate a Cortex Analyst semantic file. Metadata from your Explore fields can be merged with the generated Cortex Analyst semantic file. | Looker Views referenced in the Looker Explores must be tables/views in Snowflake. Looker SDK credentials are required. Visit Looker Authentication SDK Docs for more information. Install Looker's API Explorer extension from the Looker Marketplace to view API credentials directly. |
Due to context window as well as quality constraints, we currently limit the size of the generated semantic model to < 30,980 tokens (~123,920 characters).
Please note sample values and verified queries is not counted into this token length constraints. You can include as many sample values or verified queries as you'd like with limiting the overall file to <1MB.
If your snowflake tables and comments do not have comments, we currently leverages cortex LLM function to auto-generate description suggestions. Those generation are suffixed with '__' and additional comment to remind you to confirm/modity the descriptions.
IMPORTANT: After generation, your YAML files will have a series of lines with # <FILL-OUT>
. Please take the time
to fill these out with your business context, or else subsequent validation of your model will fail.
By default, the generated semantic model will contain all columns from the provided tables/views. However, it's highly encouraged to only keep relevant columns and drop any unwanted columns from the generated semantic model.
In addition, consider adding the following elements to your semantic model:
- Logical columns for a given table/view that are expressions over physical columns.
- Example:
col1 - col2
could be theexpr
for a logical column.
- Example:
- Synonyms. Any additional synonyms for column names.
- Filters. Additional filters with their relevant
expr
.
If you have an example table in your account with the following DDL statements.
CREATE TABLE sales.public.sd_data
(
id SERIAL PRIMARY KEY,
dt DATETIME,
cat VARCHAR(255),
loc VARCHAR(255),
cntry VARCHAR(255),
chn VARCHAR(50),
amt DECIMAL(10, 2),
unts INT,
cst DECIMAL(10, 2)
);
Here is an example semantic model, with data elements automatically generated from this repo and filled out by a user.
# Name and description of the semantic model.
name: Sales Data
description: This semantic model can be used for asking questions over the sales data.
# A semantic model can contain one or more tables.
tables:
# A logical table on top of the 'sd_data' base table.
- name: sales_data
description: A logical table capturing daily sales information across different store locations and product categories.
# The fully qualified name of the base table.
base_table:
database: sales
schema: public
table: sd_data
# Dimension columns in the logical table.
dimensions:
- name: product_category
synonyms:
- "item_category"
- "product_type"
description: The category of the product sold.
expr: cat
data_type: NUMBER
unique: false
sample_values:
- "501"
- "544"
- name: store_country
description: The country where the sale took place.
expr: cntry
data_type: TEXT
unique: false
sample_values:
- "USA"
- "GBR"
- name: sales_channel
synonyms:
- "channel"
- "distribution_channel"
description: The channel through which the sale was made.
expr: chn
data_type: TEXT
unique: false
sample_values:
- "FB"
- "GOOGLE"
# Time dimension columns in the logical table.
time_dimensions:
- name: sale_timestamp
synonyms:
- "time_of_sale"
- "transaction_time"
description: The time when the sale occurred. In UTC.
expr: dt
data_type: TIMESTAMP
unique: false
# Measure columns in the logical table.
measures:
- name: sales_amount
synonyms:
- "revenue"
- "total_sales"
description: The total amount of money generated from the sale.
expr: amt
data_type: NUMBER
default_aggregation: sum
- name: sales_tax
description: The sales tax paid for this sale.
expr: amt * 0.0975
data_type: NUMBER
default_aggregation: sum
- name: units_sold
synonyms:
- "quantity_sold"
- "number_of_units"
description: The number of units sold in the transaction.
expr: unts
data_type: NUMBER
default_aggregation: sum
- name: cost
description: The cost of the product sold.
expr: cst
data_type: NUMBER
default_aggregation: sum
- name: profit
synonyms:
- "earnings"
- "net income"
description: The profit generated from a sale.
expr: amt - cst
data_type: NUMBER
default_aggregation: sum
# A table can define commonly used filters over it. These filters can then be referenced in user questions directly.
filters:
- name: north_america
synonyms:
- "North America"
- "N.A."
- "NA"
description: "A filter to restrict only to north american countries"
expr: cntry IN ('canada', 'mexico', 'usa')
In order to push a new build and release, follow the steps below. Note, only admins are allowed to push release/v
tags.
You should follow the setup commands from usage-cli to install poetry and create your environment.
- Checkout a new branch from main. You should name this branch
release/vYYYY-MM-DD
. - Bump the poetry:
poetry version patch
- increments0.1.x
to0.1.(x+1)
poetry version minor
- increments0.x.0
to0.(x+1).0
poetry version major
- incrementsx.0.0
to(x+1).0.0
- Update the
CHANGELOG.md
adding a relevant header for your version number along with a description of the changes made. - Run
make build
to create a new .whl file. - Push your files for approval.
- After approval, run
make release
which will cut a new release and attach the .whl file. - Merge in your pr.
- Note: If you
make release
does not trigger the GH action. Please delete the tag and push again.