diff --git a/README.md b/README.md index 83aa4f7e5bc..4dfd8a8be9e 100644 --- a/README.md +++ b/README.md @@ -56,9 +56,9 @@ You can click a link available in a netlify bot PR comment to see and review you 2. Clone this repo: `git clone https://github.com/dbt-labs/docs.getdbt.com.git` 3. `cd` into the repo: `cd docs.getdbt.com` 4. `cd` into the `website` subdirectory: `cd website` -5. Install the required node packages: `npm install` (optional — install any updates) -6. Build the website: `npm start` -7. Before pushing your changes to a branch, check that all links work by using the `make build` script. +5. Install the required node packages: `make install` or `npm install` (optional — install any updates) +6. Build the website: `make run` or `npm start` +7. Before pushing your changes to a branch, run `make build` or `npm run build` and check that all links work Advisory: - If you run into an `fatal error: 'vips/vips8' file not found` error when you run `npm install`, you may need to run `brew install vips`. Warning: this one will take a while -- go ahead and grab some coffee! diff --git a/website/Makefile b/website/Makefile new file mode 100644 index 00000000000..250b23e35bb --- /dev/null +++ b/website/Makefile @@ -0,0 +1,10 @@ +.PHONY: run install build + +run: + npm start + +install: + npm install + +build: + DOCS_ENV=build npm run build diff --git a/website/blog/2022-05-03-making-dbt-cloud-api-calls-using-dbt-cloud-cli.md b/website/blog/2022-05-03-making-dbt-cloud-api-calls-using-dbt-cloud-cli.md index 5f293ac077b..91ad1080ce6 100644 --- a/website/blog/2022-05-03-making-dbt-cloud-api-calls-using-dbt-cloud-cli.md +++ b/website/blog/2022-05-03-making-dbt-cloud-api-calls-using-dbt-cloud-cli.md @@ -109,7 +109,7 @@ After the initial release I started to expand to cover the rest of the dbt Cloud In this example we’ll download a `catalog.json` artifact from the latest run of a dbt Cloud job using `dbt-cloud run list` and `dbt-cloud get-artifact` and then write a simple Data Catalog CLI application using the same tools that are used in `dbt-cloud-cli` (i.e., `click` and `pydantic`). Let’s dive right in! -The first command we need is the `dbt-cloud run list` which uses an [API endpoint](https://docs.getdbt.com/dbt-cloud/api-v2#/operations/List%20Runs) that returns runs sorted by creation date, with the most recent run appearing first. The command returns a JSON response that has one top-level attribute `data` that contains a list of runs. We’ll need to extract the `id` attribute of the first one and to do that we use [jq](https://stedolan.github.io/jq/): +The first command we need is the `dbt-cloud run list` which uses an [API endpoint](https://docs.getdbt.com/dbt-cloud/api-v2-legacy#/operations/List%20Runs) that returns runs sorted by creation date, with the most recent run appearing first. The command returns a JSON response that has one top-level attribute `data` that contains a list of runs. We’ll need to extract the `id` attribute of the first one and to do that we use [jq](https://stedolan.github.io/jq/): ``` latest_run_id=$(dbt-cloud run list --job-id $DBT_CLOUD_JOB_ID | jq .data[0].id -r) diff --git a/website/blog/2023-07-03-data-vault-2-0-with-dbt-cloud.md b/website/blog/2023-07-03-data-vault-2-0-with-dbt-cloud.md new file mode 100644 index 00000000000..e1351034f66 --- /dev/null +++ b/website/blog/2023-07-03-data-vault-2-0-with-dbt-cloud.md @@ -0,0 +1,150 @@ +--- +title: "Data Vault 2.0 with dbt Cloud" +description: "When to use, and when not to use Data Vault 2.0 data modeling, and why dbt Cloud is a great choice" +slug: data-vault-with-dbt-cloud + +authors: [rastislav_zdechovan, sean_mcintyre] + +tags: [analytics craft, data ecosystem] +hide_table_of_contents: true + +date: 2023-07-03 +is_featured: true +--- + +Data Vault 2.0 is a data modeling technique designed to help scale large data warehousing projects. It is a rigid, prescriptive system detailed vigorously in [a book](https://www.amazon.com/Building-Scalable-Data-Warehouse-Vault/dp/0128025107) that has become the bible for this technique. + +So why Data Vault? Have you experienced a data warehousing project with 50+ data sources, with 25+ data developers working on the same data platform, or data spanning 5+ years with two or more generations of source systems? If not, it might be hard to initially understand the benefits of Data Vault, and maybe [Kimball modelling](https://docs.getdbt.com/blog/kimball-dimensional-model) is better for you. But if you are in _any_ of the situations listed, then this is the article for you! + + + +Here’s an analogy to help illustrate Data Vault: + +Think of a city’s water supply. Each house does not have a pipe directly from the local river: there is a dam and a reservoir to collect water for the city from all of the sources – the lakes, streams, creeks, and glaciers – before the water is redirected into each neighborhood and finally into each home’s taps. + +A new development in the city? No problem! Just hook up the new pipes to the reservoir! Not enough water? Just find another water source and fill up the reservoir. + +Data Vault is the dam and reservoir: it is the well-engineered data model to structure an organization’s data from source systems for use by downstream data projects – rather than each team collecting data straight from the source. The Data Vault data model is designed using a few well-applied principles, and in practice, pools source data so it is available for use by all downstream consumers. This promotes a scalable data warehouse through reusability and modularity. + + + +## Data Vault components + +Loading your data directly from source systems without applying any business rules implies that you want them stored in a so-called **Raw Vault**. This is most of the time the first step in the journey of transforming your data. There are situations where you’d want to apply business logic before loading the data into your presentation layer, that’s where **Business Vault **comes into play. Performance enhancement or centralized business logic are a few of the reasons for doing so. + +The core components of Data Vault are hubs, links, and satellites. They allow for more flexibility and extensibility and can be used to model complex processes in an agile way. + +Here is what you need to know about the main components: + +* **Hubs**: A hub is the central repository of all business keys identifying the same business entity. By separating data into hubs, we ensure each piece of business concept is as accurate and consistent as possible while avoiding redundancy and ensuring referential integrity; +* **Links**: Links connect your hubs in Data Vault. The relationship is stored as data, which makes it auditable and flexible to change. There are several special types of links, but in most cases, links are bidirectional, meaning you can easily navigate back and forth between business entities. This allows you to analyze complex relationships via connections created by hubs and links in your data model; +* **Satellites**: Satellites store contextual, descriptive, and historical information about the hubs and links they are attached to, depending on whether the data is related to a business object or a relationship. Each satellite in the Data Vault provides additional, valuable information about the main entity. + +You can think of these Raw Vault components as LEGO bricks: they are modular and you can combine them in many different ways to build a wide variety of different, cohesive structures. + +Given its modular structure that requires many joins to get the specific information, Data Vault is not intended as a final, presentation layer in your data warehouse. Instead, due to the wide variety of use cases, the framework works brilliantly as the middle, integration layer of your business, serving any form of presentation layer you might have, such as wide tables, star schema, feature stores, you name it. + +To further accelerate the creation of these layers and prevent the repetition of the same business logic, you can make use of Business Vault as a complementary layer of your data warehouse. + +The Business Vault is there to fill the gaps of generic, source-data-generated Raw Vault, which often does not cover all of the business processes of your organization. You can easily address such challenges by applying soft rules applied in this. + +Business Vault can also help with performance issues that can arise due to presentation layer transformations having to do lots of joins on the fly. In such scenarios, a business vault becomes a central piece of your business logic populating all of the information marts. + +### Should you consider Data Vault for your data warehouse? + +Data Vault is a powerful modelling technique for middle-to-enterprise level data warehouses with the following attributes: + +* Integration of multiple dynamic source systems; +* Long-term project with agile delivery requirements; +* Auditibilty and compliance needs; +* Preference for template based project allowing automation needs; +* High flexibility of the data model with minimal reengineering; +* Load performance is important, parallel loading is a must. + +Due to its complexity, Data Vault is not a go-to choice for: + +* Simple and constant systems; +* Quick one-off solutions for experiments or short-term data warehouse projects; +* Data warehouse layers needed for direct reporting. + +## dbt Cloud: the operating system for Data Vault + +There are many tools that can be used to implement your Data Vault project but dbt Cloud with its rich set of features provides the functionalities that make the difference by accelerating your project end to end, saving you the trouble of jumping from one tool to another. + +Let’s take a look at the most impactful features and explore how you can leverage them when implementing your Data Vault project. + +### Scalable schema + +Don’t Repeat Yourself (DRY) software engineering principles can help you sleep better when you are dealing with complex projects, which Data Vault most often is. + +dbt's [**macros**](https://docs.getdbt.com/docs/build/jinja-macros) feature is a lifesaver in terms of templating your code. It saves you headaches due to manual errors as well as defining transformation logic in one place in case you need to change it. + +Data Vault follows the insert-only principle with incremental loading strategy. A built-in [**Jinja**](https://docs.getdbt.com/docs/build/jinja-macros) functionality allows you to create one version of the dbt model for both incremental and full load of a table. The easy dependency management that this feature helps you achieve is a huge benefit for highly complex projects. + +If you are new to the framework, taking a look at already built Data Vault macros can be crucial, and even if you are an expert, it can still be beneficial. dbt’s rich set of community [**packages**](https://docs.getdbt.com/docs/build/packages) can be directly applied to your project or used as an inspiration for your own transformation templates. + +Building your transformation templates leveraging reusable macros and flexible Jinja language can enhance your project development in a scalable way. When things get more complex, you are able to go back and change your templates in one place either completely, or using parameters to ensure you don’t mess with what already works well. + +If you are someone who has practiced Data Vault data modeling in another tool, you might appreciate the dbt [**model contracts**](https://docs.getdbt.com/docs/collaborate/govern/model-contracts) as a way to guarantee to your data end-users the exact shape of a dbt transformation. This is a similar practice to writing DDL. + +Scalability also happens at the database layer. With [**materializations**](https://docs.getdbt.com/docs/build/materializations), you have fine-grained control over whether a database object built by dbt is persisted as a view, table, or built incrementally, which gives you control over the performance and cost characteristics of each transformation. So if your data platform bill is growing, it’s easy to identify which Data Vault components are the most expensive and make optimizations to reduce cost. + +With the active dbt open source community, there is a good chance you are facing a problem which was already solved by someone else. There are plenty of amazing packages available in the dbt [package hub](https://hub.getdbt.com/), which you can utilise to speed up your development even further. + +### Agile development + +dbt Cloud includes **built-in Git** with accessible features directly from its IDE, which simplifies development immensely. Once a developer is happy with their additions or changes to the Data Vault codebase, they can commit the code within the IDE and open a Pull Request, triggering a code review process. Then, with [continuous integration with dbt Cloud](https://docs.getdbt.com/docs/deploy/continuous-integration), automated checks are run to ensure data quality standards and Data Vault conventions are met, automatically preventing any bad changes from reaching production. + +The biggest boon to Data Vault developer productivity in dbt Cloud are the **DataOps** and **Data Warehouse Automation** features of dbt Cloud. Each Data Vault developer gets their own development environment to work in and there is no complicated set up process to go through. + +Commit your work, create a pull request, and have automated code review enabled by dbt Cloud [**jobs**](https://docs.getdbt.com/docs/deploy/dbt-cloud-job) that can be defined for each environment separately (e.g., testing, QA, production). Together with dbt [**tags**](https://docs.getdbt.com/reference/resource-configs/tags), the feature allows you to orchestrate your project in an efficient and powerful way. + +### Auditable data + +One of the main selling points of Data Vault is its auditability. In addition to its own capabilities, dbt Cloud features enhance this advantage even further. Each job execution leaves an [**audit log**](https://docs.getdbt.com/docs/cloud/manage-access/audit-log), which can be leveraged to analyze trends in job performance among other things, allowing you to identify bottlenecks in your system. dbt Cloud stores [**artifact**](https://docs.getdbt.com/docs/deploy/artifacts) files after each execution for further processing and analysis as well, and exposes them programmatically via the [Discovery API](https://www.getdbt.com/blog/introducing-the-discovery-api/). + +dbt has the built-in **data lineage **which helps both developers and data consumers understand just how the data assets in the data warehouse are created. And with the self-serve and automatically generated [**dbt docs**](https://docs.getdbt.com/reference/commands/cmd-docs), you can spend less time answering questions about your data from across the organization and more time building your Data Vault. + +Last but not least, the built-in [**dbt testing framework**](https://docs.getdbt.com/guides/dbt-ecosystem/dbt-python-snowpark/13-testing) allows Data Vault developers to test their assumptions about the data in their database. Not only are primary key checks and foreign key checks easy to add and simple to run, but more complex checks like integer range checks, anomaly detection, and highly sophisticated data quality checks are also possible expressed as SQL statements. Infinite Lambda have created two dbt packages for data quality, [dq_tools](https://hub.getdbt.com/infinitelambda/dq_tools/latest/) and [dq_vault](https://hub.getdbt.com/infinitelambda/dq_vault/latest/), which are described later in this post. + +## How to get started with dbt Cloud and Data Vault + +There are many decisions to make before you roll up your sleeves and start implementing your Data Vault data warehouse. Apart from data modelling work, you need to agree on naming conventions, hash algorithm, staging strategy, and data types for standard metadata attributes, and make sure these are all well documented. Here, to save yourself some headaches in the long run, we recommend starting your own **decision log**. + +In terms of the implementation of the Data Vault itself, we recommend familiarizing yourself with the best practices well in advance, especially if you have no previous experience with the framework. There are two well-known dbt packages focusing on Data Vault implementation, which you can take inspiration from to build your own templating system, or there can be used directly if they fit your use case. + +### AutomateDV (formerly known as dbtvault) + +AutomateDV is the most popular open source Data Vault package for dbt, with some users having over 5000 Data Vault components in their project. Here in Infinite Lambda, we’ve been using this package for quite some time now, even building on top of it (depending on the specifics of the project). This mature system provides a great way to start your Data Vault with dbt Cloud journey as the learning curve is quite manageable, it is well documented and even comes with tutorials and working examples built on top of Snowflake’s TPCH standard dataset. There is one limitation to using the package and that is _AutomateDV _expects your source data to contain only one delta load. In order to work around this issue, owners of the package came up with custom dbt materializations to help you with the initial load of your system, however, the performance of such load is in our experience not acceptable. + +### datavault4dbt + +At first glance, this fairly new open source package works in a similar fashion, especially since the usage of the macros provides the same experience (apart from the names of some of the parameters). Diving deeper into documentation, however, we can see it provides a higher level of customization thanks to many global variables, which alters the behavior of macros. It also supports any type of source data - CDC, transient or persistent, it can handle it all. We suggest looking into this package if you have a deeper understanding of Data Vault and need a complex, customizable system. It’s good to be aware of the fact that this package is new, so there is a risk of hidden unresolved issues. + +### Customizing the existing packages + +These two packages, AutomateDV, and datavault4dbt, are the most popular approaches to building a Data Vault on dbt. However, sometimes these packages don’t quite match an organization’s pre-existing Data Vault practices built with a different tool. At the surface, dbt looks quite simple, but deep down is extremely customizable: it’s possible to make minor modifications to the packages within your project using Jinja, which is a powerful templating language. + +For example, some organizations choose different hashing algorithms to generate their Data Vault hash keys than what comes out-of-the-box with AutomateDV. So to change that, you can add a [dbt macro](https://docs.getdbt.com/docs/build/jinja-macros#macros) called [default__hash_alg_md5](https://github.com/Datavault-UK/automate-dv/blob/3db7cc285e110ae6976d0afe7a93adf9b776b449/macros/supporting/hash_components/select_hash_alg.sql#L32C1-L36) to your project with the custom logic you want. Much of the package logic can be overridden in this way to suit your needs. + +### Build your own system + +Every project is different and needs its own set of features, special treatments tailored to your data, or performance tuning mechanisms. Because of this, for any long term, high priority data warehouse solutions we at [Infinite Lambda](https://infinitelambda.com/) recommend working on your own templating system. It needs significant engineering effort before an actual implementation (and bug fixing during), but you’ll save time later thanks to knowing where to look for a potential issue. If you are not comfortable creating such a system from scratch, you can always start with one of the above open-source packages and build on them once you hit its limits. + +We at Infinite Lambda treat data quality very seriously and we push for high test coverage as well as overall data governance in every project. With the experience from multiple projects, we developed two data quality dbt packages, which can help business users raise trust in your data. + +Within the [dq_tools](https://hub.getdbt.com/infinitelambda/dq_tools/latest/) _package, we aim for simple storing test results and visualization of these in a BI dashboard. Leveraging this tool can help with making sure your system behaves in an expected way, all in a visual format of dashboard built on your favorite BI tool. [dq_vault](https://hub.getdbt.com/infinitelambda/dq_vault/latest/) package provides an overview of data quality for all Data Vault models in your dbt project. Complex as it is, Data Vault projects need detailed test coverage to make sure there are no holes in the system. This tool helps with governing your testing strategy and being able to identify issues very quickly. + +To help you get started, [we have created a template GitHub project](https://github.com/infinitelambda/dbt-data-vault-template) you can utilize to understand the basic principles of building Data Vault with dbt Cloud using one of the abovementioned packages. But if you need help building your Data Vault, get in touch. + + + +### Entity Relation Diagrams (ERDs) and dbt + +Data lineage is dbt's strength, but sometimes it's not enough to help you to understand the relationships between Data Vault components like a classic ERD would. There are a few open source packages to visualize the entities in your Data Vault built with dbt. I recommend checking out the [dbterd](https://dbterd.datnguyen.de/1.2/index.html) which turns your [dbt relationship data quality checks](https://docs.getdbt.com/docs/build/tests#generic-tests) into an ERD. + +## Summary + +By leveraging the building blocks of Data Vault, organizations can build data warehouses that are adaptable to changing business requirements, promote data quality and integrity, and enable efficient data management and analytics. This in turn drives better decision-making, competitive advantage and business growth. + +Choosing the right methodology for building your data warehouse is crucial for your system’s capabilities in the long run. If you are exploring Data Vault and want to learn more, Infinite Lambda can help you make the right call for your organization. diff --git a/website/blog/authors.yml b/website/blog/authors.yml index 76ccf5d77a8..72e747cc577 100644 --- a/website/blog/authors.yml +++ b/website/blog/authors.yml @@ -373,6 +373,15 @@ pat_kearns: name: Pat Kearns organization: dbt Labs +rastislav_zdechovan: + image_url: /img/blog/authors/rastislav-zdechovan.png + job_title: Analytics Engineer + links: + - icon: fa-linkedin + url: https://www.linkedin.com/in/rastislav-zdechovan/ + name: Rastislav Zdechovan + organization: Infinite Lambda + ross_turk: image_url: /img/blog/authors/ross-turk.png job_title: VP Marketing diff --git a/website/dbt-versions.js b/website/dbt-versions.js index f611e17b650..52b68d6c084 100644 --- a/website/dbt-versions.js +++ b/website/dbt-versions.js @@ -31,6 +31,14 @@ exports.versions = [ ] exports.versionedPages = [ + { + "page": "reference/resource-properties/deprecation_date", + "firstVersion": "1.6", + }, + { + "page": "reference/commands/retry", + "firstVersion": "1.6", + }, { "page": "docs/build/groups", "firstVersion": "1.5", @@ -115,11 +123,67 @@ exports.versionedPages = [ "page": "reference/dbt-jinja-functions/print", "firstVersion": "1.1", }, + { + "page": "docs/build/build-metrics-intro", + "firstVersion": "1.6", + }, + { + "page": "docs/build/sl-getting-started", + "firstVersion": "1.6", + }, + { + "page": "docs/build/about-metricflow", + "firstVersion": "1.6", + }, + { + "page": "docs/build/join-logic", + "firstVersion": "1.6", + }, + { + "page": "docs/build/validation", + "firstVersion": "1.6", + }, + { + "page": "docs/build/semantic-models", + "firstVersion": "1.6", + }, + { + "page": "docs/build/group-by", + "firstVersion": "1.6", + }, + { + "page": "docs/build/entities", + "firstVersion": "1.6", + }, + { + "page": "docs/build/metrics-overview", + "firstVersion": "1.6", + }, + { + "page": "docs/build/cumulative", + "firstVersion": "1.6", + }, + { + "page": "docs/build/derived", + "firstVersion": "1.6", + }, + { + "page": "docs/build/measure-proxy", + "firstVersion": "1.6", + }, + { + "page": "docs/build/ratio", + "firstVersion": "1.6", + }, ] exports.versionedCategories = [ { "category": "Model governance", "firstVersion": "1.5", + }, + { + "category": "Build your metrics", + "firstVersion": "1.6", } ] diff --git a/website/docs/community/resources/code-of-conduct.md b/website/docs/community/resources/code-of-conduct.md index 6788f3ae39f..22159b36cc9 100644 --- a/website/docs/community/resources/code-of-conduct.md +++ b/website/docs/community/resources/code-of-conduct.md @@ -1,22 +1,16 @@ --- title: "Code of Conduct" id: "code-of-conduct" +description: "Learn about the community values that shape our rules, and review our anti-harassment policy." --- # dbt Community Code of Conduct -dbt has a supportive, active community of thousands of smart, kind, and helpful people who share a commitment to elevating the analytics profession. +This Code of Conduct applies to all dbt Community spaces, both online and offline. This includes Slack, Discourse, code repositories (dbt Core, dbt packages, etc.), Office Hours, and Meetups. Participants are responsible for knowing and abiding by this Code of Conduct. -You can get involved in the dbt community by connecting at [events](/community/events), getting or giving help in any of our many channels, contributing to dbt or a dbt package, and many other ways. - -People genuinely love this community, and we are committed to maintaining the spirit of it. As such have written this Code of Conduct to help all participants understand how to best participate in our community. - -The Code of Conduct applies to all dbt Community spaces both online and off. This includes: Slack, Discourse, code repositories (dbt Core, dbt packages etc), Office Hours and Meetups. There are some guidelines specific to particular forums (listed below). Participants are responsible for knowing and abiding by this Code of Conduct. - -This Code of Conduct has three sections: +This Code of Conduct has two sections: - **dbt Community Values:** These values apply to all of our community spaces, and all of our guidelines are based on these values. -- **Forum-specific guidelines**: These guidelines explain some of the cultural norms that apply to specific forums. - **Anti-harassment policy:** We are dedicated to providing a harassment-free experience for everyone in our community — here, we outline exactly what that means. We appreciate your support in continuing to build a community we’re all proud of. @@ -24,19 +18,16 @@ We appreciate your support in continuing to build a community we’re all proud — The dbt Community Admin Team. ## dbt Community Values +### Create more value than you capture. -### Be respectful. - -We want everyone to have a fulfilling and positive experience in the dbt Community and we are continuously grateful in your help ensuring that this is the case. - -Be courteous, respectful, and polite to fellow community members. Generally, don’t be a jerk. - -Be considerate of others’ time — many people in the community generously give their time for free. +Each community member should strive to create more value in the community than they capture. This is foundational to being a community. Ways to demonstrate this value: -- Take the time to write bug reports well ([example](https://github.com/fishtown-analytics/dbt/issues/2370)) -- Thank people if they help solve a problem. +- [Coding contributions](/community/contributing/contributing-coding): Contribute to dbt Core, a package, or an adapter. Beyond implementing new functionality, you can also open issues or participate in discussions. +- [Writing contributions](/community/contributing/contributing-writing): You can suggest edits to every page of the dbt documentation, or suggest a topic for the dbt Developer Blog. +- [Join in online](/community/contributing/contributing-online-community): Ask and answer questions on the Discourse forum, kick off a lively discussion in Slack, or even maintain a Slack channel of your own. +- [Participate in events](/community/contributing/contributing-realtime-events): Organise a community Meetup, speak at an event, or provide office space/sponsorship for an existing event. ### Be you. @@ -44,7 +35,8 @@ Some developer communities allow and even encourage anonymity — we prefer it w Ways to demonstrate this value: -- Update your profile on any dbt Community forums to include your name, and a clear picture. On Slack, use the “what I do” section to add your role title and current company +- Update your profile on dbt Community platforms to include your name and a clear picture of yourself. Where available, use the "what I do" section to add your role, title and current company. +- Join your `#local-` channel in Slack, or if it doesn't exist then propose a new one. - Write in your own voice, and offer your own advice, rather than speaking in your company’s marketing or support voice. ### Encourage diversity and participation. @@ -57,73 +49,19 @@ Ways to demonstrate this value: - Demonstrate empathy for a community member’s experience — not everyone comes from the same career background, so adjust answers accordingly. - If you are sourcing speakers for events, put in additional effort to find speakers from underrepresented groups. -### Create more value than you capture. - -Each community member should strive to create more value in the community than they capture. This is foundational to being a community. - -Ways to demonstrate this value: - -- Contribute to dbt or a dbt package -- Participate in discussions on Slack and Discourse -- Share things you have learned on Discourse -- Host events - -Be mindful that others may not want their image or name on social media, and when attending or hosting an in-person event, ask permission prior to posting about another person. - ### Be curious. -Always ask yourself “why?” and strive to be continually learning. +Always ask yourself "why?" and strive to be continually learning. Ways to demonstrate this value: -- Try solving a problem yourself before asking for help, e.g. rather than asking “what happens when I do X”, experiment and observe the results! -- When asking questions, explain the “why” behind your decisions, e.g. “I’m trying to solve X problem, by writing Y code. I’m getting Z problem” -- When helping someone else, explain why you chose that solution, or if no solution exists, elaborate on the reason for that, e.g. “That’s not possible in dbt today — but here’s a workaround / check out this GitHub issue for a relevant discussion” - -## Guidelines - -### Participating in Slack - -dbt Slack is where the dbt community hangs out, discusses issues, and troubleshoots problems together. It is not a support service — please do not treat it like one. - -We also have a number of cultural norms in our Slack community. You must read and agree to the rules before joining Slack, but you can also find them [here](/community/resources/slack-rules-of-the-road/). - -As a short summary: - -- [Rule 1: Be respectful](/community/resources/slack-rules-of-the-road/#rule-1-be-respectful) -- [Rule 2: Use the right channel](/community/resources/slack-rules-of-the-road/#rule-2-use-the-right-channel) -- [Rule 3: Put effort into your question](/community/resources/slack-rules-of-the-road/#rule-3-put-effort-into-your-question) -- [Rule 4: Do not double-post](/community/resources/slack-rules-of-the-road/#rule-4-do-not-double-post) -- [Rule 5: Keep it in public channels](/community/resources/slack-rules-of-the-road/#rule-5-keep-it-in-public-channels) -- [Rule 6: Do not solicit members of our Slack](/community/resources/slack-rules-of-the-road/#rule-6-do-not-solicit-members-of-our-slack) -- [Rule 7: Do not demand attention with @channel and @here, or by tagging individuals](/community/resources/slack-rules-of-the-road/#rule-7-do-not-demand-attention-with-channel-and-here-or-by-tagging-individuals) -- [Rule 8: Use threads](/community/resources/slack-rules-of-the-road/#rule-8-use-threads) - -### Vendor guidelines - -If you are a vendor (i.e. you represent an organization that sells a product or service relevant to our community), then there are additional guidelines you should be aware of. - -Most importantly — do not solicit members of our community as lead generation. You can find the rest of these [here](/community/resources/vendor-guidelines). - -### Guideline violations — 3 strikes method - -The point of our guidelines is not to find opportunities to punish people, but we do need a fair way to deal with people who do harm to our community. Violations related to our anti-harassment policy (below) will be addressed immediately and are not subject to 3 strikes. - -1. First occurrence: We’ll give you a friendly, but public, reminder that the behavior is inappropriate according to our guidelines. -2. Second occurrence: We’ll send you a private message with a warning that any additional violations will result in removal from the community. -3. Third occurrence: Depending on the violation, we might need to delete or ban your account. - -Notes: - -- Obvious spammers are banned on first occurrence. -- Participation in the dbt Community is a privilege — we reserve the right to remove people from the community. -- Violations are forgiven after 6 months of good behavior, and we won’t hold a grudge. -- People who are committing minor formatting / style infractions will get some education, rather than hammering them in the 3 strikes process. -- Contact conduct@getdbt.com to report abuse or appeal violations. In the case of appeals, we know that mistakes happen, and we’ll work with you to come up with a fair solution if there has been a misunderstanding. +- Try solving a problem yourself before asking for help, e.g. rather than asking "what happens when I do X", experiment and observe the results! +- When asking questions, explain the "why" behind your decisions, e.g. "I’m trying to solve X problem, by writing Y code. I’m getting Z problem" +- When helping someone else, explain why you chose that solution, or if no solution exists, elaborate on the reason for that, e.g. "That’s not possible in dbt today — but here’s a workaround / check out this GitHub issue for a relevant discussion" ## Anti-harassment policy -Further to our guidelines for participating in the community in a positive manner, we are also dedicated to providing a harassment-free experience for everyone. We do not tolerate harassment of participants in any form. +We are dedicated to providing a harassment-free experience for everyone. We do not tolerate harassment of participants in any form. Harassment includes: @@ -131,7 +69,7 @@ Harassment includes: - Unwelcome comments regarding a person’s lifestyle choices and practices, including those related to food, health, parenting, drugs, and employment. - Deliberate misgendering or use of ‘dead’ or rejected names. - Gratuitous or off-topic sexual images or behaviour in spaces where they’re not appropriate. -- Physical contact and simulated physical contact (eg, textual descriptions like “*hug*” or “*backrub*”) without consent or after a request to stop. +- Physical contact and simulated physical contact (eg, textual descriptions like "*hug*" or "*backrub*") without consent or after a request to stop. - Threats of violence. - Incitement of violence towards any individual, including encouraging a person to commit suicide or to engage in self-harm. - Deliberate intimidation. @@ -141,19 +79,21 @@ Harassment includes: - Unwelcome sexual attention. - Pattern of inappropriate social contact, such as requesting/assuming inappropriate levels of intimacy with others - Continued one-on-one communication after requests to cease. -- Deliberate “outing” of any aspect of a person’s identity without their consent except as necessary to protect vulnerable people from intentional abuse. +- Deliberate "outing" of any aspect of a person’s identity without their consent except as necessary to protect vulnerable people from intentional abuse. - Publication of non-harassing private communication. +Be mindful that others may not want their image or name on social media. Ask permission prior to posting about another person at in-person events. + The dbt Community prioritizes marginalized people’s safety over privileged people’s comfort. The dbt Community Admin team reserves the right not to act on complaints regarding: - ‘Reverse’ -isms, including ‘reverse racism,’ ‘reverse sexism,’ and ‘cisphobia’ -- Reasonable communication of boundaries, such as “leave me alone,” “go away,” or “I’m not discussing this with you.” +- Reasonable communication of boundaries, such as "leave me alone," "go away," or "I’m not discussing this with you." - Communicating in a ‘tone’ you don’t find congenial - Criticizing racist, sexist, cissexist, or otherwise oppressive behavior or assumptions ### Reporting harassment -If you are being harassed by a member of the dbt Community, notice that someone else is being harassed, or have any other concerns, please contact us at [community@dbtlabs.com](mailto:community@dbtlabs.com). +If you are being harassed by a member of the dbt Community, notice that someone else is being harassed, or have any other concerns, please contact us at [community@dbtlabs.com](mailto:community@dbtlabs.com) or use the workflows in [#moderation-and-administration](https://getdbt.slack.com/archives/C02JJ8N822H) on Slack. We will respect confidentiality requests for the purpose of protecting victims of abuse. At our discretion, we may publicly name a person about whom we’ve received harassment complaints, or privately warn third parties about them, if we believe that doing so will increase the safety of dbt community members or the general public. We will not name harassment victims without their affirmative consent. diff --git a/website/docs/community/resources/community-rules-of-the-road.md b/website/docs/community/resources/community-rules-of-the-road.md new file mode 100644 index 00000000000..12711b64c06 --- /dev/null +++ b/website/docs/community/resources/community-rules-of-the-road.md @@ -0,0 +1,70 @@ +--- +title: "dbt Community Rules of the Road" +id: "community-rules-of-the-road" +description: "This community is filled with smart, kind, and helpful people who share our commitment to elevating the analytics profession. These rules help everyone understand how to best participate." +--- + +As of June 2023, the dbt Community includes over 50,000 data professionals and is still growing. People genuinely love this community. It's filled with smart, kind, and helpful people who share our commitment to elevating the analytics profession. + +We are committed to maintaining the spirit of this community, and have written these rules alongside its members to help everyone understand how to best participate. We appreciate your support in continuing to build a community we're all proud of. + +## Expectations for all members +### Rule 1: Be respectful +We want everyone in this community to have a fulfilling and positive experience. Therefore, this first rule is serious and straightforward; we simply will not tolerate disrespectful behavior of any kind. + +Everyone interacting on a dbt platform – including Slack, the forum, codebase, issue trackers, and mailing lists – is expected to follow the [Community Code of Conduct](/community/resources/code-of-conduct). If you are unable to abide by the code of conduct set forth here, we encourage you not to participate in the community. + +### Rule 2: Keep it in public spaces +Unless you have someone's express permission to contact them directly, do not directly message other community members, whether on a dbt Community platform or other spaces like LinkedIn. + +We highly value the time community members put into helping each other, and we have precisely zero tolerance for people who abuse their access to experienced professionals. If you are being directly messaged with requests for assistance without your consent, let us know in the [#moderation-and-administration](https://getdbt.slack.com/archives/C02JJ8N822H) Slack channel. We will remove that person from the community. Your time and attention is valuable. + +### Rule 3: Follow messaging etiquette +In short: put effort into your question, use threads, post in the right channel, and do not seek extra attention by tagging individuals or double-posting. For more information, see our [guide on getting help](/community/resources/getting-help). + +### Rule 4: Do not solicit community members +This community is built for data practitioners to discuss the work that they do, the ideas that they have, and the things that they are learning. It is decidedly not intended to be lead generation for vendors or recruiters. + +Vendors and recruiters are subject to additional rules to ensure this space remains welcoming to everyone. These requirements are detailed below and are enforced vigorously. + +## Vendor expectations + +As a vendor/dbt partner, you are also a member of this community, and we encourage you to participate fully in the space. We have seen folks grow fantastic user relationships for their products when they come in with the mindset to share rather than pushing a pitch. At the same time, active community members have a finely honed sense of when they are being reduced to an audience or a resource to be monetized, and their response is reliably negative. + +:::info Who is a vendor? +Vendors are generally individuals belonging to companies that are creating products or services primarily targeted at data professionals, but this title also includes recruiters, investors, open source maintainers (with or without a paid offering), consultants and freelancers. If in doubt, err on the side of caution. +::: + +### Rule 1: Identify yourself +Include your company in your display name, e.g. "Alice (DataCo)". When joining a discussion about your product (after the waiting period below), be sure to note your business interests. + +### Rule 2: Let others speak first +If a community member asks a question about your product directly, or mentions that they have a problem that your product could help with, wait 1 business day before responding to allow other members to share their experiences and recommendations. (This doesn't apply to unambiguously support-style questions from existing users, or in your `#tools-` channel if you have one). + +### Rule 3: Keep promotional content to specified spaces +As a space for professional practice, the dbt Community is primarily a non-commercial space. However, as a service to community members who want to be able to keep up to date with the data industry, there are several areas available on the Community Slack for vendors to share promotional material: +- [#vendor-content](https://getdbt.slack.com/archives/C03B0Q4EBL3) +- [#events](https://getdbt.slack.com/archives/C80RCAZ5E) +- #tools-* (post in [#moderation-and-administration](https://getdbt.slack.com/archives/C02JJ8N822H) to request a channel for your tool/product) + +Recruiters may also post in [#jobs](https://getdbt.slack.com/archives/C7A7BARGT)/[#jobs-eu](https://getdbt.slack.com/archives/C04JMHHK6CD) but may not solicit applications in DMs. + +The definition of "vendor content" can be blurry at the edges, and we defer to members' instincts in these scenarios. As a rule, if something is hosted on a site controlled by that company or its employees (including platforms like Substack and Medium), or contains a CTA such as signing up for a mailing list or trial account, it will likely be considered promotional. + +### One more tip: Be yourself +Speak in your own voice, and join in any or all of the conversations that interest you. Share your expertise as a data professional. Make a meme if you're so inclined. Get in a (friendly) debate. You are not limited to only your company's products and services, and making yourself known as a familiar face outside of commercial contexts is one of the most effective ways of building trust with the community. Put another way, [create more value than you capture](/community/resources/code-of-conduct#create-more-value-than-you-capture). + +Because unaffiliated community members are able to share links in any channel, the most effective way to have your work reach a wider audience is to create things that are genuinely useful to the community. + + +## Handling violations + +The point of these rules is not to find opportunities to punish people, but to ensure the longevity of the community. Participation in this community is a privilege, and we reserve the right to remove people from it. + +To report an issue or appeal a judgement, email [community@dbtlabs.com](mailto:community@dbtlabs.com) or use the workflows in [#moderation-and-administration](https://getdbt.slack.com/archives/C02JJ8N822H) on Slack. + +Violations related to our anti-harassment policy will result in immediate removal. Other issues are handled in proportion to their impact, and may include: + +- a friendly, but public, reminder that the behavior is inappropriate according to our guidelines. +- a private message with a warning that any additional violations will result in removal from the community. +- temporary or permanent suspension of your account. diff --git a/website/docs/guides/legacy/getting-help.md b/website/docs/community/resources/getting-help.md similarity index 77% rename from website/docs/guides/legacy/getting-help.md rename to website/docs/community/resources/getting-help.md index 67421c71aae..5f423683014 100644 --- a/website/docs/guides/legacy/getting-help.md +++ b/website/docs/community/resources/getting-help.md @@ -12,15 +12,13 @@ The docs site you're on is highly searchable, make sure to explore for the answe We have a handy guide on [debugging errors](/guides/best-practices/debugging-errors) to help out! This guide also helps explain why errors occur, and which docs you might need to search for help. #### Search for answers using your favorite search engine -We're committed to making more errors searchable, so it's worth checking if there's a solution already out there! Further, some errors related to installing dbt, the SQL in your models, or getting YAML right, are errors that are not-specific to dbt, so there may be other resources to cehck. +We're committed to making more errors searchable, so it's worth checking if there's a solution already out there! Further, some errors related to installing dbt, the SQL in your models, or getting YAML right, are errors that are not-specific to dbt, so there may be other resources to check. #### Experiment! If the question you have is "What happens when I do `X`", try doing `X` and see what happens! Assuming you have a solid dev environment set up, making mistakes in development won't affect your end users ### 2. Take a few minutes to formulate your question well Explaining the problems you are facing clearly will help others help you. - #### Include relevant details in your question Include exactly what's going wrong! When asking your question, you should: @@ -37,19 +35,21 @@ In general, people are much more willing to help when they know you've already g #### Share the context of the problem you're trying to solve Sometimes you might hit a boundary of dbt because you're trying to use it in a way that doesn't align with the opinions we've built into dbt. By sharing the context of the problem you're trying to solve, we might be able to share insight into whether there's an alternative way to think about it. +#### Post a single message and use threads +The dbt Slack's culture revolves around threads. When posting a message, try drafting it to yourself first to make sure you have included all the context. Include big code blocks in a thread to avoid overwhelming the channel. + +#### Don't tag individuals to demand help +If someone feels inclined to answer your question, they will do so. We are a community of volunteers, and we're generally pretty responsive and helpful! If nobody has replied to your question, consider if you've asked a question that helps us understand your problem. If you require in-depth, ongoing assistance, we have a wonderful group of experienced dbt consultants in our ecosystem. You can find a full list [below](#receiving-dedicated-support). + + ### 3. Choose the right medium for your question We use a number of different mediums to share information - If your question is roughly "I've hit this error and am stuck", please ask it on [the dbt Community Forum](https://discourse.getdbt.com). - If you think you've found a bug, please report it on the relevant GitHub repo (e.g. [dbt repo](https://github.com/dbt-labs/dbt), [dbt-utils repo](https://github.com/dbt-labs/dbt-utils)) -- If you are looking for an opinionated answer (e.g. "What's the best approach to X?", "Why is Y done this way?"), then, feel free to join our [Slack community](https://community.getdbt.com/) and ask it in the correct channel: - * **#advice-dbt-for-beginners:** A great channel if you're getting started with dbt and want to understand how it works. - * **#advice-dbt-for-power-users:** If you’re hitting an error in dbt that you don’t understand, let us know here. - * **#advice-data-modeling:** This channel is most useful when wanting to ask questions about data model design, SQL patterns, and testing. - * **#dbt-suggestions:** Got an idea for dbt? This is the place! - * Other channels: We're adding new channels all the time — please take a moment to browse the channels to see if there is a better fit +- If you are looking for a more wide-ranging conversation (e.g. "What's the best approach to X?", "Why is Y done this way?"), join our [Slack community](https://getdbt.com/community). Channels are consistently named with prefixes to aid discoverability. ## Receiving dedicated support -If you need dedicated support to build your dbt project, consider reaching out regarding [professional services](https://www.getdbt.com/contact/), or engaging one of our [consulting partners](https://www.getdbt.com/ecosystem/). +If you need dedicated support to build your dbt project, consider reaching out regarding [professional services](https://www.getdbt.com/contact/), or engaging one of our [consulting partners](https://partners.getdbt.com/english/directory/). ## dbt Training If you want to receive dbt training, check out our [dbt Learn](https://learn.getdbt.com/) program. @@ -60,14 +60,4 @@ If you want to receive dbt training, check out our [dbt Learn](https://learn.get - Billing - Bug reports related to the web interface -As a rule of thumb, if you are using dbt Cloud, but your problem is related to code within your dbt project, then please follow the above process rather than reaching out to support. - - +As a rule of thumb, if you are using dbt Cloud, but your problem is related to code within your dbt project, then please follow the above process rather than reaching out to support. \ No newline at end of file diff --git a/website/docs/community/resources/maintaining-a-channel.md b/website/docs/community/resources/maintaining-a-channel.md index af1cd46bf84..289fa389e80 100644 --- a/website/docs/community/resources/maintaining-a-channel.md +++ b/website/docs/community/resources/maintaining-a-channel.md @@ -18,8 +18,8 @@ A maintainer can be a dbt Labs employee, but does not have to be. *Slack channel ## Initial instructions -1. Review the [Rules of the Road](community/resources/slack-rules-of-the-road) and [Code of Conduct](community/resources/code-of-conduct) and please let the the folks who created the channel know that you read both documents and you agree to be mindful of them. -2. If you are a vendor, review the [Vendor Guidelines](https://www.getdbt.com/community/vendor-guidelines). +1. Review the [Rules of the Road](community/resources/community-rules-of-the-road) and [Code of Conduct](community/resources/code-of-conduct) and please let the the folks who created the channel know that you read both documents and you agree to be mindful of them. +2. If you are a vendor, review the [Vendor Expectations](community/resources/community-rules-of-the-road#vendor-expectations). 3. Add the Topic and Description to the channel. @Mention your name in the channel Description, identifying yourself as the maintainer. Ex: *Maintainer: First Last (pronouns).* If you are a vendor, make sure your Handle contains your affiliation. 4. Complete or update your Slack profile by making sure your Company (in the ‘What I do’ field), Pronouns, and Handle, if you’re a vendor, are up-to-date. 5. Post initial conversation topics once a few folks get in the channel to help folks get to know each other. Check out this [example introductory post](https://getdbt.slack.com/archives/C02FXAZRRDW/p1632407767005000). @@ -32,7 +32,7 @@ A maintainer can be a dbt Labs employee, but does not have to be. *Slack channel - If the channel is an industry channel, it’s helpful to monitor [#introductions](https://getdbt.slack.com/archives/CETJLH1V3) and invite people. Keep an eye out for folks who might benefit from being in the new channel if they mention they are working in the space, or are thinking about some of these problems. - Make sure folks follow the [Rules of the Road](https://docs.getdbt.com/docs/contributing/slack-rules-of-the-road). For example, if you notice someone is not following one, gently remind them of the rule in thread, and, ideally, provide an example of how they can rephrase their message or where they can redirect it. If you have a question about how to proceed, just post about it in #moderation-and-administration with a link to the thread or screenshot and someone will give you advice. - In tools channels, sharing customer stories and product updates is very okay in this channel because folks expect that when they join. However, please avoid any direct sales campaigns, pricing offers, etc. -- If you have any questions/doubts about the [Rules of the Road](/community/resources/slack-rules-of-the-road) or [Vendor Guidelines](/community/resources/vendor-guidelines), please post a question in #moderation-and-administration about what sort of things the community expects from interactions with vendors. +- If you have any questions/doubts about the [Rules of the Road and Vendor Expectations](/community/resources/community-rules-of-the-road), please post a question in #moderation-and-administration about what sort of things the community expects from interactions with vendors. - A reminder that we never DM anyone in Slack without their permission in public channel or some prior relationship. - A reminder that @ here/all/channel are disabled. - Use and encourage the use of threads 🧵 to keep conversations tidy! diff --git a/website/docs/community/resources/slack-rules-of-the-road.md b/website/docs/community/resources/slack-rules-of-the-road.md deleted file mode 100644 index 351507ecab6..00000000000 --- a/website/docs/community/resources/slack-rules-of-the-road.md +++ /dev/null @@ -1,67 +0,0 @@ ---- -title: "dbt Slack: Rules of the Road" -id: "slack-rules-of-the-road" ---- - -As of October 2022, the dbt Slack community includes 35,000+ data professionals and is growing month-over-month. People genuinely love this community. It’s filled with smart, kind, and helpful people who share our commitment to elevating the analytics profession. - -We are committed to maintaining the spirit of this community, and as such have written these rules to help new members understand how to best participate in our community. - -We appreciate your support in continuing to build a community we’re all proud of. - -## Rule 1: Be respectful -We want everyone to have a fulfilling and positive experience in dbt Slack and we are continuously grateful in your help ensuring that this is the case. - -The guidelines that follow are important, but transgressions around Slack etiquette are forgivable. This first rule, however, is serious -- we simply will not tolerate disrespectful behavior of any kind. - -Everyone interacting in dbt Slack, codebase, issue trackers, and mailing lists are expected to follow the [PyPA Code of Conduct](https://www.pypa.io/en/latest/code-of-conduct/). If you are unable to abide by the code of conduct set forth here, we encourage you not to participate in the community. - -## Rule 2: Use the right channel -It’s important that we make it possible for members of the community to opt-in to various types of conversations. Our different Slack channels specifically exist for this purpose. Our members do a wonderful job at making sure messages are posted in the most relevant channel, and you’ll frequently see people (respectfully!) reminding each other about where to post messages. Here's a guide to our channels: -- If you're new to dbt and unsure where something belongs, feel free to post in **#advice-dbt-for-beginners** - we'll be able to direct you to the right place -- **For job postings, use #jobs**. If you post a job description outside of #jobs, we will delete it and send you a link to this rule. -- For database-specific questions, use **#db-snowflake**, **#db-bigquery**, **#db-redshift**, or similar. -- For questions about data modeling or for SQL help, use **#modeling** -- For conversations unrelated to dbt or analytics, consider if dbt Slack is an appropriate medium for the conversation. If so, use **#memes-and-off-topic-chatter**. - -If you're hitting an error, you should post your question in [the Community Forum](https://discourse.getdbt.com) instead. - -## Rule 3: Put effort into your question -dbt Slack is a community of volunteers. These are kind, knowledgeable, helpful people who share their time and expertise for free. - -A thoughtful and well-researched post will garner far more responses than a low-effort one. See the guide on [getting help](/guides/legacy/getting-help) for more information about how to ask a good question. - -## Rule 4: Mark your questions as resolved -Were you in need of help, and received a helpful reply? Please mark your question as resolved by adding a ✅ reaction to your original post. Note that other community members may summon Slackbot to remind you to do this, by posting the words `resolved bot` as a reply to your message. - -## Rule 5: Do not double-post -Our best members are respectful of peoples’ time. We understand that even though a question feels urgent, dbt Slack is not a customer service platform, it is a community of volunteers. - -The majority of questions in dbt Slack get answered, though you may need to wait a bit. If you’re not getting a response, please do not post the same question to multiple channels (we’ll delete your messages and send you a link to this page). Instead, review your question and see if you can rewrite it better to make it easier for someone to answer quickly. - -## Rule 6: Keep it in public channels -Unless you have someone’s express permission to contact them directly, **do not directly message members of this community to solicit help, sell a product, or recruit for a role**. - -We highly value the time community members put into helping each other, and we have precisely zero tolerance for people who abuse their access to experienced professionals. If you are being directly messaged by members of the community asking for assistance without your consent, let us know. We will remove that person from the community. Your time and attention is valuable. - -## Rule 7: Do not solicit members of our Slack -This community is built for data practitioners to discuss the work that they do, the ideas that they have, and the things that they are learning. It is decidedly not intended to be lead generation for vendors or recruiters. - -**Do not pitch your products or services in dbt Slack**: this isn't the right place for that. Vendors can add enormous value to the community by being there to answer questions about their products when questions arise. - -Further, **do not use our Slack community for outbound recruitment for a role**. Recruiters should feel free to post opportunities in the #jobs channel, but should not directly contact members about an opportunity. - -We appreciate when vendors and recruiters identify themselves clearly in their Slack username. If you see someone pitching products and services in dbt Slack, or contact you directly about an open role, let us know. We’ll delete the message and remind that person about this rule. - -## Rule 8: Do not demand attention with @channel and @here, or by tagging individuals -The @channel and @here keywords in Slack are disabled for everyone except admins. If you make a post containing @channel or @here, nothing will happen. Still, we'll send you a link to this rule to help you better understand how dbt Slack operates. - -Do not tag individuals for in-depth assistance in your questions. If someone feels inclined to answer your question, they will do so. We are a community of volunteers, and we're generally pretty responsive and helpful! If nobody has replied to your question, consider if you've asked a question that helps us understand your problem. If you require in-depth, ongoing assistance, we have a wonderful group of experienced dbt consultants in our ecosystem. You can find a full list [here](https://www.getdbt.com/ecosystem/). - -## Rule 9: Use threads -The best way to keep conversations coherent in Slack is to use threads. The dbt Slack community uses threads heavily and if you break this convention, a member of the community will let you know. - -Here are some guidelines on how to use threads effectively: -* Type your question out as one message rather than separate messages (Pro Tip: Write a first draft of your question as a direct message to yourself) -* Leverage Slack's edit functionality if you realize you forgot to add something to your question rather than adding new messages. -* If you see a conversation taking place across discrete messages, send over a link to this rule. diff --git a/website/docs/community/resources/vendor-guidelines.md b/website/docs/community/resources/vendor-guidelines.md deleted file mode 100644 index 1b6bb6c9511..00000000000 --- a/website/docs/community/resources/vendor-guidelines.md +++ /dev/null @@ -1,66 +0,0 @@ ---- -title: "Vendor guidelines" -id: "vendor-guidelines" ---- - -# Engaging in the dbt Community as a Vendor - -A key aspect that makes dbt stand out from other tools is the dbt Community. -This community was built to drive our mission statement of empowering analysts. -This includes advancing the field of analytics engineering practices. -We are creating spaces where folks can learn from each other, share best practices, -discover what it means to use software engineering workflows, and so on. - -The dbt community extends far beyond what happens in dbt Slack. There are regular meetups, -blog posts, and even a conference! Our North Star is to extend the knowledge loop; -we are a community, not an audience. - -Our community members expect a thoughtful space full of kind, curious, and bright individuals. -They contribute to the knowledge loop with their own expertise and benefit from the relevant knowledge brought to the table by other community experts (including vendors). -Along those lines, **we value diversity and inclusion**. -We seek to amplify underrepresented communities and have no tolerance for anyone who is disrespectful in this space. - -As a vendor/dbt partner, you are also a member of this community, one that we want -and deeply encourage to share your expertise in tooling, analytics, etc. -Our community members are truly open to discovering and discussing innovative solutions and tools. -We have seen folks grow fantastic user relationships for their products when they come in with the mindset to share rather than pushing a pitch. - -To guide you on your community journey, we have created this document for you to read and share with your coworkers. -By following these guidelines, you will help us maintain this community as well as gain -full access to all the benefits that this community can provide. - - -## Dos & Don'ts for dbt Slack - -### Dos -- **Read the Rules of The Road.** These rules are the best ways to participate in our community. -- **Fill your profile!** We want to get to know you so do upload a picture of yourself and add your company in your name (e.g. "Alice (DataCo)"). Be sure to include your company in your profile so folks know that you work for a vendor -- **Introduce Yourself in #introductions.** Tell us about yourself! -- **Be helpful.** We encourage folks to answer questions and offer their product expertise to conversations already in motion. You can even invite folks to chat in DMs if anyone wants more info about your product. But be sure you identify yourself and your business interests in thread. -- **Be yourself when posting, speak in your own voice.** -- **Participate in all the conversations that interest you.** Make a meme if you’re so inclined. Get in a (friendly) debate. You are not limited to only your company's products and services. -- **Post with intention.** If you have a link or product update that is appropriate to share, give context. - -### Don'ts -- **Do not do 1:1 outbound.** Only initiate DMs if you’ve received active confirmation in a public channel that a DM would be welcome. -- **Do not be anonymous.** Folks who identify themselves clearly are able to build empathy and form genuine relationships much easier. This is what we want for the community. -- Spam channels with Marketing material. -- **Do not post without context.** Posts that include context outside of just the pitch are the ones that add value to our community. - - -## Summary - -This community is centered around feeding into the knowledge loop. It’s a place intended for building genuine, helpful connections. We found that most vendors find success in our space by leading with this intention. - -Here are some ways you can contribute to the community: - -- contribute to the dbt core repository -- write dbt packages -- write other public content (blog posts, case studies, etc.) -- respond to questions on slack / discourse -- host events -- promote / respond to content written by community members -- Partner up with community members on blog posts/code/etc. - -For more information on the thought behind our community, especially if you are interested in creating your own, feel free to -reach out to our community managers. diff --git a/website/docs/community/spotlight/bruno-de-lima.md b/website/docs/community/spotlight/bruno-de-lima.md index 1aa8e1ba433..7f40f66859c 100644 --- a/website/docs/community/spotlight/bruno-de-lima.md +++ b/website/docs/community/spotlight/bruno-de-lima.md @@ -10,8 +10,8 @@ description: | image: /img/community/spotlight/bruno-de-lima.jpg pronouns: he/him location: Florianópolis, Brazil -jobTitle: Analytics Engineer -companyName: Indicium +jobTitle: Data Engineer +companyName: phData organization: "" socialLinks: - name: LinkedIn diff --git a/website/docs/docs/build/about-metricflow.md b/website/docs/docs/build/about-metricflow.md new file mode 100644 index 00000000000..f35bed24044 --- /dev/null +++ b/website/docs/docs/build/about-metricflow.md @@ -0,0 +1,288 @@ +--- +title: "About MetricFlow" +id: about-metricflow +description: "Learn more about MetricFlow and its key concepts" +sidebar_label: About MetricFlow +tags: [Metrics, Semantic Layer] +--- + +This guide introduces MetricFlow's fundamental ideas for new users. MetricFlow, which powers the dbt Semantic Layer, helps you define and manage the logic for your company's metrics. It's an opinionated set of abstractions and helps data consumers retrieve metric datasets from a data platform quickly and efficiently. + +:::info + +MetricFlow is a new way to define metrics in dbt and one of the key components of the [dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-semantic-layer). It handles SQL query construction and defines the specification for dbt semantic models and metrics. + +To fully experience the dbt Semantic Layer, including the ability to query dbt metrics via external integrations, you'll need a [dbt Cloud Team or Enterprise account](https://www.getdbt.com/pricing/). + +::: + +There are a few key principles: + +- **Flexible, but complete** — Ability to create any metric on any data model by defining logic in flexible abstractions. +- **Don't Repeat Yourself (DRY)** — Avoid repetition by allowing metric definitions to be enabled whenever possible. +- **Simple with progressive complexity** — Make MetricFlow approachable by relying on known concepts and structures in data modeling. +- **Performant and efficient** — Allow for performance optimizations in centralized data engineering while still enabling distributed definition and ownership of logic. + +## MetricFlow + +- MetricFlow is a SQL query generation engine that helps you create metrics by constructing appropriate queries for different granularities and dimensions that are useful for various business applications. + +- It uses YAML files to define a semantic graph, which maps language to data. This graph consists of [semantic models](/docs/build/semantic-models), which serve as data entry points, and [metrics](/docs/build/metrics-overview), which are functions used to create new quantitative indicators. + +- MetricFlow is a [BSL package](https://github.com/dbt-labs/metricflow) (code is source available) and available on dbt versions 1.6 and higher. Data practitioners and enthusiasts are highly encouraged to contribute. + +- MetricFlow, as a part of the dbt Semantic Layer, allows organizations to define company metrics logic through YAML abstractions, as described in the following sections. + +- You can install MetricFlow via PyPI as an extension of your [dbt adapter](/docs/supported-data-platforms) in the CLI. To install the adapter, run `pip install "dbt-metricflow[your_adapter_name]"` and add the adapter name at the end of the command. For example, for a Snowflake adapter run `pip install "dbt-metricflow[snowflake]"`. + +### Semantic graph + +We're introducing a new concept: a "semantic graph". It's the relationship between semantic models and YAML configurations that creates a data landscape for building metrics. You can think of it like a map, where tables are like locations, and the connections between them (edges) are like roads. Although it's under the hood, the semantic graph is a subset of the , and you can see the semantic models as nodes on the DAG. + +The semantic graph helps us decide which information is available to use for consumption and which is not. The connections between tables in the semantic graph are more about relationships between the information. This is different from the DAG, where the connections show dependencies between tasks. + +When MetricFlow generates a metric, it uses its SQL engine to figure out the best path between tables using the framework defined in YAML files for semantic models and metrics. When these models and metrics are correctly defined, they can be used downstream with dbt Semantic Layer's integrations. + +### Semantic models + +Semantic models are the starting points of data and correspond to models in your dbt project. You can create multiple semantic models from each model. Semantic models have metadata, like a data table, that define important information such as the table name and primary keys for the graph to be navigated correctly. + +For a semantic model, there are three main pieces of metadata: + +* [Entities](/docs/build/entities) — The join keys of your semantic model (think of these as the traversal paths, or edges between semantic models). +* [Dimensions](/docs/build/dimensions) — These are the ways you want to group or slice/dice your metrics. +* [Measures](/docs/build/measures) — The aggregation functions that give you a numeric result and can be used to create your metrics. + + +### Metrics + +Metrics, which is a key concept, are functions that combine measures, constraints, or other mathematical functions to define new quantitative indicators. MetricFlow uses measures and various aggregation types, such as average, sum, and count distinct, to create metrics. Dimensions add context to metrics and without them, a metric is simply a number for all time. You can define metrics in the same YAML files as your semantic models, or create a new file. + +MetricFlow supports different metric types: + +- [Derived](/docs/build/derived) — An expression of other metrics, which allows you to do calculations on top of metrics. +- [Ratio](/docs/build/ratio) — Create a ratio out of two measures, like revenue per customer. +- [Simple](/docs/build/simple) — Metrics that refer directly to one measure. +## Use case + +In the upcoming sections, we'll show how data practitioners currently calculate metrics and compare it to how MetricFlow makes defining metrics easier and more flexible. + +The following example data schema image shows a number of different types of data tables: + +- `transactions` is a production data platform export that has been cleaned up and organized for analytical consumption +- `visits` is a raw event log +- `stores` is a cleaned-up and fully normalized dimensional table from a daily production database export +- `products` is a dimensional table that came from an external source such as a wholesale vendor of the goods this store sells. +- `customers` is a partially denormalized table in this case with a column derived from the transactions table through some upstream process + +![MetricFlow-SchemaExample](/img/docs/building-a-dbt-project/MetricFlow-SchemaExample.jpeg) + +To make this more concrete, consider the metric `revenue`, which is defined using the SQL expression: + +`select sum(price * quantity) as revenue from transactions` + +This expression calculates the total revenue by multiplying the price and quantity for each transaction and then adding up all the results. In business settings, the metric `revenue` is often calculated according to different categories, such as: +- Time, for example `date_trunc(created_at, 'day')` +- Product, using `product_category` from the `product` table. + +### Calculate metrics + +Next, we'll compare how data practitioners currently calculate metrics with multiple queries versus how MetricFlow simplifies and streamlines the process. + + + + +The following example displays how data practitioners typically would calculate the revenue metric aggregated. It's also likely that analysts are asked for more details on a metric, like how much revenue came from bulk purchases. + +Using the following query creates a situation where multiple analysts working on the same data, each using their own query method — this can lead to confusion, inconsistencies, and a headache for data management. + +```sql +select + date_trunc(transactions.created_at, 'day') as day + , products.category as product_category + , sum(transactions.price * transactions.quantity) as revenue +from + transactions +left join + products +on + transactions.product_id = products.product_id +group by 1, 2 +``` + + + + +> Introducing MetricFlow, a key component of the dbt Semantic Layer 🤩 - simplifying data collaboration and governance. + +In the following three example tabs, use MetricFlow to define a semantic model that uses revenue as a metric and a sample schema to create consistent and accurate results — eliminating confusion, code duplication, and streamlining your workflow. + + + + +In this example, a measure named revenue is defined based on two columns in the `schema.transactions` table. The time dimension `ds` provides daily granularity and can be aggregated to weekly or monthly time periods. Additionally, a categorical dimension called `is_bulk_transaction` is specified using a case statement to capture bulk purchases. + + +```yaml +semantic_models: + - name: transactions + description: "A record for every transaction that takes place. Carts are considered multiple transactions for each SKU." + owners: support@getdbt.com + model: (ref('transactions')) + default: + agg_time_dimension: metric_time + + # --- entities --- + entities: + - name: transaction_id + type: primary + - name: customer_id + type: foreign + - name: store_id + type: foreign + - name: product_id + type: foreign + + # --- measures --- + measures: + - name: revenue + description: + expr: price * quantity + agg: sum + - name: quantity + description: Quantity of products sold + expr: quantity + agg: sum + - name: active_customers + description: A count of distinct customers completing transactions + expr: customer_id + agg: count_distinct + + # --- dimensions --- + dimensions: + - name: metric_time + type: time + expr: date_trunc('day', ts) + type_params: + time_granularity: day + - name: is_bulk_transaction + type: categorical + expr: case when quantity > 10 then true else false end + ``` + + + + +Similarly, you could then add a `products` semantic model on top of the `products` model to incorporate even more dimensions to slice and dice your revenue metric. + +Notice the identifiers present in the semantic models `products` and `transactions`. MetricFlow does the heavy-lifting for you by traversing the appropriate join keys to identify the available dimensions to slice and dice your `revenue` metric. + +```yaml +semantic_models: + - name: products + description: A record for every product available through our retail stores. + owners: support@getdbt.com + model: ref('products') + + # --- identifiers --- + entities: + - name: product_id + type: primary + + # --- dimensions --- + dimensions: + - name: category + type: categorical + - name: brand + type: categorical + - name: is_perishable + type: categorical + expr: | + category in ("vegetables", "fruits", "dairy", "deli") +``` + + + +Imagine an even more difficult metric is needed, like the amount of money earned each day by selling perishable goods per active customer. Without MetricFlow the data practitioner's original SQL might look like this: + +```sql +select + date_trunc(transactions.created_at, 'day') as day + , products.category as product_category + , sum(transactions.price * transactions.quantity) as revenue + , count(distinct customer_id) as active_customers + , sum(transactions.price * transactions.quantity)/count(distinct customer_id) as perishable_revenues_per_active_customer +from + transactions +left join + products +on + transactions.product_id = products.product_id +where + products.category in ("vegetables", "fruits", "dairy", "deli") +group by 1, 2 +``` + +MetricFlow simplifies the SQL process via metric YAML configurations as seen below. You can also commit them to your git repository to ensure everyone on the data and business teams can see and approve them as the true and only source of information. + +```yaml +metrics: + - name: perishables_revenue_per_active_customer + description: Revenue from perishable goods (vegetables, fruits, dairy, deli) for each active store. + type: ratio + type_params: + numerator: revenue + denominator: active_customers + filter: | + {{dimension('perishable_goods')}} in ('vegetables',' fruits', 'dairy', 'deli') +``` + + + + + + +## FAQs + +
+ Do my datasets need to be normalized? +
+
Not at all! While a cleaned and well-modeled data set can be extraordinarily powerful and is the ideal input, you can use any dataset from raw to fully denormalized datasets.

It's recommended that you apply quality data consistency, such as filtering bad data, normalizing common objects, and data modeling of keys and tables, in upstream applications. The Semantic Layer is more efficient at doing data denormalization instead of normalization.

If you have not invested in data consistency, that is okay. The Semantic Layer can take SQL queries or expressions to define consistent datasets.
+
+
+
+ Why is normalized data the ideal input? +
+
MetricFlow is built to do denormalization efficiently. There are better tools to take raw datasets and accomplish the various tasks required to build data consistency and organized data models. On the other end, by putting in denormalized data you are potentially creating redundancy which is technically challenging to manage, and you are reducing the potential granularity that MetricFlow can use to aggregate metrics.
+
+
+
+ Why not just make metrics the same as measures? +
+
One principle of MetricFlow is to reduce the duplication of logic sometimes referred to as Don't Repeat Yourself(DRY).

Many metrics are constructed from reused measures and in some cases constructed from measures from different semantic models. This allows for metrics to be built breadth-first (metrics that can stand alone) instead of depth-first (where you have multiple metrics acting as functions of each other).

Additionally, not all metrics are constructed off of measures. As an example, a conversion metric is likely defined as the presence or absence of an event record after some other event record.
+
+
+
+ How does the Semantic Layer handle joins? +
+
MetricFlow builds joins based on the types of keys and parameters that are passed to entities. To better understand how joins are constructed see our documentations on join types.

Rather than capturing arbitrary join logic, MetricFlow captures the types of each identifier and then helps the user to navigate to appropriate joins. This allows us to avoid the construction of fan out and chasm joins as well as generate legible SQL.
+
+
+
+ Are entities and join keys the same thing? +
+
If it helps you to think of entities as join keys, that is very reasonable. Entities in MetricFlow have applications beyond joining two tables, such as acting as a dimension.
+
+
+
+ Can a table without a primary or unique entities have dimensions? +
+
Yes, but because a dimension is considered an attribute of the primary or unique ent of the table, they are only usable by the metrics that are defined in that table. They cannot be joined to metrics from other tables. This is common in event logs.
+
+
+ + +## Related docs +- [Joins](/docs/build/join-logic) +- [Validations](/docs/build/validation) + diff --git a/website/docs/docs/build/build-metrics-intro.md b/website/docs/docs/build/build-metrics-intro.md new file mode 100644 index 00000000000..e98ee013d0b --- /dev/null +++ b/website/docs/docs/build/build-metrics-intro.md @@ -0,0 +1,62 @@ +--- +title: "Build your metrics" +id: build-metrics-intro +description: "Learn about MetricFlow and build your metrics with semantic models" +sidebar_label: Build your metrics +tags: [Metrics, Semantic Layer, Governance] +hide_table_of_contents: true +--- + +Use MetricFlow in dbt to centrally define your metrics. MetricFlow is a key component of the [dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-semantic-layer) and is responsible for SQL query construction and defining specifications for dbt semantic models and metrics. + +Use familiar constructs like semantic models and metrics to avoid duplicative coding, optimize your development workflow, ensure data governance for company metrics, and guarantee consistency for data consumers. + +:::info +MetricFlow is currently available on dbt Core v1.6 beta for [command line (CLI)](/docs/core/about-the-cli) users, with support for dbt Cloud and integrations coming soon. MetricFlow, a BSL package (code is source available), is a new way to define metrics in dbt and will replace the dbt_metrics package. + +To fully experience the dbt Semantic Layer, including the ability to query dbt metrics via external integrations, you'll need a [dbt Cloud Team or Enterprise account](https://www.getdbt.com/pricing/). +::: + +Before you start, keep the following considerations in mind: +- Use the CLI to define metrics in YAML and query them using the [new metric specifications](https://github.com/dbt-labs/dbt-core/discussions/7456). +- You must be on dbt Core v1.6 beta or higher to use MetricFlow. [Upgrade your dbt version](/docs/core/pip-install#change-dbt-core-versions) to get started. + * Note: Support for dbt Cloud and querying via external integrations coming soon. +- MetricFlow currently only supports Snowflake and Postgres. + * Note: Support for BigQuery, Databricks, and Redshift coming soon. +- dbt Labs is working with [integration partners](https://www.getdbt.com/product/semantic-layer-integrations) to develop updated integrations for the new Semantic Layer, powered by MetricFlow, in addition to introducing other consumption methods like Python and JDBC.

+ +
+ + + + + + + + + +

+ + +## Related docs + +- [The dbt Semantic Layer: what's next](https://www.getdbt.com/blog/dbt-semantic-layer-whats-next/) blog +- [Get started with MetricFlow](/docs/build/sl-getting-started) + + diff --git a/website/docs/docs/build/cumulative-metrics.md b/website/docs/docs/build/cumulative-metrics.md new file mode 100644 index 00000000000..77d23d32dce --- /dev/null +++ b/website/docs/docs/build/cumulative-metrics.md @@ -0,0 +1,201 @@ +--- +title: "Cumulative metrics" +id: cumulative +description: "Use Cumulative metrics to aggregate a measure over a given window." +sidebar_label: Cumulative +tags: [Metrics, Semantic Layer] +--- + +Cumulative metrics aggregate a measure over a given window. If no window is specified, the window is considered infinite and accumulates values over all time. + +```yaml +# Cumulative metrics aggregate a measure over a given window. The window is considered infinite if no window parameter is passed (accumulate the measure over all time) +metrics: +- name: wau_rolling_7 + owners: + - support@getdbt.com + type: cumulative + type_params: + measures: + - distinct_users + #Omitting window will accumulate the measure over all time + window: 7 days +``` + +### Window options + +This section details examples for when you specify and don't specify window options. + + + + + +If a window option is specified, the MetricFlow framework applies a sliding window to the underlying measure. + +Suppose the underlying measure `distinct_users` is configured as such to reflect a count of distinct users by user_id and user_status. + +```yaml +measures: + - name: distinct_users + description: The number of distinct users creating mql queries + expr: case when user_status in ('PENDING','ACTIVE') then user_id else null end + agg: count_distinct +``` + +We can write a cumulative metric `wau_rolling_7` as such: + +``` yaml +metrics: + name: wau_rolling_7 + # Define the measure and the window. + type: cumulative + type_params: + measures: + - distinct_users + # the default window is infinity - omitting window will accumulate the measure over all time + window: 7 days +``` + +From the sample yaml above, note the following: + +* `type`: Specify cumulative to indicate the type of metric. +* `type_params`: Specify the measure you want to aggregate as a cumulative metric. You have the option of specifying a `window`, or a `grain to date`. + +For example, in the `wau_rolling_7` cumulative metric, MetricFlow takes a sliding 7-day window of relevant users and applies a count distinct function. + +If you omit the `window`, the measure will accumulate over all time. Otherwise, you can choose from granularities like day, week, quarter, or month, and describe the window using phrases like "7 days" or "1 month." + + + + + +You can use cumulative metrics without a window specified to obtain a running total. Suppose you have a log table with columns like: + +Suppose you (a subscription-based company for the sake of this example) have an event-based log table with the following columns: + +* `date`: a date column +* `user_id`: (integer) an ID specified for each user that is responsible for the event +* `subscription_plan`: (integer) a column that indicates a particular subscription plan associated with the user. +* `subscription_revenue`: (integer) a column that indicates the value associated with the subscription plan. +* `event_type`: (integer) a column that populates with +1 to indicate an added subscription, or -1 to indicate a deleted subscription. +* `revenue`: (integer) a column that multiplies `event_type` and `subscription_revenue` to depict the amount of revenue added or lost for a specific date. + +Using cumulative metrics without specifying a window, you can calculate running totals for metrics like the count of active subscriptions and revenue at any point in time. The following configuration YAML displays creating such cumulative metrics to obtain current revenue or total number of active subscriptions as a cumulative sum: + +```yaml +measures: + - name: revenue + description: Total revenue + agg: sum + expr: revenue + - name: subscription_count + description: Count of active subscriptions + agg: sum + expr: event_type + +metrics: +- name: current_revenue + description: Current revenue + type: cumulative + type_params: + measures: + - revenue +- name: active_subscriptions + description: Count of active subscriptions + type: cumulative + type_params: + measures: + - subscription_count +``` + + + + + +### Grain to date + +You can choose to specify a grain to date in your cumulative metric configuration to accumulate a metric from the start of a grain (such as week, month, or year). When using a window, such as a month, MetricFlow will go back one full calendar month. However, grain to date will always start accumulating from the beginning of the grain, regardless of the latest date of data. + +For example, let's consider an underlying measure of `total_revenue.` + +```yaml +measures: + - name: total_revenue + description: Total revenue (summed) + agg: sum + expr: revenue +``` + +We can compare the difference between a 1-month window and a monthly grain to date. The cumulative metric in a window approach applies a sliding window of 1 month, whereas the grain to date by month resets at the beginning of each month. + +```yaml +metrics: + name: revenue_monthly_window #For this metric, we use a window of 1 month + description: Monthly revenue using a window of 1 month (think of this as a sliding window of 30 days) + type: cumulative + type_params: + measures: + - total_revenue + window: 1 month +``` + +```yaml +metrics: + name: revenue_monthly_grain_to_date #For this metric, we use a monthly grain to date + description: Monthly revenue using a grain to date of 1 month (think of this as a monthly resetting point) + type: cumulative + type_params: + measures: + - total_revenue + grain_to_date: month +``` + +### Implementation + +The current method connects the metric table to a timespine table using the primary time dimension as the join key. We use the accumulation window in the join to decide whether a record should be included on a particular day. The following SQL code produced from an example cumulative metric is provided for reference: + +``` sql +select + count(distinct distinct_users) as weekly_active_users + , metric_time +from ( + select + subq_3.distinct_users as distinct_users + , subq_3.metric_time as metric_time + from ( + select + subq_2.distinct_users as distinct_users + , subq_1.metric_time as metric_time + from ( + select + metric_time + from transform_prod_schema.mf_time_spine subq_1356 + where ( + metric_time >= cast('2000-01-01' as timestamp) + ) and ( + metric_time <= cast('2040-12-31' as timestamp) + ) + ) subq_1 + inner join ( + select + distinct_users as distinct_users + , date_trunc('day', ds) as metric_time + from demo_schema.transactions transactions_src_426 + where ( + (date_trunc('day', ds)) >= cast('1999-12-26' as timestamp) + ) AND ( + (date_trunc('day', ds)) <= cast('2040-12-31' as timestamp) + ) + ) subq_2 + on + ( + subq_2.metric_time <= subq_1.metric_time + ) and ( + subq_2.metric_time > dateadd(day, -7, subq_1.metric_time) + ) + ) subq_3 +) +group by + metric_time +limit 100 +``` diff --git a/website/docs/docs/build/custom-aliases.md b/website/docs/docs/build/custom-aliases.md index 9876f534f8f..589d64f8510 100644 --- a/website/docs/docs/build/custom-aliases.md +++ b/website/docs/docs/build/custom-aliases.md @@ -114,6 +114,14 @@ The default implementation of `generate_alias_name` simply uses the supplied `al + + +### Managing different behaviors across packages + +See docs on macro `dispatch`: ["Managing different global overrides across packages"](/reference/dbt-jinja-functions/dispatch) + + + ### Caveats #### Ambiguous database identifiers diff --git a/website/docs/docs/build/custom-databases.md b/website/docs/docs/build/custom-databases.md index 3df3d705837..300fd3147f1 100644 --- a/website/docs/docs/build/custom-databases.md +++ b/website/docs/docs/build/custom-databases.md @@ -87,6 +87,14 @@ The default implementation of `generate_database_name` simply uses the supplied + + +### Managing different behaviors across packages + +See docs on macro `dispatch`: ["Managing different global overrides across packages"](/reference/dbt-jinja-functions/dispatch) + + + ## Considerations ### BigQuery diff --git a/website/docs/docs/build/custom-schemas.md b/website/docs/docs/build/custom-schemas.md index 156d2f50368..b8dbb9a0846 100644 --- a/website/docs/docs/build/custom-schemas.md +++ b/website/docs/docs/build/custom-schemas.md @@ -178,7 +178,7 @@ The following context methods _are_ available in the `generate_schema_name` macr | Other macros in your project | Macro | ✅ | | Other macros in your packages | Macro | ✅ | -#### Which vars are available in generate_schema_name? +### Which vars are available in generate_schema_name? @@ -190,6 +190,14 @@ for more information on these changes. Globally-scoped variables and variables defined on the command line with [--vars](/docs/build/project-variables) are accessible in the `generate_schema_name` context. + + +### Managing different behaviors across packages + +See docs on macro `dispatch`: ["Managing different global overrides across packages"](/reference/dbt-jinja-functions/dispatch) + + + ## Managing environments In the `generate_schema_name` macro examples shown above, the `target.name` context variable is used to change the schema name that dbt generates for models. If the `generate_schema_name` macro in your project uses the `target.name` context variable, you must additionally ensure that your different dbt environments are configured appropriately. While you can use any naming scheme you'd like, we typically recommend: @@ -199,4 +207,4 @@ In the `generate_schema_name` macro examples shown above, the `target.name` cont If your schema names are being generated incorrectly, double check your target name in the relevant environment. -For more information, consult the [managing environments in dbt Core](/docs/collaborate/environments/dbt-core-environments) guide. +For more information, consult the [managing environments in dbt Core](/docs/core/dbt-core-environments) guide. diff --git a/website/docs/docs/build/derived-metrics.md b/website/docs/docs/build/derived-metrics.md new file mode 100644 index 00000000000..0ca14d1c6f2 --- /dev/null +++ b/website/docs/docs/build/derived-metrics.md @@ -0,0 +1,40 @@ +--- +title: "Derived metrics" +id: derived +description: "Derived metrics is defined as an expression of other metrics.." +sidebar_label: Derived +tags: [Metrics, Semantic Layer] +--- + +Derived metrics in MetricFlow refer to metrics that are created by defining an expression using other metrics. Derived metrics allow for calculations on top of metrics. For example, you can define a metric called "Net Sales Per User" by using other metrics in the calculation. + +```yaml +metrics: + - name: net_sales_per_user + type: derived + type_params: + expr: gross_sales - cogs / active_users + metrics: + - name: gross_sales # these are all metrics (can be a derived metric, meaning building a derived metric with derived metrics) + - name: cogs + - name: users + filter: | # Optional additional constraint + {{dimension('filter')}} is_active + alias: active_users # Optional alias to use in the expr +``` + +## Derived metric offset + +You may want to use an offset value of a metric in the definition of a derived metric. For example, if you define retention rate as (active customers at the end of the month/active customers at the beginning of the month)-1 you can model this using a derived metric with an offset. + +```yaml +metrics: +- name: user_retention + type: derived + type_params: + expr: active_customers/active_customers_t1m + metrics: + - name: active_customers # these are all metrics (can be a derived metric, meaning building a derived metric with derived metrics) + - name: active_customers + offset_window: 1 month + alias: active_customers_t1m diff --git a/website/docs/docs/build/dimensions.md b/website/docs/docs/build/dimensions.md new file mode 100644 index 00000000000..ec92f7595b2 --- /dev/null +++ b/website/docs/docs/build/dimensions.md @@ -0,0 +1,360 @@ +--- +title: Dimensions +id: dimensions +description: "Dimensions determine the level of aggregation for a metric, and are non-aggregatable expressions." +sidebar_label: "Dimensions" +tags: [Metrics, Semantic Layer] +--- + +Dimensions is a way to group or filter information based on categories or time. It's like a special label that helps organize and analyze data. + +In a data platform, dimensions is part of a larger structure called a semantic model. It's created along with other elements like [entities](/docs/build/entities) and [measures](/docs/build/measures), and used to add more details to your data that can't be easily added up or combined. In SQL, dimensions is typically included in the `dimensions` clause of your SQL query. + + + +Refer to the following semantic model example: + +```yaml +semantic_models: + - name: transactions + description: A record for every transaction that takes place. Carts are considered multiple transactions for each SKU. + model: {{ ref("fact_transactions") }} + default: + agg_time_dimension: metric_time +# --- entities --- + entities: + ... + +# --- measures --- + measures: + ... +# --- dimensions --- + dimensions: + - name: metric_time + type: time + expr: date_trunc('day', ts) + - name: is_bulk_transaction + type: categorical + expr: case when quantity > 10 then true else false end +``` + +All dimensions require a `name`, `type` and in most cases, an `expr` parameter. + +| Name | Parameter | Field type | +| --- | --- | --- | +| `name` | Refers to the name of the group that will be visible to the user in downstream tools. It can also serve as an alias if the column name or SQL query reference is different and provided in the `expr` parameter.

— dimensions names should be unique within a semantic model, but they can be non-unique across different models as MetricFlow uses [joins](/docs/build/join-logic) to identify the right dimension. | Required | +| `type` | Specifies the type of group created in the semantic model. There are three types:

— Categorical: Group rows in a table by categories like geography, product type, color, and so on.
— Time: Point to a date field in the data platform, and must be of type TIMESTAMP or equivalent in the data platform engine.
— Slowly-changing dimensions: Analyze metrics over time and slice them by groups that change over time, like sales trends by a customer's country. | Required | +| `expr` | Defines the underlying column or SQL query for a dimension. If no `expr` is specified, MetricFlow will use the column with the same name as the group. You can use column name itself to input a SQL expression. | Optional | + +## Dimensions types + +Dimensions has three types. This section further explains the definitions and provides examples. + +1. [Categorical](#categorical) +1. [Time](#time) +1. [Slowly changing](#scd-type-ii) + +### Categorical + +Categorical is used to group metrics by different categories such as product type, color, or geographical area. They can refer to existing columns in your dbt model or be calculated using a SQL expression with the `expr` parameter. An example of a category dimension is `is_bulk_transaction`, which is a group created by applying a case statement to the underlying column `quantity`. This allows users to group or filter the data based on bulk transactions. + +```yaml +dimensions: + - name: is_bulk_transaction + type: categorical + expr: case when quantity > 10 then true else false end +``` + +### Time + +Time has additional parameters specified under the `type_params` section. + +:::tip use datetime data type if using BigQuery +To use BigQuery as your data platform, time dimensions columns need to be in the datetime data type. If they are stored in another type, you can cast them to datetime using the `expr` property. Time dimensions are used to group metrics by different levels of time, such as day, week, month, quarter, and year. MetricFlow supports these granularities, which can be specified using the `time_granularity` parameter. +::: + + + + + +To specify the default time dimensions for a measure or metric in MetricFlow, set the `is_primary` parameter to True. If you have multiple time dimensions in your semantic model, the non-primary ones should have `is_primary` set to False. To assign a non-primary time dimensions to a measure, use the `agg_time_dimension` parameter and refer to the time dimensions defined in the section. + +In the provided example, the semantic model has two time groups, `created_at` and `deleted_at`, with `created_at` being the primary time dimensions through `is_primary: True`. The `users_created` measure defaults to the primary time dimensions, while the `users_deleted` measure uses `deleted_at` as its time group. + +```yaml +dimensions: + - name: created_at + type: time + expr: date_trunc('day', ts_created) #ts_created is the underlying column name from the table + is_partition: True + type_params: + is_primary: True + time_granularity: day + - name: deleted_at + type: time + expr: date_trunc('day', ts_deleted) #ts_deleted is the underlying column name from the table + is_partition: True + type_params: + is_primary: False + time_granularity: day + +measures: + - name: users_deleted + expr: 1 + agg: sum + agg_time_dimension: deleted_at + - name: users_created + expr: 1 + agg: sum +``` + +When querying one or more metrics in MetricFlow using the CLI, the default time dimensions for a single metric is the primary time dimension, which can be referred to as metric_time or the dimensions's name. Multiple time groups can be used in separate metrics, such as users_created which uses created_at, and users_deleted which uses deleted_at. + + ``` + mf query --metrics users_created,users_deleted --dimensions metric_time --order metric_time + ``` + + + + + +`time_granularity` specifies the smallest level of detail that a measure or metric should be reported at, such as daily, weekly, monthly, quarterly, or yearly. Different granularity options are available, and each metric must have a specified granularity. For example, a metric that is specified with weekly granularity couldn't be aggregated to a daily grain. + +The current options for time granularity are day, week, month, quarter, and year. + +Aggregation between metrics with different granularities is possible, with the Semantic Layer returning results at the highest granularity by default. For example, when querying two metrics with daily and monthly granularity, the resulting aggregation will be at the monthly level. + +```yaml +dimensions: + - name: created_at + type: time + expr: date_trunc('day', ts_created) #ts_created is the underlying column name from the table + is_partition: True + type_params: + is_primary: True + time_granularity: day + - name: deleted_at + type: time + expr: date_trunc('day', ts_deleted) #ts_deleted is the underlying column name from the table + is_partition: True + type_params: + is_primary: False + time_granularity: day + +measures: + - name: users_deleted + expr: 1 + agg: sum + agg_time_dimension: deleted_at + - name: users_created + expr: 1 + agg: sum +``` + + + + + +Use `is_partition: True` to indicate that a dimension exists over a specific time window. For example, a date-partitioned dimensional table. When you query metrics from different tables, the Semantic Layer will use this parameter to ensure that the correct dimensional values are joined to measures. + +In addition, MetricFlow allows for easy aggregation of metrics at query time. For example, you can aggregate the `messages_per_month` measure, where the original `time_granularity` of the time dimensions `metrics_time`, at a yearly granularity by specifying it in the query in the CLI. + +``` +mf query --metrics messages_per_month --dimensions metric_time --order metric_time --time-granularity year +``` + + +```yaml +dimensions: + - name: created_at + type: time + expr: date_trunc('day', ts_created) #ts_created is the underlying column name from the table + is_partition: True + type_params: + is_primary: True + time_granularity: day + - name: deleted_at + type: time + expr: date_trunc('day', ts_deleted) #ts_deleted is the underlying column name from the table + is_partition: True + type_params: + is_primary: False + time_granularity: day + +measures: + - name: users_deleted + expr: 1 + agg: sum + agg_time_dimension: deleted_at + - name: users_created + expr: 1 + agg: sum +``` + + + + + + +### SCD Type II + +:::caution +Currently, there are limitations in supporting SCD's. +::: + +MetricFlow, supports joins against dimensions values in a semantic model built on top of an SCD Type II table (slowly changing dimension) Type II table. This is useful when you need a particular metric sliced by a group that changes over time, such as the historical trends of sales by a customer's country. + +As their name suggests SCD Type II are groups that change values at a coarser time granularity. This results in a range of valid rows with different dimensions values for a given metric or measure. MetricFlow associates the metric with the first (minimum) available dimensions value within a coarser time window, such as month. By default, MetricFlow uses the group that is valid at the beginning of the time granularity. + +The following basic structure of an SCD Type II data platform table is supported: + +| entity_key | dimensions_1 | dimensions_2 | ... | dimensions_x | valid_from | valid_to | +|------------|-------------|-------------|-----|-------------|------------|----------| + +* `entity_key` (required): An entity_key (or some sort of identifier) must be present +* `valid_from` (required): A timestamp indicating the start of a changing dimensions value must be present +* `valid_to` (required): A timestamp indicating the end of a changing dimensions value must be present + +**Note**: The SCD dimensions table must have `valid_to` and `valid_from` columns. + +This is an example of SQL code that shows how a sample metric called `num_events` is joined with versioned dimensions data (stored in a table called `scd_dimensions`) using a natural key made up of the `entity_key` and `timestamp` columns. + + +```sql +select metric_time, dimensions_1, sum(1) as num_events +from events a +left outer join scd_dimensions b +on + a.entity_key = b.entity_key + and a.metric_time >= b.valid_from + and (a.metric_time < b. valid_to or b.valid_to is null) +group by 1, 2 +``` + + + + + +This example shows how to create slowly changing dimensions (SCD) using a semantic model. The SCD table contains information about sales persons' tier and the time length of that tier. Suppose you have the underlying SCD table: + +| sales_person_id | tier | start_date | end_date | +|-----------------|------|------------|----------| +| 111 | 1 | 2019-02-03 | 2020-01-05| +| 111 | 2 | 2020-01-05 | 2048-01-01| +| 222 | 2 | 2020-03-05 | 2048-01-01| +| 333 | 2 | 2020-08-19 | 2021-10-22| +| 333 | 3 | 2021-10-22 | 2048-01-01| + +Take note of the extra arguments under `validity_params`: `is_start` and `is_end`. These arguments indicate the columns in the SCD table that contain the start and end dates for each tier (or beginning or ending timestamp column for a dimensional value). + +```yaml +semantic_models: + - name: sales_person_tiers + description: SCD Type II table of tiers for sales people + model: {{ref(sales_person_tiers)}} + default: + agg_time_dimension: tier_start + + dimensions: + - name: tier_start + type: time + expr: start_date + type_params: + time_granularity: day + validity_params: + is_start: True + - name: tier_end + type: time + expr: end_date + type_params: + time_granularity: day + validity_params: + is_end: True + - name: tier + type: categorical + + entities: + - name: sales_person + type: natural + expr: sales_person_id +``` + +The following code represents a separate semantic model that holds a fact table for `transactions`: + +```yaml +semantic_models: + - name: transactions + description: | + Each row represents one transaction. + There is a transaction, product, sales_person, and customer id for + every transaction. There is only one transaction id per + transaction. The `metric_time` or date is reflected in UTC. + model: {{ ref(fact_transactions) }} + default: + agg_time_dimension: metric_time + + entities: + - name: transaction_id + type: primary + - name: customer + type: foreign + expr: customer_id + - name: product + type: foreign + expr: product_id + - name: sales_person + type: foreign + expr: sales_person_id + + measures: + - name: transactions + expr: 1 + agg: sum + - name: gross_sales + expr: sales_price + agg: sum + - name: sales_persons_with_a_sale + expr: sales_person_id + agg: count_distinct + + dimensions: + - name: metric_time + type: time + is_partition: true + type_params: + time_granularity: day + - name: sales_geo + type: categorical +``` + +You can now access the metrics in the `transactions` semantic model organized by the slowly changing dimension of `tier`. + +In the sales tier example, For instance, if a salesperson was Tier 1 from 2022-03-01 to 2022-03-12, and gets promoted to Tier 2 from 2022-03-12 onwards, all transactions from March would be categorized under Tier 1 since the dimensions value of Tier 1 comes earlier (and is the default starting point), even though the salesperson was promoted to Tier 2 on 2022-03-12. + + + + + +This example shows how to create slowly changing dimensions (SCD) using a semantic model. The SCD table contains information about sales persons' tier and the time length of that tier. Suppose you have the underlying SCD table: + +| sales_person_id | tier | start_date | end_date | +|-----------------|------|------------|----------| +| 111 | 1 | 2019-02-03 | 2020-01-05| +| 111 | 2 | 2020-01-05 | 2048-01-01| +| 222 | 2 | 2020-03-05 | 2048-01-01| +| 333 | 2 | 2020-08-19 | 2021-10-22| +| 333 | 3 | 2021-10-22 | 2048-01-01| + +In the sales tier example, if sales_person_id 456 is Tier 2 from 2022-03-08 onwards, but there is no associated tier level dimension for this person from 2022-03-01 to 2022-03-08, then all transactions associated with sales_person_id 456 for the month of March will be grouped under 'NA' since no tier is present prior to Tier 2. + +The following command or code represents how to return the count of transactions generated by each sales tier per month: + +``` +mf query --metrics transactions --dimensions metric_time__month,sales_person__tier --order metric_time__month --order sales_person__tier + +``` + + + diff --git a/website/docs/docs/build/entities.md b/website/docs/docs/build/entities.md new file mode 100644 index 00000000000..1e7f2ff878d --- /dev/null +++ b/website/docs/docs/build/entities.md @@ -0,0 +1,45 @@ +--- +title: Entities +id: entities +description: "Entities are real-world concepts that correspond to key parts of your business, such as customers, transactions, and ad campaigns." +sidebar_label: "Entities" +tags: [Metrics, Semantic Layer] +--- + +Entities are real-world concepts in a business such as customers, transactions, and ad campaigns. We often focus our analyses around specific entities, such as customer churn or annual recurring revenue modeling. We represent entities in our semantic models using id columns that serve as join keys to other semantic models in your semantic graph. + +Within a semantic graph, the required parameters for an entity are `name` and `type`. The `name` refers to either the key column name from the underlying data table, or it may serve as an alias with the column name referenced in the `expr` parameter. + +Entities can be specified with a single column or multiple columns. Entities (join keys) in a semantic model are identified by their name. Each entity name must be unique within a semantic model, but it doesn't have to be unique across different semantic models. + +There are four entity types: primary, foreign, unique, or natural. + +:::tip Use entities as a dimensions +You can also use entities as a dimensions, which allows you to aggregate a metric to the granularity of that entity. +::: + + +## Entity types + +MetricFlow's join logic depends on the entity `type` you use, and it also determines how to join semantic models. Refer to [Joins](/docs/build/join-logic) for more info on how to construct joins. + +* **Primary —** A primary key has **only one** record for each row in the table, and it includes every record in the data platform. +* **Unique —** A unique key contains **only one** record per row in the table, but it may have a subset of records in the data warehouse. It can also include nulls. +* **Foreign —** A foreign key can include zero, one, or multiple instances of the same record. Null values may also be present. +* **Natural —** Natural keys are column or combination of columns in a table that uniquely identify a record based on real-world data. For instance, in a sales_person_department dimension table, the sales_person_id can serve as a natural key. + +Here's an example of how to define entities in a semantic model: + +``` yaml +entities: + - name: transaction + type: primary + expr: id_transaction + - name: order + type: foreign + expr: id_order + - name: user + type: foreign + expr: substring(id_order from 2) +``` + diff --git a/website/docs/docs/build/incremental-models.md b/website/docs/docs/build/incremental-models.md index 5059caca2e1..15b24520711 100644 --- a/website/docs/docs/build/incremental-models.md +++ b/website/docs/docs/build/incremental-models.md @@ -4,8 +4,6 @@ description: "Read this tutorial to learn how to use incremental models when bui id: "incremental-models" --- -## Overview - Incremental models are built as tables in your . The first time a model is run, the is built by transforming _all_ rows of source data. On subsequent runs, dbt transforms _only_ the rows in your source data that you tell dbt to filter for, inserting them into the target table which is the table that has already been built. Often, the rows you filter for on an incremental run will be the rows in your source data that have been created or updated since the last time dbt ran. As such, on each dbt run, your model gets built incrementally. @@ -255,15 +253,37 @@ to build incremental models. Click the name of the adapter in the below table for more information about supported incremental strategies. -| data platform adapter | default strategy | additional supported strategies | -| :----------------------------------------------------------------------------------------------- | -------- | ------------------------------------------------------- | -| dbt-postgres | `append` | `delete+insert` | -| dbt-redshift | `append` | `delete+insert` | -| [dbt-bigquery](/reference/resource-configs/bigquery-configs#merge-behavior-incremental-models) | `merge` | `insert_overwrite` | -| [dbt-spark](/reference/resource-configs/spark-configs#incremental-models) | `append` | `merge` (Delta only) `insert_overwrite` | -| [dbt-databricks](/reference/resource-configs/databricks-configs#incremental-models) | `append` | `merge` (Delta only) `insert_overwrite` | -| [dbt-snowflake](/reference/resource-configs/snowflake-configs#merge-behavior-incremental-models) | `merge` | `append`, `delete+insert` | -| [dbt-trino](/reference/resource-configs/trino-configs#incremental) | `append` | `merge` `delete+insert` | +The `merge` strategy is available in dbt-postgres and dbt-redshift beginning in dbt v1.6. + + + + +| data platform adapter | default strategy | additional supported strategies | +| :-------------------| ---------------- | -------------------- | +| [dbt-postgres](/reference/resource-configs/postgres-configs#incremental-materialization-strategies) | `append` | `delete+insert` | +| [dbt-redshift](/reference/resource-configs/redshift-configs#incremental-materialization-strategies) | `append` | `delete+insert` | +| [dbt-bigquery](/reference/resource-configs/bigquery-configs#merge-behavior-incremental-models) | `merge` | `insert_overwrite` | +| [dbt-spark](/reference/resource-configs/spark-configs#incremental-models) | `append` | `merge` (Delta only) `insert_overwrite` | +| [dbt-databricks](/reference/resource-configs/databricks-configs#incremental-models) | `append` | `merge` (Delta only) `insert_overwrite` | +| [dbt-snowflake](/reference/resource-configs/snowflake-configs#merge-behavior-incremental-models) | `merge` | `append`, `delete+insert` | +| [dbt-trino](/reference/resource-configs/trino-configs#incremental) | `append` | `merge` `delete+insert` | + + + + + + +| data platform adapter | default strategy | additional supported strategies | +| :----------------- | :----------------| : ---------------------------------- | +| [dbt-postgres](/reference/resource-configs/postgres-configs#incremental-materialization-strategies) | `append` | `merge` , `delete+insert` | +| [dbt-redshift](/reference/resource-configs/redshift-configs#incremental-materialization-strategies) | `append` | `merge`, `delete+insert` | +| [dbt-bigquery](/reference/resource-configs/bigquery-configs#merge-behavior-incremental-models) | `merge` | `insert_overwrite` | +| [dbt-spark](/reference/resource-configs/spark-configs#incremental-models) | `append` | `merge` (Delta only) `insert_overwrite` | +| [dbt-databricks](/reference/resource-configs/databricks-configs#incremental-models) | `append` | `merge` (Delta only) `insert_overwrite` | +| [dbt-snowflake](/reference/resource-configs/snowflake-configs#merge-behavior-incremental-models) | `merge` | `append`, `delete+insert` | +| [dbt-trino](/reference/resource-configs/trino-configs#incremental) | `append` | `merge` `delete+insert` | + + diff --git a/website/docs/docs/build/jinja-macros.md b/website/docs/docs/build/jinja-macros.md index 3c0f6f7ad71..5b0df69e898 100644 --- a/website/docs/docs/build/jinja-macros.md +++ b/website/docs/docs/build/jinja-macros.md @@ -140,7 +140,7 @@ select field_5, count(*) from my_table -{{ dbt_utils.group_by(5) }} +{{ dbt_utils.dimensions(5) }} ``` diff --git a/website/docs/docs/build/join-logic.md b/website/docs/docs/build/join-logic.md new file mode 100644 index 00000000000..eb4e02ed423 --- /dev/null +++ b/website/docs/docs/build/join-logic.md @@ -0,0 +1,145 @@ +--- +title: Joins +id: join-logic +description: "Joins allow you to combine data from different tables and create new metrics" +sidebar_label: "Joins" +tags: [Metrics, Semantic Layer] +--- + +Joins are a powerful part of MetricFlow and simplify the process of making all valid dimensions available for your metrics at query time, regardless of where they are defined in different semantic models. With Joins, you can also create metrics using measures from different semantic models. + +Joins use `entities` defined in your semantic model configs as the join keys between tables. Assuming entities are defined in the semantic model, MetricFlow creates a graph using the semantic models as nodes and the join paths as edges to perform joins automatically. MetricFlow chooses the appropriate join type and avoids fan-out or chasm joins with other tables based on the entity types. + +
+ What are fan-out or chasm joins? +
+
— Fan-out joins are when one row in a table is joined to multiple rows in another table, resulting in more output rows than input rows.

+ — Chasm joins are when two tables have a many-to-many relationship through an intermediate table, and the join results in duplicate or missing data.
+
+
+ + +## Types of joins + +:::tip Joins are auto-generated +MetricFlow automatically generates the necessary joins to the defined semantic objects, eliminating the need for you to create new semantic models or configuration files. + +This document explains the different types of joins that can be used with entities and how to query them using the CLI. +::: + +MetricFlow primarily uses left joins for joins, and restricts the use of fan-out and chasm joins. Refer to the table below to identify which joins are or aren't allowed based on specific entity types to prevent the creation of risky joins. + +| entity type - Table A | entity type - Table B | Join type | +|---------------------------|---------------------------|----------------------| +| Primary | Primary | ✅ Left | +| Primary | Unique | ✅ Left | +| Primary | Foreign | ❌ Fan-out (Not allowed) | +| Unique | Primary | ✅ Left | +| Unique | Unique | ✅ Left | +| Unique | Foreign | ❌ Fan-out (Not allowed) | +| Foreign | Primary | ✅ Left | +| Foreign | Unique | ✅ Left | +| Foreign | Foreign | ❌ Fan-out (Not allowed) | + +### Example + +The following example uses two semantic models with a common entity and shows a MetricFlow query that requires a join between the two semantic models. + +Let's say you have two semantic models, `transactions` and `user_signup` as seen in the following example: + +```yaml +semantic_models: + - name: transactions + entities: + - name: id + type: primary + - name: user + type: foreign + expr: user_id + measures: + - name: average_purchase_price + agg: avg + expr: purchase_price + - name: user_signup + entities: + - name: user + type: primary + expr: user_id + dimensions: + - name: type + type: categorical +``` + +MetricFlow will use `user_id` as the join key to join two semantic models, `transactions` and `user_signup`. This enables you to query the `average_purchase_price` metric in `transactions`, sliced by the `type` dimension in the `user_signup` semantic model. + +Note that the `average_purchase_price` measure is defined in the `transactions` semantic model, where `user_id` is a foreign entity. However, the `user_signup` semantic model has `user_id` as a primary entity. + +Since this is a foreign-to-primary relationship, a left join is implemented where the `transactions` semantic model joins the `user_signup` semantic model, since the `average_purchase_price` measure is defined in the `transactions` semantic model. + +When querying dimensions from different semantic models using the CLI, a double underscore (or dunder) is added to the dimension name after the joining entity. In the CLI query shown below, `user_id__type` is included as a `dimension`. + +```yaml +mf query --metrics average_purchase_price --dimensions metric_time,user_id__type +``` + +## Multi-hop joins + +:::info +This feature is currently in development and not currently available. +::: + +MetricFlow allows users to join measures and dimensions across a graph of entities, which we refer to as a 'multi-hop join.' This is because users can move from one table to another like a 'hop' within a graph. + +Here's an example schema for reference: + +![Multi-Hop-Join](/img/docs/building-a-dbt-project/multihop-diagram.png) + +Notice how this schema can be translated into the three MetricFlow semantic models below to create the metric 'Average purchase price by country' using the `purchase_price` measure from the sales table and the `country_name` dimension from the `country_dim` table. + +```yaml +semantic_models: + - name: sales + defaults: + agg_time_dimension: first_ordered_at + entities: + - name: id + type: primary + - name: user_id + type: foreign + measures: + - name: average_purchase_price + agg: avg + expr: purchase_price + dimensions: + - name: metric_time + type: time + type_params: + is_primary: true + - name: user_signup + entities: + - name: user_id + type: primary + - name: country_id + type: Unique + dimensions: + - name: signup_date + type: time + - name: country_dim + entities: + - name: country_id + type: primary + dimensions: + - name: country_name + type: categorical +``` + +### Query multi-hop joins + +:::info +This feature is currently in development and not currently available. +::: + +To query dimensions _without_ a multi-hop join involved, you can use the fully qualified dimension name with the syntax entity double underscore (dunder) dimension, like `entity__dimension`. + +For dimensions retrieved by a multi-hop join, you need to additionally provide the entity path as a list, like `user_id`. + diff --git a/website/docs/docs/build/measures.md b/website/docs/docs/build/measures.md new file mode 100644 index 00000000000..c96761ecb1b --- /dev/null +++ b/website/docs/docs/build/measures.md @@ -0,0 +1,221 @@ +--- +title: Measures +id: measures +description: "Measures are aggregations performed on columns in your model." +sidebar_label: "Measures" +tags: [Metrics, Semantic Layer] +--- + +Measures are aggregations performed on columns in your model. They can be used as final metrics or serve as building blocks for more complex metrics. Measures have several inputs, which are described in the following table along with their field types. + +| Parameter | Description | Field type | +| --- | --- | --- | +| [`name`](#name) | Provide a name for the measure, which must be unique and can't be repeated across all semantic models in your dbt project. | Required | +| [`description`](#description) | Describes the calculated measure. | Optional | +| [`agg`](#aggregation) | dbt supports the following aggregations: `sum`, `max`, `min`, `count_distinct`, and `sum_boolean`. | Required | +| [`expr`](#expr) | You can either reference an existing column in the table or use a SQL expression to create or derive a new one. | Optional | +| [`non_additive_dimension`](#non-additive-dimensions) | Non-additive dimensions can be specified for measures that cannot be aggregated over certain dimensions, such as bank account balances, to avoid producing incorrect results. | Optional | + +### Name + +When you create a measure, you can either give it a custom name or use the `name` of the data platform column directly. If the `name` of the measure is different from the column name, you need to add an `expr` to specify the column name. The `name` of the measure is used when creating a metric. + +Measure names must be **unique** across all semantic models in a project. + +### Description + +The description describes the calculated measure. It's strongly recommended you create verbose and human-readable descriptions in this field. + +### Aggregation + +The aggregation determines how the field will be aggregated. For example, a `sum` aggregation type over a granularity of `day` would sum the values across a given day. + +Supported aggregations include: + +| Aggregation types | Description | +|-------------------|--------------------------| +| sum | Sum across the values | +| min | Minimum across the values| +| max | Maximum across the values| +| average | Average across the values | +| sum_boolean | A sum for a boolean type | +| count_distinct | Distinct count of values | +| median | Median (p50) calculation across the values | +| percentile | Percentile calculation across the values | + + +### Expr + +If the `name` you specified for a measure doesn't match a column name in your model, you can use the `expr` parameter instead. This allows you to use any valid SQL to manipulate an underlying column name into a specific output. The `name` parameter then serves as an alias for your measure. + +**Notes**: When using SQL functions in the `expr` parameter, **always use data platform-specific SQL**. This is because outputs may differ depending on your specific data platform. + +:::tip For Snowflake users +For Snowflake users, if you use a week-level function in the `expr` parameter, it'll now return Monday as the default week start day based on ISO standards. If you have any account or session level overrides for the `WEEK_START` parameter that fix it to a value other than 0 or 1, you will still see Monday as the week start. + +If you use the `dayofweek` function in the `expr` parameter with the legacy Snowflake default of `WEEK_START = 0`, it will now return ISO-standard values of 1 (Monday) through 7 (Sunday) instead of Snowflake's legacy default values of 0 (Monday) through 6 (Sunday). +::: + + +### Model with different aggregations + +```yaml +semantic_models: + - name: transactions + description: A record for every transaction that takes place. Carts are considered multiple transactions for each SKU. + model: ref('schema.transactions') + default: + agg_time_dimensions: + +# --- entities --- + entities: + - name: transaction_id + type: primary + - name: customer_id + type: foreign + - name: store_id + type: foreign + - name: product_id + type: foreign + + # --- measures --- + measures: + - name: transaction_amount_usd + description: Total USD value of transactions + expr: transaction_amount_usd + agg: sum + - name: transaction_amount_usd_avg + description: Average USD value of transactions + expr: transaction_amount_usd + agg: average + - name: transaction_amount_usd_max + description: Maximum USD value of transactions + expr: transaction_amount_usd + agg: max + - name: transaction_amount_usd_min + description: Minimum USD value of transactions + expr: transaction_amount_usd + agg: min + - name: quick_buy_transactions + description: The total transactions bought as quick buy + expr: quick_buy_flag + agg: sum_boolean + - name: distinct_transactions_count + description: Distinct count of transactions + expr: transaction_id + agg: count_distinct + - name: transactions + description: The average value of transactions + expr: transaction_amount_usd + agg: average + - name: transactions_amount_usd_valid #Notice here how we use expr to compute the aggregation based on a condition + description: The total USD value of valid transactions only + expr: CASE WHEN is_valid = True then 1 else 0 end + agg: sum + - name: transactions + description: The average value of transactions. + expr: transaction_amount_usd + agg: average + - name: p99_transaction_value + description: The 99th percentile transaction value + expr: transaction_amount_usd + agg: percentile + agg_params: + percentile: .99 + use_discrete_percentile: False #False will calculate the discrete percentile and True will calculate the continuous percentile + - name: median_transaction_value + description: The median transaction value + expr: transaction_amount_usd + agg: median + +# --- dimensions --- + dimensions: + - name: metric_time + type: time + expr: date_trunc('day', ts) #expr refers to underlying column ts + type_params: + time_granularity: day + - name: is_bulk_transaction + type: categorical + expr: case when quantity > 10 then true else false end + +``` + +### Non-additive dimensions + +Some measures cannot be aggregated over certain dimensions, like time, because it could result in incorrect outcomes. Examples include bank account balances where it does not make sense to carry over balances month-to-month, and monthly recurring revenue where daily recurring revenue cannot be summed up to achieve monthly recurring revenue. You can specify non-additive dimensions to handle this, where certain dimensions are excluded from aggregation. + +To demonstrate the configuration for non-additive measures, consider a subscription table that includes one row per date of the registered user, the user's active subscription plan(s), and the plan's subscription value (revenue) with the following columns: + +- `date_transaction`: The daily date-spine. +- `user_id`: The ID pertaining to the registered user. +- `subscription_plan`: A column to indicate the subscription plan ID. +- `subscription_value`: A column to indicate the monthly subscription value (revenue) of a particular subscription plan ID. + +Parameters under the `non_additive_dimension` will specify dimensions that the measure should not be aggregated over. + +| Parameter | Description | Field type | +| --- | --- | --- | +| `name`| This will be the name of the time dimension (that has already been defined in the data source) that the measure should not be aggregated over. | Required | +| `window_choice` | Choose either `min` or `max`, where `min` reflects the beginning of the time period and `max` reflects the end of the time period. | Required | +| `window_groupings` | Provide the entities that you would like to group by. | Optional | + + +```yaml +semantic_models: + - name: subscription_table + description: A subscription table with one row per date for each active user and their subscription plans. + model: ref('your_schema.subscription_table') + default: + agg_time_dimension: metric_time + + entities: + - name: user_id + type: foreign + + dimensions: + - name: metric_time + type: time + expr: date_transaction + type_params: + is_primary: True + time_granularity: day + + measures: + - name: count_users_end_of_month + description: Count of users at the end of the month + expr: 1 + agg: sum + non_additive_dimension: + name: metric_time + window_choice: min + - name: mrr_end_of_month + description: Aggregate by summing all users active subscription plans at end of month + expr: subscription_value + agg: sum + non_additive_dimension: + name: metric_time + window_choice: max + - name: mrr_by_user_end_of_month + description: Group by user_id to achieve each users MRR at the end of the month + expr: subscription_value + agg: sum + non_additive_dimension: + name: metric_time + window_choice: max + window_groupings: + - user_id +--- +metrics: + - name: mrr_end_of_month + type: simple + type_params: + measure: mrr_end_of_month +``` + +We can query the semi-additive metrics using the following syntax: + +```bash +mf query --metrics mrr_by_end_of_month --dimensions metric_time__month --order metric_time__month +mf query --metrics mrr_by_end_of_month --dimensions metric_time__week --order metric_time__week +``` diff --git a/website/docs/docs/build/metricflow-time-spine.md b/website/docs/docs/build/metricflow-time-spine.md new file mode 100644 index 00000000000..607df692bc9 --- /dev/null +++ b/website/docs/docs/build/metricflow-time-spine.md @@ -0,0 +1,32 @@ +--- +title: MetricFlow time spine +id: metricflow-time-spine +description: "MetricFlow expects a default timespine table called metricflow_time_spine" +sidebar_label: "MetricFlow time spine" +tags: [Metrics, Semantic Layer] +--- + +MetricFlow uses a timespine table to construct cumulative metrics. By default, MetricFlow expects the timespine table to be named `metricflow_time_spine` and doesn't support using a different name. + +To create this table, you need to create a model in your dbt project called `metricflow_time_spine` and add the following code: + +```sql +-- metricflow_time_spine.sql +with days as ( + {{dbt_utils.date_spine('day' + , "to_date('01/01/2000','mm/dd/yyyy')" + , "to_date('01/01/2027','mm/dd/yyyy')" + ) + }} +), + +final as ( + select cast(date_day as date) as date_day + from days +) + +select * +from final +``` + +You only need to include the `date_day` column in the table. MetricFlow can handle broader levels of detail, but it doesn't currently support finer grains. diff --git a/website/docs/docs/build/metrics-overview.md b/website/docs/docs/build/metrics-overview.md new file mode 100644 index 00000000000..e7271ecf417 --- /dev/null +++ b/website/docs/docs/build/metrics-overview.md @@ -0,0 +1,151 @@ +--- +title: Creating metrics +id: metrics-overview +description: "Metrics can be defined in the same or separate YAML files from semantic models within the same dbt project repo." +sidebar_label: "Creating metrics" +tags: [Metrics, Semantic Layer] +--- + +Once you've created your semantic models, it's time to start adding metrics! Metrics can be defined in the same YAML files as your semantic models, or split into separate YAML files into any other subdirectories (provided that these subdirectories are also within the same dbt project repo) + +The keys for metrics definitions are: + +* `name`: Provide the reference name for the metric. This name must be unique amongst all metrics. +* `type`: Define the type of metric, which can be a measure (`simple`) or ratio (`ratio`)). +* `type_params`: Additional parameters used to configure metrics. `type_params` are different for each metric type. +* `constraint`: For any type of metric, you may optionally include a constraint string, which applies a dimensional filter when computing the metric. You may think of this as your WHERE clause. +* `meta`: Additional metadata you want to add to your metric. + +This page explains the different supported metric types you can add to your dbt project. + + + +### Derived metrics + +[Derived metrics](/docs/build/derived) are defined as an expression of other metrics. Derived metrics allow you to do calculations on top of metrics. + +```yaml +metrics: + - name: net_sales_per_user + type: derived + type_params: + metrics: + - name: gross_sales # these are all metrics (can be a derived metric, meaning building a derived metric with derived metrics) + - name: cogs + - name: users + filter: is_active # Optional additional constraint + alias: active_users # Optional alias to use in the expr +``` + + +### Ratio metrics + +[Ratio metrics](/docs/build/ratio) involve a numerator measure and a denominator measure. A `constraint` string can be applied, to both numerator and denominator, or applied separately to the numerator or denominator. + +```yaml +# Ratio Metric +metrics: + - name: cancellation_rate + owners: + - support@getdbt.com +# Ratio metrics create a ratio out of two measures. +# Define the measures from the semantic model as numerator or denominator + type: ratio + type_params: + numerator: cancellations_usd + denominator: transaction_amount_usd + filter: | # add optional constraint string. This applies to both the numerator and denominator + {{ dimension('country', entity_path=['customer']) }} = 'MX' + + - name: enterprise_cancellation_rate + owners: + - support@getdbt.com + # Ratio metrics create a ratio out of two measures. + # Define the measures from the semantic model as numerator or denominator + type: ratio + type_params: + numerator: + name: cancellations_usd + filter: tier = 'enterprise' #constraint only applies to the numerator + denominator: transaction_amount_usd + filter: | # add optional constraint string. This applies to both the numerator and denominator + {{ dimension('country', entity_path=['customer']) }} = 'MX' + +``` +### Simple metrics + +[Simple metrics](/docs/build/simple) point directly to a measure. You may think of it as a function that takes only one measure as the input. + + +```yaml +metrics: +# Define the reference name of the metric. +# This name must be unique amongst metrics and can include lowercase letters, numbers, and underscores. +# This name is used to call the metric from the dbt Semantic Layer API. + - name: cancellations + type: simple + type_params: + # Specify the measure you are creating a proxy for. + measure: cancellations_usd + filter: | + {{dimension('value')}} > 100 and {{dimension('acquisition', entity_path=['user'])}} +``` + +### Further configuration + +You can set more metadata for your metrics, which can be used by other tools later on. The way this metadata is used will vary based on the specific integration partner + +- **Description** — Write a detailed description of the metric. + + + + +## Related docs + +- [Semantic models](/docs/build/semantic-models) +- [Derived](/docs/build/derived) + + + diff --git a/website/docs/docs/build/metrics.md b/website/docs/docs/build/metrics.md index 3f869c42bcf..c43e2a86915 100644 --- a/website/docs/docs/build/metrics.md +++ b/website/docs/docs/build/metrics.md @@ -1,15 +1,28 @@ --- -title: "Add metrics to your DAG" -sidebar_label: "Metrics" +title: "Metrics" id: "metrics" description: "When you define metrics in dbt projects, you encode crucial business logic in tested, version-controlled code. The dbt metrics layer helps you standardize metrics within your organization." keywords: - dbt metrics layer --- -:::info Coming soon -The dbt Semantic Layer is undergoing some sophisticated changes, enabling more complex metric definitions and efficient querying. As part of these changes, the dbt_metrics package will be deprecated and replaced with MetricFlow. For more info, check out the [The dbt Semantic Layer: what's next?](https://www.getdbt.com/blog/dbt-semantic-layer-whats-next/) and [dbt_metrics deprecation](https://docs.getdbt.com/blog/deprecating-dbt-metrics) blog. + + +:::info dbt Metrics isn't supported + +dbt Metrics is no longer supported in v1.6 and higher. To build your semantic layer, define and query metrics, and provide data governance - refer to [Build your Semantic Layer](/docs/build/build-metrics-intro) for updated guidance. + ::: + + + + +:::info dbt Metrics not recommended + +dbt Metrics won't be supported in v1.6 and higher, and is being replaced with MetricFlow. [Defining metrics](/docs/build/build-semantic-layer-intro) with MetricFlow will help shape the future of the dbt Semantic Layer — let us know [your thoughts and join the convo](https://github.com/dbt-labs/dbt-core/discussions/7456) to help build it! + +::: + @@ -17,14 +30,13 @@ The dbt Semantic Layer is undergoing some sophisticated changes, enabling more c * **v1.0.0**: Metrics are new and experimental - -## About Metrics + A metric is an aggregation over a that supports zero or more dimensions. Some examples of metrics include: - active users - monthly recurring revenue (mrr) -In v1.0, dbt supports metric definitions as a new node type. Like [exposures](/docs/build/exposures), metrics appear as nodes in the directed acyclic graph (DAG) and can be expressed in YAML files. Defining metrics in dbt projects encodes crucial business logic in tested, version-controlled code. Further, you can expose these metrics definitions to downstream tooling, which drives consistency and precision in metric reporting. +In v1.0, dbt supports metric definitions as a new node type. Like [exposures](exposures), metrics appear as nodes in the directed acyclic graph (DAG) and can be expressed in YAML files. Defining metrics in dbt projects encodes crucial business logic in tested, version-controlled code. Further, you can expose these metrics definitions to downstream tooling, which drives consistency and precision in metric reporting. Review the video below to learn more about metrics, why they're important, and how to get started: @@ -33,10 +45,10 @@ Review the video below to learn more about metrics, why they're important, and h ### Benefits of defining metrics **Use metric specifications in downstream tools** -dbt's compilation context can access metrics via the [`graph.metrics` variable](/reference/dbt-jinja-functions/graph). The [manifest artifact](/reference/artifacts/manifest-json) includes metrics for downstream metadata consumption. +dbt's compilation context can access metrics via the [`graph.metrics` variable](graph). The [manifest artifact](manifest-json) includes metrics for downstream metadata consumption. **See and select dependencies** -As with Exposures, you can see everything that rolls up into a metric (`dbt ls -s +metric:*`), and visualize them in [dbt documentation](/docs/collaborate/documentation). For more information, see "[The `metric:` selection method](/reference/node-selection/methods#the-metric-method)." +As with Exposures, you can see everything that rolls up into a metric (`dbt ls -s +metric:*`), and visualize them in [dbt documentation](documentation). For more information, see "[The `metric:` selection method](node-selection/methods#the-metric-method)." @@ -68,7 +80,7 @@ metrics: - name: rolling_new_customers label: New Customers model: ref('dim_customers') - [description](/reference/resource-properties/description): "The 14 day rolling count of paying customers using the product" + [description](description): "The 14 day rolling count of paying customers using the product" calculation_method: count_distinct expression: user_id @@ -99,11 +111,11 @@ metrics: value: "'2020-01-01'" # general properties - [config](/reference/resource-properties/config): + [config](resource-properties/config): enabled: true | false treat_null_values_as_zero: true | false - [meta](/reference/resource-configs/meta): {team: Finance} + [meta](resource-configs/meta): {team: Finance} ``` @@ -700,6 +712,7 @@ The above example will return a dataset that contains the metric provided in the **Important caveat** - You _must_ wrap the `expression` property for `derived` metrics in double quotes to render it. For example, `expression: "{{ metric('develop_metric') }} - 1 "`. +
diff --git a/website/docs/docs/build/project-variables.md b/website/docs/docs/build/project-variables.md index a84bbcb36a9..b6e1b564ec8 100644 --- a/website/docs/docs/build/project-variables.md +++ b/website/docs/docs/build/project-variables.md @@ -17,6 +17,13 @@ Variables can be defined in two ways: ### Defining variables in `dbt_project.yml` + +:::info + +Jinja is not supported within the `vars` config, and all values will be interpreted literally. + +::: + :::info New in v0.17.0 The syntax for specifying vars in the `dbt_project.yml` file has changed in @@ -86,18 +93,32 @@ You can find more information on defining dictionaries with YAML [here](https:// ### Variable precedence -Variables defined with the `--vars` command line argument override variables -defined in the `dbt_project.yml` file. They are globally scoped and will be -accessible to all packages included in the project. +Variables defined with the `--vars` command line argument override variables defined in the `dbt_project.yml` file. They are globally scoped and accessible to the root project and all installed packages. The order of precedence for variable declaration is as follows (highest priority first): + + 1. The variables defined on the command line with `--vars`. -3. The package-scoped variable declaration in the `dbt_project.yml` file -2. The global variable declaration in the `dbt_project.yml` file. +2. The package-scoped variable declaration in the root `dbt_project.yml` file +3. The global variable declaration in the root `dbt_project.yml` file +4. If this node is defined in a package: variable declarations in that package's `dbt_project.yml` file +5. The variable's default argument (if one is provided) + + + + + +1. The variables defined on the command line with `--vars` +2. The package-scoped variable declaration in the root `dbt_project.yml` file +3. The global variable declaration in the root `dbt_project.yml` file 4. The variable's default argument (if one is provided). + + If dbt is unable to find a definition for a variable after checking these four places, then a compilation error will be raised. +**Note:** Variable scope is based on the node ultimately using that variable. Imagine the case where a model defined in the root project is calling a macro defined in an installed package. That macro, in turn, uses the value of a variable. The variable will be resolved based on the _root project's_ scope, rather than the package's scope. + diff --git a/website/docs/docs/build/python-models.md b/website/docs/docs/build/python-models.md index 2211cf78fa9..5b9222ad1c5 100644 --- a/website/docs/docs/build/python-models.md +++ b/website/docs/docs/build/python-models.md @@ -679,6 +679,8 @@ models: submission_method: serverless ``` +Python models running on Dataproc Serverless can be further configured in your [BigQuery profile](/docs/core/connect-data-platform/bigquery-setup#running-python-models-on-dataproc). + Any user or service account that runs dbt Python models will need the following permissions(in addition to the required BigQuery permissions) ([docs](https://cloud.google.com/dataproc/docs/concepts/iam/iam)): ``` dataproc.batches.create diff --git a/website/docs/docs/build/ratio-metrics.md b/website/docs/docs/build/ratio-metrics.md new file mode 100644 index 00000000000..d70815f140d --- /dev/null +++ b/website/docs/docs/build/ratio-metrics.md @@ -0,0 +1,106 @@ +--- +id: ratio +title: "Ratio metrics" +description: "Use ratio metrics to create a ratio out of two measures. " +sidebar_label: Ratio +tags: [Metrics, Semantic Layer] +--- + +Ratio allows you to create a ratio between two measures. You simply specify a numerator and a denominator measure. Additionally, you can apply a dimensional filter to both the numerator and denominator using a constraint string when computing the metric. + +```yaml +# Ratio Metric + metrics: + - name: cancellation_rate + owners: + - support@getdbt.com + type: ratio # Ratio metrics create a ratio out of two measures. Define the measures from the semantic model as numerator or denominator + type_params: + numerator: cancellations_usd + denominator: transaction_amount_usd + filter: | # add optional constraint string. This applies to both the numerator and denominator + {{ dimension('country', entity_path=['customer']) }} = 'MX' + + - name: enterprise_cancellation_rate + owners: + - support@getdbt.com + type: ratio # Ratio metrics create a ratio out of two measures. Define the measures from the semantic model as numerator or denominator + type_params: + numerator: + name: cancellations_usd + filter: tier = 'enterprise' #constraint only applies to the numerator + denominator: transaction_amount_usd + filter: | # add optional constraint string. This applies to both the numerator and denominator + {{ dimension('country', entity_path=['customer']) }} = 'MX' + +``` +### Different semantic models + +If the numerator and denominator in a ratio metric come from different semantic models, the system will compute their values in subqueries and then join the result set based on common dimensions to calculate the final ratio. Here's an example of the generated SQL for such a ratio metric. + + +```SQL +select + subq_15577.metric_time as metric_time + , cast(subq_15577.mql_queries_created_test as double) / cast(nullif(subq_15582.distinct_query_users, 0) as double) as mql_queries_per_active_user +from ( + select + metric_time + , sum(mql_queries_created_test) as mql_queries_created_test + from ( + select + cast(query_created_at as date) as metric_time + , case when query_status in ('PENDING','MODE') then 1 else 0 end as mql_queries_created_test + from prod_dbt.mql_query_base mql_queries_test_src_2552 + ) subq_15576 + group by + metric_time +) subq_15577 +inner join ( + select + metric_time + , count(distinct distinct_query_users) as distinct_query_users + from ( + select + cast(query_created_at as date) as metric_time + , case when query_status in ('MODE','PENDING') then email else null end as distinct_query_users + from prod_dbt.mql_query_base mql_queries_src_2585 + ) subq_15581 + group by + metric_time +) subq_15582 +on + ( + ( + subq_15577.metric_time = subq_15582.metric_time + ) or ( + ( + subq_15577.metric_time is null + ) and ( + subq_15582.metric_time is null + ) + ) + ) +``` + +### Add filter + +Users can define constraints on input measures for a metric by applying a filter directly to the measure, like so: + +```yaml +metrics: + - name: frequent_purchaser_ratio + description: Fraction of active users who qualify as frequent purchasers + owners: + - support@getdbt.com + type: ratio + type_params: + numerator: + name: distinct_purchasers + filter: {{dimension('is_frequent_purchaser')}} + alias: frequent_purchasers + denominator: + name: distinct_purchasers +``` + +Note the `filter` and `alias` parameters for the measure referenced in the numerator. Use the `filter` parameter to apply a filter to the measure it's attached to. The `alias` parameter is used to avoid naming conflicts in the rendered SQL queries when the same measure is used with different filters. If there are no naming conflicts, the `alias` parameter can be left out. diff --git a/website/docs/docs/build/semantic-models.md b/website/docs/docs/build/semantic-models.md new file mode 100644 index 00000000000..043973ac154 --- /dev/null +++ b/website/docs/docs/build/semantic-models.md @@ -0,0 +1,169 @@ +--- +title: "Semantic models" +id: "semantic-models" +description: "Semantic models are yml abstractions on top of a dbt mode, connected via joining keys as edges" +keywords: + - dbt metrics layer +sidebar_label: Semantic models +tags: [Metrics, Semantic Layer] +--- + +Semantic models serve as the foundation for defining data in MetricFlow, which powers the dbt Semantic Layer. You can think of semantic models as nodes in your semantic graph, connected via entities as edges. MetricFlow takes semantic models defined in YAML configuration files as inputs and creates a semantic graph that can be used to query metrics. + +Each semantic model corresponds to a dbt model in your DAG. Therefore you will have one YAML config for each semantic model in your dbt project. You can create multiple semantic models out of a single dbt model, as long as you give each semantic model a unique name. + +You can configure semantic models in your dbt project directory in a `YAML` file. Depending on your project structure, you can nest semantic models under a `metrics:` folder or organize them under project sources. Semantic models have 6 components and this page explains the definitions with some examples: + +1. [Name](#name) — Unique name for the semantic model. +1. [Description](#description) — Includes important details in the description. +1. [Model](#model) — Specifies the dbt model for the semantic model using the `ref` function. +1. [Entities](#entities) — Uses the columns from entities as join keys and indicate their type as primary, foreign, or unique keys with the `type` parameter. +1. [Dimensions](#dimensions) — Different ways to group or slice data for a metric, they can be `time-based` or `categorical`. +1. [Measures](#measures) — Aggregations applied to columns in your data model. They can be the final metric or used as building blocks for more complex metrics. + + +## Semantic models components + +The following example displays a complete configuration and detailed descriptions of each field: + +```yml +semantic_models: + - name: transaction # A semantic model with the name Transactions + model: ref('fact_transactions') # References the dbt model named `fact_transactions` + description: "Transaction fact table at the transaction level. This table contains one row per transaction and includes the transaction timestamp." + default: + agg_time_dimension: transaction_date + + entities: # Entities included in the table are defined here. MetricFlow will use these columns as join keys. + - name: transaction + type: primary + expr: transaction_id + - name: customer + type: foreign + expr: customer_id + + + dimensions: # dimensions are qualitative values such as names, dates, or geographical data. They provide context to metrics and allow "metric by group" data slicing. + - name: transaction_date + type: time + type_params: + time_granularity: day + + - name: transaction_location + type: categorical + expr: order_country + + measures: # Measures are columns we perform an aggregation over. Measures are inputs to metrics. + - name: transaction_total + description: "The total value of the transaction." + agg: sum + + - name: sales + description: "The total sale of the transaction." + agg: sum + expr: transaction_total + + - name: median_sales + description: "The median sale of the transaction." + agg: median + expr: transaction_total + + - name: customers # Another semantic model called customers. + model: ref('dim_customers') + description: "A customer dimension table." + + entities: + - name: customer + type: primary + expr: customer_id + + dimensions: + - name: first_name + type: categorical +``` + +### Name + +Define the name of the semantic model. You must define a unique name for the semantic model. The semantic graph will use this name to identify the model, and you can update it at any time. + +### Description + +Includes important details in the description of the semantic model. This description will primarily be used by other configuration contributors. You can use the pipe operator `(|)` to include multiple lines in the description. + +### Model + +Specify the dbt model for the semantic model using the [`ref` function](/reference/dbt-jinja-functions/ref). + +### Entities + +To specify the [entities](/docs/build/entities) in your model, use their columns as join keys and indicate their `type` as primary, foreign, or unique keys with the type parameter. + + + + + +Here are the types of keys: + +- **Primary** — Only one record per row in the table, and it includes every record in the data platform. +- **Unique** — Only one record per row in the table, but it may have a subset of records in the data platform. Null values may also be present. +- **Foreign** — Can have zero, one, or multiple instances of the same record. Null values may also be present. +- **Natural** — A column or combination of columns in a table that uniquely identifies a record based on real-world data. For example, the `sales_person_id` can serve as a natural key in a `sales_person_department` dimension table. + + + + +This example shows a semantic model with three entities and their entity types: `transaction` (primary), `order` (foreign), and `user` (foreign). + +To reference a desired column, use the actual column name from the model in the `name` parameter. You can also use `name` as an alias to rename the column, and the `expr` parameter to refer to the original column name or a SQL expression of the column. + + +```yml +entity: + - name: transaction + type: primary + - name: order + type: foreign + expr: id_order + - name: user + type: foreign + expr: substring(id_order FROM 2) +``` + +You can refer to entities (join keys) in a semantic model using the `name` parameter. Entity names must be unique within a semantic model, and identifier names can be non-unique across semantic models since MetricFlow uses them for [joins](/docs/build/join-logic). + + + + +### Dimensions + +[Dimensions](/docs/build/dimensions) are the different ways you can group or slice data for a metric. It can be time-consuming and error-prone to anticipate all possible options in a single table, such as region, country, user role, and so on. + +MetricFlow simplifies this by allowing you to query all metric groups and construct the join during the query. To specify dimensions parameters, include the `name` (either a column or SQL expression) and `type` (`categorical` or `time`). Categorical groups represent qualitative values, while time groups represent dates of varying granularity. + +dimensions are identified using the name parameter, just like identifiers. The naming of groups must be unique within a semantic model, but not across semantic models since MetricFlow, uses entities to determine the appropriate groups. + +:::info For time groups + +For semantic models with a measure, you must have a primary time group. + +::: + +### Measures + +[Measures](/docs/build/measures) are aggregations applied to columns in your data model. They can be used as the foundational building blocks for more complex metrics, or be the final metric itself. Measures have various parameters which are listed in a table along with their descriptions and types. + +| Parameter | Description | Field type | +| --- | --- | --- | +| `name`| Provide a name for the measure, which must be unique and can't be repeated across all semantic models in your dbt project. | Required | +| `description` | Describes the calculated measure. | Optional | +| `agg` | dbt supports the following aggregations: `sum`, `max`, `min`, `count_distinct`, and `sum_boolean`. | Required | +| `expr` | You can either reference an existing column in the table or use a SQL expression to create or derive a new one. | Optional | +| `non_additive_dimension` | Non-additive dimensions can be specified for measures that cannot be aggregated over certain dimensions, such as bank account balances, to avoid producing incorrect results. | Optional | +| `create_metric`* | You can create a metric directly from a measure with create_metric: True and specify its display name with create_metric_display_name. | Optional | +_*Coming soon_ +## Related docs + +- [About MetricFlow](/docs/build/about-metricflow) +- [Dimensions](/docs/build/dimensions) +- [Entities](/docs/build/entities) +- [Measures](/docs/build/measures) diff --git a/website/docs/docs/build/simple.md b/website/docs/docs/build/simple.md new file mode 100644 index 00000000000..0092427699d --- /dev/null +++ b/website/docs/docs/build/simple.md @@ -0,0 +1,26 @@ +--- +title: "Simple metrics" +id: simple +description: "Use simple metrics to directly reference a single measure." +sidebar_label: Simple +tags: [Metrics, Semantic Layer] +--- + +Simple metrics are metrics that directly reference a single measure, without any additional measures involved. + + +``` yaml +metrics: + - name: cancellations + type: simple # Pointers to a measure you created in a data source + type_params: + measure: cancellations_usd # The measure you're creating a proxy of. + # For any metric optionally include a filter string which applies a dimensional filter when computing the metric + filter: | + {{dimension('value')}} > 100 and {{dimension('acquisition', entity_path=['user'])}} +``` diff --git a/website/docs/docs/build/sl-getting-started.md b/website/docs/docs/build/sl-getting-started.md new file mode 100644 index 00000000000..a2e176016ee --- /dev/null +++ b/website/docs/docs/build/sl-getting-started.md @@ -0,0 +1,132 @@ +--- +id: sl-getting-started +title: Get started with MetricFlow +description: "Learn how to create your first semantic model and metric." +sidebar_label: Get started with MetricFlow +tags: [Metrics, Semantic Layer] +--- + +This getting started page recommends a workflow to help you get started creating your first metrics. Here are the following steps you'll take: + +- [Create a semantic model](#create-a-semantic-model) +- [Create your metrics](#create-your-metrics) +- [Test and query your metrics](#test-and-query-your-metrics) + +## Prerequisites + +- Use the [command line (CLI)](/docs/core/about-the-cli) and have a dbt project and repository set up. + * Note: Support for dbt Cloud and integrations coming soon. +- Your dbt production environment must be on [dbt Core v1.6](/docs/dbt-versions/core) or higher. Support for the development environment coming soon. +- Have a dbt project connected to Snowflake or Postgres. + * Note: Support for BigQuery, Databricks, and Redshift coming soon. +- Have an understanding of key concepts in [MetricFlow](/docs/build/about-metricflow), which powers the revamped dbt Semantic Layer. +- Recommended — dbt Labs recommends you install the [MetricFlow CLI package](https://github.com/dbt-labs/metricflow) to test your metrics. + +:::tip +New to dbt or metrics? Try our [Jaffle shop example project](https://github.com/dbt-labs/jaffle-sl-template) to help you get started! +::: + +## Install MetricFlow + +Before you begin, make sure you install the `metricflow` and [dbt adapter](/docs/supported-data-platforms) via PyPI in the CLI. To install them, open the command line interface (CLI) and use the pip install command `pip install "dbt-metricflow[your_adapter_name]"`. + +Note that specifying `[your_adapter_name]` is required. This is because you must install MetricFlow as an extension of a dbt adapter. For example, for a Snowflake adapter, run `pip install "dbt-metricflow[snowflake]"`. + +Currently, the supported adapters are Snowflake and Postgres (BigQuery, Databricks, and Redshift coming soon). + +## Create a semantic model + +MetricFlow, which powers the dbt Semantic Layer, has two main objects: [semantic models](/docs/build/semantic-models) and [metrics](/docs/build/metrics-overview). You can think of semantic models as nodes in your semantic graph, connected via entities as edges. MetricFlow takes semantic models defined in YAML configuration files as inputs and creates a semantic graph that you can use to query metrics. + +This step will guide you through setting up your semantic models, which consists of [entities](/docs/build/entities), [dimensions](/docs/build/dimensions), and [measures](/docs/build/measures). + +1. Name your semantic model, fill in appropriate metadata, and map it to a model in your dbt project. +```yaml +semantic_models: + - name: transactions + description: | + This table captures every transaction starting July 02, 2014. Each row represents one transaction + model: ref('fact_transactions') + ``` + +2. Define your entities. These are the keys in your table that MetricFlow will use to join other semantic models. These are usually columns like `customer_id`, `transaction_id`, and so on. + +```yaml + entities: + - name: transaction + type: primary + expr: id_transaction + - name: customer + type: foreign + expr: id_customer + ``` + +3. Define your dimensions and measures. dimensions are properties of the records in your table that are non-aggregatable. They provide categorical or time-based context to enrich metrics. Measures are the building block for creating metrics. They are numerical columns that MetricFlow aggregates to create metrics. + +```yaml +measures: + - name: transaction_amount_usd + description: The total USD value of the transaction. + agg: sum + dimensions: + - name: is_large + type: categorical + expr: case when transaction_amount_usd >= 30 then true else false end +``` + +:::tip +If you're familiar with writing SQL, you can think of dimensions as the columns you would group by and measures as the columns you would aggregate. +```sql +select + metric_time_day, -- time + country, -- categorical dimension + sum(revenue_usd) -- measure +from + snowflake.fact_transactions -- sql table +group by metric_time_day, country -- dimensions + ``` +::: + +## Create your metrics + +Now that you've created your first semantic model, it's time to define your first metric. MetricFlow supports different metric types like [simple](/docs/build/simple), [ratio](/docs/build/ratio), [cumulative](/docs/build/cumulative), and [derived](/docs/build/derived). You can define metrics in the same YAML files as your semantic models, or create a new file. + +The example metric we'll create is a simple metric that refers directly to a measure, based on the `transaction_amount_usd` measure, which will be implemented as a `sum()` function in SQL. + +```yaml +--- +metrics: + - name: transaction_amount_usd + type: simple + type_params: + measure: transaction_amount_usd +``` + +Interact and test your metric using the CLI before committing it to your MetricFlow repository. + +## Test and query your metrics + +Follow these steps to test and query your metrics using MetricFlow: + +1. If you haven't done so already, make sure you [install MetricFlow](#install-metricflow). + +2. Run `mf version` to see your CLI version. If you don't have the CLI installed, run `pip install --upgrade "dbt-metricflow[your_adapter_name]"`. For example, if you have a Snowflake adapter, run `pip install --upgrade "dbt-metricflow[snowflake]"`. + +3. Save your files and run `mf validate-configs` to validate the changes before committing them + +4. Run `mf query --metrics --dimensions ` to query the metrics and dimensions you want to see in the CLI. + +5. Verify that the metric values are what you expect. You can view the generated SQL if you enter `--explain` in the CLI. + +6. Then commit your changes to push them to your git repo. + + + +## Related docs + +- [The dbt Semantic Layer: what’s next](https://www.getdbt.com/blog/dbt-semantic-layer-whats-next/) blog post +- [About MetricFlow](/docs/build/about-metricflow) +- [Semantic models](/docs/build/semantic-models) +- [Metrics](/docs/build/metrics-overview) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index fabb1a243d2..01330c5f0aa 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -10,7 +10,6 @@ id: "snapshots" * [Snapshot properties](/reference/snapshot-properties) * [`snapshot` command](/reference/commands/snapshot) -## Overview ### What are snapshots? Analysts often need to "look back in time" at previous data states in their mutable tables. While some source data systems are built in a way that makes accessing historical data possible, this is not always the case. dbt provides a mechanism, **snapshots**, which records changes to a mutable over time. @@ -415,3 +414,4 @@ Snapshot results: + diff --git a/website/docs/docs/build/validation.md b/website/docs/docs/build/validation.md new file mode 100644 index 00000000000..808d054f021 --- /dev/null +++ b/website/docs/docs/build/validation.md @@ -0,0 +1,55 @@ +--- +title: Validations +id: validation +description: "The Semantic Layer, powered by MetricFlow, has three types of built-in validations, including Parsing Validation, Semantic Validation, and Data Warehouse validation, which are performed in a sequential and blocking manner." +sidebar_label: "Validations" +tags: [Metrics, Semantic Layer] +--- + +Validations refer to the process of checking whether a system or configuration meets the expected requirements or constraints. In the case of the Semantic Layer, powered by MetricFlow, there are three built-in validations — [parsing](#parsing), [semantic](#semantic), and [data platform](#data-platform). + +These validations ensure that configuration files follow the expected schema, the semantic graph doesn't violate any constraints, and semantic definitions in the graph exist in the physical table - providing effective data governance support. These three validation steps occur sequentially and must succeed before proceeding to the next step. + +The code that handles validation [can be found here](https://github.com/dbt-labs/dbt-semantic-interfaces/tree/main/dbt_semantic_interfaces/validations) for those who want to dive deeper into this topic. + +## Prerequisites + +- You have installed the [MetricFlow CLI package](https://github.com/dbt-labs/metricflow) + +## Validations command + +You can run validations from the CLI with the following commands: + +```bash +mf validate-configs +``` + +## Parsing + +In this validation step, we ensure your config files follow the defined schema for each semantic graph object and can be parsed successfully. It validates the schema for the following core objects: + +* Semantic models +* Identifiers +* Measures +* Dimensions +* Metrics + +## Semantic + +This validation step occurs after we've built your semantic graph. The Semantic Layer, powered by MetricFlow, runs a suite of tests to ensure that your semantic graph doesn't violate any constraints. For example, we check to see if measure names are unique, or if metrics referenced in materialization exist. The current semantic rules we check for are: + +1. Check those semantic models with measures have a valid time dimension +2. Check that there is only one primary identifier defined in each semantic model +3. Dimension consistency +4. Unique measures in semantic models +5. Measures in metrics are valid +7. Cumulative metrics are configured properly + +## Data platform + +This type of validation Checks to see if the semantic definitions in your semantic graph exist in the underlying physical table. To test this, we run queries against your data platform to ensure the generated SQL for semantic models, dimensions, and metrics will execute. We run the following checks + +* Check that measures and dimensions exist +* Check that underlying tables for data sources exist +* Check that the generated SQL for metrics will execute + diff --git a/website/docs/docs/cloud/about-cloud/browsers.md b/website/docs/docs/cloud/about-cloud/browsers.md new file mode 100644 index 00000000000..4a04f70171b --- /dev/null +++ b/website/docs/docs/cloud/about-cloud/browsers.md @@ -0,0 +1,24 @@ +--- +title: "Supported browsers" +id: "browsers" +description: "dbt Cloud supports the latest browsers like Chrome and Firefox." +--- + +To have the best experience with dbt Cloud, we recommend using the latest versions of the following browsers: + +- [Google Chrome](https://www.google.com/chrome/) — Latest version is fully supported in dbt Cloud +- [Mozilla Firefox](https://www.mozilla.org/en-US/firefox/) — Latest version is fully supported in dbt Cloud +- [Apple Safari](https://www.apple.com/safari/) — Latest version support provided on a best-effort basis +- [Microsoft Edge](https://www.microsoft.com/en-us/edge?form=MA13FJ&exp=e00) — Latest version support provided on a best-effort basis + +dbt Cloud provides two types of browser support: + +- Fully supported — dbt Cloud is fully tested and supported on these browsers. Features display and work as intended. +- Best effort — You can access dbt Cloud on these browsers. Features may not display or work as intended. + +You may still be able to access and use dbt Cloud even without using the latest recommended browser or an unlisted browser. However, some features might not display as intended. + +:::note +To improve your experience using dbt Cloud, we suggest that you turn off ad blockers. +::: + diff --git a/website/docs/docs/cloud/about-cloud/dbt-cloud-features.md b/website/docs/docs/cloud/about-cloud/dbt-cloud-features.md index 5063305e77b..f301dfce34b 100644 --- a/website/docs/docs/cloud/about-cloud/dbt-cloud-features.md +++ b/website/docs/docs/cloud/about-cloud/dbt-cloud-features.md @@ -1,19 +1,16 @@ --- title: "dbt Cloud features" id: "dbt-cloud-features" +sidebar_label: "dbt Cloud features" +description: "Explore dbt Cloud's features and learn why dbt Cloud is the fastest way to deploy dbt" hide_table_of_contents: true --- -dbt Cloud is the fastest and most reliable way to deploy dbt. Develop, test, schedule, document, and investigate data models all in one web-based UI. +dbt Cloud is the fastest and most reliable way to deploy dbt. Develop, test, schedule, document, and investigate data models all in one browser-based UI. In addition to providing a hosted architecture for running dbt across your organization, dbt Cloud comes equipped with turnkey support for scheduling jobs, CI/CD, hosting documentation, monitoring & alerting, and an integrated development environment (IDE). -In addition to providing a hosted architecture for running dbt Core across your organization, dbt Cloud comes equipped with turnkey support for scheduling jobs, CI/CD, hosting documentation, monitoring & alerting, and an integrated developer environment (IDE). - -dbt Cloud's [flexible plans](https://www.getdbt.com/pricing/) and features make it well-suited for data teams of any size — sign up for your [free 14 day trial](https://www.getdbt.com/signup/)! - -To have the best experience using dbt Cloud, we recommend you use modern and up-to-date web browsers like Chrome, Safari, Edge, and Firefox.

- -
+dbt Cloud's [flexible plans](https://www.getdbt.com/pricing/) and features make it well-suited for data teams of any size — sign up for your [free 14-day trial](https://www.getdbt.com/signup/)! +

- ***These features are available on [selected plans](https://www.getdbt.com/pricing/).** - +*These features are available on [selected plans](https://www.getdbt.com/pricing/). ## Related docs - [dbt Cloud plans and pricing](https://www.getdbt.com/pricing/) - [Quickstart guides](/quickstarts) - [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud) -- [dbt Cloud support](/docs/dbt-support) -- [Become a contributor](https://docs.getdbt.com/community/contribute) + diff --git a/website/docs/docs/cloud/about-cloud/regions-ip-addresses.md b/website/docs/docs/cloud/about-cloud/regions-ip-addresses.md index 7146909f02b..bc8c180f2fd 100644 --- a/website/docs/docs/cloud/about-cloud/regions-ip-addresses.md +++ b/website/docs/docs/cloud/about-cloud/regions-ip-addresses.md @@ -14,7 +14,7 @@ dbt Cloud is [hosted](/docs/cloud/about-cloud/architecture) in multiple regions | North America [^1] | AWS us-east-1 (N. Virginia) | cloud.getdbt.com | 52.45.144.63
54.81.134.249
52.22.161.231 | ✅ | ✅ | ✅ | | EMEA [^1] | AWS eu-central-1 (Frankfurt) | emea.dbt.com | 3.123.45.39
3.126.140.248
3.72.153.148 | ❌ | ❌ | ✅ | | APAC [^1] | AWS ap-southeast-2 (Sydney)| au.dbt.com | 52.65.89.235
3.106.40.33
13.239.155.206
| ❌ | ❌ | ✅ | -| Virtual Private dbt or Single tenant | Customized | Customized | Ask [Support](/guides/legacy/getting-help#dbt-cloud-support) for your IPs | ❌ | ❌ | ✅ | +| Virtual Private dbt or Single tenant | Customized | Customized | Ask [Support](/community/resources/getting-help#dbt-cloud-support) for your IPs | ❌ | ❌ | ✅ | [^1]: These regions support [multi-tenant](/docs/cloud/about-cloud/tenancy) deployment environments hosted by dbt Labs. diff --git a/website/docs/docs/cloud/about-cloud/tenancy.md b/website/docs/docs/cloud/about-cloud/tenancy.md index 0a20bc4e253..0d312767b82 100644 --- a/website/docs/docs/cloud/about-cloud/tenancy.md +++ b/website/docs/docs/cloud/about-cloud/tenancy.md @@ -1,10 +1,12 @@ --- title: Tenancy id: tenancy -description: "Information aboute single tenant and multi-tenant dbt Cloud instances" +description: "Information about single tenant and multi-tenant dbt Cloud instances" --- -dbt Cloud is available in both single (virtual private) and multi-tenant configurations. +import AboutCloud from '/snippets/_test-tenancy.md'; + + ### Multi-tenant @@ -22,4 +24,4 @@ _To learn more about setting up a dbt Cloud single tenant deployment, [please co ### Available features - \ No newline at end of file + diff --git a/website/docs/docs/cloud/connect-data-platform/about-connections.md b/website/docs/docs/cloud/connect-data-platform/about-connections.md index 2866746eabc..65bfac3a90d 100644 --- a/website/docs/docs/cloud/connect-data-platform/about-connections.md +++ b/website/docs/docs/cloud/connect-data-platform/about-connections.md @@ -25,4 +25,4 @@ dbt Cloud will always connect to your data platform from the IP addresses specif Be sure to allow traffic from these IPs in your firewall, and include them in any database grants. -Allowing these IP addresses only enables the connection to your . However, you might want to send API requests from your restricted network to the dbt Cloud API. For example, you could use the API to send a POST request that [triggers a job to run](https://docs.getdbt.com/dbt-cloud/api-v2#operation/triggerRun). Using the dbt Cloud API requires that you allow the `cloud.getdbt.com` subdomain. For more on the dbt Cloud architecture, see [Deployment architecture](/docs/cloud/about-cloud/architecture). +Allowing these IP addresses only enables the connection to your . However, you might want to send API requests from your restricted network to the dbt Cloud API. For example, you could use the API to send a POST request that [triggers a job to run](https://docs.getdbt.com/dbt-cloud/api-v2-legacy#operation/triggerRun). Using the dbt Cloud API requires that you allow the `cloud.getdbt.com` subdomain. For more on the dbt Cloud architecture, see [Deployment architecture](/docs/cloud/about-cloud/architecture). diff --git a/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md b/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md index e3199984d05..c5bb2dc8b9e 100644 --- a/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md +++ b/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md @@ -65,7 +65,7 @@ To stay informed on IDE updates, read [dbt Cloud IDE release notes](/tags/ide), | **Lint and Format** | [Lint and format](/docs/cloud/dbt-cloud-ide/lint-format) your files with a click of a button, powered by SQLFluff, sqlfmt, Prettier, and Black. | **Git diff view** | Ability to see what has been changed in a file before you make a pull request. | **dbt autocomplete** | New autocomplete features to help you develop faster:

- Use `ref` to autocomplete your model names
- Use `source` to autocomplete your source name + table name
- Use `macro` to autocomplete your arguments
- Use `env var` to autocomplete env var
- Start typing a hyphen (-) to use in-line autocomplete in a YAML file | -| ** in the IDE** | You can see how models are used as building blocks from left to right to transform your data from raw sources into cleaned-up modular derived pieces and final outputs on the far right of the DAG. The default view is 2+model+2 (defaults to display 2 nodes away), however you can change it to +model+ (full ). | +| ** in the IDE** | You can see how models are used as building blocks from left to right to transform your data from raw sources into cleaned-up modular derived pieces and final outputs on the far right of the DAG. The default view is 2+model+2 (defaults to display 2 nodes away), however, you can change it to +model+ (full ). Note the `--exclude` flag isn't supported. | | **Status bar** | This area provides you with useful information about your IDE and project status. You also have additional options like enabling light or dark mode, restarting the IDE, or [recloning your repo](/docs/collaborate/git/version-control-basics). | **Dark mode** | From the status bar in the Cloud IDE, enable dark mode for a great viewing experience in low-light environments. @@ -92,11 +92,11 @@ The Cloud IDE needs explicit action to save your changes. There are three ways y :::info📌 -New to dbt? Check out our [quickstart guide](/quickstarts) to build your first dbt project in the Cloud IDE! +New to dbt? Check out our [quickstart guides](/quickstarts) to build your first dbt project in the Cloud IDE! ::: -In order to start experiencing the great features of the Cloud IDE, you need to first set up a [dbt Cloud development environment](/docs/collaborate/environments/dbt-cloud-environments). In the following steps, we outline how to set up developer credentials and access the IDE. If you're creating a new project, you will automatically configure this during the project setup. +In order to start experiencing the great features of the Cloud IDE, you need to first set up a [dbt Cloud development environment](/docs/dbt-cloud-environments). In the following steps, we outline how to set up developer credentials and access the IDE. If you're creating a new project, you will automatically configure this during the project setup. The IDE uses developer credentials to connect to your data platform. These developer credentials should be specific to your user and they should *not* be super user credentials or the same credentials that you use for your production deployment of dbt. @@ -141,6 +141,8 @@ The dbt Cloud IDE makes it possible to [build and view](/docs/collaborate/build- ## Related questions +
+
Is there a cost to using the Cloud IDE?
diff --git a/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md b/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md index efc106fea1f..63a4f9a0312 100644 --- a/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md +++ b/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md @@ -98,9 +98,10 @@ The console section, located below the File editor, includes various console tab 6. **Compiled Code tab —** The Compile button triggers a compile invocation that generates compiled code, which is displayed in the Compiled Code tab. -7. **Lineage tab —** The Lineage tab in the File Editor displays the active model's lineage or . By default, it shows two degrees of lineage in both directions (`2+model_name+2`), however you can change it to +model+ (full DAG). +7. **Lineage tab —** The Lineage tab in the File Editor displays the active model's lineage or . By default, it shows two degrees of lineage in both directions (`2+model_name+2`), however, you can change it to +model+ (full DAG). - Double-click a node in the DAG to open that file in a new tab - - Expand the DAG and use node selection syntax (select or exclude) to view a subset of your DAG + - Expand or shrink the DAG using node selection syntax. + - Note, the `--exclude` flag isn't supported. diff --git a/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md b/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md index ee72da25cfb..c486ac8b69c 100644 --- a/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md +++ b/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md @@ -10,7 +10,7 @@ Enhance your development workflow by integrating with popular linters and format
What are linters and formatters? -Linters analyze code for errors, bugs, and style issues, while formatters fix style and formatting rules. +Linters analyze code for errors, bugs, and style issues, while formatters fix style and formatting rules. Read more about when to use linters or formatters in the FAQs
@@ -183,6 +183,23 @@ To format your Python code, dbt Cloud integrates with [Black](https://black.read ## FAQs +
+When should I use SQLFluff and when should I use sqlfmt? + +SQLFluff and sqlfmt are both tools used for formatting SQL code, but there are some differences that may make one preferable to the other depending on your use case.
+ +SQLFluff is a SQL code linter and formatter. This means that it analyzes your code to identify potential issues and bugs, and follows coding standards. It also formats your code according to a set of rules, which are [customizable](#customize-linting), to ensure consistent coding practices. You can also use SQLFluff to keep your SQL code well-formatted and follow styling best practices.
+ +sqlfmt is a SQL code formatter. This means it automatically formats your SQL code according to a set of formatting rules that aren't customizable. It focuses solely on the appearance and layout of the code, which helps ensure consistent indentation, line breaks, and spacing. sqlfmt doesn't analyze your code for errors or bugs and doesn't look at coding issues beyond code formatting.
+ +You can use either SQLFluff or sqlfmt depending on your preference and what works best for you: + +- Use SQLFluff to have your code linted and formatted (meaning analyze fix your code for errors/bugs, and format your styling). It allows you the flexibility to customize your own rules. + +- Use sqlfmt to only have your code well-formatted without analyzing it for errors and bugs. You can use sqlfmt out of the box, making it convenient to use right away without having to configure it. + +
+
Can I nest .sqlfluff files? diff --git a/website/docs/docs/cloud/git/authenticate-azure.md b/website/docs/docs/cloud/git/authenticate-azure.md index abac4fd1b59..f3a534ac923 100644 --- a/website/docs/docs/cloud/git/authenticate-azure.md +++ b/website/docs/docs/cloud/git/authenticate-azure.md @@ -22,3 +22,7 @@ Connect your dbt Cloud profile to Azure DevOps using OAuth: You will be directed back to dbt Cloud, and your profile should be linked. You are now ready to develop in dbt Cloud! + +## FAQs + + diff --git a/website/docs/docs/cloud/git/connect-github.md b/website/docs/docs/cloud/git/connect-github.md index 7f47636afa3..6113e3ccb42 100644 --- a/website/docs/docs/cloud/git/connect-github.md +++ b/website/docs/docs/cloud/git/connect-github.md @@ -5,7 +5,6 @@ id: "connect-github" sidebar_label: "Connect to GitHub" --- -## Overview Connecting your GitHub account to dbt Cloud provides convenience and another layer of security to dbt Cloud: - Log into dbt Cloud using OAuth through GitHub. @@ -71,3 +70,8 @@ To connect a personal GitHub account: 4. Once you approve authorization, you will be redirected to dbt Cloud, and you should now see your connected account. The next time you log into dbt Cloud, you will be able to do so via OAuth through GitHub, and if you're on the Enterprise plan, you're ready to use the dbt Cloud IDE. + + +## FAQs + + diff --git a/website/docs/docs/cloud/git/connect-gitlab.md b/website/docs/docs/cloud/git/connect-gitlab.md index 9ac7c254f11..e66fa577e5b 100644 --- a/website/docs/docs/cloud/git/connect-gitlab.md +++ b/website/docs/docs/cloud/git/connect-gitlab.md @@ -4,7 +4,6 @@ description: "Learn how connecting your GitLab account provides convenience and id: "connect-gitlab" --- -## Overview Connecting your GitLab account to dbt Cloud provides convenience and another layer of security to dbt Cloud: - Import new GitLab repos with a couple clicks during dbt Cloud project setup. @@ -117,3 +116,9 @@ If you do see your repository listed, but are unable to import the repository su - You are a maintainer of that repository. Only users with maintainer permissions can set up repository connections. If you imported a repository using the dbt Cloud native integration with GitLab, you should be able to see the clone strategy is using a `deploy_token`. If it's relying on an SSH key, this means the repository was not set up using the native GitLab integration, but rather using the generic git clone option. The repository must be reconnected in order to get the benefits described above. + +## FAQs + + + + diff --git a/website/docs/docs/cloud/manage-access/about-access.md b/website/docs/docs/cloud/manage-access/about-access.md index ce1e1c48e7d..e1cb4f65a35 100644 --- a/website/docs/docs/cloud/manage-access/about-access.md +++ b/website/docs/docs/cloud/manage-access/about-access.md @@ -31,14 +31,12 @@ user can only have one type of license at any given time. A user's license type controls the features in dbt Cloud that the user is able to access. dbt Cloud's three license types are: - - **Read Only** - - **Developer** - - **IT** + + - **Developer** — User may be granted _any_ permissions. + - **Read Only** — User has read-only permissions applied to all dbt Cloud resources regardless of the role-based permissions that the user is assigned. + - **IT** — User has [Security Admin](/docs/cloud/manage-access/enterprise-permissions#security-admin) and [Billing Admin](docs/cloud/manage-access/enterprise-permissions#billing-admin) permissions applied regardless of the role-based permissions that the user is assigned. For more information on these license types, see [Seats & Users](/docs/cloud/manage-access/seats-and-users). -At a high level, Developers may be granted _any_ permissions, whereas Read Only -users will have read-only permissions applied to all dbt Cloud resources -regardless of the role-based permissions that the user is assigned. IT users will have Security Admin permissions applied regardless of the role-based permissions that the user is assigned. ## Role-based access control @@ -78,7 +76,7 @@ page in your Account Settings. /> -### SSO Mappings +### SSO mappings SSO Mappings connect Identity Provider (IdP) group membership to dbt Cloud group membership. When a user logs into dbt Cloud via a supported identity provider, @@ -96,7 +94,7 @@ groups. ::: -### Permission Sets +### Permission sets Permission sets are predefined collections of granular permissions. Permission sets combine low-level permission grants into high-level roles that can be diff --git a/website/docs/docs/cloud/manage-access/audit-log.md b/website/docs/docs/cloud/manage-access/audit-log.md index 52c27d99ab2..818ec553e7b 100644 --- a/website/docs/docs/cloud/manage-access/audit-log.md +++ b/website/docs/docs/cloud/manage-access/audit-log.md @@ -7,12 +7,6 @@ sidebar_label: "Audit log" To review actions performed by people in your organization, dbt provides logs of audited user and system events in real time. The audit log appears as events happen and includes details such as who performed the action, what the action was, and when it was performed. You can use these details to troubleshoot access issues, perform security audits, or analyze specific events. -:::note - -Single-tenant deployment environments hosted on Microsoft Azure do not currently support audit logs. For more information, refer to [Single tenant](/docs/cloud/about-cloud/tenancy). - -::: - You must be an **Account Admin** to access the audit log and this feature is only available on Enterprise plans. The dbt Cloud audit log stores all the events that occurred in your organization in real-time, including: diff --git a/website/docs/docs/cloud/manage-access/auth0-migration.md b/website/docs/docs/cloud/manage-access/auth0-migration.md index 3262c53079a..af430772ca4 100644 --- a/website/docs/docs/cloud/manage-access/auth0-migration.md +++ b/website/docs/docs/cloud/manage-access/auth0-migration.md @@ -2,7 +2,7 @@ title: "Migrating to Auth0 for SSO" id: "auth0-migration" sidebar: "SSO Auth0 Migration" -description: "Required actions for migrating to Auth0 for SSO services on dbt Cloud" +description: "Required actions for migrating to Auth0 for SSO services on dbt Cloud." --- :::warning Limited availability @@ -28,48 +28,48 @@ Alternatively, you can start the process from the **Settings** page in the **Sin -Once you have opted to begin the migration process, the following steps will vary depending on the configured identity provider. Skip to the section that's right for your environment. These steps only apply to customers going through the migration; new setups will use the existing [setup instructions](/docs/cloud/manage-access/sso-overview). +Once you have opted to begin the migration process, the following steps will vary depending on the configured identity provider. You can just skip to the section that's right for your environment. These steps only apply to customers going through the migration; new setups will use the existing [setup instructions](/docs/cloud/manage-access/sso-overview). :::warning Login {slug} Slugs should contain only letters, numbers, and dashes. Make sure to remove underscores (if they exist) from login slugs: * before migrating on the **Account Settings** page, or -* while migrating (but before enabling) as show in the Migrate authentication screenshots for your respective setup. +* while migrating (before enabling), as shown in the Migrate authentication screenshots for your respective setup. After changing the slug, admins must share the new login URL with their dbt Cloud users. ::: ## SAML 2.0 and Okta -SAML 2.0 users must update a few fields in the SSO app configuration to match the new Auth0 URL and URI. You can approach this by editing the existing SSO app settings or creating a new one to accommodate the Auth0 settings. One approach isn't inherently better, so choose whichever works best for your organization. +SAML 2.0 users must update a few fields in the SSO app configuration to match the new Auth0 URL and URI. You can approach this by editing the existing SSO app settings or creating a new one to accommodate the Auth0 settings. One approach isn't inherently better, so you can choose whichever works best for your organization. The fields that will be updated are: - Single sign-on URL — `https:///login/callback?connection={slug}` - Audience URI (SP Entity ID) — `urn:auth0::{slug}` -Replace `{slug}` with your organization’s login slug. It must be unique across all dbt Cloud instances and is usually something like your company name separated by dashes (for example, `dbt-labs`). +Sample steps to update (you must complete all of them to ensure uninterrupted access to dbt Cloud): + +1. Replace `{slug}` with your organization’s login slug. It must be unique across all dbt Cloud instances and is usually something like your company name separated by dashes (for example, `dbt-labs`). Here is an example of an updated SAML 2.0 setup in Okta. -After the configuration is saved, your SAML settings will look something like this: +2. Save the configuration, and your SAML settings will look something like this: -Once you have saved this information in the SSO environment, you must update the single sign-on URL fields in the dbt Cloud migration window and provide the updated x.509 certificate. - -Toggle the `Enable new SSO authentication` option to ensure the traffic is routed correctly. _The new SSO migration action is final and cannot be undone_ +3. Toggle the `Enable new SSO authentication` option to ensure the traffic is routed correctly. _The new SSO migration action is final and cannot be undone_ -Save the settings and test the new configuration using the SSO login URL provided on the settings page. +4. Save the settings and test the new configuration using the SSO login URL provided on the settings page. ## Google Workspace Google Workspace admins updating their SSO APIs with the Auth0 URL won't have to do much if it is an existing setup. This can be done as a new project or by editing an existing SSO setup. No additional scopes are needed since this is migrating from an existing setup. All scopes were defined during the initial configuration. -Steps to update: +Steps to update (you must complete all of them to ensure uninterrupted access to dbt Cloud): 1. Open the [Google Cloud console](https://console.cloud.google.com/) and select the project with your dbt Cloud single sign-on settings. From the project page **Quick Access**, select **APIs and Services** @@ -99,7 +99,9 @@ You must complete the domain authorization before you toggle `Enable New SSO Aut Azure Active Directory admins will need to make a slight adjustment to the existing authentication app in the Azure AD portal. This migration does not require that the entire app be deleted or recreated; you can edit the existing app. Start by opening the Azure portal and navigating to the Active Directory overview. -1. Click **App Regitstrations** on the left side menu. +Steps to update (you must complete all of them to ensure uninterrupted access to dbt Cloud): + +1. Click **App Registrations** on the left side menu. diff --git a/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md b/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md index dece68bf04a..62c193bb669 100644 --- a/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md +++ b/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md @@ -6,13 +6,14 @@ sidebar: "Users and licenses" --- In dbt Cloud, _licenses_ are used to allocate users to your account. There are three different types of licenses in dbt Cloud: -- Developer -- Read-only -- IT -The type of license a user is assigned controls which capabilities of dbt Cloud the user is permitted to access. Users with a Developer license can be granted access to the Deployment and [Development](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud) functionality in dbt Cloud, whereas users with Read Only licenses are intended to view the [artifacts](/docs/deploy/artifacts) created in a dbt Cloud account. Users with an IT License can manage users, groups, and licenses, among other permissions. +- **Developer** — Granted access to the Deployment and [Development](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud) functionality in dbt Cloud. +- **Read-only** — Intended to view the [artifacts](/docs/deploy/artifacts) created in a dbt Cloud account. +- **IT** — Can manage users, groups, and licenses, among other permissions. Available on Enterprise and Team plans only. -| Functionality | Developer User | Read Only Users | IT Users | +The user's assigned license determines the specific capabilities they can access in dbt Cloud. + +| Functionality | Developer User | Read Only Users | IT Users* | | ------------- | -------------- | --------------- | -------- | | Use the Developer IDE | ✅ | ❌ | ❌ | | Use Jobs | ✅ | ❌ | ❌ | @@ -20,6 +21,7 @@ The type of license a user is assigned controls which capabilities of dbt Cloud | API Access | ✅ | ❌ | ❌ | | Use [Source Freshness](/docs/deploy/source-freshness) | ✅ | ✅ | ❌ | | Use [Docs](/docs/collaborate/build-and-view-your-docs) | ✅ | ✅ | ❌ | +*Available on Enterprise and Team plans only and doesn't count toward seat usage. ## Licenses diff --git a/website/docs/docs/cloud/manage-access/enterprise-permissions.md b/website/docs/docs/cloud/manage-access/enterprise-permissions.md index f536b157233..3fb2ab93a8e 100644 --- a/website/docs/docs/cloud/manage-access/enterprise-permissions.md +++ b/website/docs/docs/cloud/manage-access/enterprise-permissions.md @@ -4,14 +4,9 @@ id: "enterprise-permissions" description: "Permission sets for Enterprise plans." --- -:::info Enterprise Feature +import SetUpPages from '/snippets/_available-enterprise-only.md'; -This guide describes a feature of the dbt Cloud Enterprise plan. -If you're interested in learning more about an Enterprise plan, contact us at sales@getdbt.com. - -::: - -## Overview + The dbt Cloud Enterprise plan supports a number of pre-built permission sets to help manage access controls within a dbt Cloud account. See the docs on [access @@ -61,6 +56,17 @@ Security Admins have access to modify certain account-level settings. Users with - View and export Audit Logs - Create, delete, and modify IP Restrictions +### Billing Admin + +- **Has permissions on:** Account-level settings +- **License restrictions:** must have a Developer or an IT license + +Billing Admins have access to modify certain account-level settings related to billing. Users with Billing Admin permissions can: + +- View and modify **Account Settings** such as: + - View billing information + - Modify billing information (accounts on the Team plan) + - This includes modifying Developer Seat counts for the Account ### Project Creator - **Has permissions on:** Authorized projects, account-level settings diff --git a/website/docs/docs/cloud/manage-access/licenses-and-groups.md b/website/docs/docs/cloud/manage-access/licenses-and-groups.md index 99d4acad997..51a0649b896 100644 --- a/website/docs/docs/cloud/manage-access/licenses-and-groups.md +++ b/website/docs/docs/cloud/manage-access/licenses-and-groups.md @@ -32,7 +32,7 @@ to access. dbt Cloud's three license types are: For more information on these license types, see [Seats & Users](/docs/cloud/manage-access/seats-and-users). At a high-level, Developers may be granted _any_ permissions, whereas Read Only users will have read-only permissions applied to all dbt Cloud resources -regardless of the role-based permissions that the user is assigned. IT users will have Security Admin permissions applied regardless of the role-based permissions that the user is assigned. +regardless of the role-based permissions that the user is assigned. IT users will have Security Admin and Billing Admin permissions applied regardless of the role-based permissions that the user is assigned. ## Role-based access control diff --git a/website/docs/docs/cloud/manage-access/self-service-permissions.md b/website/docs/docs/cloud/manage-access/self-service-permissions.md index ce893448e6e..7a086dd1eec 100644 --- a/website/docs/docs/cloud/manage-access/self-service-permissions.md +++ b/website/docs/docs/cloud/manage-access/self-service-permissions.md @@ -3,7 +3,6 @@ title: "Self-service permissions" description: "Learn how dbt Cloud administrators can use self-service permissions to control access in a dbt Cloud account." id: "self-service-permissions" --- -## Overview dbt Cloud supports two different permission sets to manage permissions for self-service accounts: **Member** and **Owner**. diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-azure-active-directory.md b/website/docs/docs/cloud/manage-access/set-up-sso-azure-active-directory.md index c3b1f8b73da..13f49422832 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-azure-active-directory.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-azure-active-directory.md @@ -5,11 +5,9 @@ id: "set-up-sso-azure-active-directory" sidebar_label: "Set up SSO with Azure AD" --- -:::info Enterprise Feature -This guide describes a feature of the dbt Cloud Enterprise plan. If you’re -interested in learning more about an Enterprise plan, contact us at -sales@getdbt.com. -::: +import SetUpPages from '/snippets/_sso-docs-mt-available.md'; + + dbt Cloud Enterprise supports single-sign on via Azure Active Directory (Azure AD). You will need permissions to create and manage a new Azure AD application. @@ -171,4 +169,4 @@ Now you have completed setting up SSO with Azure AD, the next steps will be to s Ensure that the domain name under which user accounts exist in Azure matches the domain you supplied in [Supplying credentials](#supplying-credentials) when you configured SSO. - \ No newline at end of file + diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md b/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md index 2297fafc45a..314d1128cb0 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md @@ -4,11 +4,9 @@ description: "Learn how dbt Cloud administrators can use Single-Sign On (SSO) vi id: "set-up-sso-google-workspace" --- -:::info Enterprise Feature -This guide describes a feature of the dbt Cloud Enterprise plan. If you’re -interested in learning more about an Enterprise plan, contact us at -sales@getdbt.com. -::: +import SetUpPages from '/snippets/_sso-docs-mt-available.md'; + + dbt Cloud Enterprise supports Single-Sign On (SSO) via Google GSuite. You will need permissions to create and manage a new Google OAuth2 application, as well as diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-okta.md b/website/docs/docs/cloud/manage-access/set-up-sso-okta.md index d5c91e15076..70de8285450 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-okta.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-okta.md @@ -3,11 +3,9 @@ title: "Set up SSO with Okta" id: "set-up-sso-okta" --- -:::info Enterprise Feature +import SetUpPages from '/snippets/_sso-docs-mt-available.md'; -This guide describes a feature of the dbt Cloud Enterprise plan. If you’re interested in learning more about an Enterprise plan, contact us at sales@getdbt.com. - -::: + ## Okta SSO diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md b/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md index c644d5d30fc..297e92600f7 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md @@ -3,12 +3,9 @@ title: "Set up SSO with SAML 2.0" id: "set-up-sso-saml-2.0" --- -:::info Enterprise Feature +import SetUpPages from '/snippets/_sso-docs-mt-available.md'; -This guide describes a feature of the dbt Cloud Enterprise plan. If you’re interested in learning -more about an Enterprise plan, contact us at sales@getdbt.com. - -::: + dbt Cloud Enterprise supports single-sign on (SSO) for any SAML 2.0-compliant identity provider (IdP). Currently supported features include: diff --git a/website/docs/docs/cloud/manage-access/sso-overview.md b/website/docs/docs/cloud/manage-access/sso-overview.md index 5af7f0d7721..b129b40c029 100644 --- a/website/docs/docs/cloud/manage-access/sso-overview.md +++ b/website/docs/docs/cloud/manage-access/sso-overview.md @@ -46,14 +46,6 @@ Then, assign all of these (and only these) to the user license. This step will a ## SSO enforcement -:::info Security Update - -Please read the following update if you've enabled SSO but still have non-admin users logging in with a password. The changes outlined here will be released after September 15, 2022. - -::: - -Starting September 15, 2022, we will be making these security changes to SSO to increase the security posture of your environment: - * **SSO Enforcement:** If you have SSO turned on in your organization, dbt Cloud will enforce SSO-only logins for all non-admin users. If an Account Admin already has a password, they can continue logging in with a password. * **SSO Re-Authentication:** dbt Cloud will prompt you to re-authenticate using your SSO provider every 24 hours to ensure high security. diff --git a/website/docs/docs/cloud/secure/about-privatelink.md b/website/docs/docs/cloud/secure/about-privatelink.md index d478a3437b1..7bd18f306b6 100644 --- a/website/docs/docs/cloud/secure/about-privatelink.md +++ b/website/docs/docs/cloud/secure/about-privatelink.md @@ -10,9 +10,13 @@ This feature is currently in Private Preview, and these instructions are specifi PrivateLink enables a private connection from any dbt Cloud Multi-Tenant environment to your data platform hosted on AWS using [AWS PrivateLink](https://aws.amazon.com/privatelink/) technology. PrivateLink allows dbt Cloud customers to meet security and compliance controls as it allows connectivity between dbt Cloud and your data platform without traversing the public internet. This feature is supported in most regions across NA, Europe, and Asia, but [contact us](https://www.getdbt.com/contact/) if you have questions about availability. +### Cross-region PrivateLink + +dbt Labs has a worldwide network of regional VPCs. These VPCs are specifically used to host PrivateLink VPC endpoints, which are connected to dbt Cloud instance environments. To ensure security, access to these endpoints is protected by security groups, network policies, and application connection safeguards. The connected services are also authenticated. Currently, we have multiple customers successfully connecting to their PrivateLink endpoints in different AWS regions within dbt Cloud. + ### Configuring PrivateLink -dbt Cloud supports the following data platforms for use with the PrivateLink feature. Instructions for enabling PrivateLink for the various data platform providers are unique. The following guides will walk you through the necessary steps, including working with [dbt Support](https://docs.getdbt.com/guides/legacy/getting-help#dbt-cloud-support) to complete the connection in the dbt private network and setting up the endpoint in dbt Cloud. +dbt Cloud supports the following data platforms for use with the PrivateLink feature. Instructions for enabling PrivateLink for the various data platform providers are unique. The following guides will walk you through the necessary steps, including working with [dbt Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support) to complete the connection in the dbt private network and setting up the endpoint in dbt Cloud. - [Redshift](/docs/cloud/secure/redshift-privatelink) - [Snowflake](/docs/cloud/secure/snowflake-privatelink) diff --git a/website/docs/docs/cloud/secure/databricks-privatelink.md b/website/docs/docs/cloud/secure/databricks-privatelink.md index 7082b6bff54..c136cd8a0f9 100644 --- a/website/docs/docs/cloud/secure/databricks-privatelink.md +++ b/website/docs/docs/cloud/secure/databricks-privatelink.md @@ -10,7 +10,7 @@ The following steps will walk you through the setup of a Databricks AWS PrivateL ## Configure PrivateLink 1. Locate your [Databricks Workspace ID](https://kb.databricks.com/en_US/administration/find-your-workspace-id#:~:text=When%20viewing%20a%20Databricks%20workspace,make%20up%20the%20workspace%20ID) -2. Add the required information to the template below, and submit your request to [dbt Support](https://docs.getdbt.com/guides/legacy/getting-help#dbt-cloud-support): +2. Add the required information to the template below, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support): ``` Subject: New Multi-Tenant PrivateLink Request - Type: Databricks diff --git a/website/docs/docs/cloud/secure/ip-restrictions.md b/website/docs/docs/cloud/secure/ip-restrictions.md index 730cf0a04b2..dacd0c885c4 100644 --- a/website/docs/docs/cloud/secure/ip-restrictions.md +++ b/website/docs/docs/cloud/secure/ip-restrictions.md @@ -1,11 +1,13 @@ --- title: "Configuring IP restrictions" id: ip-restrictions -description: "Configuring IP retrictions to outside traffic from accessing your dbt Cloud environment" +description: "Configuring IP restrictions to outside traffic from accessing your dbt Cloud environment" sidebar_label: "IP restrictions" --- -## About IP Restrictions +import SetUpPages from '/snippets/_available-tiers-iprestrictions.md'; + + IP Restrictions help control which IP addresses are allowed to connect to dbt Cloud. IP restrictions allow dbt Cloud customers to meet security and compliance controls by only allowing approved IPs to connect to their dbt Cloud environment. This feature is supported in all regions across NA, Europe, and Asia-Pacific, but contact us if you have questions about availability. @@ -67,4 +69,4 @@ Once enabled, when someone attempts to access dbt Cloud from a restricted IP, th - \ No newline at end of file + diff --git a/website/docs/docs/cloud/secure/redshift-privatelink.md b/website/docs/docs/cloud/secure/redshift-privatelink.md index c63c4dc8103..b8c357825f8 100644 --- a/website/docs/docs/cloud/secure/redshift-privatelink.md +++ b/website/docs/docs/cloud/secure/redshift-privatelink.md @@ -21,13 +21,13 @@ dbt Cloud supports both types of endpoints, but there are a number of [considera -3. Enter the AWS account ID: `346425330055` - _NOTE: This account ID only applies to dbt Cloud Multi-Tenant environments. For Virtual Private/Single-Tenant account IDs please contact [Support](https://docs.getdbt.com/guides/legacy/getting-help#dbt-cloud-support)._ +3. Enter the AWS account ID: `346425330055` - _NOTE: This account ID only applies to dbt Cloud Multi-Tenant environments. For Virtual Private/Single-Tenant account IDs please contact [Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support)._ -4. Choose **Grant access to all VPCs** —or— (optional) contact [Support](https://docs.getdbt.com/guides/legacy/getting-help#dbt-cloud-support) for the appropriate regional VPC ID to designate in the **Grant access to specific VPCs** field. +4. Choose **Grant access to all VPCs** —or— (optional) contact [Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support) for the appropriate regional VPC ID to designate in the **Grant access to specific VPCs** field. -5. Add the required information to the following template, and submit your request to [dbt Support](https://docs.getdbt.com/guides/legacy/getting-help#dbt-cloud-support): +5. Add the required information to the following template, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support): ``` Subject: New Multi-Tenant PrivateLink Request @@ -77,7 +77,7 @@ Once the VPC Endpoint Service is provisioned, you can find the service name in t -### 4. Add the required information to the template below, and submit your request to [dbt Support](https://docs.getdbt.com/guides/legacy/getting-help#dbt-cloud-support): +### 4. Add the required information to the template below, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support): ``` Subject: New Multi-Tenant PrivateLink Request - Type: Redshift Interface-type diff --git a/website/docs/docs/cloud/secure/snowflake-privatelink.md b/website/docs/docs/cloud/secure/snowflake-privatelink.md index 27f373bf13f..16138e7e86d 100644 --- a/website/docs/docs/cloud/secure/snowflake-privatelink.md +++ b/website/docs/docs/cloud/secure/snowflake-privatelink.md @@ -12,14 +12,14 @@ The following steps will walk you through the setup of a Snowflake AWS PrivateLi 1. Open a Support case with Snowflake to allow access from the dbt Cloud AWS account - Snowflake prefers that the account owner opens the Support case directly, rather than dbt Labs acting on their behalf. For more information, refer to [Snowflake's knowledge base article](https://community.snowflake.com/s/article/HowtosetupPrivatelinktoSnowflakefromCloudServiceVendors) - Provide them with your dbt Cloud account ID along with any other information requested in the article. - - AWS account ID: `346425330055` - _NOTE: This account ID only applies to dbt Cloud Multi-Tenant environments. For Virtual Private/Single-Tenant account IDs please contact [Support](https://docs.getdbt.com/guides/legacy/getting-help#dbt-cloud-support)._ + - AWS account ID: `346425330055` - _NOTE: This account ID only applies to dbt Cloud Multi-Tenant environments. For Virtual Private/Single-Tenant account IDs please contact [Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support)._ - You will need to have `ACCOUNTADMIN` access to the Snowflake instance to submit a Support request. 2. After Snowflake has granted the requested access, run the Snowflake system function [SYSTEM$GET_PRIVATELINK_CONFIG](https://docs.snowflake.com/en/sql-reference/functions/system_get_privatelink_config.html) and copy the output. -3. Add the required information to the template below, and submit your request to [dbt Support](https://docs.getdbt.com/guides/legacy/getting-help#dbt-cloud-support): +3. Add the required information to the template below, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support): ``` Subject: New Multi-Tenant PrivateLink Request diff --git a/website/docs/docs/collaborate/environments/dbt-cloud-environments.md b/website/docs/docs/collaborate/environments/dbt-cloud-environments.md deleted file mode 100644 index 0157f6cc4c4..00000000000 --- a/website/docs/docs/collaborate/environments/dbt-cloud-environments.md +++ /dev/null @@ -1,221 +0,0 @@ ---- -title: "dbt Cloud environments" -id: "dbt-cloud-environments" ---- - - -An environment determines how dbt Cloud will execute your project in both the dbt Cloud IDE and scheduled jobs. Critically, in order to execute dbt, environments define three variables: - -1. The version of dbt Core that will be used to run your project -2. The warehouse connection information (including the target database/schema settings) -3. The version of your code to execute - -For users familiar with development on the CLI, each environment is roughly analogous to an entry in your `profiles.yml` file, with some additional information about your repository to ensure the proper version of code is executed. More info on dbt core environments [here](/docs/collaborate/environments/dbt-core-environments). - -## Types of environments - -In dbt Cloud, there are two types of environments: deployment and development. Deployment environments determine the settings used when jobs created within that environment are executed. Development environments determine the settings used in the dbt Cloud IDE for that particular dbt Cloud Project. Each dbt Cloud project can only have a single development environment but can have any number of deployment environments. - -| | Development Environments | Deployment Environments | -| --- | --- | --- | -| Determines settings for | dbt Cloud IDE | dbt Cloud Job runs | -| How many can I have in my project? | 1 | Any number | - -## Common environment settings - -Both development and deployment environments have a section called **General Settings**, which has some basic settings that all environments will define: - -| Setting | Example Value | Definition | Accepted Values | -| --- | --- | --- | --- | -| Name | Production | The environment name | Any string! | -| Environment Type | Deployment | The type of environment | [Deployment, Development] | -| dbt Version | 1.4 (latest) | The dbt version used | Any dbt version in the dropdown | -| Default to Custom Branch | ☑️ | Determines whether to use a branch other than the repository’s default | See below | -| Custom Branch | dev | Custom Branch name | See below | - -**dbt Version notes** - -- dbt Cloud allows users to select any dbt release. At this time, **environments must use a dbt version greater than or equal to v1.0.0;** [lower versions are no longer supported](/docs/dbt-versions/upgrade-core-in-cloud). -- If you select a current version with `(latest)` in the name, your environment will automatically install the latest stable version of the minor version selected. - -**Custom branch behavior** - -By default, all environments will use the default branch in your repository (usually the `main` branch) when accessing your dbt code. This is overridable within each dbt Cloud Environment using the **Default to a custom branch** option. This setting have will have slightly different behavior depending on the environment type: - -- **Development**: determines which branch in the dbt Cloud IDE developers create branches from and open PRs against -- **Deployment:** determines the branch is cloned during job executions for each environment. - -For more info, check out this [FAQ page on this topic](/faqs/Environments/custom-branch-settings)! - -## Create a development environment - -To create a new dbt Cloud development environment, navigate to **Deploy** -> **Environments** and then click **Create Environment**. Select **Development** as the environment type. - -After setting the **General Settings** as above, there’s nothing more that needs to be done on the environments page. Click **Save** to create the environment. - -### Set developer credentials - -To use the IDE, each developer will need to set up [personal development credentials](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud#access-the-cloud-ide) to your warehouse connection in their **Profile Settings**. This allows users to set separate target information, as well as maintain individual credentials to connect to your warehouse via the dbt Cloud IDE. - - - - -## Create a deployment environment - -To create a new dbt Cloud development environment, navigate to **Deploy** -> **Environments** and then click **Create Environment**. Select **Deployment** as the environment type. - -### Semantic Layer - -For Semantic Layer-eligible customers, the next section of environment settings is the Semantic Layer configurations. [The Semantic Layer setup guide](/docs/use-dbt-semantic-layer/setup-dbt-semantic-layer) has the most up-to-date setup instructions! - -### Deployment connection - -:::info Warehouse Connections - - Warehouse connections are set at the Project level for dbt Cloud accounts, and each Project can have one connection (Snowflake account, Redshift host, Bigquery project, Databricks host, etc.). Some details of that connection (databases/schemas/etc.) can be overridden within this section of the dbt Cloud environment settings. - -::: - -This section determines the exact location in your warehouse dbt should target when building warehouse objects! This section will look a bit different depending on your warehouse provider. - - - - -
- -This section will not appear if you are using Postgres, as all values are inferred from the project's connection. - -
- -
- -This section will not appear if you are using Redshift, as all values are inferred from the project's connection. - -
- -
- - - -#### Editable fields - -- **Role**: Snowflake role -- **Database**: Target database -- **Warehouse**: Snowflake warehouse - -
- -
- -This section will not appear if you are using Bigquery, as all values are inferred from the project's connection. - -
- -
- -This section will not appear if you are using Spark, as all values are inferred from the project's connection. - -
- -
- - - -#### Editable fields - -- **Catalog** (optional): [Unity Catalog namespace](/docs/core/connect-data-platform/databricks-setup) - -
- -
- - -### Deployment credentials - -This section allows you to determine the credentials that should be used when connecting to your warehouse. The authentication methods may differ depending on the warehouse and dbt Cloud tier you are on. - - - -
- - - -#### Editable fields - -- **Username**: Postgres username to use (most likely a service account) -- **Password**: Postgres password for the listed user -- **Schema**: Target schema - -
- -
- - - -#### Editable fields - -- **Username**: Redshift username to use (most likely a service account) -- **Password**: Redshift password for the listed user -- **Schema**: Target schema - -
- -
- - - -#### Editable fields - -- **Auth Method**: This determines the way dbt connects to your warehouse - - One of: [**Username & Password**, **Key Pair**] -- If **Username & Password**: - - **Username**: username to use (most likely a service account) - - **Password**: password for the listed user -- If **Key Pair**: - - **Username**: username to use (most likely a service account) - - **Private Key**: value of the Private SSH Key (optional) - - **Private Key Passphrase**: value of the Private SSH Key Passphrase (optional, only if required) -- **Schema**: Target Schema for this environment - -
- -
- - - -#### Editable fields - -- **Dataset**: Target dataset - -
- -
- - - -#### Editable fields - -- **Token**: Access token -- **Schema**: Target schema - -
- -
- - - -#### Editable fields - -- **Token**: Access token -- **Schema**: Target schema - -
- -
- - -## Related docs - -- [Upgrade Core version in Cloud](/docs/dbt-versions/upgrade-core-in-cloud) -- [Delete a job or environment in dbt Cloud](/faqs/Environments/delete-environment-job) -- [Develop in Cloud](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud) diff --git a/website/docs/docs/collaborate/environments/environments-in-dbt.md b/website/docs/docs/collaborate/environments/environments-in-dbt.md deleted file mode 100644 index a7c4296be25..00000000000 --- a/website/docs/docs/collaborate/environments/environments-in-dbt.md +++ /dev/null @@ -1,18 +0,0 @@ ---- -title: "Environments in dbt" -id: "environments-in-dbt" ---- - -In software engineering, environments are used to enable engineers to develop and test code without impacting the users of their software. - -“Production” (or _prod_) refers to the environment that end users interact with, while “development” (or _dev_) is the environment that engineers work in. This means that engineers can work iteratively when writing and testing new code in _development_, and once they are confident in these changes, deploy their code to _production_. - -In traditional software engineering, different environments often use completely separate architecture. For example, the dev and prod versions of a website may use different servers and databases. - -Data warehouses can also be designed to have separate environments – the _production_ environment refers to the relations (for example, schemas, tables, and views) that your end users query (often through a BI tool). - - -## Related docs -- [About dbt Core versions](/docs/dbt-versions/core) -- [Set Environment variables in dbt Cloud](/docs/build/environment-variables#special-environment-variables) -- [Use Environment variables in jinja](/reference/dbt-jinja-functions/env_var) diff --git a/website/docs/docs/collaborate/git/version-control-basics.md b/website/docs/docs/collaborate/git/version-control-basics.md index 332c8e3f71a..5c88d9536b4 100644 --- a/website/docs/docs/collaborate/git/version-control-basics.md +++ b/website/docs/docs/collaborate/git/version-control-basics.md @@ -57,6 +57,26 @@ Refer to [merge conflicts](/docs/collaborate/git/merge-conflicts) to learn how t ## The .gitignore file -dbt Labs recommends that you exclude files so they're not tracked by Git and won't slow down your dbt project. +To make sure dbt Cloud runs smoothly, you must exclude certain sub-folders in your git repository containing your dbt project from being tracked by git. You can achieve this by adding three lines to a special file named [.gitignore](https://github.com/dbt-labs/dbt-starter-project/blob/main/.gitignore). This file is placed in the root folder of your dbt project. -You can do this with a special file named [.gitignore](https://github.com/dbt-labs/dbt-starter-project/blob/main/.gitignore) which is automatically included in your dbt project after you initialize it in dbt Cloud. The `.gitignore` file must be placed at the root of your dbt project. +Some git providers will automatically create a 'boilerplate' `.gitignore` file when the repository is created. However, based on dbt Labs' experience, these default `.gitignore` files typically don't include the required entries for dbt Cloud to function correctly. + +The `.gitignore` file can include unrelated files and folders if the code repository requires it. However, the following folders must be included in the `gitignore` file to ensure dbt Cloud operates smoothly: + +``` +dbt_packages/ +logs/ +target/ +``` + +**Note** — By using a trailing slash, these lines in the `gitignore` file serve as 'folder wildcards', excluding all files and folders within those folders from being tracked by git. + + +:::note + +- **dbt Cloud projects created after Dec 1, 2022** — If you use the **Initialize dbt Project** button in the dbt Cloud IDE to setup a new and empty dbt project, dbt Cloud will automatically add a `.gitignore` file with the required entries. If a `.gitignore` file already exists, the necessary folders will be appended to the existing file. + +- **Migrating project from Core to dbt Cloud** — Make sure you check the `.gitignore` file contains the necessary entries. dbt Core doesn't interact with git so dbt Cloud doesn't automatically add or verify entries in the `.gitignore` file. Additionally, if the repository already contains dbt code and doesn't require initialization, dbt Cloud won't add any missing entries to the .gitignore file. +::: + +For additional info or troubleshooting tips please refer to the [detailed FAQ](/faqs/Git/gitignore). diff --git a/website/docs/docs/collaborate/govern/model-access.md b/website/docs/docs/collaborate/govern/model-access.md index ed806d054aa..95928110862 100644 --- a/website/docs/docs/collaborate/govern/model-access.md +++ b/website/docs/docs/collaborate/govern/model-access.md @@ -108,6 +108,47 @@ models: + + +Models with `materialized` set to `ephemeral` cannot have the access property set to public. + +For example, if you have model confg set as: + + + +```sql + +{{ config(materialized='ephemeral') }} + +``` + + + +And the model contract is defined: + + + +```yaml + +models: + - name: my_model + access: public + +``` + + + +It will lead to the following error: + +``` +❯ dbt parse +02:19:30 Encountered an error: +Parsing Error + Node model.jaffle_shop.my_model with 'ephemeral' materialization has an invalid value (public) for the access field +``` + + + ## FAQs ### How does model access relate to database permissions? diff --git a/website/docs/docs/collaborate/govern/model-versions.md b/website/docs/docs/collaborate/govern/model-versions.md index 9ee9c345d8e..64559aa70af 100644 --- a/website/docs/docs/collaborate/govern/model-versions.md +++ b/website/docs/docs/collaborate/govern/model-versions.md @@ -326,7 +326,13 @@ We intend to build this into `dbt-core` as out-of-the-box functionality. (Upvote -- otherwise, it's a no-op {% if model.get('version') and model.get('version') == model.get('latest_version') %} - {% set new_relation = this.incorporate(path={"identifier": model['name']}) %} + {% set new_relation = this.incorporate(path={"identifier": model['name']}) %} + + {% set existing_relation = load_relation(new_relation) %} + + {% if existing_relation and not existing_relation.is_view %} + {{ drop_relation_if_exists(existing_relation) }} + {% endif %} {% set create_view_sql -%} -- this syntax may vary by data platform diff --git a/website/docs/docs/core/connect-data-platform/bigquery-setup.md b/website/docs/docs/core/connect-data-platform/bigquery-setup.md index e27c494f500..bfb97237b29 100644 --- a/website/docs/docs/core/connect-data-platform/bigquery-setup.md +++ b/website/docs/docs/core/connect-data-platform/bigquery-setup.md @@ -501,12 +501,44 @@ my-profile: project: abc-123 dataset: my_dataset - # for dbt Python models + # for dbt Python models to be run on a Dataproc cluster gcs_bucket: dbt-python dataproc_cluster_name: dbt-python dataproc_region: us-central1 ``` +Alternatively, Dataproc Serverless can be used: + +```yaml +my-profile: + target: dev + outputs: + dev: + type: bigquery + method: oauth + project: abc-123 + dataset: my_dataset + + # for dbt Python models to be run on Dataproc Serverless + gcs_bucket: dbt-python + dataproc_region: us-central1 + submission_method: serverless + dataproc_batch: + environment_config: + execution_config: + service_account: dbt@abc-123.iam.gserviceaccount.com + subnetwork_uri: regions/us-central1/subnetworks/dataproc-dbt + labels: + project: my-project + role: dev + runtime_config: + properties: + spark.executor.instances: 3 + spark.driver.memory: 1g +``` + +For a full list of possible configuration fields that can be passed in `dataproc_batch`, refer to the [Dataproc Serverless Batch](https://cloud.google.com/dataproc-serverless/docs/reference/rpc/google.cloud.dataproc.v1#google.cloud.dataproc.v1.Batch) documentation. + ## Required permissions diff --git a/website/docs/docs/core/connect-data-platform/connection-profiles.md b/website/docs/docs/core/connect-data-platform/connection-profiles.md index da3256fec02..8088ff1dfa7 100644 --- a/website/docs/docs/core/connect-data-platform/connection-profiles.md +++ b/website/docs/docs/core/connect-data-platform/connection-profiles.md @@ -91,7 +91,7 @@ Use the [debug](/reference/dbt-jinja-functions/debug-method) command to check wh ## Understanding targets in profiles -dbt supports multiple targets within one profile to encourage the use of separate development and production environments as discussed in [dbt Core Environments](/docs/collaborate/environments/dbt-core-environments). +dbt supports multiple targets within one profile to encourage the use of separate development and production environments as discussed in [dbt Core Environments](/docs/core/dbt-core-environments). A typical profile for an analyst using dbt locally will have a target named `dev`, and have this set as the default. diff --git a/website/docs/docs/core/connect-data-platform/dremio-setup.md b/website/docs/docs/core/connect-data-platform/dremio-setup.md index 4d10464400f..fa6ca154fcd 100644 --- a/website/docs/docs/core/connect-data-platform/dremio-setup.md +++ b/website/docs/docs/core/connect-data-platform/dremio-setup.md @@ -49,7 +49,11 @@ pip is the easiest way to install the adapter:

For further info, refer to the GitHub repository: {frontMatter.meta.github_repo}

-Follow the repository's link for os dependencies. +Follow the repository's link for OS dependencies. + +:::note +[Model contracts](/docs/collaborate/govern/model-contracts) are not supported. +::: ## Prerequisites for Dremio Cloud Before connecting from project to Dremio Cloud, follow these prerequisite steps: diff --git a/website/docs/docs/collaborate/environments/dbt-core-environments.md b/website/docs/docs/core/dbt-core-environments.md similarity index 98% rename from website/docs/docs/collaborate/environments/dbt-core-environments.md rename to website/docs/docs/core/dbt-core-environments.md index f983e6f31ba..5daf17bddf9 100644 --- a/website/docs/docs/collaborate/environments/dbt-core-environments.md +++ b/website/docs/docs/core/dbt-core-environments.md @@ -1,5 +1,5 @@ --- -title: "dbt Core Environments" +title: "dbt Core environments" id: "dbt-core-environments" --- diff --git a/website/docs/docs/core/docker-install.md b/website/docs/docs/core/docker-install.md index cf0fea5fffe..dfb2a669e34 100644 --- a/website/docs/docs/core/docker-install.md +++ b/website/docs/docs/core/docker-install.md @@ -51,4 +51,4 @@ In particular, the Dockerfile supports building images: - Images that install one or more third-party adapters - Images against another system architecture -Please note that, if you go the route of building your own Docker images, we are unable to offer dedicated support for custom use cases. If you run into problems, you are welcome to [ask the community for help](/guides/legacy/getting-help) or [open an issue](/community/resources/oss-expectations#issues) in the `dbt-core` repository. If many users are requesting the same enhancement, we will tag the issue `help_wanted` and invite community contribution. +Please note that, if you go the route of building your own Docker images, we are unable to offer dedicated support for custom use cases. If you run into problems, you are welcome to [ask the community for help](/community/resources/getting-help) or [open an issue](/community/resources/oss-expectations#issues) in the `dbt-core` repository. If many users are requesting the same enhancement, we will tag the issue `help_wanted` and invite community contribution. diff --git a/website/docs/docs/dbt-cloud-apis/admin-cloud-api.md b/website/docs/docs/dbt-cloud-apis/admin-cloud-api.md index 3ff3061518c..62b13f7aeb5 100644 --- a/website/docs/docs/dbt-cloud-apis/admin-cloud-api.md +++ b/website/docs/docs/dbt-cloud-apis/admin-cloud-api.md @@ -15,14 +15,20 @@ Check out our dbt Cloud Admin API docs to help you access the API:
+ + diff --git a/website/docs/docs/dbt-cloud-apis/discovery-api.md b/website/docs/docs/dbt-cloud-apis/discovery-api.md index a9d53696890..16c9bc16ec4 100644 --- a/website/docs/docs/dbt-cloud-apis/discovery-api.md +++ b/website/docs/docs/dbt-cloud-apis/discovery-api.md @@ -13,7 +13,7 @@ You can access the Discovery API through [ad hoc queries](/docs/dbt-cloud-apis/d You can query the dbt Cloud metadata: -- At the [environment](/docs/collaborate/environments/environments-in-dbt) level for both the latest state (use the `environment` endpoint) and historical run results (use `modelByEnvironment`) of a dbt Cloud project in production. +- At the [environment](/docs/environments-in-dbt) level for both the latest state (use the `environment` endpoint) and historical run results (use `modelByEnvironment`) of a dbt Cloud project in production. - At the job level for results on a specific dbt Cloud job run for a given resource type, like `models` or `test`. :::tip Public Preview diff --git a/website/docs/docs/dbt-cloud-apis/discovery-querying.md b/website/docs/docs/dbt-cloud-apis/discovery-querying.md index 8d602e73e5f..f75369e92a8 100644 --- a/website/docs/docs/dbt-cloud-apis/discovery-querying.md +++ b/website/docs/docs/dbt-cloud-apis/discovery-querying.md @@ -8,7 +8,7 @@ The Discovery API supports ad-hoc queries and integrations.. If you are new to t Use the Discovery API to evaluate data pipeline health and project state across runs or at a moment in time. dbt Labs provide a [GraphQL explorer](https://metadata.cloud.getdbt.com/graphql) for this API, enabling you to run queries and browse the schema. -Since GraphQL provides a description of the data in the API, the schema displayed in the GraphQL explorer accurately represents the graph and fields available to query. +Since GraphQL describes the data in the API, the schema displayed in the GraphQL explorer accurately represents the graph and fields available to query. @@ -16,11 +16,11 @@ Since GraphQL provides a description of the data in the API, the schema displaye Currently, authorization of requests takes place [using a service token](/docs/dbt-cloud-apis/service-tokens). dbt Cloud admin users can generate a Metadata Only service token that is authorized to execute a specific query against the Discovery API. -Once you've created a token, you can use it in the Authorization header of requests to the dbt Cloud Discovery API. Be sure to include the Token prefix in the Authorization header, or the request will fail with a `401 Unauthorized` error. Note that `Bearer` can be used in place of `Token` in the Authorization header. Both syntaxes are equivalent. +Once you've created a token, you can use it in the Authorization header of requests to the dbt Cloud Discovery API. Be sure to include the Token prefix in the Authorization header, or the request will fail with a `401 Unauthorized` error. Note that `Bearer` can be used instead of `Token` in the Authorization header. Both syntaxes are equivalent. ## Access the Discovery API -1. Create a [service account token](/docs/dbt-cloud-apis/service-tokens) to authorize requests. dbt Cloud Admin users can generate a _Metadata Only_ service token, which can be used to execute a specific query against the Discovery API for authorization of requests. +1. Create a [service account token](/docs/dbt-cloud-apis/service-tokens) to authorize requests. dbt Cloud Admin users can generate a _Metadata Only_ service token, which can be used to execute a specific query against the Discovery API to authorize requests. 2. Find your API URL using the endpoint `https://metadata.{YOUR_ACCESS_URL}/graphql`. @@ -57,14 +57,15 @@ metadata = response.json()['data'][ENDPOINT] Every query will require an environment ID or job ID. You can get the ID from a dbt Cloud URL or using the Admin API. -There are several illustrative example queries in this documentation. You can see an examples in the [use case guide](/docs/dbt-cloud-apis/discovery-use-cases-and-examples). +There are several illustrative example queries on this page. For more examples, refer to [Use cases and examples for the Discovery API](/docs/dbt-cloud-apis/discovery-use-cases-and-examples). ## Reasonable use -To maintain performance and stability, and prevent abuse, Discovery (GraphQL) API usage is subject to request rate and response size limits. -- The current request rate limit is 200 requests within a minute for a given IP address. If a user exceeds this limit, they will receive an HTTP 429 response status. +Discovery (GraphQL) API usage is subject to request rate and response size limits to maintain the performance and stability of the metadata platform and prevent abuse. +- The current request rate limit is 200 requests for a given IP address within a minute. If you exceed this limit, you will receive an HTTP 429 response status. - Environment-level endpoints will be subject to response size limits in the future. The depth of the graph should not exceed three levels. A user can paginate up to 500 items per query. +- Job-level endpoints are subject to query complexity limits. Nested nodes (like parents), code (like rawCode), and catalog columns are considered as most complex. Overly complex queries should be broken up into separate queries with only necessary fields included. dbt Labs recommends using the environment endpoint instead for most use cases to get the latest descriptive and result metadata for a dbt Cloud project. ## Retention limits You can use the Discovery API to query data from the previous three months. For example, if today was April 1st, you could query data back to January 1st. diff --git a/website/docs/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md b/website/docs/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md index 4e00f88d563..030688d9aeb 100644 --- a/website/docs/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md +++ b/website/docs/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md @@ -163,52 +163,56 @@ The API returns full identifier information (`database.schema.alias`) and the `e ```graphql - environment(id: $environmentId) { - applied { - models(first: $first) { - edges { - node { - uniqueId - compiledCode - database - schema - alias - materializedType - executionInfo { - executeCompletedAt - lastJobDefinitionId - lastRunGeneratedAt - lastRunId - lastRunStatus - lastRunError - lastSuccessJobDefinitionId - runGeneratedAt - lastSuccessRunId - } - } - } - } - } - } + query($environmentId: Int!, $first: Int!){ + environment(id: $environmentId) { + applied { + models(first: $first) { + edges { + node { + uniqueId + compiledCode + database + schema + alias + materializedType + executionInfo { + executeCompletedAt + lastJobDefinitionId + lastRunGeneratedAt + lastRunId + lastRunStatus + lastRunError + lastSuccessJobDefinitionId + runGeneratedAt + lastSuccessRunId + } + } + } + } + } + } + } ```
### What happened with my job run? -To review results for specific runs, you can query the metadata at the job level. This is helpful for historical analysis of deployment performance or optimizing particular jobs. +You can query the metadata at the job level to review results for specific runs. This is helpful for historical analysis of deployment performance or optimizing particular jobs.
Example query ```graphql -models(jobId: $jobId, runId: $runId) { - name - status - tests { - name - status - } +query($jobId: Int!, $runId: Int!){ + models(jobId: $jobId, runId: $runId) { + name + status + tests { + name + status + } + } } ``` @@ -224,39 +228,41 @@ With the API, you can compare the `rawCode` between the definition and applied s ```graphql -environment(id: $environmentId) { - applied { - models(first: $first, filter: {uniqueIds:"MODEL.PROJECT.MODEL_NAME"}) { - edges { - node { - rawCode - ancestors(types: [Source]){ - ...on SourceAppliedStateNode { - freshness { - maxLoadedAt - } - } - } - executionInfo { - runGeneratedAt - executeCompletedAt - } - materializedType - } - } - } - } - definition { - models(first: $first, filter: {uniqueIds:"MODEL.PROJECT.MODEL_NAME"}) { - edges { - node { - rawCode - runGeneratedAt - materializedType - } - } - } - } +query($environmentId: Int!, $first: Int!){ + environment(id: $environmentId) { + applied { + models(first: $first, filter: {uniqueIds:"MODEL.PROJECT.MODEL_NAME"}) { + edges { + node { + rawCode + ancestors(types: [Source]){ + ...on SourceAppliedStateNode { + freshness { + maxLoadedAt + } + } + } + executionInfo { + runGeneratedAt + executeCompletedAt + } + materializedType + } + } + } + } + definition { + models(first: $first, filter: {uniqueIds:"MODEL.PROJECT.MODEL_NAME"}) { + edges { + node { + rawCode + runGeneratedAt + materializedType + } + } + } + } + } } ``` @@ -278,27 +284,30 @@ By filtering on the latest status, you can get lists of models that failed to bu ```graphql -environment(id: $environmentId) { - applied { - models(first: $first, filter: {lastRunStatus:error}) { - edges { - node { - name - executionInfo { - lastRunId - } - } - } - } - tests(first: $first, filter: {status:"fail"}) { - edges { - node { - name - executionInfo { - lastRunId - } - } - } +query($environmentId: Int!, $first: Int!){ + environment(id: $environmentId) { + applied { + models(first: $first, filter: {lastRunStatus:error}) { + edges { + node { + name + executionInfo { + lastRunId + } + } + } + } + tests(first: $first, filter: {status:"fail"}) { + edges { + node { + name + executionInfo { + lastRunId + } + } + } + } + } } } ``` @@ -307,7 +316,7 @@ environment(id: $environmentId) { ```graphql -query ModelByEnvironment($environmentId: Int!, $uniqueId: String!, $lastRunCount: Int) { +query($environmentId: Int!, $uniqueId: String!, $lastRunCount: Int) { modelByEnvironment(environmentId: $environmentId, uniqueId: $uniqueId, lastRunCount: $lastRunCount) { name executeStartedAt @@ -335,48 +344,50 @@ You can get the metadata on the latest execution for a particular model or acros ```graphql -environment(id: $environmentId) { - applied { - models(first: $first,filter:{uniqueIds:"MODEL.PROJECT.MODEL_NAME"}) { - edges { - node { - name - ancestors(types:[Model, Source, Seed, Snapshot]) { - ... on ModelAppliedStateNode { - name - resourceType - materializedType - executionInfo { - executeCompletedAt - } - } - ... on SourceAppliedStateNode { - sourceName - name - resourceType - freshness { - maxLoadedAt - } - } - ... on SnapshotAppliedStateNode { - name - resourceType - executionInfo { - executeCompletedAt - } - } - ... on SeedAppliedStateNode { - name - resourceType - executionInfo { - executeCompletedAt - } - } - } - } - } - } - } +query($environmentId: Int!, $first: Int!){ + environment(id: $environmentId) { + applied { + models(first: $first,filter:{uniqueIds:"MODEL.PROJECT.MODEL_NAME"}) { + edges { + node { + name + ancestors(types:[Model, Source, Seed, Snapshot]) { + ... on ModelAppliedStateNode { + name + resourceType + materializedType + executionInfo { + executeCompletedAt + } + } + ... on SourceAppliedStateNode { + sourceName + name + resourceType + freshness { + maxLoadedAt + } + } + ... on SnapshotAppliedStateNode { + name + resourceType + executionInfo { + executeCompletedAt + } + } + ... on SeedAppliedStateNode { + name + resourceType + executionInfo { + executeCompletedAt + } + } + } + } + } + } + } + } } ``` @@ -447,39 +458,41 @@ Checking [source freshness](/docs/build/sources#snapshotting-source-data-freshne Example query ```graphql -environment(id: $environmentId) { - applied { - sources(first: $first, filters:{freshnessChecked:true, database:"production"}) { - edges { - node { - sourceName - name - identifier - loader - freshness { - freshnessJobDefinitionId - freshnessRunId - freshnessRunGeneratedAt - freshnessStatus - freshnessChecked - maxLoadedAt - maxLoadedAtTimeAgoInS - snapshottedAt - criteria { - errorAfter { - count - period - } - warnAfter { - count - period - } - } - } - } - } - } - } +query($environmentId: Int!, $first: Int!){ + environment(id: $environmentId) { + applied { + sources(first: $first, filters:{freshnessChecked:true, database:"production"}) { + edges { + node { + sourceName + name + identifier + loader + freshness { + freshnessJobDefinitionId + freshnessRunId + freshnessRunGeneratedAt + freshnessStatus + freshnessChecked + maxLoadedAt + maxLoadedAtTimeAgoInS + snapshottedAt + criteria { + errorAfter { + count + period + } + warnAfter { + count + period + } + } + } + } + } + } + } + } } ``` @@ -496,27 +509,29 @@ environment(id: $environmentId) { For the following example, the `parents` are the nodes (code) that's being tested and `executionInfo` describes the latest test results: ```graphql -environment(id: $environmentId) { - applied { - tests(first: $first) { - edges { - node { - name - columnName - parents { - name - resourceType - } - executionInfo { - lastRunStatus - lastRunError - executeCompletedAt - executionTime - } - } - } - } - } +query($environmentId: Int!, $first: Int!){ + environment(id: $environmentId) { + applied { + tests(first: $first) { + edges { + node { + name + columnName + parents { + name + resourceType + } + executionInfo { + lastRunStatus + lastRunError + executeCompletedAt + executionTime + } + } + } + } + } + } } ``` @@ -533,34 +548,36 @@ To enforce the shape of a model's definition, you can define contracts on models ```graphql -environment(id:123) { - definition { +query{ + environment(id:123) { + definition { models(first:100, filter:{access:public}) { - edges { - nodes { - name - latest_version - contract_enforced - constraints{ - name - type - expression - columns - } - catalog { - columns { - name - type - constraints { - name - type - expression - } - } - } - } - } + edges { + nodes { + name + latest_version + contract_enforced + constraints{ + name + type + expression + columns + } + catalog { + columns { + name + type + constraints { + name + type + expression + } + } + } + } + } } + } } } ``` @@ -584,26 +601,28 @@ Query the Discovery API to map a table/view in the data platform to the model in Example query ```graphql -environment(id: $environmentId) { - applied { - models(first: $first, filter: {database:"analytics", schema:"prod", identifier:"customers"}) { - edges { - node { - name - description - tags - meta - catalog { - columns { - name - description - type - } - } - } - } - } - } +query($environmentId: Int!, $first: Int!){ + environment(id: $environmentId) { + applied { + models(first: $first, filter: {database:"analytics", schema:"prod", identifier:"customers"}) { + edges { + node { + name + description + tags + meta + catalog { + columns { + name + description + type + } + } + } + } + } + } + } } ```
@@ -727,7 +746,7 @@ query Lineage($environmentId: Int!, $first: Int!) { } ``` -Then, extract the node definitions and create a lineage graph. You can traverse downstream from sources and seeds (adding an edge from each node with children to its children) or iterate through each node’s parents (if it has them). Keep in mind that models and snapshots can have parents and children, whereas sources and seeds have only children and exposures and metrics only have parents. +Then, extract the node definitions and create a lineage graph. You can traverse downstream from sources and seeds (adding an edge from each node with children to its children) or iterate through each node’s parents (if it has them). Keep in mind that models, snapshots, and metrics can have parents and children, whereas sources and seeds have only children and exposures only have parents. 2. Extract the node definitions, construct a lineage graph, and plot the graph. @@ -863,25 +882,27 @@ Metric definitions are coming soon to the Discovery API with dbt v1.6. You’ll Example query ```graphql -environment(id: $environmentId) { - definition { - metrics(first: $first) { - edges { - node { - name - description - type - formula - filter - tags - parents { - name - resourceType - } - } - } - } - } +query($environmentId: Int!, $first: Int!){ + environment(id: $environmentId) { + definition { + metrics(first: $first) { + edges { + node { + name + description + type + formula + filter + tags + parents { + name + resourceType + } + } + } + } + } + } } ``` @@ -905,35 +926,37 @@ You can define and surface the groups each model is associated with. Groups cont Example query ```graphql -environment(id: $environmentId) { - applied { - model(first: $first, filter:{uniqueIds:["MODEL.PROJECT.NAME"]}) { - edges { - node { - name - description - resourceType - access - group - } - } - } - } - definition { - groups(first: $first) { - edges { - node { - name - resourceType - models { - name - } - owner_name - owner_email - } - } - } - } +query($environmentId: Int!, $first: Int!){ + environment(id: $environmentId) { + applied { + model(first: $first, filter:{uniqueIds:["MODEL.PROJECT.NAME"]}) { + edges { + node { + name + description + resourceType + access + group + } + } + } + } + definition { + groups(first: $first) { + edges { + node { + name + resourceType + models { + name + } + owner_name + owner_email + } + } + } + } + } } ```
@@ -947,31 +970,34 @@ You can enable users the ability to specify the level of access for a given mode Example query ```graphql -environment(id: $environmentId) { - definition { - models(first: $first) { - edges { - node { - name - access - } - } - } - } +query($environmentId: Int!, $first: Int!){ + environment(id: $environmentId) { + definition { + models(first: $first) { + edges { + node { + name + access + } + } + } + } + } } --- - -environment(id: $environmentId) { - definition { - models(first: $first, filters:{access:public}) { - edges { - node { - name - } - } - } - } +query($environmentId: Int!, $first: Int!){ + environment(id: $environmentId) { + definition { + models(first: $first, filters:{access:public}) { + edges { + node { + name + } + } + } + } + } } ``` @@ -996,35 +1022,37 @@ For development use cases, people typically query the historical or latest defin This example reviews an exposure and the models used in it, including when they were last executed and their test results: ```graphql -environment(id: $environmentId) { - applied { - exposures(first: $first) { - edges { - node { - name - description - owner_name - url - parents { - name - resourceType - ... on ModelAppliedStateNode { - executionInfo { - executeCompletedAt - lastRunStatus - } - tests { - executionInfo { - executeCompletedAt - lastRunStatus - } - } - } - } - } - } - } - } +query($environmentId: Int!, $first: Int!){ + environment(id: $environmentId) { + applied { + exposures(first: $first) { + edges { + node { + name + description + owner_name + url + parents { + name + resourceType + ... on ModelAppliedStateNode { + executionInfo { + executeCompletedAt + lastRunStatus + } + tests { + executionInfo { + executeCompletedAt + lastRunStatus + } + } + } + } + } + } + } + } + } } ``` @@ -1039,16 +1067,18 @@ The Discovery API provides historical information about any resource in your pro Review the differences in `compiledCode` or `columns` between runs or plot the “Approximate Size” and “Row Count” `stats` over time: ```graphql -modelByEnvironment(environmentId: $environmentId, uniqueId: $uniqueId, lastRunCount: $lastRunCount, withCatalog: $withCatalog) { - name - compiledCode - columns { - name +query(environmentId: Int!, uniqueId: String!, lastRunCount: Int!, withCatalog: Boolean!){ + modelByEnvironment(environmentId: $environmentId, uniqueId: $uniqueId, lastRunCount: $lastRunCount, withCatalog: $withCatalog) { + name + compiledCode + columns { + name + } + stats { + label + value + } } - stats { - label - value - } } ``` @@ -1061,28 +1091,30 @@ dbt lineage begins with data sources. For a given source, you can look at which Example query ```graphql -environment(id: $environmentId) { - applied { - sources(first: $first, filter:{uniqueIds:["SOURCE_NAME.TABLE_NAME"]}) { - edges { - node { - loader - children { - uniqueId - resourceType - ... on ModelAppliedStateNode { - database - schema - alias - children { - uniqueId - } - } - } - } - } - } - } +query($environmentId: Int!, $first: Int!){ + environment(id: $environmentId) { + applied { + sources(first: $first, filter:{uniqueIds:["SOURCE_NAME.TABLE_NAME"]}) { + edges { + node { + loader + children { + uniqueId + resourceType + ... on ModelAppliedStateNode { + database + schema + alias + children { + uniqueId + } + } + } + } + } + } + } + } } ``` diff --git a/website/docs/docs/dbt-cloud-apis/migrating-to-v2.md b/website/docs/docs/dbt-cloud-apis/migrating-to-v2.md index 1161dd159bd..3e6ac2c3577 100644 --- a/website/docs/docs/dbt-cloud-apis/migrating-to-v2.md +++ b/website/docs/docs/dbt-cloud-apis/migrating-to-v2.md @@ -10,7 +10,7 @@ In an attempt to provide an improved dbt Cloud Administrative API experience, th ## Key differences -When using the [List runs](/dbt-cloud/api-v2#tag/Runs) endpoint, you can include triggered runs and sort by ID. You can use the following request in v2 to get a similar response as v4, replacing the `{accountId}` with your own and `{YOUR_ACCESS_URL}` with the appropriate [Access URL](https://docs.getdbt.com/docs/cloud/about-cloud/regions-ip-addresses) for your region and plan: +When using the [List runs](/dbt-cloud/api-v2-legacy#tag/Runs) endpoint, you can include triggered runs and sort by ID. You can use the following request in v2 to get a similar response as v4, replacing the `{accountId}` with your own and `{YOUR_ACCESS_URL}` with the appropriate [Access URL](https://docs.getdbt.com/docs/cloud/about-cloud/regions-ip-addresses) for your region and plan: ```shell GET https://{YOUR_ACCESS_URL}/api/v2/accounts/{accountId}/runs/?include_related=[%22trigger%22]&order_by=-id diff --git a/website/docs/docs/dbt-cloud-apis/project-state.md b/website/docs/docs/dbt-cloud-apis/project-state.md index da74a7f60be..a5ee71ebb1b 100644 --- a/website/docs/docs/dbt-cloud-apis/project-state.md +++ b/website/docs/docs/dbt-cloud-apis/project-state.md @@ -6,7 +6,7 @@ dbt Cloud provides a stateful way of deploying dbt. Artifacts are accessible pro With the implementation of the `environment` endpoint in the Discovery API, we've introduced the idea of multiple states. The Discovery API provides a single API endpoint that returns the latest state of models, sources, and other nodes in the DAG. -A single [deployment environment](/docs/collaborate/environments/environments-in-dbt) should represent the production state of a given dbt Cloud project. +A single [deployment environment](/docs/environments-in-dbt) should represent the production state of a given dbt Cloud project. There are two states that can be queried in dbt Cloud: diff --git a/website/docs/docs/dbt-cloud-apis/schema-discovery-modelByEnv.mdx b/website/docs/docs/dbt-cloud-apis/schema-discovery-modelByEnv.mdx index 89dc57f643e..400735bdce4 100644 --- a/website/docs/docs/dbt-cloud-apis/schema-discovery-modelByEnv.mdx +++ b/website/docs/docs/dbt-cloud-apis/schema-discovery-modelByEnv.mdx @@ -25,14 +25,15 @@ You can use the `environment_id` and `model_unique_id` to return the model and i ```graphql -modelByEnvironment(environmentId: 834, uniqueId: "model.marketing.customers", lastRunCount: 20) { - runId, # Get historical results for a particular model - runGeneratedAt, - executionTime, # View build time across runs - status, - tests { name, status, executeCompletedAt } # View test results across runs +query{ + modelByEnvironment(environmentId: 834, uniqueId: "model.marketing.customers", lastRunCount: 20) { + runId, # Get historical results for a particular model + runGeneratedAt, + executionTime, # View build time across runs + status, + tests { name, status, executeCompletedAt } # View test results across runs } - +} ``` ### Fields diff --git a/website/docs/docs/dbt-cloud-apis/service-tokens.md b/website/docs/docs/dbt-cloud-apis/service-tokens.md index b2c50f6236d..139eff8fd07 100644 --- a/website/docs/docs/dbt-cloud-apis/service-tokens.md +++ b/website/docs/docs/dbt-cloud-apis/service-tokens.md @@ -57,6 +57,9 @@ Account Admin service tokens have full `read + write` access to an account, so p **Security Admin**
Security Admin service tokens have certain account-level permissions. For more on these permissions, see [Security Admin](/docs/cloud/manage-access/enterprise-permissions#security-admin). +**Billing Admin**
+Billing Admin service tokens have certain account-level permissions. For more on these permissions, see [Billing Admin](/docs/cloud/manage-access/enterprise-permissions#billing-admin). + **Metadata Only**
Metadata-only service tokens authorize requests to the Discovery API. diff --git a/website/docs/docs/dbt-cloud-environments.md b/website/docs/docs/dbt-cloud-environments.md new file mode 100644 index 00000000000..5eccf3e7400 --- /dev/null +++ b/website/docs/docs/dbt-cloud-environments.md @@ -0,0 +1,47 @@ +--- +title: "dbt Cloud environments" +id: "dbt-cloud-environments" +description: "Learn about dbt Cloud's development environment to execute your project in the IDE" +--- + +An environment determines how dbt Cloud will execute your project in both the dbt Cloud IDE (for development) and scheduled jobs (for deployment). + +Critically, in order to execute dbt, environments define three variables: + +1. The version of dbt Core that will be used to run your project +2. The warehouse connection information (including the target database/schema settings) +3. The version of your code to execute + +Each dbt Cloud project can have only one [development environment](#create-a-development-environment), but there is no limit to the number of [deployment environments](/docs/deploy/deploy-environments), providing you the flexibility and customization to tailor the execution of scheduled jobs. + +Use environments to customize settings for different stages of your project and streamline the execution process by using software engineering principles. This page will detail the different types of environments and how to intuitively configure your development environment in dbt Cloud. + + +import CloudEnvInfo from '/snippets/_cloud-environments-info.md'; + + + + +## Create a development environment + +To create a new dbt Cloud development environment: + +1. Navigate to **Deploy** -> **Environments** +2. Click **Create Environment**. +3. Select **Development** as the environment type. +4. Fill in the fields under **General Settings** and **Development Credentials**. +5. Click **Save** to create the environment. + +### Set developer credentials + +To use the IDE, each developer will need to set up [personal development credentials](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud#access-the-cloud-ide) to your warehouse connection in their **Profile Settings**. This allows you to set separate target information and maintain individual credentials to connect to your warehouse via the dbt Cloud IDE. + + + + + +## Deployment environment + +Deployment environments in dbt Cloud are crucial for executing scheduled jobs. A dbt Cloud project can have multiple deployment environments, allowing for flexibility and customization. + +To learn more about dbt Cloud deployments and how to configure deployment environments, visit the [Deployment environments](/docs/deploy/deploy-environments) page. For our best practices guide, read [dbt Cloud environment best practices](https://docs.getdbt.com/guides/best-practices/environment-setup/1-env-guide-overview) for more info. diff --git a/website/docs/docs/dbt-support.md b/website/docs/docs/dbt-support.md index 23bc3164c7d..a6e9262200c 100644 --- a/website/docs/docs/dbt-support.md +++ b/website/docs/docs/dbt-support.md @@ -10,7 +10,7 @@ If you're developing in the command line (CLI) and have questions or need some h ## dbt Cloud support We want to help you work through implementing and utilizing dbt Cloud at your organization. Have a question you can't find an answer to in [our docs](https://docs.getdbt.com/) or [the Community Forum](https://discourse.getdbt.com/)? Our Support team is here to `dbt help` you! -Check out our guide on [getting help](/guides/legacy/getting-help) - half of the problem is often knowing where to look... and how to ask good questions! +Check out our guide on [getting help](/community/resources/getting-help) - half of the problem is often knowing where to look... and how to ask good questions! Types of dbt Cloud-related questions our Support team can assist you with, regardless of your dbt Cloud plan: - **How do I...** diff --git a/website/docs/docs/dbt-versions/core-versions.md b/website/docs/docs/dbt-versions/core-versions.md index 9e429766d9c..328b6cf4166 100644 --- a/website/docs/docs/dbt-versions/core-versions.md +++ b/website/docs/docs/dbt-versions/core-versions.md @@ -4,7 +4,12 @@ id: "core" description: "Learn about semantic versioning for dbt Core, and how long those versions are supported." --- -dbt Core releases follow [semantic versioning](https://semver.org/) guidelines. For more on how we use semantic versions, see [How dbt Core uses semantic versioning](#how-dbt-core-uses-semantic-versioning). +dbt Core releases follow [semantic versioning](https://semver.org/) guidelines. For more on how we use semantic versions, see [How dbt Core uses semantic versioning](#how-dbt-core-uses-semantic-versioning). + +dbt Labs provides different support levels for different versions, which may include new features, bug fixes, or security patches: + + + @@ -20,9 +25,12 @@ All dbt Core versions released prior to 1.0 and their version-specific documenta ## EOL version support -All dbt Core versions with an end-of-life (EOL) support level will no longer receive bug fixes. To continue receiving bug fixes, dbt Labs recommends upgrading to a newer version. +All dbt Core minor versions that have reached end-of-life (EOL) will have no new patch releases. This means they will no longer receive any fixes, including for known bugs that have been identified. Fixes for those bugs will instead be made in newer minor versions that are still under active support. + +We recommend upgrading to a newer version in [dbt Cloud](/docs/dbt-versions/upgrade-core-in-cloud) or [dbt Core](/docs/core/installation#upgrading-dbt-core) to continue receiving support. + +All dbt Core v1.0 and later are available in dbt Cloud until further notice. In the future, we intend to align dbt Cloud availability with dbt Core ongoing support. You will receive plenty of advance notice before any changes take place. -All dbt Core versions v1.0 and later are available in dbt Cloud until further notice. In the future, we intend to align dbt Cloud availability with dbt Core ongoing support. You will receive plenty of advance notice before any changes take place. ## Current version support diff --git a/website/docs/docs/dbt-versions/experimental-features.md b/website/docs/docs/dbt-versions/experimental-features.md new file mode 100644 index 00000000000..35c64146149 --- /dev/null +++ b/website/docs/docs/dbt-versions/experimental-features.md @@ -0,0 +1,23 @@ +--- +title: "Preview new and experimental features in dbt Cloud" +id: "experimental-features" +sidebar_label: "Preview new dbt Cloud features" +description: "Gain early access to many new dbt Labs experimental features by enabling this in your profile." +--- + +dbt Labs often tests experimental features before deciding to continue on the [Product lifecycle](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles#dbt-cloud). + +You can access experimental features to preview beta features that haven’t yet been released to dbt Cloud. You can toggle on or off all experimental features in your Profile settings. Experimental features: + +- May not be feature-complete or fully stable as we’re actively developing them. +- Could be discontinued at any time. +- May require feedback from you to understand their limitations or impact. Each experimental feature collects feedback directly in dbt Cloud, which may impact dbt Labs' decisions to implement. +- May have limited technical support and be excluded from our Support SLAs. +- May not have public documentation available. + +To enable or disable experimental features: + +1. Navigate to **Profile settings** by clicking the gear icon in the top right. +2. Find Experimental features at the bottom of Your Profile page. +3. Click **Beta** to toggle the features on or off as shown in the following image. + ![Experimental features](/img/docs/dbt-versions/experimental-feats.png) diff --git a/website/docs/docs/dbt-versions/release-notes/06-July-2023/faster-run.md b/website/docs/docs/dbt-versions/release-notes/06-July-2023/faster-run.md new file mode 100644 index 00000000000..0f88f1d2fa8 --- /dev/null +++ b/website/docs/docs/dbt-versions/release-notes/06-July-2023/faster-run.md @@ -0,0 +1,34 @@ +--- +title: "Enhancement: Faster run starts and unlimited job concurrency" +description: "We have enhanced the dbt Cloud Scheduler by reducing prep time for all accounts and provided unlimited job concurrency for Enterprise accounts." +sidebar_label: "Enhancement: Faster run starts and unlimited job concurrency" +tags: [07-2023, scheduler] +date: 2023-07-06 +sidebar_position: 10 +--- + +We’ve introduced significant improvements to the dbt Cloud Scheduler, offering improved performance, durability, and scalability. + +Read more on how you can experience faster run start execution and how enterprise users can now run as many jobs concurrently as they want to. + +## Faster run starts + +The Scheduler takes care of preparing each dbt Cloud job to run in your cloud data platform. This [prep](/docs/deploy/job-scheduler#scheduler-queue) involves readying a Kubernetes pod with the right version of dbt installed, setting environment variables, loading data platform credentials, and git provider authorization, amongst other environment-setting tasks. Only after the environment is set up, can dbt execution begin. We display this time to the user in dbt Cloud as “prep time”. + + + +For all its strengths, Kubernetes has challenges, especially with pod management impacting run execution time. We’ve rebuilt our scheduler by ensuring faster job execution with a ready pool of pods to execute customers’ jobs. This means you won't experience long prep times at the top of the hour, and we’re determined to keep runs starting near instantaneously. Don’t just take our word, review the data yourself. + + + +Jobs scheduled at the top of the hour used to take over 106 seconds to prepare because of the volume of runs the scheduler has to process. Now, even with increased runs, we have reduced prep time to 27 secs (at a maximum) — a 75% speed improvement for runs at peak traffic times! + +## Unlimited job concurrency for Enterprise accounts + +Our enhanced scheduler offers more durability and empowers users to run jobs effortlessly. + +This means Enterprise, multi-tenant accounts can now enjoy the advantages of unlimited job concurrency. Previously limited to a fixed number of run slots, Enterprise accounts now have the freedom to operate without constraints. Single-tenant support will be coming soon. Team plan customers will continue to have only 2 run slots. + +Something to note, each running job occupies a run slot for its duration, and if all slots are occupied, jobs will queue accordingly. + +For more feature details, refer to the [dbt Cloud pricing page](https://www.getdbt.com/pricing/). diff --git a/website/docs/docs/dbt-versions/release-notes/08-May-2023/product-docs-may.md b/website/docs/docs/dbt-versions/release-notes/08-May-2023/product-docs-may.md new file mode 100644 index 00000000000..762a6a723f8 --- /dev/null +++ b/website/docs/docs/dbt-versions/release-notes/08-May-2023/product-docs-may.md @@ -0,0 +1,43 @@ +--- +title: "May 2023 product docs updates" +id: "May-product-docs" +description: "May 2023: Find out what the product docs team has been busy doing in the month of May." +sidebar_label: "Update: Product docs changes" +sidebar_position: 1 +tags: [May-2023, product-docs] +date: 2023-06-01 +--- + +Hello from the dbt Docs team: @mirnawong1, @matthewshaver, @nghi-ly, and @runleonarun! First, we’d like to thank the 13 new community contributors to docs.getdbt.com! + +Here's what's new to [docs.getdbt.com](http://docs.getdbt.com/) in May: + +## 🔎 Discoverability + +- We made sure everyone knows that Cloud-users don’t need a [profiles.yml file](/docs/core/connect-data-platform/profiles.yml) by adding a callout on several key pages. +- Fleshed out the [model jinja variable page](/reference/dbt-jinja-functions/model), which originally lacked conceptual info and didn’t link to the schema page. +- Added a new [Quickstarts landing page](/quickstarts). This new format sets up for future iterations that will include filtering! But for now, we are excited you can step through quickstarts in a focused way. + +## ☁ Cloud projects + +- We launched [dbt Cloud IDE user interface doc](/docs/cloud/dbt-cloud-ide/ide-user-interface), which provides a thorough walkthrough of the IDE UI elements and their definitions. +- Launched a sparkling new [dbt Cloud Scheduler page](/docs/deploy/job-scheduler) ✨! We went from previously having little content around the scheduler to a subsection that breaks down the awesome scheduler features and how it works. +- Updated the [dbt Cloud user license page](/docs/cloud/manage-access/seats-and-users#licenses) to clarify how to add or remove cloud users. +- Shipped these Discovery API docs to coincide with the launch of the Discovery API: + - [About the Discovery API](/docs/dbt-cloud-apis/discovery-api) + - [Use cases and examples for the Discovery API](/docs/dbt-cloud-apis/discovery-use-cases-and-examples) + - [Query the Discovery API](/docs/dbt-cloud-apis/discovery-querying) + +## 🎯 Core projects + +- See what’s coming up [in Core v 1.6](https://github.com/dbt-labs/docs.getdbt.com/issues?q=is%3Aissue+label%3A%22dbt-core+v1.6%22)! +- We turned the `profiles.yml` [page](/docs/core/connect-data-platform/profiles.yml) into a landing page, added more context to profile.yml page, and moved the ‘About CLI’ higher up in the `Set up dbt` section. + +## New 📚 Guides, ✏️ blog posts, and FAQs + +If you want to contribute to a blog post, we’re focusing on content + +- Published a blog post: [Accelerate your documentation workflow: Generate docs for whole folders at once](/blog/generating-dynamic-docs-dbt) +- Published a blog post: [Data engineers + dbt v1.5: Evolving the craft for scale](/blog/evolving-data-engineer-craft) +- Added an [FAQ](/faqs/Warehouse/db-connection-dbt-compile) to clarify the common question users have on *Why does dbt compile needs to connect to the database?* +- Published a [discourse article](https://discourse.getdbt.com/t/how-to-configure-external-user-email-notifications-in-dbt-cloud/8393) about configuring job notifications for non-dbt Cloud users diff --git a/website/docs/docs/dbt-versions/release-notes/08-May-2023/run-history-endpoint.md b/website/docs/docs/dbt-versions/release-notes/08-May-2023/run-history-endpoint.md index ddf71872846..050fd8339a2 100644 --- a/website/docs/docs/dbt-versions/release-notes/08-May-2023/run-history-endpoint.md +++ b/website/docs/docs/dbt-versions/release-notes/08-May-2023/run-history-endpoint.md @@ -12,7 +12,7 @@ dbt Labs is making a change to the metadata retrieval policy for Run History in **Beginning June 1, 2023,** developers on the dbt Cloud multi-tenant application will be able to self-serve access to their account’s run history through the dbt Cloud user interface (UI) and API for only 365 days, on a rolling basis. Older run history will be available for download by reaching out to Customer Support. We're seeking to minimize the amount of metadata we store while maximizing application performance. -Specifically, all `GET` requests to the dbt Cloud [Runs endpoint](https://docs.getdbt.com/dbt-cloud/api-v2#tag/Runs) will return information on runs, artifacts, logs, and run steps only for the past 365 days. Additionally, the run history displayed in the dbt Cloud UI will only show runs for the past 365 days. +Specifically, all `GET` requests to the dbt Cloud [Runs endpoint](https://docs.getdbt.com/dbt-cloud/api-v2-legacy#tag/Runs) will return information on runs, artifacts, logs, and run steps only for the past 365 days. Additionally, the run history displayed in the dbt Cloud UI will only show runs for the past 365 days. diff --git a/website/docs/docs/dbt-versions/release-notes/09-April-2023/api-endpoint-restriction.md b/website/docs/docs/dbt-versions/release-notes/09-April-2023/api-endpoint-restriction.md index 2959cc2f1ed..8507fe3dbbb 100644 --- a/website/docs/docs/dbt-versions/release-notes/09-April-2023/api-endpoint-restriction.md +++ b/website/docs/docs/dbt-versions/release-notes/09-April-2023/api-endpoint-restriction.md @@ -20,4 +20,4 @@ dbt Cloud is hosted in multiple regions around the world, and each region has a ::: -For more info, refer to our [documentation](/dbt-cloud/api-v2#tag/Runs/operation/listRunsForAccount). +For more info, refer to our [documentation](/dbt-cloud/api-v2-legacy#tag/Runs/operation/listRunsForAccount). diff --git a/website/docs/docs/dbt-versions/release-notes/10-Mar-2023/apiv2-limit.md b/website/docs/docs/dbt-versions/release-notes/10-Mar-2023/apiv2-limit.md index fca26d7a535..85c4af48b54 100644 --- a/website/docs/docs/dbt-versions/release-notes/10-Mar-2023/apiv2-limit.md +++ b/website/docs/docs/dbt-versions/release-notes/10-Mar-2023/apiv2-limit.md @@ -11,4 +11,4 @@ To make the API more scalable and reliable, we've implemented a maximum limit of This maximum limit applies to [multi-tenant instances](/docs/cloud/about-cloud/regions-ip-addresses) only, and _does not_ apply to single tenant instances. -Refer to the [Pagination](https://docs.getdbt.com/dbt-cloud/api-v2#section/Pagination) section for more information on this change. +Refer to the [Pagination](https://docs.getdbt.com/dbt-cloud/api-v2-legacy#section/Pagination) section for more information on this change. diff --git a/website/docs/docs/dbt-versions/release-notes/26-Sept-2022/liststeps-endpoint-deprecation.md b/website/docs/docs/dbt-versions/release-notes/26-Sept-2022/liststeps-endpoint-deprecation.md index 377b4a12d08..545847efd90 100644 --- a/website/docs/docs/dbt-versions/release-notes/26-Sept-2022/liststeps-endpoint-deprecation.md +++ b/website/docs/docs/dbt-versions/release-notes/26-Sept-2022/liststeps-endpoint-deprecation.md @@ -6,9 +6,9 @@ sidebar_label: "Deprecation: List Steps API endpoint" tags: [Sept-2022] --- -On October 14th, 2022 dbt Labs is deprecating the [List Steps](https://docs.getdbt.com/dbt-cloud/api-v2#tag/Runs/operation/listSteps) API endpoint. From October 14th, any GET requests to this endpoint will fail. Please prepare to stop using the List Steps endpoint as soon as possible. +On October 14th, 2022 dbt Labs is deprecating the [List Steps](https://docs.getdbt.com/dbt-cloud/api-v2-legacy#tag/Runs/operation/listSteps) API endpoint. From October 14th, any GET requests to this endpoint will fail. Please prepare to stop using the List Steps endpoint as soon as possible. -dbt Labs will continue to maintain the [Get Run](https://docs.getdbt.com/dbt-cloud/api-v2#tag/Runs/operation/getRunById) endpoint, which is a viable alternative depending on the use case. +dbt Labs will continue to maintain the [Get Run](https://docs.getdbt.com/dbt-cloud/api-v2-legacy#tag/Runs/operation/getRunById) endpoint, which is a viable alternative depending on the use case. You can fetch run steps for an individual run with a GET request to the following URL, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cloud/regions-ip-addresses) for your region and plan: diff --git a/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md b/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md index 17c0a21fa63..6c9ffe5d60e 100644 --- a/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md +++ b/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md @@ -3,8 +3,6 @@ title: "Upgrade Core version in Cloud" id: "upgrade-core-in-cloud" --- -## Upgrading to the latest version of dbt in Cloud - In dbt Cloud, both jobs and environments are configured to use a specific version of dbt Core. The version can be upgraded at any time. ### Environments @@ -23,15 +21,17 @@ Each job in dbt Cloud can be configured to inherit parameters from the environme The example job seen in the screenshot above belongs to the environment "Prod". It inherits the dbt version of its environment as shown by the **Inherited from ENVIRONMENT_NAME (DBT_VERSION)** selection. You may also manually override the dbt version of a specific job to be any of the current Core releases supported by Cloud by selecting another option from the dropdown. -## Supported Versions +## Supported versions -We have always encouraged our customers to upgrade dbt Core versions whenever a new minor version is released. We released our first major version of dbt - `dbt 1.0` - in December 2021. Alongside this release, we updated our policy on which versions of dbt Core we will support in dbt Cloud. +dbt Labs has always encouraged users to upgrade dbt Core versions whenever a new minor version is released. We released our first major version of dbt - `dbt 1.0` - in December 2021. Alongside this release, we updated our policy on which versions of dbt Core we will support in dbt Cloud. +> **Starting with v1.0, all subsequent minor versions are available in dbt Cloud. Versions are actively supported, with patches and bug fixes, for 1 year after their initial release. At the end of the 1-year window, we encourage all users to upgrade to a newer version for better ongoing maintenance and support.** +We provide different support levels for different versions, which may include new features, bug fixes, or security patches: - > **Starting with v1.0, any subsequent minor versions will be supported in dbt Cloud for 1 year post release. At the end of the 1 year window, accounts must upgrade to a supported version of dbt or risk service disruption.** + -We will continue to update this table so that customers know when we plan to stop supporting different versions of Core in dbt Cloud. +We'll continue to update the following release table so that users know when we plan to stop supporting different versions of Core in dbt Cloud. diff --git a/website/docs/docs/deploy/dbt-cloud-job.md b/website/docs/docs/deploy/dbt-cloud-job.md index d756693b17e..fa9eead2d3b 100644 --- a/website/docs/docs/deploy/dbt-cloud-job.md +++ b/website/docs/docs/deploy/dbt-cloud-job.md @@ -1,86 +1,26 @@ --- -title: "Deploy with dbt Cloud" +title: "dbt Cloud jobs" id: "dbt-cloud-job" -description: "You can enable continuous integration (CI) to test every single change prior to deploying the code to production just like in a software development workflow." +description: "Manage, setup, and configure your dbt Cloud job using elegant job commands and triggers." hide_table_of_contents: true tags: ["scheduler"] --- -dbt Cloud offers the easiest way to run your dbt project in production. Deploying with dbt Cloud lets you: -- Keep production data fresh on a timely basis -- Ensure CI and production pipelines are efficient -- Identify the root cause of failures in deployment environments -- Maintain high quality code and data in production -- Gain visibility into the health of deployment jobs, models, and tests +Manage, set up, and automate your dbt jobs using robust custom job settings. You can use the job scheduler to configure when and how your jobs run, helping you keep production data fresh on a timely basis. -Learn more about the features you can use in dbt Cloud to help your team ship timely, quality production data more easily. +This portion of our documentation will go over dbt Cloud's various job settings using: -
+- [Job settings](/docs/deploy/job-settings) — Intuitively navigate the user interface to create new dbt jobs or edit existing ones. +- [Job commands](/docs/deploy/job-commands) — Use job commands to configure dbt commands on a schedule. +- [Job triggers](/docs/deploy/job-triggers) — You can configure when and how dbt should run your job, such as: + * Running on scheduled days or cron schedules + * Setting up continuous integration (CI) to run when someone opens a new pull request in your dbt repository + * Using the API to trigger jobs - + - + - + - - - - - - - - - - - - - - - - -

+ diff --git a/website/docs/docs/deploy/deploy-environments.md b/website/docs/docs/deploy/deploy-environments.md new file mode 100644 index 00000000000..da54b918436 --- /dev/null +++ b/website/docs/docs/deploy/deploy-environments.md @@ -0,0 +1,187 @@ +--- +title: "Deployment environments" +id: "deploy-environments" +description: "Learn about dbt Cloud's deployment environment to seamlessly schedule jobs or enable CI." +--- + +Deployment environments in dbt Cloud are crucial for deploying dbt jobs. To execute dbt, environments determine the settings used during job runs, including: + +- The version of dbt Core that will be used to run your project +- The warehouse connection information (including the target database/schema settings) +- The version of your code to execute + +A dbt Cloud project can have multiple deployment environments, providing you the flexibility and customization to tailor the execution of dbt jobs. You can use deployment environments to [create and schedule jobs](/docs/deploy/job-settings#create-and-schedule-jobs), [enable continuous integration](/docs/deploy/continuous-integration), or more based on your specific needs or requirements. + +:::tip Learn how to manage dbt Cloud environments +To learn different approaches to managing dbt Cloud environments and recommendations for your organization's unique needs, read [dbt Cloud environment best practices](https://docs.getdbt.com/guides/best-practices/environment-setup/1-env-guide-overview). +::: + +This page will go over the different types of environments and how to intuitively configure your deployment environment in dbt Cloud. + +import CloudEnvInfo from '/snippets/_cloud-environments-info.md'; + + + +## Create a deployment environment + +To create a new dbt Cloud development environment, navigate to **Deploy** -> **Environments** and then click **Create Environment**. Select **Deployment** as the environment type. + + + +### Semantic Layer + +For Semantic Layer-eligible customers, the next section of environment settings is the Semantic Layer configurations. [The Semantic Layer setup guide](/docs/use-dbt-semantic-layer/setup-dbt-semantic-layer) has the most up-to-date setup instructions! + +### Deployment connection + +:::info Warehouse Connections + + Warehouse connections are set at the Project level for dbt Cloud accounts, and each Project can have one connection (Snowflake account, Redshift host, Bigquery project, Databricks host, etc.). Some details of that connection (databases/schemas/etc.) can be overridden within this section of the dbt Cloud environment settings. + +::: + +This section determines the exact location in your warehouse dbt should target when building warehouse objects! This section will look a bit different depending on your warehouse provider. + + + + +
+ +This section will not appear if you are using Postgres, as all values are inferred from the project's connection. + +
+ +
+ +This section will not appear if you are using Redshift, as all values are inferred from the project's connection. + +
+ +
+ + + +#### Editable fields + +- **Role**: Snowflake role +- **Database**: Target database +- **Warehouse**: Snowflake warehouse + +
+ +
+ +This section will not appear if you are using Bigquery, as all values are inferred from the project's connection. + +
+ +
+ +This section will not appear if you are using Spark, as all values are inferred from the project's connection. + +
+ +
+ + + +#### Editable fields + +- **Catalog** (optional): [Unity Catalog namespace](/docs/core/connect-data-platform/databricks-setup) + +
+ +
+ + +### Deployment credentials + +This section allows you to determine the credentials that should be used when connecting to your warehouse. The authentication methods may differ depending on the warehouse and dbt Cloud tier you are on. + + + +
+ + + +#### Editable fields + +- **Username**: Postgres username to use (most likely a service account) +- **Password**: Postgres password for the listed user +- **Schema**: Target schema + +
+ +
+ + + +#### Editable fields + +- **Username**: Redshift username to use (most likely a service account) +- **Password**: Redshift password for the listed user +- **Schema**: Target schema + +
+ +
+ + + +#### Editable fields + +- **Auth Method**: This determines the way dbt connects to your warehouse + - One of: [**Username & Password**, **Key Pair**] +- If **Username & Password**: + - **Username**: username to use (most likely a service account) + - **Password**: password for the listed user +- If **Key Pair**: + - **Username**: username to use (most likely a service account) + - **Private Key**: value of the Private SSH Key (optional) + - **Private Key Passphrase**: value of the Private SSH Key Passphrase (optional, only if required) +- **Schema**: Target Schema for this environment + +
+ +
+ + + +#### Editable fields + +- **Dataset**: Target dataset + +
+ +
+ + + +#### Editable fields + +- **Token**: Access token +- **Schema**: Target schema + +
+ +
+ + + +#### Editable fields + +- **Token**: Access token +- **Schema**: Target schema + +
+ +
+ + +## Related docs + +- [dbt Cloud environment best practices](https://docs.getdbt.com/guides/best-practices/environment-setup/1-env-guide-overview) +- [Deploy dbt jobs](/docs/deploy/dbt-cloud-job) +- [Deploy CI jobs](/docs/deploy/continuous-integration) +- [Delete a job or environment in dbt Cloud](/faqs/Environments/delete-environment-job) + diff --git a/website/docs/docs/deploy/deployment-overview.md b/website/docs/docs/deploy/deployment-overview.md index 8f8937620f7..dddc252211e 100644 --- a/website/docs/docs/deploy/deployment-overview.md +++ b/website/docs/docs/deploy/deployment-overview.md @@ -1,36 +1,129 @@ --- -title: "Deploy dbt jobs" +title: "Deploy dbt" id: "deployments" -sidebar: "About job deployments" +sidebar: "Use dbt Cloud's capabilities to seamlessly run a dbt job in production." +hide_table_of_contents: true +tags: ["scheduler"] --- -Running dbt in production means setting up a system to run a _dbt job on a schedule_, rather than running dbt commands manually from the command line. Your production dbt jobs should create the tables and views that your business intelligence tools and end users query. Before continuing, make sure you understand dbt's approach to [managing environments](/docs/collaborate/environments/environments-in-dbt). +Use dbt Cloud's capabilities to seamlessly run a dbt job in production or staging environments. Rather than run dbt commands manually from the command line, you can leverage the [dbt Cloud's in-app scheduling](/docs/deploy/job-scheduler) to automate how and when you execute dbt. -In addition to setting up a schedule, there are other considerations when setting up dbt to run in production: +dbt Cloud offers the easiest and most reliable way to run your dbt project in production. Effortlessly promote high quality code from development to production and build fresh data assets that your business intelligence tools and end users query to make business decisions. Deploying with dbt Cloud lets you: +- Keep production data fresh on a timely basis +- Ensure CI and production pipelines are efficient +- Identify the root cause of failures in deployment environments +- Maintain high-quality code and data in production +- Gain visibility into the health of deployment jobs, models, and tests -* The complexity involved in creating a new dbt job or editing an existing one. -* Setting up notifications if a step within your job returns an error code (for example, a model can't be built or a test fails). -* Accessing logs to help debug any issues. -* Pulling the latest version of your git repo before running dbt (continuous deployment). -* Running and testing your dbt project before merging code into master (continuous integration). -* Allowing access for team members that need to collaborate on your dbt project. +Before continuing, make sure you understand dbt's approach to [deployment environments](/docs/deploy/deploy-environments). - +
-
+ +

+ +## dbt Cloud jobs + +
+ + title="Job settings" + body="Create and schedule jobs for the dbt Cloud scheduler to run." + link="/docs/deploy/job-settings" + icon="dbt-bit"/> + + + + + +

+ +## Monitor jobs and alerts + +
+ + + + + + + + + + + + + +

+ + + + + +## Related docs -

\ No newline at end of file +- [Integrate with other orchestration tools](/docs/deploy/deployment-tools) diff --git a/website/docs/docs/deploy/deployment-tools.md b/website/docs/docs/deploy/deployment-tools.md index e7a0d3c43c3..26e9e4ea317 100644 --- a/website/docs/docs/deploy/deployment-tools.md +++ b/website/docs/docs/deploy/deployment-tools.md @@ -1,13 +1,12 @@ --- -title: "Deploy with other tools" +title: "Integrate with other orchestration tools" id: "deployment-tools" -sidebar: "Deploy with other tools" +sidebar_label: "Integrate with other tools" --- -Discover additional ways to schedule and run your dbt jobs with the help of robust tools such as Airflow, Prefect, Dagster, automation server, Cron, and Azure Data Factory (ADF), alongside [dbt Cloud](/docs/deploy/dbt-cloud-job). - -Use these tools to automate your data workflows, trigger dbt jobs (including those hosted on dbt Cloud), and enjoy a hassle-free experience, saving time and increasing efficiency. +Alongside [dbt Cloud](/docs/deploy/dbt-cloud-job), discover other ways to schedule and run your dbt jobs with the help of tools such as Airflow, Prefect, Dagster, automation server, Cron, and Azure Data Factory (ADF), +Build and install these tools to automate your data workflows, trigger dbt jobs (including those hosted on dbt Cloud), and enjoy a hassle-free experience, saving time and increasing efficiency. ## Airflow diff --git a/website/docs/docs/deploy/job-commands.md b/website/docs/docs/deploy/job-commands.md index 04d63ecac62..acdc3a00228 100644 --- a/website/docs/docs/deploy/job-commands.md +++ b/website/docs/docs/deploy/job-commands.md @@ -19,7 +19,7 @@ Job commands are specific tasks executed by the job, and you can configure them During a job run, the commands are "chained" together and executed as run steps. When you add a dbt command in the **Commands** section, you can expect different outcomes compared to the checkbox option. - + ### Built-in commands @@ -29,7 +29,7 @@ Every job invocation automatically includes the [`dbt deps`](/reference/commands **Job outcome** — During a job run, the built-in commands are "chained" together. This means if one of the run steps in the chain fails, then the next commands aren't executed, and the entire job fails with an "Error" job status. - + ### Checkbox commands @@ -56,7 +56,7 @@ Use [selectors](/reference/node-selection/syntax) as a powerful way to select an In the following example image, the first four run steps are successful. However, if the fifth run step (`dbt run --select state:modified+ --full-refresh --fail-fast`) fails, then the next run steps aren't executed, and the entire job fails. The failed job returns a non-zero [exit code](/reference/exit-codes) and "Error" job status: - + ## Job command failures diff --git a/website/docs/docs/deploy/job-settings.md b/website/docs/docs/deploy/job-settings.md index e8808397b6a..3b53880bddf 100644 --- a/website/docs/docs/deploy/job-settings.md +++ b/website/docs/docs/deploy/job-settings.md @@ -21,7 +21,7 @@ You can create a job and configure it to run on [scheduled days and times](/docs - You must have a dbt project connected to a [data platform](/docs/cloud/connect-data-platform/about-connections). - You must [create and schedule a dbt Cloud job](#create-and-schedule-jobs). - You must have [access permission](/docs/cloud/manage-access/about-user-access) to view, create, modify, or run jobs. -- You must set up a [deployment environment](/docs/collaborate/environments/dbt-cloud-environments). +- You must set up a [deployment environment](/docs/deploy/deploy-environments). ## Create and schedule jobs {#create-and-schedule-jobs} @@ -55,4 +55,4 @@ You can create a job and configure it to run on [scheduled days and times](/docs -7. Select **Save**, then click **Run Now** to run your job. Click the run and watch its progress under **Run history**. \ No newline at end of file +7. Select **Save**, then click **Run Now** to run your job. Click the run and watch its progress under **Run history**. diff --git a/website/docs/docs/deploy/monitor-jobs.md b/website/docs/docs/deploy/monitor-jobs.md new file mode 100644 index 00000000000..c4c5fcb73a5 --- /dev/null +++ b/website/docs/docs/deploy/monitor-jobs.md @@ -0,0 +1,28 @@ +--- +title: "Monitor jobs and alerts" +id: "monitor-jobs" +description: "Monitor your dbt Cloud job and set up alerts to ensure seamless orchestration and optimize your data transformations" +tags: ["scheduler"] +--- + +Monitor your dbt Cloud jobs to help identify improvement and set up alerts to proactively alert the right people or team. + +This portion of our documentation will go over dbt Cloud's various capabilities that help you monitor your jobs and set up alerts to ensure seamless orchestration, including: + +- [Run visibility](/docs/deploy/run-visibility) — View your run history to help identify where improvements can be made to scheduled jobs. +- [Job notifications](/docs/deploy/job-notifications) — Receive email or slack notifications when a job run succeeds, fails, or is canceled. +- [Webhooks](/docs/deploy/webhooks) — Use webhooks to send events about your dbt jobs' statuses to other systems. +- [Leverage artifacts](/docs/deploy/artifacts) — dbt Cloud generates and saves artifacts for your project, which it uses to power features like creating docs for your project and reporting freshness of your sources. +- [Source freshness](/docs/deploy/source-freshness) — Monitor data governance by enabling snapshots to capture the freshness of your data sources. +- [Dashboard status tiles](/docs/deploy/dashboard-status-tiles) — Set up and add status tiles to view data freshness and quality checks + + + + + + + + + + + diff --git a/website/docs/docs/deploy/slim-ci-jobs.md b/website/docs/docs/deploy/slim-ci-jobs.md index 8f8f359c1cf..ee00d0b15ba 100644 --- a/website/docs/docs/deploy/slim-ci-jobs.md +++ b/website/docs/docs/deploy/slim-ci-jobs.md @@ -15,7 +15,7 @@ You can set up Slim [continuous integration](/docs/deploy/continuous-integration ## Set up Slim CI jobs -dbt Labs recommends that you create your Slim CI job in a dedicated dbt Cloud [deployment environment](/docs/collaborate/environments/dbt-cloud-environments#create-a-deployment-environment) that's connected to a staging database. Having a separate environment dedicated for CI will provide better isolation between your temporary CI schemas builds and your production data builds. Additionally, sometimes teams need their Slim CI jobs to be triggered when a PR is made to a branch other than main. If your team maintains a staging branch in your release process, having a separate environment will allow you to set a [custom branch](/faqs/environments/custom-branch-settings), and accordingly the CI job in that dedicated environment will be triggered only when PRs are made to the specified, custom branch. +dbt Labs recommends that you create your Slim CI job in a dedicated dbt Cloud [deployment environment](/docs/deploy/deploy-environments#create-a-deployment-environment) that's connected to a staging database. Having a separate environment dedicated for CI will provide better isolation between your temporary CI schemas builds and your production data builds. Additionally, sometimes teams need their Slim CI jobs to be triggered when a PR is made to a branch other than main. If your team maintains a staging branch in your release process, having a separate environment will allow you to set a [custom branch](/faqs/environments/custom-branch-settings), and accordingly the CI job in that dedicated environment will be triggered only when PRs are made to the specified, custom branch. 1. On your deployment environment page, click **Create One** to create a new CI job. 2. In the **Execution Settings** section: @@ -137,4 +137,4 @@ If your temporary pull request schemas aren't dropping after a merge or close of • ❌ Database has changed from the default connection (like dev).

- \ No newline at end of file + diff --git a/website/docs/docs/environments-in-dbt.md b/website/docs/docs/environments-in-dbt.md new file mode 100644 index 00000000000..54eaa68f667 --- /dev/null +++ b/website/docs/docs/environments-in-dbt.md @@ -0,0 +1,39 @@ +--- +title: "About environments" +id: "environments-in-dbt" +hide_table_of_contents: true +--- + +In software engineering, environments are used to enable engineers to develop and test code without impacting the users of their software. Typically, there are two types of environments in dbt: + +- **Deployment or Production** (or _prod_) — Refers to the environment that end users interact with. + +- **Development** (or _dev_) — Refers to the environment that engineers work in. This means that engineers can work iteratively when writing and testing new code in _development_. Once they are confident in these changes, they can deploy their code to _production_. + +In traditional software engineering, different environments often use completely separate architecture. For example, the dev and prod versions of a website may use different servers and databases. Data warehouses can also be designed to have separate environments — the _production_ environment refers to the relations (for example, schemas, tables, and views) that your end users query (often through a BI tool). + +Configure environments to tell dbt Cloud or dbt Core how to build and execute your project in development and production: + +
+ + + + + +

+ +## Related docs + +- [dbt Cloud environment best practices](https://docs.getdbt.com/guides/best-practices/environment-setup/1-env-guide-overview) +- [Deployment environments](/docs/deploy/deploy-environments) +- [About dbt Core versions](/docs/dbt-versions/core) +- [Set Environment variables in dbt Cloud](/docs/build/environment-variables#special-environment-variables) +- [Use Environment variables in jinja](/reference/dbt-jinja-functions/env_var) diff --git a/website/docs/docs/use-dbt-semantic-layer/avail-sl-integrations.md b/website/docs/docs/use-dbt-semantic-layer/avail-sl-integrations.md index dc5fbdb429e..8c004d865bb 100644 --- a/website/docs/docs/use-dbt-semantic-layer/avail-sl-integrations.md +++ b/website/docs/docs/use-dbt-semantic-layer/avail-sl-integrations.md @@ -6,7 +6,11 @@ sidebar_label: "Available integrations" --- :::info Coming soon -The dbt Semantic Layer is undergoing some sophisticated changes, enabling more complex metric definitions and efficient querying. As part of these changes, the dbt_metrics package will be deprecated and replaced with MetricFlow. For more info, check out the [The dbt Semantic Layer: what's next?](https://www.getdbt.com/blog/dbt-semantic-layer-whats-next/) and [dbt_metrics deprecation](https://docs.getdbt.com/blog/deprecating-dbt-metrics) blog. +The dbt Semantic Layer is undergoing a [significant revamp](https://www.getdbt.com/blog/dbt-semantic-layer-whats-next/), making it more efficient to define and query metrics. + +**What’s changing?** The dbt_metrics package will be [deprecated](https://docs.getdbt.com/blog/deprecating-dbt-metrics) and replaced with [MetricFlow](/docs/build/about-metricflow?version=1.6), a new way framework for defining metrics in dbt. + +**What's new?** Learn how to [Build your metrics](/docs/build/build-metrics-intro?version=1.6) using MetricFlow, one of the key components that makes up the revamped dbt Semantic Layer. It handles SQL query construction and defines the specification for dbt semantic models and metrics. ::: A wide variety of data applications across the modern data stack natively integrate with the dbt Semantic Layer and dbt metrics — from Business Intelligence tools to notebooks, data catalogs, and more. diff --git a/website/docs/docs/use-dbt-semantic-layer/dbt-semantic-layer.md b/website/docs/docs/use-dbt-semantic-layer/dbt-semantic-layer.md index 777dadd21a1..5fe781ffeb6 100644 --- a/website/docs/docs/use-dbt-semantic-layer/dbt-semantic-layer.md +++ b/website/docs/docs/use-dbt-semantic-layer/dbt-semantic-layer.md @@ -6,16 +6,20 @@ sidebar_label: "dbt Semantic Layer" --- :::info Coming soon -The dbt Semantic Layer is undergoing some sophisticated changes, enabling more complex metric definitions and efficient querying. As part of these changes, the dbt_metrics package will be deprecated and replaced with MetricFlow. For more info, check out the [The dbt Semantic Layer: what's next?](https://www.getdbt.com/blog/dbt-semantic-layer-whats-next/) and [dbt_metrics deprecation](https://docs.getdbt.com/blog/deprecating-dbt-metrics) blog. +The dbt Semantic Layer is undergoing a [significant revamp](https://www.getdbt.com/blog/dbt-semantic-layer-whats-next/), making it more efficient to define and query metrics. + +**What’s changing?** The dbt_metrics package will be [deprecated](https://docs.getdbt.com/blog/deprecating-dbt-metrics) and replaced with [MetricFlow](/docs/build/about-metricflow?version=1.6), a new way framework for defining metrics in dbt. + +**What's new?** Learn how to [Build your metrics](/docs/build/build-metrics-intro?version=1.6) using MetricFlow, one of the key components that makes up the revamped dbt Semantic Layer. It handles SQL query construction and defines the specification for dbt semantic models and metrics. ::: The dbt Semantic Layer allows data teams to centrally define essential business metrics like `revenue`, `customer`, and `churn` in the modeling layer (your dbt project) for consistent self-service within downstream data tools like BI and metadata management solutions. The dbt Semantic Layer provides the flexibility to define metrics on top of your existing models and then query those metrics and models in your analysis tools of choice. -The result? You have less duplicative coding for data teams and more consistency for data consumers. +The result? You have less duplicate coding for data teams and more consistency for data consumers. The dbt Semantic Layer has four main parts: -- Define your metrics in version-controlled dbt project code +- Define your metrics in version-controlled dbt project code using MetricFlow - Import your metric definitions via the [Discovery API](/docs/dbt-cloud-apis/discovery-api) - Query your metric data via the dbt Proxy Server - Explore and analyze dbt metrics in downstream tools diff --git a/website/docs/docs/use-dbt-semantic-layer/quickstart-semantic-layer.md b/website/docs/docs/use-dbt-semantic-layer/quickstart-semantic-layer.md index ff61aab8f09..19a5fb15057 100644 --- a/website/docs/docs/use-dbt-semantic-layer/quickstart-semantic-layer.md +++ b/website/docs/docs/use-dbt-semantic-layer/quickstart-semantic-layer.md @@ -5,10 +5,12 @@ description: "Define metrics and set up the dbt Semantic Layer" sidebar_label: "Quickstart" --- -# dbt Semantic Layer quickstart - :::info Coming soon -The dbt Semantic Layer is undergoing some sophisticated changes, enabling more complex metric definitions and efficient querying. As part of these changes, the dbt_metrics package will be deprecated and replaced with MetricFlow. For more info, check out the [The dbt Semantic Layer: what's next?](https://www.getdbt.com/blog/dbt-semantic-layer-whats-next/) and [dbt_metrics deprecation](https://docs.getdbt.com/blog/deprecating-dbt-metrics) blog. +The dbt Semantic Layer is undergoing a [significant revamp](https://www.getdbt.com/blog/dbt-semantic-layer-whats-next/), making it more efficient to define and query metrics. + +**What’s changing?** The dbt_metrics package will be [deprecated](https://docs.getdbt.com/blog/deprecating-dbt-metrics) and replaced with [MetricFlow](/docs/build/about-metricflow?version=1.6), a new way framework for defining metrics in dbt. + +**What's new?** Learn how to [Build your metrics](/docs/build/build-metrics-intro?version=1.6) using MetricFlow, one of the key components that makes up the revamped dbt Semantic Layer. It handles SQL query construction and defines the specification for dbt semantic models and metrics. ::: ## Public Preview diff --git a/website/docs/docs/use-dbt-semantic-layer/set-dbt-semantic-layer.md b/website/docs/docs/use-dbt-semantic-layer/set-dbt-semantic-layer.md index 1af2ba868db..b045725ca62 100644 --- a/website/docs/docs/use-dbt-semantic-layer/set-dbt-semantic-layer.md +++ b/website/docs/docs/use-dbt-semantic-layer/set-dbt-semantic-layer.md @@ -6,7 +6,11 @@ sidebar_label: "Set up the dbt Semantic Layer" --- :::info Coming soon -The dbt Semantic Layer is undergoing some sophisticated changes, enabling more complex metric definitions and efficient querying. As part of these changes, the dbt_metrics package will be deprecated and replaced with MetricFlow. For more info, check out the [The dbt Semantic Layer: what's next?](https://www.getdbt.com/blog/dbt-semantic-layer-whats-next/) and [dbt_metrics deprecation](https://docs.getdbt.com/blog/deprecating-dbt-metrics) blog. +The dbt Semantic Layer is undergoing a [significant revamp](https://www.getdbt.com/blog/dbt-semantic-layer-whats-next/), making it more efficient to define and query metrics. + +**What’s changing?** The dbt_metrics package will be [deprecated](https://docs.getdbt.com/blog/deprecating-dbt-metrics) and replaced with [MetricFlow](/docs/build/about-metricflow?version=1.6), a new way framework for defining metrics in dbt. + +**What's new?** Learn how to [Build your metrics](/docs/build/build-metrics-intro?version=1.6) using MetricFlow, one of the key components that makes up the revamped dbt Semantic Layer. It handles SQL query construction and defines the specification for dbt semantic models and metrics. ::: With the dbt Semantic Layer, you'll be able to centrally define business metrics, reduce code duplication and inconsistency, create self-service in downstream tools, and more. Configure the dbt Semantic Layer in dbt Cloud to connect with your integrated partner tool. diff --git a/website/docs/faqs/Environments/profile-name.md b/website/docs/faqs/Environments/profile-name.md index cdb38527a2e..aade6c252db 100644 --- a/website/docs/faqs/Environments/profile-name.md +++ b/website/docs/faqs/Environments/profile-name.md @@ -4,4 +4,4 @@ description: "Use company name for profile name" sidebar_label: 'Naming your profile' id: profile-name --- -We typically use a company name for a profile name, and then use targets to differentiate between `dev` and `prod`. Check out the docs on [environments in dbt Core](/docs/collaborate/environments/dbt-core-environments) for more information. +We typically use a company name for a profile name, and then use targets to differentiate between `dev` and `prod`. Check out the docs on [environments in dbt Core](/docs/core/dbt-core-environments) for more information. diff --git a/website/docs/faqs/Environments/target-names.md b/website/docs/faqs/Environments/target-names.md index 5b9c3fa99fd..2619e31c2c2 100644 --- a/website/docs/faqs/Environments/target-names.md +++ b/website/docs/faqs/Environments/target-names.md @@ -5,4 +5,4 @@ sidebar_label: 'Naming your target' id: target-names --- -We typically use targets to differentiate between development and production runs of dbt, naming the targets `dev` and `prod`, respectively. Check out the docs on [managing environments in dbt Core](/docs/collaborate/environments/dbt-core-environments) for more information. +We typically use targets to differentiate between development and production runs of dbt, naming the targets `dev` and `prod`, respectively. Check out the docs on [managing environments in dbt Core](/docs/core/dbt-core-environments) for more information. diff --git a/website/docs/faqs/Git/gitignore.md b/website/docs/faqs/Git/gitignore.md index fb097bb4043..6bda9611733 100644 --- a/website/docs/faqs/Git/gitignore.md +++ b/website/docs/faqs/Git/gitignore.md @@ -1,25 +1,233 @@ --- -title: Why can't I checkout a branch or create a new branch? -description: "Add or fill in gitignore file" -sidebar_label: 'Unable to checkout or create branch' +title: How can I fix my .gitignore file? +description: "Use these instructions to fix your gitignore file" +sidebar_label: 'How to fix your .gitignore file' id: gitignore --- -If you're finding yourself unable to revert changes, check out a branch or click commit - this is usually do to your project missing a [.gitignore](https://github.com/dbt-labs/dbt-starter-project/blob/main/.gitignore) file OR your gitignore file doesn't contain the necessary content inside the folder. +A `.gitignore` file specifies which files git should intentionally ignore or 'untrack'. dbt Cloud indicates untracked files in the project file explorer pane by putting the file or folder name in *italics*. -This is what causes that 'commit' git action button to display. No worries though - to fix this, you'll need to complete the following steps in order: +If you encounter issues like problems reverting changes, checking out or creating a new branch, or not being prompted to open a pull request after a commit in the dbt Cloud IDE — this usually indicates a problem with the [.gitignore](https://github.com/dbt-labs/dbt-starter-project/blob/main/.gitignore) file. The file may be missing or lacks the required entries for dbt Cloud to work correctly. -1. In the Cloud IDE, add the missing .gitignore file or contents to your project. You'll want to make sure the .gitignore file includes the following: +### Fix in the dbt Cloud IDE - ```shell - target/ - dbt_modules/ - dbt_packages/ - logs/ - ``` +To resolve issues with your `gitignore` file, adding the correct entries won't automatically remove (or 'untrack') files or folders that have already been tracked by git. The updated `gitignore` will only prevent new files or folders from being tracked. So you'll need to first fix the `gitignore` file, then perform some additional git operations to untrack any incorrect files or folders. -2. Once you've added that, make sure to save and commit. + -3. Navigate to the same branch in your remote repository (which can be accessed directly through your git provider's web interface) and delete the logs, target, and dbt_modules/dbt_packages folders. +1. Launch the Cloud IDE into the project that is being fixed, by selecting **Develop** on the menu bar. +2. In your **File Explorer**, check to see if a `.gitignore` file exists at the root of your dbt project folder. If it doesn't exist, create a new file. +3. Open the new or existing `gitignore` file, and add the following: -4. Go back into the Cloud IDE and reclone your repository. This can be done by clicking on the green "ready" in the bottom right corner of the IDE (next to the command bar), and then clicking the orange "reclone repo" button in the pop up. +```bash +# ✅ Correct +target/ +dbt_packages/ +logs/ +# legacy -- renamed to dbt_packages in dbt v1 +dbt_modules/ +``` + +* **Note** — You can place these lines anywhere in the file, as long as they're on separate lines. The lines shown are wildcards that will include all nested files and folders. Avoid adding a trailing `'*'` to the lines, such as `target/*`. + +For more info on `gitignore` syntax, refer to the [Git docs](https://git-scm.com/docs/gitignore). + +4. Save the changes but _don't commit_. +5. Restart the IDE by clicking on the three dots next to the **IDE Status button** on the lower right corner of the IDE screen and select **Restart IDE**. + + + +6. Once the IDE restarts, go to the **File Explorer** to delete the following files or folders (if they exist). No data will be lost: + * `target`, `dbt_modules`, `dbt_packages`, `logs` +7. **Save** and then **Commit and sync** the changes. +8. Restart the IDE again using the same procedure as step 5. +9. Once the IDE restarts, use the **Create a pull request** (PR) button under the **Version Control** menu to start the process of integrating the changes. +10. When the git provider's website opens to a page with the new PR, follow the necessary steps to complete and merge the PR into the main branch of that repository. + + * **Note** — The 'main' branch might also be called 'master', 'dev', 'qa', 'prod', or something else depending on the organizational naming conventions. The goal is to merge these changes into the root branch that all other development branches are created from. + +11. Return to the dbt Cloud IDE and use the **Change Branch** button, to switch to the main branch of the project. +12. Once the branch has changed, click the **Pull from remote** button to pull in all the changes. +13. Verify the changes by making sure the files/folders in the `.gitignore `file are in italics. + + + +### Fix in the git provider + +Sometimes it's necessary to use the git providers web interface to fix a broken `.gitignore` file. Although the specific steps may vary across providers, the general process remains the same. + +There are two options for this approach: editing the main branch directly if allowed, or creating a pull request to implement the changes if required: + + + + + +When permissions allow it, it's possible to edit the `.gitignore` directly on the main branch of your repo. Here are the following steps: + +1. Go to your repository's web interface. +2. Switch to the main branch and the root directory of your dbt project. +3. Find the `.gitignore` file. Create a blank one if it doesn't exist. +4. Edit the file in the web interface, adding the following entries: +```bash +target/ +dbt_packages/ +logs/ +# legacy -- renamed to dbt_packages in dbt v1 +dbt_modules/ +``` + +5. Commit (save) the file. +6. Delete the following folders from the dbt project root, if they exist. No data or code will be lost: + * `target`, `dbt_modules`, `dbt_packages`, `logs` +7. Commit (save) the deletions to the main branch. +8. Switch to the dbt Cloud IDE, and open the project that you're fixing. +9. Reclone your repo in the IDE by clicking on the three dots next to the **IDE Status** button on the lower right corner of the IDE screen, then select **Reclone Repo**. + * **Note** — Any saved but uncommitted changes will be lost, so make sure you copy any modified code that you want to keep in a temporary location outside of dbt Cloud. +10. Once you reclone the repo, open the `.gitignore` file in the branch you're working in. If the new changes aren't included, you'll need to merge the latest commits from the main branch into your working branch. +11. Go to the **File Explorer** to verify the `.gitignore` file contains the correct entries and make sure the untracked files/folders in the .gitignore file are in *italics*. +12. Great job 🎉! You've configured the `.gitignore` correctly and can continue with your development! + + + + + +If you can't edit the `.gitignore` directly on the main branch of your repo, follow these steps: + +1. Go to your repository's web interface. +2. Switch to an existing development branch, or create a new branch just for these changes (This is often faster and cleaner). +3. Find the `.gitignore` file. Create a blank one if it doesn't exist. +4. Edit the file in the web interface, adding the following entries: + +```bash +target/ +dbt_packages/ +logs/ +# legacy -- renamed to dbt_packages in dbt v1 +dbt_modules/ +``` +5. Commit (save) the file. +6. Delete the following folders from the dbt project root, if they exist. No data or code will be lost: + * `target`, `dbt_modules`, `dbt_packages`, `logs` +7. Commit (save) the deleted folders. +8. Open a merge request using the git provider web interface. The merge request should attempt to merge the changes into the 'main' branch that all development branches are created from. +9. Follow the necessary procedures to get the branch approved and merged into the 'main' branch. You can delete the branch after the merge is complete. +10. Once the merge is complete, go back to the dbt Cloud IDE, and open the project that you're fixing. +11. Reclone your repo in the IDE by clicking on the three dots next to the **IDE Status** button on the lower right corner of the IDE screen, then select **Reclone Repo**. + * **Note** — Any saved but uncommitted changes will be lost, so make sure you copy any modified code that you want to keep in a temporary location outside of dbt Cloud. +12. Once you reclone the repo, open the `.gitignore` file in the branch you're working in. If the new changes aren't included, you'll need to merge the latest commits from the main branch into your working branch. +13. Go to the **File Explorer** to verify the `.gitignore` file contains the correct entries and make sure the untracked files/folders in the .gitignore file are in *italics*. +14. Great job 🎉! You've configured the `.gitignore` correctly and can continue with your development! + + + + + + + + +1. Launch the Cloud IDE into the project that is being fixed, by selecting **Develop** on the menu bar. +2. In your **File Explorer**, check to see if a `.gitignore` file exists at the root of your dbt project folder. If it doesn't exist, create a new file. +3. Open the new or existing `gitignore` file, and add the following: + +```bash +target/ +dbt_packages/ +logs/ +# legacy -- renamed to dbt_packages in dbt v1 +dbt_modules/ +``` + + * **Note** — You can place these lines anywhere in the file, as long as they're on separate lines. The lines shown are wildcards that will include all nested file and folders. Avoid adding a trailing `'*'` to the lines, such as `target/*`. + +For more info on `gitignore` syntax, refer to the [Git docs](https://git-scm.com/docs/gitignore). + +4. Save the changes but _don't commit_. +5. Restart the IDE by clicking on the three dots next to the **IDE Status button** on the lower right corner of the IDE screen and select **Restart IDE**. + + + +6. Once the IDE restarts, go to the **File Explorer** to delete the following files or folders (if they exist). No data will be lost: + * `target`, `dbt_modules`, `dbt_packages`, `logs` +7. **Save** and then **Commit and sync** the changes. +8. Restart the IDE again using the same procedure as step 5. +9. Once the IDE restarts, use the 'Create a pull request' (PR) button under the **Version Control** menu to start the process of integrating the changes. +10. When the git provider's website opens to a page with the new PR, follow the necessary steps to compelete and merge the PR into the main branch of that repository. + + * **Note** — The 'main' branch might also be called 'master', 'dev', 'qa', 'prod', or something else depending on the organizational naming conventions. The goal is to merge these changes into the root branch that all other development branches are created from. + +11. Return to the dbt Cloud IDE and use the **Change Branch** button to switch to the main branch of the project. +12. Once the branch has changed, click the **Pull from remote** button to pull in all the changes. +13. Verify the changes by making sure the files/folders in the `.gitignore `file are in italics. + + + + +### Fix in the git provider + +Sometimes it's necessary to use the git providers web interface to fix a broken `.gitignore` file. Although the specific steps may vary across providers, the general process remains the same. + +There are two options for this approach: editing the main branch directly if allowed, or creating a pull request to implement the changes if required: + + + + + +When permissions allow it, it's possible to edit the `.gitignore` directly on the main branch of your repo. Here are the following steps: + +1. Go to your repository's web interface. +2. Switch to the main branch, and the root directory of your dbt project. +3. Find the `.gitignore` file. Create a blank one if it doesn't exist. +4. Edit the file in the web interface, adding the following entries: +```bash +target/ +dbt_packages/ +logs/ +# legacy -- renamed to dbt_packages in dbt v1 +dbt_modules/ +``` +5. Commit (save) the file. +6. Delete the following folders from the dbt project root, if they exist. No data or code will be lost: + * `target`, `dbt_modules`, `dbt_packages`, `logs` +7. Commit (save) the deletions to the main branch. +8. Switch to the dbt Cloud IDE, and open the project that you're fixing. +9. Reclone your repo in the IDE by clicking on the three dots next to the **IDE Status** button on the lower right corner of the IDE screen, then select **Reclone Repo**. + * **Note** — Any saved but uncommitted changes will be lost, so make sure you copy any modified code that you want to keep in a temporary location outside of dbt Cloud. +10. Once you reclone the repo, open the `.gitignore` file in the branch you're working in. If the new changes aren't included, you'll need to merge the latest commits from the main branch into your working branch. +11. Go to the **File Explorer** to verify the `.gitignore` file contains the correct entries and make sure the untracked files/folders in the .gitignore file are in *italics*. +12. Great job 🎉! You've configured the `.gitignore` correctly and can continue with your development! + + + + +If you can't edit the `.gitignore` directly on the main branch of your repo, follow these steps: + +1. Go to your repository's web interface. +2. Switch to an existing development branch, or create a new branch just for these changes (This is often faster and cleaner). +3. Find the `.gitignore` file. Create a blank one if it doesn't exist. +4. Edit the file in the web interface, adding the following entries: +```bash +target/ +dbt_packages/ +logs/ +# legacy -- renamed to dbt_packages in dbt v1 +dbt_modules/ +``` +5. Commit (save) the file. +6. Delete the following folders from the dbt project root, if they exist. No data or code will be lost: + * `target`, `dbt_modules`, `dbt_packages`, `logs` +7. Commit (save) the deleted folders. +8. Open a merge request using the git provider web interface. The merge request should be attempting to merge the changes into the 'main' branch that all development branches are created from. +9. Follow the necessary procedures to get the branch approved and merged into the 'main' branch. You can delete the branch after the merge is complete. +10. Once the merge is complete, go back to the dbt Cloud IDE, and open the project that you're fixing. +11. Reclone your repo in the IDE by clicking on the three dots next to the **IDE Status** button on the lower right corner of the IDE screen, then select **Reclone Repo**. + * **Note** — Any saved but uncommitted changes will be lost, so make sure you copy any modified code that you want to keep in a temporary location outside of dbt Cloud. +12. Once you reclone the repo, open the `.gitignore` file in the branch you're working in. If the new changes aren't included, you'll need to merge the latest commits from the main branch into your working branch. +13. Go to the **File Explorer** to verify the `.gitignore` file contains the correct entries and make sure the untracked files/folders in the .gitignore file are in *italics*. +14. Great job 🎉! You've configured the `.gitignore` correctly and can continue with your development! + + + + + + +For more info, refer to this [detailed video](https://www.loom.com/share/9b3b8e2b617f41a8bad76ec7e42dd014) for additional guidance. diff --git a/website/docs/faqs/Models/unique-model-names.md b/website/docs/faqs/Models/unique-model-names.md index b1a523427c0..c721fca7c6e 100644 --- a/website/docs/faqs/Models/unique-model-names.md +++ b/website/docs/faqs/Models/unique-model-names.md @@ -6,6 +6,20 @@ id: unique-model-names --- -Yes! To build dependencies between models, you need to use the `ref` function. The `ref` function only takes one argument — the model name (i.e. the filename). As a result, these model names need to be unique, _even if they are in distinct folders_. + + +Within one project: yes! To build dependencies between models, you need to use the `ref` function, and pass in the model name as an argument. dbt uses that model name to uniquely resolve the `ref` to a specific model. As a result, these model names need to be unique, _even if they are in distinct folders_. + +A model in one project can have the same name as a model in another project (installed as a dependency). dbt uses the project name to uniquely identify each model. We call this "namespacing." If you `ref` a model with a duplicated name, it will resolve to the model within the same namespace (package or project), or raise an error because of an ambiguous reference. Use [two-argument `ref`](/reference/dbt-jinja-functions/ref#two-argument-variant) to disambiguate references by specifying the namespace. + +Those models will still need to land in distinct locations in the data warehouse. Read the docs on [custom aliases](/docs/build/custom-aliases) and [custom schemas](/docs/build/custom-schemas) for details on how to achieve this. + + + + + +Yes! To build dependencies between models, you need to use the `ref` function, and pass in the model name as an argument. dbt uses that model name to uniquely resolve the `ref` to a specific model. As a result, these model names need to be unique, _even if they are in distinct folders_. Often, this question comes up because users want to give two models the same name in their warehouse, splitting them across separate schemas (e.g. `stripe.users` and `app.users`). Checkout the docs on [custom aliases](/docs/build/custom-aliases) and [custom schemas](/docs/build/custom-schemas) to achieve this. + + diff --git a/website/docs/faqs/Snapshots/snapshot-target-is-not-a-snapshot-table.md b/website/docs/faqs/Snapshots/snapshot-target-is-not-a-snapshot-table.md new file mode 100644 index 00000000000..5ce8f380008 --- /dev/null +++ b/website/docs/faqs/Snapshots/snapshot-target-is-not-a-snapshot-table.md @@ -0,0 +1,29 @@ +--- +title: "Debug Snapshot target is not a snapshot table errors" +description: "Debugging Snapshot target is not a snapshot table" +sidebar_label: "Snapshot target is not a snapshot table" +id: snapshot-target-is-not-a-snapshot-table +--- + +If you see the following error when you try executing the snapshot command: + +> Snapshot target is not a snapshot table (missing `dbt_scd_id`, `dbt_valid_from`, `dbt_valid_to`) + +Double check that you haven't inadvertently caused your snapshot to behave like table materializations by setting its `materialized` config to be `table`. Prior to dbt version 1.4, it was possible to have a snapshot like this: + +```sql +{% snapshot snappy %} + {{ config(materialized = 'table', ...) }} + ... +{% endsnapshot %} +``` + +dbt is treating snapshots like tables (issuing `create or replace table ...` statements) **silently** instead of actually snapshotting data (SCD2 via `insert` / `merge` statements). When upgrading to dbt versions 1.4 and higher, dbt now raises a Parsing Error (instead of silently treating snapshots like tables) that reads: + +``` +A snapshot must have a materialized value of 'snapshot' +``` + +This tells you to change your `materialized` config to `snapshot`. But when you make that change, you might encounter an error message saying that certain fields like `dbt_scd_id` are missing. This error happens because, previously, when dbt treated snapshots as tables, it didn't include the necessary [snapshot meta-fields](/docs/build/snapshots#snapshot-meta-fields) in your target table. Since those meta-fields don't exist, dbt correctly identifies that you're trying to create a snapshot in a table that isn't actually a snapshot. + +When this happens, you have to start from scratch — re-snapshotting your source data as if it was the first time by dropping your "snapshot" which isn't a real snapshot table. Then dbt snapshot will create a new snapshot and insert the snapshot meta-fields as expected. diff --git a/website/docs/faqs/Troubleshooting/gitignore.md b/website/docs/faqs/Troubleshooting/gitignore.md index 47c7500e662..59fd4e8c866 100644 --- a/website/docs/faqs/Troubleshooting/gitignore.md +++ b/website/docs/faqs/Troubleshooting/gitignore.md @@ -1,26 +1,86 @@ --- -title: Why can't I checkout a branch or create a new branch? -description: "Add or fill in gitignore file" -sidebar_label: 'Unable to checkout or create branch' +title: How can I fix my .gitignore file? +description: "Use these instructions to fix your gitignore file" +sidebar_label: 'How to fix your .gitignore file' id: gitignore --- -If you're finding yourself unable to revert changes, check out a branch or click commit - this is usually do to your project missing a .[gitignore](https://github.com/dbt-labs/dbt-starter-project/blob/main/.gitignore) file OR your gitignore file doesn't contain the necessary content inside the folder. +A gitignore file specifies which files Git should intentionally ignore. You can identify these files in your project by their italics formatting. -This is what causes that 'commit' git action button to display. No worries though - to fix this, you'll need to complete the following steps in order: +If you can't revert changes, check out a branch, or click commit — this is usually do to your project missing a [.gitignore](https://github.com/dbt-labs/dbt-starter-project/blob/main/.gitignore) file OR your gitignore file doesn't contain the necessary content inside the folder. -1. In the Cloud IDE, add the missing .gitignore file or contents to your project. You'll want to make sure the .gitignore file includes the following: +To fix this, complete the following steps: - ```shell - target/ - dbt_modules/ - dbt_packages/ - logs/ - ``` + -2. Once you've added that, make sure to save and commit. +1. In the dbt Cloud IDE, add the following [.gitignore contents](https://github.com/dbt-labs/dbt-starter-project/blob/main/.gitignore) in your dbt project `.gitignore` file: +```bash +target/ +dbt_packages/ +logs/ +# legacy -- renamed to dbt_packages in dbt v1 +dbt_modules/ +``` +2. Save your changes but _don't commit_ +3. Restart the IDE by clicking on the three dots next to the **IDE Status button** on the lower right of the IDE. -3. Navigate to the same branch in your remote repository (which can be accessed directly through your git provider's web interface) and delete the logs, target, and dbt_modules/dbt_packages folders. + -4. Go back into the Cloud IDE and reclone your repository. This can be done by clicking on the green "ready" in the bottom right corner of the IDE (next to the command bar), and then clicking the orange "reclone repo" button in the pop up. +4. Select **Restart IDE**. +5. Go back to your dbt project and delete the following files or folders if you have them: + * `target`, `dbt_modules`, `dbt_packages`, `logs` +6. **Save** and then **Commit and sync** your changes. +7. Restart the IDE again. +8. Create a pull request (PR) under the **Version Control** menu to integrate your new changes. +9. Merge the PR on your git provider page. +10. Switch to your main branch and click on **Pull from remote** to pull in all the changes you made to your main branch. You can verify the changes by making sure the files/folders in the .gitignore file are in italics. + + + + + + + +1. In the dbt Cloud IDE, add the following [.gitignore contents](https://github.com/dbt-labs/dbt-starter-project/blob/main/.gitignore) in your dbt project `.gitignore` file: +```bash +target/ +dbt_packages/ +logs/ +# legacy -- renamed to dbt_packages in dbt v1 +dbt_modules/ +``` +2. Go to your `dbt_project.yml` file and add `tmp/` after your `target-path:` and add `log-path: "tmp/logs"`. + * So it should look like: `target-path: "tmp/target"` and `log-path: "tmp/logs"`: + + + +3. Save your changes but _don't commit_. +4. Restart the IDE by clicking on the three dots next to the **IDE Status button** on the lower right of the IDE. + + + +5. Select **Restart IDE**. +6. Go back to your dbt project and delete the following four folders (if you have them): + * `target` + * `dbt_modules` + * `dbt_packages` + * `logs` +7. **Save** and then **Commit and sync** your changes. +8. Go back to your `dbt_project.yml` file and undo the modifications you made in **Step 2**. + + * Remove `tmp` from your `target-path` and completely remove the `log-path: "tmp/logs"` line. + + + +9. Restart the IDE again. +10. Delete the `tmp` folder in the **File Explorer**. +11. Create a pull request (PR) under the **Version Control** menu to integrate your new changes. +12. Merge the PR in your git provider page. +13. Switch to your main branch and click on **Pull from remote** to pull in all the changes you made to your main branch. You can verify the changes by making sure the files/folders in the .gitignore file are in italics. + + + + + +For more info, refer to this [detailed video](https://www.loom.com/share/9b3b8e2b617f41a8bad76ec7e42dd014) for additional guidance. diff --git a/website/docs/guides/best-practices/debugging-errors.md b/website/docs/guides/best-practices/debugging-errors.md index 288a079c9a2..39670820ddd 100644 --- a/website/docs/guides/best-practices/debugging-errors.md +++ b/website/docs/guides/best-practices/debugging-errors.md @@ -18,7 +18,7 @@ Learning how to debug is a skill, and one that will make you great at your role! - The `logs/dbt.log` file contains all the queries that dbt runs, and additional logging. Recent errors will be at the bottom of the file. - **dbt Cloud users**: Use the above, or the `Details` tab in the command output. - **dbt CLI users**: Note that your code editor _may_ be hiding these files from the tree [VSCode help](https://stackoverflow.com/questions/42891463/how-can-i-show-ignored-files-in-visual-studio-code)). -5. If you are really stuck, try [asking for help](/guides/legacy/getting-help). Before doing so, take the time to write your question well so that others can diagnose the problem quickly. +5. If you are really stuck, try [asking for help](/community/resources/getting-help). Before doing so, take the time to write your question well so that others can diagnose the problem quickly. ## Types of errors diff --git a/website/docs/guides/best-practices/environment-setup/1-env-guide-overview.md b/website/docs/guides/best-practices/environment-setup/1-env-guide-overview.md index 8dfd8c48d30..17811b14ca3 100644 --- a/website/docs/guides/best-practices/environment-setup/1-env-guide-overview.md +++ b/website/docs/guides/best-practices/environment-setup/1-env-guide-overview.md @@ -24,7 +24,7 @@ This guide has three main goals: - At each stage, explain *why* we recommend the approach that we do, so that you're equipped to decide when and where to deviate from these recommendations to better fit your organization’s unique needs :::info -☁️ This guide focuses on architecture for **dbt Cloud**. However, similar principles apply for developers using dbt Core. Before diving into this guide we recommend taking a look at our **[dbt Cloud environments](/docs/collaborate/environments/dbt-cloud-environments)** page for more context. +☁️ This guide focuses on architecture for **dbt Cloud**. However, similar principles apply for developers using dbt Core. Before diving into this guide we recommend taking a look at our **[dbt Cloud environments](/docs/dbt-cloud-environments)** page for more context. ::: diff --git a/website/docs/guides/best-practices/environment-setup/2-one-deployment-environment.md b/website/docs/guides/best-practices/environment-setup/2-one-deployment-environment.md index 2c5eb6029e6..d7d64eda548 100644 --- a/website/docs/guides/best-practices/environment-setup/2-one-deployment-environment.md +++ b/website/docs/guides/best-practices/environment-setup/2-one-deployment-environment.md @@ -32,10 +32,10 @@ hoverSnippet: Learn how to configure a single deployment environment setup in db 3. [**Slim CI Job**](/docs/deploy/continuous-integration) automatically kicks off, and tests the changes made in the PR 4. When Slim CI Job is successful and team is ready to deploy changes to Production, the PR is merged directly into the `main` branch. The next time a production job runs, these changes will be incorporated and executed. -### dbt Cloud setup +### dbt Cloud setup -1. Create your [**development environment**](/docs/collaborate/environments/dbt-cloud-environments#create-a-development-environment) to power the dbt Cloud IDE. No extra customization needed! -2. Create your **[production deployment environment](/docs/collaborate/environments/dbt-cloud-environments#create-a-deployment-environment)**. +1. Create your [**development environment**](/docs/dbt-cloud-environments) to power the dbt Cloud IDE. No extra customization needed! +2. Create your **[production deployment environment](/docs/deploy/deploy-environments)**. 3. Define your **dbt Cloud jobs** in the production deployment environment from step 2. 1. **Production job(s)**: You will need to set up **at least one scheduled job** that deploys your project to your production databases/schemas. You may create multiple jobs based on your business SLAs. 2. **Slim CI Job**: Unlike the production jobs, which are triggered via the scheduler, this job will be triggered when PRs are opened in your repository. Refer to [Slim CI jobs](/docs/deploy/slim-ci-jobs) for details. @@ -43,7 +43,7 @@ hoverSnippet: Learn how to configure a single deployment environment setup in db ### When this works well -This approach is the recommended approach for most use-cases as it allows changes to code to be quickly promoted to production, with confidence that they can be trusted. With this option, multiple developers can easily contributing to the same code base with confidence. +This approach is recommended for most use cases because it enables you to quickly and safely implement code changes in the production environment. It also gives developers the confidence to trust and rely on these changes. With this option, multiple developers can easily contribute to and collaborate on the same codebase with confidence. :::info 💡 Check out [Sunrun's Coalesce 2022 talk](https://www.youtube.com/watch?v=vmBAO2XN-fM) on Automating CI/CD in dbt Cloud, where they simplified their CI/CD process from several long-lived branches to a single long-lived main branch with feature branches. diff --git a/website/docs/guides/best-practices/environment-setup/3-many-deployment-environments.md b/website/docs/guides/best-practices/environment-setup/3-many-deployment-environments.md index 1df787eea7d..cf9f6954ca7 100644 --- a/website/docs/guides/best-practices/environment-setup/3-many-deployment-environments.md +++ b/website/docs/guides/best-practices/environment-setup/3-many-deployment-environments.md @@ -24,7 +24,7 @@ hoverSnippet: Learn how to configure a many deployment environment setup in dbt 2. When code is ready, developer opens a PR to merge feature branch into `qa`. 3. The **first Slim CI Job** automatically kicks off to test the changes introduced in the PR. This job will *defer to a regularly-scheduled job in the QA environment* and run in the QA deployment environment. 4. When **Slim CI Job is successful** and team is ready to deploy changes, the **PR is merged into `qa`.** -5. Scheduled jobs run in the QA deployment environment, running on `qa` branch to ensure the new changes work as indended. +5. Scheduled jobs run in the QA deployment environment, running on `qa` branch to ensure the new changes work as intended. 6. When **all feature branches** for a given release (e.g. sprint) have been **successfully merged** to `qa` and are **running without error** in the QA deployment environment, a team member opens a **PR to merge `qa` → `main`.** 7. The **second Slim CI Job** automatically kicks off to test changes in PR. This job will *defer to a regularly-scheduled job in the Production environment* and run in the Production deployment environment. 8. When **second Slim CI Job** is successful and team is ready to deploy changes, the **PR is merged into `main`**. @@ -37,11 +37,11 @@ hoverSnippet: Learn how to configure a many deployment environment setup in dbt ### dbt Cloud setup -1. Create your [**development environment**](docs/collaborate/environments/dbt-cloud-environments#create-a-development-environment) to power the dbt Cloud IDE. +1. Create your [**development environment**](/docs/dbt-cloud-environments) to power the dbt Cloud IDE. Here, we’ll set a **custom branch** so that users in the IDE create their feature branches from `qa` instead of `main`. Click **Only run on a custom branch** in **General settings**, enter `qa` into **Custom Branch.** -2. Set up your **QA [deployment environment](docs/collaborate/environments/dbt-cloud-environments#create-a-deployment-environment)** +2. Set up your **QA [deployment environment](/docs/deploy/deploy-environments)** Here, we’ll apply the same custom branch settings as the development environment in Step 1. All scheduled jobs in the QA deployment environment will use the code from the `qa` branch during execution. @@ -51,7 +51,7 @@ hoverSnippet: Learn how to configure a many deployment environment setup in dbt This job will also need to defer to one of the QA jobs created in step 3a. This enables the use of the `state` modifier in your selection syntax to only run changes introduced by your PR. -4. Set up your **Production [deployment environment](docs/collaborate/environments/dbt-cloud-environments#create-a-deployment-environment)** +4. Set up your **Production [deployment environment](/docs/deploy/deploy-environments)** Here, we’ll *also* use the same custom branch settings as the other environments, but set the custom branch as `main`. Even thought the `main` branch is the default, setting this value enables us to properly set up the CI Job in the next step. diff --git a/website/docs/guides/best-practices/how-we-style/0-how-we-style-our-dbt-projects.md b/website/docs/guides/best-practices/how-we-style/0-how-we-style-our-dbt-projects.md new file mode 100644 index 00000000000..dd695af2602 --- /dev/null +++ b/website/docs/guides/best-practices/how-we-style/0-how-we-style-our-dbt-projects.md @@ -0,0 +1,29 @@ +--- +title: How we style our dbt projects +id: 0-how-we-style-our-dbt-projects +--- + +## Why does style matter? + +Style might seem like a trivial, surface-level issue, but it's a deeply material aspect of a well-built project. A consistent, clear style enhances readability and makes your project easier to understand and maintain. Highly readable code helps build clear mental models making it easier to debug and extend your project. It's not just a favor to yourself, though; equally importantly, it makes it less effort for others to understand and contribute to your project, which is essential for peer collaboration, open-source work, and onboarding new team members. [A style guide lets you focus on what matters](https://mtlynch.io/human-code-reviews-1/#settle-style-arguments-with-a-style-guide), the logic and impact of your project, rather than the superficialities of how it's written. This brings harmony and pace to your team's work, and makes reviews more enjoyable and valuable. + +## What's important about style? + +There are two crucial tenets of code style: + +- Clarity +- Consistency + +Style your code in such a way that you can quickly read and understand it. It's also important to consider code review and git diffs. If you're making a change to a model, you want reviewers to see just the material changes you're making clearly. + +Once you've established a clear style, stay consistent. This is the most important thing. Everybody on your team needs to have a unified style, which is why having a style guide is so crucial. If you're writing a model, you should be able to look at other models in the project that your teammates have written and read in the same style. If you're writing a macro or a test, you should see the same style as your models. Consistency is key. + +## How should I style? + +You should style the project in a way you and your teammates or collaborators agree on. The most important thing is that you have a style guide and stick to it. This guide is just a suggestion to get you started and to give you a sense of what a style guide might look like. It covers various areas you may want to consider, with suggested rules. It emphasizes lots of whitespace, clarity, clear naming, and comments. + +We believe one of the strengths of SQL is that it reads like English, so we lean into that declarative nature throughout our projects. Even within dbt Labs, though, there are differing opinions on how to style, even a small but passionate contingent of leading comma enthusiasts! Again, the important thing is not to follow this style guide; it's to make _your_ style guide and follow it. Lastly, be sure to include rules, tools, _and_ examples in your style guide to make it as easy as possible for your team to follow. + +## Automation + +Use formatters and linters as much as possible. We're all human, we make mistakes. Not only that, but we all have different preferences and opinions while writing code. Automation is a great way to ensure that your project is styled consistently and correctly and that people can write in a way that's quick and comfortable for them, while still getting perfectly consistent output. diff --git a/website/docs/guides/best-practices/how-we-style/1-how-we-style-our-dbt-models.md b/website/docs/guides/best-practices/how-we-style/1-how-we-style-our-dbt-models.md new file mode 100644 index 00000000000..0157af63cfb --- /dev/null +++ b/website/docs/guides/best-practices/how-we-style/1-how-we-style-our-dbt-models.md @@ -0,0 +1,66 @@ +--- +title: How we style our dbt models +id: 1-how-we-style-our-dbt-models +--- + +## Fields and model names + +- 👥 Models should be pluralized, for example, `customers`, `orders`, `products`. +- 🔑 Each model should have a primary key. +- 🔑 The primary key of a model should be named `_id`, for example, `account_id`. This makes it easier to know what `id` is being referenced in downstream joined models. +- 🔑 Keys should be string data types. +- 🔑 Consistency is key! Use the same field names across models where possible. For example, a key to the `customers` table should be named `customer_id` rather than `user_id` or 'id'. +- ❌ Do not use abbreviations or aliases. Emphasize readability over brevity. For example, do not use `cust` for `customer` or `o` for `orders`. +- ❌ Avoid reserved words as column names. +- ➕ Booleans should be prefixed with `is_` or `has_`. +- 🕰️ Timestamp columns should be named `_at`(for example, `created_at`) and should be in UTC. If a different timezone is used, this should be indicated with a suffix (`created_at_pt`). +- 📆 Dates should be named `_date`. For example, `created_date.` +- 🔙 Events dates and times should be past tense — `created`, `updated`, or `deleted`. +- 💱 Price/revenue fields should be in decimal currency (`19.99` for $19.99; many app databases store prices as integers in cents). If a non-decimal currency is used, indicate this with a suffix (`price_in_cents`). +- 🐍 Schema, table and column names should be in `snake_case`. +- 🏦 Use names based on the _business_ terminology, rather than the source terminology. For example, if the source database uses `user_id` but the business calls them `customer_id`, use `customer_id` in the model. +- 🔢 Versions of models should use the suffix `_v1`, `_v2`, etc for consistency (`customers_v1` and `customers_v2`). +- 🗄️ Use a consistent ordering of data types and consider grouping and labeling columns by type, as in the example below. This will minimize join errors and make it easier to read the model, as well as help downstream consumers of the data understand the data types and scan models for the columns they need. We prefer to use the following order: ids, strings, numerics, booleans, dates, and timestamps. + +## Example model + +```sql +with + +source as ( + + select * from {{ source('ecom', 'raw_orders') }} + +), + +renamed as ( + + select + + ---------- ids + id as order_id, + store_id as location_id, + customer as customer_id, + + ---------- strings + status as order_status, + + ---------- numerics + (order_total / 100.0)::float as order_total, + (tax_paid / 100.0)::float as tax_paid, + + ---------- booleans + is_fulfilled, + + ---------- dates + date(order_date) as ordered_date, + + ---------- timestamps + ordered_at + + from source + +) + +select * from renamed +``` diff --git a/website/docs/guides/best-practices/how-we-style/2-how-we-style-our-sql.md b/website/docs/guides/best-practices/how-we-style/2-how-we-style-our-sql.md new file mode 100644 index 00000000000..1ea9c064d74 --- /dev/null +++ b/website/docs/guides/best-practices/how-we-style/2-how-we-style-our-sql.md @@ -0,0 +1,183 @@ +--- +title: How we style our SQL +id: 2-how-we-style-our-sql +--- + +## Basics + +- ☁️ Use [SQLFluff](https://sqlfluff.com/) to maintain these style rules automatically. + - Reference this [SQLFluff config file](https://github.com/dbt-labs/jaffle-shop-template/blob/main/.sqlfluff) for the rules we use. +- 👻 Use Jinja comments (`{# #}`) for comments that should not be included in the compiled SQL. +- ⏭️ Use trailing commas. +- 4️⃣ Indents should be four spaces. +- 📏 Lines of SQL should be no longer than 80 characters. +- ⬇️ Field names, keywords, and function names should all be lowercase. +- 🫧 The `as` keyword should be used explicitly when aliasing a field or table. + +:::info +☁️ dbt Cloud users can use the built-in [SQLFluff Cloud IDE integration](https://docs.getdbt.com/docs/cloud/dbt-cloud-ide/lint-format) to automatically lint and format their SQL. The default style sheet is based on dbt Labs style as outlined in this guide, but you can customize this to fit your needs. No need to setup any external tools, just hit `Lint`! Also, the more opinionated [sqlfmt](http://sqlfmt.com/) formatter is also available if you prefer that style. +::: + +## Fields, aggregations, and grouping + +- 🔙 Fields should be stated before aggregates and window functions. +- 🤏🏻 Aggregations should be executed as early as possible (on the smallest data set possible) before joining to another table to improve performance. +- 🔢 Ordering and grouping by a number (eg. group by 1, 2) is preferred over listing the column names (see [this classic rant](https://blog.getdbt.com/write-better-sql-a-defense-of-group-by-1/) for why). Note that if you are grouping by more than a few columns, it may be worth revisiting your model design. + +## Joins + +- 👭🏻 Prefer `union all` to `union` unless you explicitly want to remove duplicates. +- 👭🏻 If joining two or more tables, _always_ prefix your column names with the table name. If only selecting from one table, prefixes are not needed. +- 👭🏻 Be explicit about your join type (i.e. write `inner join` instead of `join`). +- 🥸 Avoid table aliases in join conditions (especially initialisms) — it's harder to understand what the table called "c" is as compared to "customers". +- ➡️ Always move left to right to make joins easy to reason about - `right joins` often indicate that you should change which table you select `from` and which one you `join` to. + +## 'Import' CTEs + +- 🔝 All `{{ ref('...') }}` statements should be placed in CTEs at the top of the file. +- 📦 'Import' CTEs should be named after the table they are referencing. +- 🤏🏻 Limit the data scanned by CTEs as much as possible. Where possible, only select the columns you're actually using and use `where` clauses to filter out unneeded data. +- For example: + +```sql +with + +orders as ( + + select + order_id, + customer_id, + order_total, + order_date + + from {{ ref('orders') }} + + where order_date >= '2020-01-01' + +) +``` + +## 'Functional' CTEs + +- ☝🏻 Where performance permits, CTEs should perform a single, logical unit of work. +- 📖 CTE names should be as verbose as needed to convey what they do e.g. `events_joined_to_users` instead of `user_events` (this could be a good model name, but does not describe a specific function or transformation). +- 🌉 CTEs that are duplicated across models should be pulled out into their own intermediate models. Look out for chunks of repeated logic that should be refactored into their own model. +- 🔚 The last line of a model should be a `select *` from your final output CTE. This makes it easy to materialize and audit the output from different steps in the model as you're developing it. You just change the CTE referenced in the `select` statement to see the output from that step. + +## Model configuration + +- 📝 Model-specific attributes (like sort/dist keys) should be specified in the model. +- 📂 If a particular configuration applies to all models in a directory, it should be specified in the `dbt_project.yml` file. +- 👓 In-model configurations should be specified like this for maximum readability: + +```sql +{{ + config( + materialized = 'table', + sort = 'id', + dist = 'id' + ) +}} +``` + +## Example SQL + +```sql +with + +events as ( + + ... + +), + +{# CTE comments go here #} +filtered_events as ( + + ... + +) + +select * from filtered_events +``` + +### Example SQL + +```sql +with + +my_data as ( + + select + field_1, + field_2, + field_3, + cancellation_date, + expiration_date, + start_date + + from {{ ref('my_data') }} + +), + +some_cte as ( + + select + id, + field_4, + field_5 + + from {{ ref('some_cte') }} + +), + +some_cte_agg as ( + + select + id, + sum(field_4) as total_field_4, + max(field_5) as max_field_5 + + from some_cte + + group by 1 + +), + +joined as ( + + select + my_data.field_1, + my_data.field_2, + my_data.field_3, + + -- use line breaks to visually separate calculations into blocks + case + when my_data.cancellation_date is null + and my_data.expiration_date is not null + then expiration_date + when my_data.cancellation_date is null + then my_data.start_date + 7 + else my_data.cancellation_date + end as cancellation_date, + + some_cte_agg.total_field_4, + some_cte_agg.max_field_5 + + from my_data + + left join some_cte_agg + on my_data.id = some_cte_agg.id + + where my_data.field_1 = 'abc' and + ( + my_data.field_2 = 'def' or + my_data.field_2 = 'ghi' + ) + + having count(*) > 1 + +) + +select * from joined +``` diff --git a/website/docs/guides/best-practices/how-we-style/3-how-we-style-our-python.md b/website/docs/guides/best-practices/how-we-style/3-how-we-style-our-python.md new file mode 100644 index 00000000000..5443abf302d --- /dev/null +++ b/website/docs/guides/best-practices/how-we-style/3-how-we-style-our-python.md @@ -0,0 +1,44 @@ +--- +title: How we style our Python +id: 3-how-we-style-our-python +--- + +## Python tooling + +- 🐍 Python has a more mature and robust ecosystem for formatting and linting (helped by the fact that it doesn't have a million distinct dialects). We recommend using those tools to format and lint your code in the style you prefer. + +- 🛠️ Our current recommendations are + + - [black](https://pypi.org/project/black/) formatter + - [ruff](https://pypi.org/project/ruff/) linter + + :::info + ☁️ dbt Cloud comes with the [black formatter built-in](https://docs.getdbt.com/docs/cloud/dbt-cloud-ide/lint-format) to automatically lint and format their SQL. You don't need to download or configure anything, just click `Format` in a Python model and you're good to go! + ::: + +## Example Python + +```python +import pandas as pd + + +def model(dbt, session): + # set length of time considered a churn + pd.Timedelta(days=2) + + dbt.config(enabled=False, materialized="table", packages=["pandas==1.5.2"]) + + orders_relation = dbt.ref("stg_orders") + + # converting a DuckDB Python Relation into a pandas DataFrame + orders_df = orders_relation.df() + + orders_df.sort_values(by="ordered_at", inplace=True) + orders_df["previous_order_at"] = orders_df.groupby("customer_id")[ + "ordered_at" + ].shift(1) + orders_df["next_order_at"] = orders_df.groupby("customer_id")["ordered_at"].shift( + -1 + ) + return orders_df +``` diff --git a/website/docs/guides/best-practices/how-we-style/4-how-we-style-our-jinja.md b/website/docs/guides/best-practices/how-we-style/4-how-we-style-our-jinja.md new file mode 100644 index 00000000000..3a969d2bdd3 --- /dev/null +++ b/website/docs/guides/best-practices/how-we-style/4-how-we-style-our-jinja.md @@ -0,0 +1,37 @@ +--- +title: How we style our Jinja +id: 4-how-we-style-our-jinja +--- + +## Jinja style guide + +- 🫧 When using Jinja delimiters, use spaces on the inside of your delimiter, like `{{ this }}` instead of `{{this}}` +- 🆕 Use newlines to visually indicate logical blocks of Jinja. +- 4️⃣ Indent 4 spaces into a Jinja block to indicate visually that the code inside is wrapped by that block. +- ❌ Don't worry (too much) about Jinja whitespace control, focus on your project code being readable. The time you save by not worrying about whitespace control will far outweigh the time you spend in your compiled code where it might not be perfect. + +## Examples of Jinja style + +```jinja +{% macro make_cool(uncool_id) %} + + do_cool_thing({{ uncool_id }}) + +{% endmacro %} +``` + +```sql +select + entity_id, + entity_type, + {% if this %} + + {{ that }}, + + {% else %} + + {{ the_other_thing }}, + + {% endif %} + {{ make_cool('uncool_id') }} as cool_id +``` diff --git a/website/docs/guides/best-practices/how-we-style/5-how-we-style-our-yaml.md b/website/docs/guides/best-practices/how-we-style/5-how-we-style-our-yaml.md new file mode 100644 index 00000000000..323ed3ac11d --- /dev/null +++ b/website/docs/guides/best-practices/how-we-style/5-how-we-style-our-yaml.md @@ -0,0 +1,44 @@ +--- +title: How we style our YAML +id: 5-how-we-style-our-yaml +--- + +## YAML Style Guide + +- 2️⃣ Indents should be two spaces +- ➡️ List items should be indented +- 🆕 Use a new line to separate list items that are dictionaries where appropriate +- 📏 Lines of YAML should be no longer than 80 characters. +- 🛠️ Use the [dbt JSON schema](https://github.com/dbt-labs/dbt-jsonschema) with any compatible IDE and a YAML formatter (we recommend [Prettier](https://prettier.io/) to validate your YAML files and format them automatically. + +:::info +☁️ As with Python and SQL, the dbt Cloud IDE comes with built-in formatting for YAML files (Markdown and JSON too!), via Prettier. Just click the `Format` button and you're in perfect style. As with the other tools, you can [also customize the formatting rules](https://docs.getdbt.com/docs/cloud/dbt-cloud-ide/lint-format#format-yaml-markdown-json) to your liking to fit your company's style guide. +::: + +### Example YAML + +```yaml +version: 2 + +models: + - name: events + columns: + - name: event_id + description: This is a unique identifier for the event + tests: + - unique + - not_null + + - name: event_time + description: "When the event occurred in UTC (eg. 2018-01-01 12:00:00)" + tests: + - not_null + + - name: user_id + description: The ID of the user who recorded the event + tests: + - not_null + - relationships: + to: ref('users') + field: id +``` diff --git a/website/docs/guides/best-practices/how-we-style/6-how-we-style-conclusion.md b/website/docs/guides/best-practices/how-we-style/6-how-we-style-conclusion.md new file mode 100644 index 00000000000..22f8e36190a --- /dev/null +++ b/website/docs/guides/best-practices/how-we-style/6-how-we-style-conclusion.md @@ -0,0 +1,12 @@ +--- +title: Now it's your turn +id: 6-how-we-style-conclusion +--- + +## BYO Styles + +Now that you've seen how we style our dbt projects, it's time to build your own. Feel free to copy this guide and use it as a template for your own project. If you do, we'd love to hear about it! Reach out to us on [the Community Forum](https://discourse.getdbt.com/c/show-and-tell/22) or [Slack](https://www.getdbt.com/community) to share your style guide. We recommend co-locating your style guide with your code to make sure contributors can easily follow it. If you're using GitHub, you can add your style guide to your repository's wiki, or include it in your README. + +## Pre-commit hooks + +Lastly, to ensure your style guide's automated rules are being followed without additional mental overhead to your team, you can use [pre-commit hooks](https://pre-commit.com/) to automatically check your code for style violations (and often fix them automagically) before it's committed. This is a great way to make sure your style guide is followed by all contributors. We recommend implementing this once you've settled on and published your style guide, and your codebase is conforming to it. This will ensure that all future commits follow the style guide. You can find an excellent set of open source pre-commit hooks for dbt from the community [here in the dbt-checkpoint project](https://github.com/dbt-checkpoint/dbt-checkpoint). diff --git a/website/docs/guides/dbt-ecosystem/databricks-guides/productionizing-your-dbt-databricks-project.md b/website/docs/guides/dbt-ecosystem/databricks-guides/productionizing-your-dbt-databricks-project.md index 62a9ee5e8f3..5da8cc6616b 100644 --- a/website/docs/guides/dbt-ecosystem/databricks-guides/productionizing-your-dbt-databricks-project.md +++ b/website/docs/guides/dbt-ecosystem/databricks-guides/productionizing-your-dbt-databricks-project.md @@ -16,7 +16,7 @@ If you don't have any of the following requirements, refer to the instructions i - You have [optimized your dbt models for peak performance](/guides/dbt-ecosystem/databricks-guides/how_to_optimize_dbt_models_on_databricks). - You have created two catalogs in Databricks: *dev* and *prod*. - You have created Databricks Service Principal to run your production jobs. -- You have at least one [deployment environment](docs/collaborate/environments/dbt-cloud-environments) in dbt Cloud. +- You have at least one [deployment environment](/docs/deploy/deploy-environments) in dbt Cloud. To get started, let's revisit the deployment environment created for your production data. @@ -24,7 +24,7 @@ To get started, let's revisit the deployment environment created for your produc In software engineering, environments play a crucial role in allowing engineers to develop and test code without affecting the end users of their software. Similarly, you can design [data lakehouses](https://www.databricks.com/product/data-lakehouse) with separate environments. The _production_ environment includes the relations (schemas, tables, and views) that end users query or use, typically in a BI tool or ML model. -In dbt Cloud, [environments](/docs/collaborate/environments/dbt-cloud-environments) come in two flavors: +In dbt Cloud, [environments](/docs/dbt-cloud-environments) come in two flavors: - Deployment — Defines the settings used for executing jobs created within that environment. - Development — Determine the settings used in the dbt Cloud IDE for a particular dbt Cloud project. diff --git a/website/docs/guides/dbt-ecosystem/dbt-python-snowpark/10-python-transformations.md b/website/docs/guides/dbt-ecosystem/dbt-python-snowpark/10-python-transformations.md index 47e09311bc6..446981214e3 100644 --- a/website/docs/guides/dbt-ecosystem/dbt-python-snowpark/10-python-transformations.md +++ b/website/docs/guides/dbt-ecosystem/dbt-python-snowpark/10-python-transformations.md @@ -144,7 +144,7 @@ Let’s take a step back before starting machine learning to both review and go ```python def model(dbt, session): - # setting configuration - dbt.config(materialized="table") + # setting configuration + dbt.config(materialized="table") ``` - - There's a limit to how complex you can get with the `dbt.config()` method. It accepts only literal values (strings, booleans, and numeric types). Passing another function or a more complex data structure is not possible. The reason is that dbt statically analyzes the arguments to `.config()` while parsing your model without executing your Python code. If you need to set a more complex configuration, we recommend you define it using the config property in a [YAML file](/reference/resource-properties/config). Learn more about configurations [here](/reference/model-configs). \ No newline at end of file + - There's a limit to how complex you can get with the `dbt.config()` method. It accepts only literal values (strings, booleans, and numeric types). Passing another function or a more complex data structure is not possible. The reason is that dbt statically analyzes the arguments to `.config()` while parsing your model without executing your Python code. If you need to set a more complex configuration, we recommend you define it using the config property in a [YAML file](/reference/resource-properties/config). Learn more about configurations [here](/reference/model-configs). diff --git a/website/docs/guides/dbt-ecosystem/dbt-python-snowpark/11-machine-learning-prep.md b/website/docs/guides/dbt-ecosystem/dbt-python-snowpark/11-machine-learning-prep.md index a6eaecce6fd..bde163b59db 100644 --- a/website/docs/guides/dbt-ecosystem/dbt-python-snowpark/11-machine-learning-prep.md +++ b/website/docs/guides/dbt-ecosystem/dbt-python-snowpark/11-machine-learning-prep.md @@ -112,43 +112,43 @@ In this next part, we’ll be performing covariate encoding. Breaking down this from sklearn.linear_model import LogisticRegression def model(dbt, session): - # dbt configuration - dbt.config(packages=["pandas","numpy","scikit-learn"]) + # dbt configuration + dbt.config(packages=["pandas","numpy","scikit-learn"]) - # get upstream data - data = dbt.ref("ml_data_prep").to_pandas() + # get upstream data + data = dbt.ref("ml_data_prep").to_pandas() - # list out covariates we want to use in addition to outcome variable we are modeling - position - covariates = data[['RACE_YEAR','CIRCUIT_NAME','GRID','CONSTRUCTOR_NAME','DRIVER','DRIVERS_AGE_YEARS','DRIVER_CONFIDENCE','CONSTRUCTOR_RELAIBLITY','TOTAL_PIT_STOPS_PER_RACE','ACTIVE_DRIVER','ACTIVE_CONSTRUCTOR', 'POSITION']] + # list out covariates we want to use in addition to outcome variable we are modeling - position + covariates = data[['RACE_YEAR','CIRCUIT_NAME','GRID','CONSTRUCTOR_NAME','DRIVER','DRIVERS_AGE_YEARS','DRIVER_CONFIDENCE','CONSTRUCTOR_RELAIBLITY','TOTAL_PIT_STOPS_PER_RACE','ACTIVE_DRIVER','ACTIVE_CONSTRUCTOR', 'POSITION']] - # filter covariates on active drivers and constructors - # use fil_cov as short for "filtered_covariates" - fil_cov = covariates[(covariates['ACTIVE_DRIVER']==1)&(covariates['ACTIVE_CONSTRUCTOR']==1)] - - # Encode categorical variables using LabelEncoder - # TODO: we'll update this to both ohe in the future for non-ordinal variables! - le = LabelEncoder() - fil_cov['CIRCUIT_NAME'] = le.fit_transform(fil_cov['CIRCUIT_NAME']) - fil_cov['CONSTRUCTOR_NAME'] = le.fit_transform(fil_cov['CONSTRUCTOR_NAME']) - fil_cov['DRIVER'] = le.fit_transform(fil_cov['DRIVER']) - fil_cov['TOTAL_PIT_STOPS_PER_RACE'] = le.fit_transform(fil_cov['TOTAL_PIT_STOPS_PER_RACE']) - - # Simply target variable "position" to represent 3 meaningful categories in Formula1 - # 1. Podium position 2. Points for team 3. Nothing - no podium or points! - def position_index(x): - if x<4: - return 1 - if x>10: - return 3 - else : - return 2 - - # we are dropping the columns that we filtered on in addition to our training variable - encoded_data = fil_cov.drop(['ACTIVE_DRIVER','ACTIVE_CONSTRUCTOR'],1) - encoded_data['POSITION_LABEL']= encoded_data['POSITION'].apply(lambda x: position_index(x)) - encoded_data_grouped_target = encoded_data.drop(['POSITION'],1) - - return encoded_data_grouped_target + # filter covariates on active drivers and constructors + # use fil_cov as short for "filtered_covariates" + fil_cov = covariates[(covariates['ACTIVE_DRIVER']==1)&(covariates['ACTIVE_CONSTRUCTOR']==1)] + + # Encode categorical variables using LabelEncoder + # TODO: we'll update this to both ohe in the future for non-ordinal variables! + le = LabelEncoder() + fil_cov['CIRCUIT_NAME'] = le.fit_transform(fil_cov['CIRCUIT_NAME']) + fil_cov['CONSTRUCTOR_NAME'] = le.fit_transform(fil_cov['CONSTRUCTOR_NAME']) + fil_cov['DRIVER'] = le.fit_transform(fil_cov['DRIVER']) + fil_cov['TOTAL_PIT_STOPS_PER_RACE'] = le.fit_transform(fil_cov['TOTAL_PIT_STOPS_PER_RACE']) + + # Simply target variable "position" to represent 3 meaningful categories in Formula1 + # 1. Podium position 2. Points for team 3. Nothing - no podium or points! + def position_index(x): + if x<4: + return 1 + if x>10: + return 3 + else : + return 2 + + # we are dropping the columns that we filtered on in addition to our training variable + encoded_data = fil_cov.drop(['ACTIVE_DRIVER','ACTIVE_CONSTRUCTOR'],1) + encoded_data['POSITION_LABEL']= encoded_data['POSITION'].apply(lambda x: position_index(x)) + encoded_data_grouped_target = encoded_data.drop(['POSITION'],1) + + return encoded_data_grouped_target ``` 2. Execute the following in the command bar: ```bash @@ -222,4 +222,4 @@ Now that we’ve cleaned and encoded our data, we are going to further split in To run our temporal data split models, we can use this syntax in the command line to run them both at once. Make sure you use a *space* [syntax](/reference/node-selection/syntax) between the model names to indicate you want to run both! 4. **Commit and push** our changes to keep saving our work as we go using `ml data prep and splits` before moving on. -👏 Now that we’ve finished our machine learning prep work we can move onto the fun part — training and prediction! \ No newline at end of file +👏 Now that we’ve finished our machine learning prep work we can move onto the fun part — training and prediction! diff --git a/website/docs/guides/dbt-ecosystem/dbt-python-snowpark/12-machine-learning-training-testing.md b/website/docs/guides/dbt-ecosystem/dbt-python-snowpark/12-machine-learning-training-testing.md index 9381b223f56..8b353a85fa3 100644 --- a/website/docs/guides/dbt-ecosystem/dbt-python-snowpark/12-machine-learning-training-testing.md +++ b/website/docs/guides/dbt-ecosystem/dbt-python-snowpark/12-machine-learning-training-testing.md @@ -34,59 +34,59 @@ If you haven’t seen code like this before or use joblib files to save machine logger = logging.getLogger("mylog") def save_file(session, model, path, dest_filename): - input_stream = io.BytesIO() - joblib.dump(model, input_stream) - session._conn.upload_stream(input_stream, path, dest_filename) - return "successfully created file: " + path + input_stream = io.BytesIO() + joblib.dump(model, input_stream) + session._conn.upload_stream(input_stream, path, dest_filename) + return "successfully created file: " + path def model(dbt, session): - dbt.config( - packages = ['numpy','scikit-learn','pandas','numpy','joblib','cachetools'], - materialized = "table", - tags = "train" - ) - # Create a stage in Snowflake to save our model file - session.sql('create or replace stage MODELSTAGE').collect() + dbt.config( + packages = ['numpy','scikit-learn','pandas','numpy','joblib','cachetools'], + materialized = "table", + tags = "train" + ) + # Create a stage in Snowflake to save our model file + session.sql('create or replace stage MODELSTAGE').collect() - #session._use_scoped_temp_objects = False - version = "1.0" - logger.info('Model training version: ' + version) + #session._use_scoped_temp_objects = False + version = "1.0" + logger.info('Model training version: ' + version) - # read in our training and testing upstream dataset - test_train_df = dbt.ref("train_test_dataset") + # read in our training and testing upstream dataset + test_train_df = dbt.ref("train_test_dataset") - # cast snowpark df to pandas df - test_train_pd_df = test_train_df.to_pandas() - target_col = "POSITION_LABEL" + # cast snowpark df to pandas df + test_train_pd_df = test_train_df.to_pandas() + target_col = "POSITION_LABEL" - # split out covariate predictors, x, from our target column position_label, y. - split_X = test_train_pd_df.drop([target_col], axis=1) - split_y = test_train_pd_df[target_col] + # split out covariate predictors, x, from our target column position_label, y. + split_X = test_train_pd_df.drop([target_col], axis=1) + split_y = test_train_pd_df[target_col] - # Split out our training and test data into proportions - X_train, X_test, y_train, y_test = train_test_split(split_X, split_y, train_size=0.7, random_state=42) - train = [X_train, y_train] - test = [X_test, y_test] + # Split out our training and test data into proportions + X_train, X_test, y_train, y_test = train_test_split(split_X, split_y, train_size=0.7, random_state=42) + train = [X_train, y_train] + test = [X_test, y_test] # now we are only training our one model to deploy - # we are keeping the focus on the workflows and not algorithms for this lab! - model = LogisticRegression() + # we are keeping the focus on the workflows and not algorithms for this lab! + model = LogisticRegression() - # fit the preprocessing pipeline and the model together - model.fit(X_train, y_train) - y_pred = model.predict_proba(X_test)[:,1] - predictions = [round(value) for value in y_pred] - balanced_accuracy = balanced_accuracy_score(y_test, predictions) - - # Save the model to a stage - save_file(session, model, "@MODELSTAGE/driver_position_"+version, "driver_position_"+version+".joblib" ) - logger.info('Model artifact:' + "@MODELSTAGE/driver_position_"+version+".joblib") + # fit the preprocessing pipeline and the model together + model.fit(X_train, y_train) + y_pred = model.predict_proba(X_test)[:,1] + predictions = [round(value) for value in y_pred] + balanced_accuracy = balanced_accuracy_score(y_test, predictions) + + # Save the model to a stage + save_file(session, model, "@MODELSTAGE/driver_position_"+version, "driver_position_"+version+".joblib" ) + logger.info('Model artifact:' + "@MODELSTAGE/driver_position_"+version+".joblib") - # Take our pandas training and testing dataframes and put them back into snowpark dataframes - snowpark_train_df = session.write_pandas(pd.concat(train, axis=1, join='inner'), "train_table", auto_create_table=True, create_temp_table=True) - snowpark_test_df = session.write_pandas(pd.concat(test, axis=1, join='inner'), "test_table", auto_create_table=True, create_temp_table=True) + # Take our pandas training and testing dataframes and put them back into snowpark dataframes + snowpark_train_df = session.write_pandas(pd.concat(train, axis=1, join='inner'), "train_table", auto_create_table=True, create_temp_table=True) + snowpark_test_df = session.write_pandas(pd.concat(test, axis=1, join='inner'), "test_table", auto_create_table=True, create_temp_table=True) - # Union our training and testing data together and add a column indicating train vs test rows - return snowpark_train_df.with_column("DATASET_TYPE", F.lit("train")).union(snowpark_test_df.with_column("DATASET_TYPE", F.lit("test"))) + # Union our training and testing data together and add a column indicating train vs test rows + return snowpark_train_df.with_column("DATASET_TYPE", F.lit("train")).union(snowpark_test_df.with_column("DATASET_TYPE", F.lit("test"))) ``` 3. Execute the following in the command bar: @@ -160,63 +160,63 @@ If you haven’t seen code like this before or use joblib files to save machine def register_udf_for_prediction(p_predictor ,p_session ,p_dbt): - # The prediction udf + # The prediction udf - def predict_position(p_df: T.PandasDataFrame[int, int, int, int, - int, int, int, int, int]) -> T.PandasSeries[int]: - # Snowpark currently does not set the column name in the input dataframe - # The default col names are like 0,1,2,... Hence we need to reset the column - # names to the features that we initially used for training. - p_df.columns = [*FEATURE_COLS] + def predict_position(p_df: T.PandasDataFrame[int, int, int, int, + int, int, int, int, int]) -> T.PandasSeries[int]: + # Snowpark currently does not set the column name in the input dataframe + # The default col names are like 0,1,2,... Hence we need to reset the column + # names to the features that we initially used for training. + p_df.columns = [*FEATURE_COLS] - # Perform prediction. this returns an array object - pred_array = p_predictor.predict(p_df) - # Convert to series - df_predicted = pd.Series(pred_array) - return df_predicted - - # The list of packages that will be used by UDF - udf_packages = p_dbt.config.get('packages') - - predict_position_udf = p_session.udf.register( - predict_position - ,name=f'predict_position' - ,packages = udf_packages - ) - return predict_position_udf + # Perform prediction. this returns an array object + pred_array = p_predictor.predict(p_df) + # Convert to series + df_predicted = pd.Series(pred_array) + return df_predicted + + # The list of packages that will be used by UDF + udf_packages = p_dbt.config.get('packages') + + predict_position_udf = p_session.udf.register( + predict_position + ,name=f'predict_position' + ,packages = udf_packages + ) + return predict_position_udf def download_models_and_libs_from_stage(p_session): - p_session.file.get(f'@{DB_STAGE}/{model_file_path}/{model_file_packaged}', DOWNLOAD_DIR) + p_session.file.get(f'@{DB_STAGE}/{model_file_path}/{model_file_packaged}', DOWNLOAD_DIR) def load_model(p_session): - # Load the model and initialize the predictor - model_fl_path = os.path.join(DOWNLOAD_DIR, model_file_packaged) - predictor = joblib.load(model_fl_path) - return predictor + # Load the model and initialize the predictor + model_fl_path = os.path.join(DOWNLOAD_DIR, model_file_packaged) + predictor = joblib.load(model_fl_path) + return predictor # ------------------------------- def model(dbt, session): - dbt.config( - packages = ['snowflake-snowpark-python' ,'scipy','scikit-learn' ,'pandas' ,'numpy'], - materialized = "table", - tags = "predict" - ) - session._use_scoped_temp_objects = False - download_models_and_libs_from_stage(session) - predictor = load_model(session) - predict_position_udf = register_udf_for_prediction(predictor, session ,dbt) + dbt.config( + packages = ['snowflake-snowpark-python' ,'scipy','scikit-learn' ,'pandas' ,'numpy'], + materialized = "table", + tags = "predict" + ) + session._use_scoped_temp_objects = False + download_models_and_libs_from_stage(session) + predictor = load_model(session) + predict_position_udf = register_udf_for_prediction(predictor, session ,dbt) - # Retrieve the data, and perform the prediction - hold_out_df = (dbt.ref("hold_out_dataset_for_prediction") - .select(*FEATURE_COLS) - ) - - # Perform prediction. - new_predictions_df = hold_out_df.withColumn("position_predicted" - ,predict_position_udf(*FEATURE_COLS) - ) + # Retrieve the data, and perform the prediction + hold_out_df = (dbt.ref("hold_out_dataset_for_prediction") + .select(*FEATURE_COLS) + ) + + # Perform prediction. + new_predictions_df = hold_out_df.withColumn("position_predicted" + ,predict_position_udf(*FEATURE_COLS) + ) - return new_predictions_df + return new_predictions_df ``` 2. Execute the following in the command bar: ```bash @@ -248,4 +248,4 @@ If you haven’t seen code like this before or use joblib files to save machine ```sql select * from {{ ref('predict_position') }} order by position_predicted ``` -7. We can see that we created predictions in our final dataset, we are ready to move on to testing! \ No newline at end of file +7. We can see that we created predictions in our final dataset, we are ready to move on to testing! diff --git a/website/docs/guides/legacy/best-practices.md b/website/docs/guides/legacy/best-practices.md index 809e837e711..0aad86dd2bc 100644 --- a/website/docs/guides/legacy/best-practices.md +++ b/website/docs/guides/legacy/best-practices.md @@ -16,7 +16,7 @@ We've codified our best practices in Git, in our [Git guide](https://github.com/ ::: ### Use separate development and production environments -dbt makes it easy to maintain separate production and development environments through the use of targets within a profile. We recommend using a `dev` target when running dbt from your command line and only running against a `prod` target when running from a production deployment. You can read more [about managing environments here](/docs/collaborate/environments/environments-in-dbt). +dbt makes it easy to maintain separate production and development environments through the use of targets within a profile. We recommend using a `dev` target when running dbt from your command line and only running against a `prod` target when running from a production deployment. You can read more [about managing environments here](/docs/environments-in-dbt). ### Use a style guide for your project SQL styles, field naming conventions, and other rules for your dbt project should be codified, especially on projects where multiple dbt users are writing code. diff --git a/website/docs/guides/migration/versions/01-upgrading-to-v1.6.md b/website/docs/guides/migration/versions/01-upgrading-to-v1.6.md index 4f7ddc2fd8a..ab831e8b760 100644 --- a/website/docs/guides/migration/versions/01-upgrading-to-v1.6.md +++ b/website/docs/guides/migration/versions/01-upgrading-to-v1.6.md @@ -27,8 +27,17 @@ dbt Labs is committed to providing backward compatibility for all versions 1.x, ## New and changed documentation -**Coming Soon** +[`dbt retry`](/reference/commands/retry) is a new command that executes the previously run command from the point of failure. This convenient command enables you to continue a failed command without rebuilding all upstream dependencies. + +**Materialized view** support (for model and project configs) has been added for three data warehouses: + - [Bigquery](/reference/resource-configs/bigquery-configs#materialized-view) + - [Postgres](/reference/resource-configs/postgres-configs#materialized-view) + - [Redshift](/reference/resource-configs/redshift-configs#materialized-view) + +[**Namespacing:**](/faqs/Models/unique-model-names) Model names can be duplicated across different namespaces (packages/projects), so long as they are unique within each package/project. We strongly encourage using [two-argument `ref`](/reference/dbt-jinja-functions/ref#two-argument-variant) when referencing a model from a different package/project. ### Quick hits -**Coming Soon** \ No newline at end of file +More consistency and flexibility around packages! Resources defined in a package will respect variable and global macro definitions within the scope of that package. +- `vars` defined in a package's `dbt_project.yml` are now available in the resolution order when compiling nodes in that package, though CLI `--vars` and the root project's `vars` will still take precedence. See ["Variable Precedence"](/docs/build/project-variables#variable-precedence) for details. +- `generate_x_name` macros (defining custom rules for database, schema, alias naming) follow the same pattern as other "global" macros for package-scoped overrides. See [macro dispatch](/reference/dbt-jinja-functions/dispatch) for an overview of the patterns that are possible. diff --git a/website/docs/guides/orchestration/airflow-and-dbt-cloud/1-airflow-and-dbt-cloud.md b/website/docs/guides/orchestration/airflow-and-dbt-cloud/1-airflow-and-dbt-cloud.md index a9adad9e4af..a377554c317 100644 --- a/website/docs/guides/orchestration/airflow-and-dbt-cloud/1-airflow-and-dbt-cloud.md +++ b/website/docs/guides/orchestration/airflow-and-dbt-cloud/1-airflow-and-dbt-cloud.md @@ -24,7 +24,7 @@ This has served as a bridge until the fabled Astronomer + dbt Labs-built dbt Clo There are many different permutations of this over time: - [Custom Python Scripts](https://github.com/sungchun12/airflow-dbt-cloud/blob/main/archive/dbt_cloud_example.py): This is an airflow DAG based on custom python API utilities [here](https://github.com/sungchun12/airflow-dbt-cloud/blob/main/archive/dbt_cloud_utils.py) -- [Make API requests directly through the BashOperator based on the docs](https://docs.getdbt.com/dbt-cloud/api-v2#operation/triggerRun): You can make cURL requests to invoke dbt Cloud to do what you want +- [Make API requests directly through the BashOperator based on the docs](https://docs.getdbt.com/dbt-cloud/api-v2-legacy#operation/triggerRun): You can make cURL requests to invoke dbt Cloud to do what you want - [Other ways to run dbt in airflow](/docs/deploy/deployments#airflow): Official dbt Docs on how teams are running dbt in airflow ## This guide's process diff --git a/website/docs/guides/orchestration/webhooks/zapier-new-cloud-job.md b/website/docs/guides/orchestration/webhooks/zapier-new-cloud-job.md index d281f12f494..75897c30150 100644 --- a/website/docs/guides/orchestration/webhooks/zapier-new-cloud-job.md +++ b/website/docs/guides/orchestration/webhooks/zapier-new-cloud-job.md @@ -47,7 +47,7 @@ In the **Set up action** area, add two items to **Input Data**: `raw_body` and ` In the **Code** field, paste the following code, replacing `YOUR_SECRET_HERE` with the secret you created when setting up the Storage by Zapier integration. Remember that this is not your dbt Cloud secret. -The code below will validate the authenticity of the request, then send a [`trigger run` command to the dbt Cloud API](https://docs.getdbt.com/dbt-cloud/api-v2#tag/Jobs/operation/triggerRun) for the given job ID. +The code below will validate the authenticity of the request, then send a [`trigger run` command to the dbt Cloud API](https://docs.getdbt.com/dbt-cloud/api-v2-legacy#tag/Jobs/operation/triggerRun) for the given job ID. ```python import hashlib diff --git a/website/docs/reference/artifacts/dbt-artifacts.md b/website/docs/reference/artifacts/dbt-artifacts.md index 7cd391b10fa..2fbcc329484 100644 --- a/website/docs/reference/artifacts/dbt-artifacts.md +++ b/website/docs/reference/artifacts/dbt-artifacts.md @@ -38,7 +38,7 @@ All artifacts produced by dbt include a `metadata` dictionary with these propert - [`invocation_id`](/reference/dbt-jinja-functions/invocation_id): Unique identifier for this dbt invocation In the manifest, the `metadata` may also include: -- `send_anonymous_usage_stats`: Whether this invocation sent [anonymous usage statistics](https://docs.getdbt.com/reference/profiles.yml/#send_anonymous_usage_stats) while executing. +- `send_anonymous_usage_stats`: Whether this invocation sent [anonymous usage statistics](/reference/global-configs/usage-stats) while executing. - `project_id`: Project identifier, hashed from `project_name`, sent with anonymous usage stats if enabled. - `user_id`: User identifier, stored by default in `~/dbt/.user.yml`, sent with anonymous usage stats if enabled. diff --git a/website/docs/reference/commands/clean.md b/website/docs/reference/commands/clean.md index d3a373dbb26..0185b701740 100644 --- a/website/docs/reference/commands/clean.md +++ b/website/docs/reference/commands/clean.md @@ -10,6 +10,6 @@ id: "clean" -`dbt clean` is a utility function that deletes all folders specified in the `clean-targets` list specified in `dbt_project.yml`. You can use this to delete the `dbt_packages` and `target` directories. +`dbt clean` is a utility function that deletes all folders specified in the [`clean-targets`](/reference/project-configs/clean-targets) list specified in `dbt_project.yml`. You can use this to delete the `dbt_packages` and `target` directories. To avoid complex permissions issues and potentially deleting crucial aspects of the remote file system without access to fix them, this command does not work when interfacing with the RPC server that powers the dbt Cloud IDE. Instead, when working in dbt Cloud, the `dbt deps` command cleans before it installs packages automatically. The `target` folder can be manually deleted from the sidebar file tree if needed. diff --git a/website/docs/reference/commands/retry.md b/website/docs/reference/commands/retry.md new file mode 100644 index 00000000000..0c010ede2c1 --- /dev/null +++ b/website/docs/reference/commands/retry.md @@ -0,0 +1,22 @@ +--- +title: "About dbt retry command" +sidebar_label: "retry" +id: "retry" +--- + +`dbt retry` re-executes the last `dbt` command from the node point of failure. If the previously executed `dbt` command was successful, `retry` will finish as `no operation`. + +Retry works with the following commands: + +- [`build`](/reference/commands/build) +- [`compile`](/reference/commands/compile) +- [`seed`](/reference/commands/seed) +- [`snapshot`](/reference/commands/build) +- [`test`](/reference/commands/test) +- [`run`](/reference/commands/run) +- [`run-operation`](/reference/commands/run-operation) + +`dbt retry` references [run_results.json](/reference/artifacts/run-results-json) to determine where to start. Executing `dbt retry` without correcting the previous failures will garner results. + +`dbt retry` reuses the [selectors](/reference/node-selection/yaml-selectors) from the previously executed command. + diff --git a/website/docs/reference/dbt-commands.md b/website/docs/reference/dbt-commands.md index 3ad0f1b45a7..116618e29e5 100644 --- a/website/docs/reference/dbt-commands.md +++ b/website/docs/reference/dbt-commands.md @@ -46,6 +46,7 @@ Use the following dbt commands in the [CLI](/docs/core/about-the-cli) and use th - [init](/reference/commands/init): initializes a new dbt project - [list](/reference/commands/list): lists resources defined in a dbt project - [parse](/reference/commands/parse): parses a project and writes detailed timing info +- [retry](/reference/commands/retry): retry the last run `dbt` command from the point of failure (requires dbt 1.6 or higher) - [rpc](/reference/commands/rpc): runs an RPC server that clients can submit queries to - [run](/reference/commands/run): runs the models in a project - [run-operation](/reference/commands/run-operation): invoke a macro, including running arbitrary maintenance SQL against the database diff --git a/website/docs/reference/dbt-jinja-functions/dispatch.md b/website/docs/reference/dbt-jinja-functions/dispatch.md index 60938361960..d615bbdb430 100644 --- a/website/docs/reference/dbt-jinja-functions/dispatch.md +++ b/website/docs/reference/dbt-jinja-functions/dispatch.md @@ -167,6 +167,24 @@ dispatch: +### Managing different global overrides across packages + +You can override global behaviors in different ways for each project that is installed as a package. This holds true for all global macros: `generate_schema_name`, `create_table_as`, etc. When parsing or running a resource defined in a package, the definition of the global macro within that package takes precedence over the definition in the root project because it's more specific to those resources. + +By combining package-level overrides and `dispatch`, it is possible to achieve three different patterns: + +1. **Package always wins** — As the developer of dbt models in a project that will be deployed elsewhere as a package, You want full control over the macros used to define & materialize my models. Your macros should always take precedence for your models, and there should not be any way to override them. + + - _Mechanism:_ Each project/package fully overrides the macro by its name, for example, `generate_schema_name` or `create_table_as`. Do not use dispatch. + +2. **Conditional application (root project wins)** — As the maintainer of one dbt project in a mesh of multiple, your team wants conditional application of these rules. When running your project standalone (in development), you want to apply custom behavior; but when installed as a package and deployed alongside several other projects (in production), you want the root-level project's rules to apply. + + - _Mechanism:_ Each package implements its "local" override by registering a candidate for dispatch with an adapter prefix, for example, `default__generate_schema_name` or `default__create_table_as`. The root-level project can then register its own candidate for dispatch (`default__generate_schema_name`), winning the default search order or by explicitly overriding the macro by name (`generate_schema_name`). + +3. **Same rules everywhere all the time** — As a member of the data platform team responsible for consistency across teams at your organization, you want to create a "macro package" that every team can install & use. + + - _Mechanism:_ Create a standalone package of candidate macros only, for example, `default__generate_schema_name` or `default__create_table_as`. Add a [project-level `dispatch` configuration](/reference/project-configs/dispatch-config) in every project's `dbt_project.yml`. + ## For adapter plugin maintainers Most packages were initially designed to work on the four original dbt adapters. By using the `dispatch` macro and project config, it is possible to "shim" existing packages to work on other adapters, by way of third-party compatibility packages. diff --git a/website/docs/reference/dbt-jinja-functions/ref.md b/website/docs/reference/dbt-jinja-functions/ref.md index 9233edb3595..c500bb934ab 100644 --- a/website/docs/reference/dbt-jinja-functions/ref.md +++ b/website/docs/reference/dbt-jinja-functions/ref.md @@ -42,7 +42,7 @@ The `{{ ref }}` function returns a `Relation` object that has the same `table`, The `ref` function supports an optional keyword argument - `version` (or `v`). When a version argument is provided to the `ref` function, dbt returns to the `Relation` object corresponding to the specified version of the referenced model. -This functionality is useful when referencing versioned models that make breaking changes by creating new versions, but guaruntee no breaking changes to existing versions of the model. +This functionality is useful when referencing versioned models that make breaking changes by creating new versions, but guarantees no breaking changes to existing versions of the model. If the `version` argument is not supplied to a `ref` of a versioned model, the latest version is. This has the benefit of automatically incorporating the latest changes of a referenced model, but there is a risk of incorporating breaking changes. @@ -73,13 +73,21 @@ select * from {{ ref('model_name') }} ### Two-argument variant -There is also a two-argument variant of the `ref` function. With this variant, you can pass both a package name and model name to `ref` to avoid ambiguity. This functionality is not commonly required for typical dbt usage. +There is also a two-argument variant of the `ref` function. With this variant, you can pass both a namespace (project or package) and model name to `ref` to avoid ambiguity. ```sql -select * from {{ ref('package_name', 'model_name') }} +select * from {{ ref('project_or_package', 'model_name') }} ``` -**Note:** The `package_name` should only include the name of the package, not the maintainer. For example, if you use the [`fivetran/stripe`](https://hub.getdbt.com/fivetran/stripe/latest/) package, type `stripe` in that argument, and not `fivetran/stripe`. +We recommend using two-argument `ref` any time you are referencing a model defined in a different package or project. While not required in all cases, it's more explicit for you, for dbt, and for future readers of your code. + + + +We especially recommend using two-argument `ref` to avoid ambiguity, in cases where a model name is duplicated across multiple projects or installed packages. If you use one-argument `ref` (just the `model_name`), dbt will look for a model by that name in the same namespace (package or project); if it finds none, it will raise an error. + + + +**Note:** The `project_or_package` should match the `name` of the project/package, as defined in its `dbt_project.yml`. This might be different from the name of the repository. It never includes the repository's organization name. For example, if you use the [`fivetran/stripe`](https://hub.getdbt.com/fivetran/stripe/latest/) package, the package name is `stripe`, not `fivetran/stripe`. ### Forcing Dependencies diff --git a/website/docs/reference/global-configs/parsing.md b/website/docs/reference/global-configs/parsing.md index c1084e9707d..b8fbf432652 100644 --- a/website/docs/reference/global-configs/parsing.md +++ b/website/docs/reference/global-configs/parsing.md @@ -6,7 +6,7 @@ sidebar: "Parsing" ### Partial Parsing -The `PARTIAL_PARSE` config can turn partial parsing on or off in your project. See [the docs on parsing](parsing#partial-parsing) for more details. +The `PARTIAL_PARSE` config can turn partial parsing on or off in your project. See [the docs on parsing](/reference/parsing#partial-parsing) for more details. @@ -29,7 +29,7 @@ dbt --no-partial-parse run ### Static parser -The `STATIC_PARSER` config can enable or disable use of the static parser. See [the docs on parsing](parsing#static-parser) for more details. +The `STATIC_PARSER` config can enable or disable the use of the static parser. See [the docs on parsing](/reference/parsing#static-parser) for more details. @@ -44,7 +44,7 @@ config: ### Experimental parser -With the `USE_EXPERIMENTAL_PARSER` config, you can opt into the latest and greatest experimental version of the static parser, which is still being sampled for 100% correctness. See [the docs on parsing](parsing#experimental-parser) for more details. +With the `USE_EXPERIMENTAL_PARSER` config, you can opt into the latest and greatest experimental version of the static parser, which is still being sampled for 100% correctness. See [the docs on parsing](/reference/parsing#experimental-parser) for more details. @@ -55,4 +55,4 @@ config: ``` - \ No newline at end of file + diff --git a/website/docs/reference/global-configs/warnings.md b/website/docs/reference/global-configs/warnings.md index 084a1f283a9..967f2209d44 100644 --- a/website/docs/reference/global-configs/warnings.md +++ b/website/docs/reference/global-configs/warnings.md @@ -32,18 +32,18 @@ dbt --warn-error-options '{"include": "all"}' run ``` ```text -dbt --warn-error-options '{"include": "all", "exclude":[NoNodesForSelectionCriteria]}' run +dbt --warn-error-options '{"include": "all", "exclude": ["NoNodesForSelectionCriteria"]}' run ... ``` ```text -dbt --warn-error-options '{"include": [NoNodesForSelectionCriteria]}' run +dbt --warn-error-options '{"include": ["NoNodesForSelectionCriteria"]}' run ... ``` ```text -dbt_WARN_ERROR_OPTIONS='{"include": [NoNodesForSelectionCriteria]}' dbt run +DBT_WARN_ERROR_OPTIONS='{"include": ["NoNodesForSelectionCriteria"]}' dbt run ... ``` @@ -60,4 +60,4 @@ config: ``` - \ No newline at end of file + diff --git a/website/docs/reference/model-properties.md b/website/docs/reference/model-properties.md index b88f9fd6b98..730432c88af 100644 --- a/website/docs/reference/model-properties.md +++ b/website/docs/reference/model-properties.md @@ -17,6 +17,7 @@ models: [docs](/reference/resource-configs/docs): show: true | false [latest_version](/reference/resource-properties/latest_version): + [deprecation_date](/reference/resource-properties/deprecation_date): [access](/reference/resource-properties/access): private | protected | public [config](/reference/resource-properties/config): [](/reference/model-configs): diff --git a/website/docs/reference/node-selection/defer.md b/website/docs/reference/node-selection/defer.md index 37d9926f7f7..a6ef6261cf1 100644 --- a/website/docs/reference/node-selection/defer.md +++ b/website/docs/reference/node-selection/defer.md @@ -11,7 +11,7 @@ title: "Defer" Deferral is a powerful, complex feature that enables compelling workflows. As the use cases for `--defer` evolve, dbt Labs might make enhancements to the feature, but commit to providing backward compatibility for supported versions of dbt Core. For details, see [dbt#5095](https://github.com/dbt-labs/dbt-core/discussions/5095). -Defer is a powerful feature that makes it possible to run a subset of models or tests in a [sandbox environment](docs/collaborate/environments/environments-in-dbt) without having to first build their upstream parents. This can save time and computational resources when you want to test a small number of models in a large project. +Defer is a powerful feature that makes it possible to run a subset of models or tests in a [sandbox environment](/docs/environments-in-dbt) without having to first build their upstream parents. This can save time and computational resources when you want to test a small number of models in a large project. Defer requires that a manifest from a previous dbt invocation be passed to the `--state` flag or env var. Together with the `state:` selection method, these features enable "Slim CI". Read more about [state](/reference/node-selection/syntax#about-node-selection). ### Usage diff --git a/website/docs/reference/node-selection/methods.md b/website/docs/reference/node-selection/methods.md index d06abc4333c..ff86d60c06a 100644 --- a/website/docs/reference/node-selection/methods.md +++ b/website/docs/reference/node-selection/methods.md @@ -46,6 +46,13 @@ The `source` method is used to select models that select from a specified [sourc $ dbt run --select source:snowplow+ # run all models that select from Snowplow sources ``` +### The "resource_type" method +Use the `resource_type` method to select nodes of a particular type (`model`, `source`, `exposure`, etc). This is similar to the `--resource-type` flag used by the [`dbt ls` command](/reference/commands/list). + + ```bash + $ dbt build --select resource_type:exposure # build all resources upstream of exposures + $ dbt list --select resource_type:test # list all tests in your project + ``` ### The "path" method The `path` method is used to select models/sources defined at or under a specific path. @@ -72,6 +79,7 @@ The `file` or `fqn` method can be used to select a model by its filename, includ ```bash # These are equivalent +dbt run --select file:some_model.sql dbt run --select some_model.sql dbt run --select some_model dbt run --select fqn:some_model # fqn is an abbreviation for "fully qualified name" @@ -294,9 +302,7 @@ Supported in v1.5 or newer. -Supported in v1.5 or newer. - -The `group` method is used to select models defined within a group. +The `group` method is used to select models defined within a [group](/reference/resource-configs/group). ```bash @@ -305,7 +311,7 @@ The `group` method is used to select models defined within a group. -### The "version" method +### The "access" method @@ -315,34 +321,34 @@ Supported in v1.5 or newer. -The `version` method selects [versioned models](/docs/collaborate/govern/model-versions) based on their [version identifier](/reference/resource-properties/versions) and [latest version](/reference/resource-properties/latest_version). +The `access` method selects models based on their [access](/reference/resource-properties/access) property. ```bash -dbt list --select version:latest # only 'latest' versions -dbt list --select version:prerelease # versions newer than the 'latest' version -dbt list --select version:old # versions older than the 'latest' version - -dbt list --select version:none # models that are *not* versioned +dbt list --select access:public # list all public models +dbt list --select access:private # list all private models +dbt list --select access:protected # list all protected models ``` -### The "access" method +### The "version" method - + -Supported in v1.6 or newer. +Supported in v1.5 or newer. - + -The `access` method selects models based on their [access](/reference/resource-properties/access) property. +The `version` method selects [versioned models](/docs/collaborate/govern/model-versions) based on their [version identifier](/reference/resource-properties/versions) and [latest version](/reference/resource-properties/latest_version). ```bash -dbt list --select access:public # list all public models -dbt list --select access:private # list all private models -dbt list --select access:protected # list all protected models +dbt list --select version:latest # only 'latest' versions +dbt list --select version:prerelease # versions newer than the 'latest' version +dbt list --select version:old # versions older than the 'latest' version + +dbt list --select version:none # models that are *not* versioned ``` - \ No newline at end of file + diff --git a/website/docs/reference/project-configs/clean-targets.md b/website/docs/reference/project-configs/clean-targets.md index 98441f7e196..119630b00b1 100644 --- a/website/docs/reference/project-configs/clean-targets.md +++ b/website/docs/reference/project-configs/clean-targets.md @@ -27,7 +27,7 @@ If this configuration is not included in your `dbt_project.yml` file, the `clean ## Examples ### Remove packages and compiled files as part of `dbt clean` :::info -This is our preferred configuration +This is our preferred configuration, but is not the default. ::: To remove packages as well as compiled files, include the value of your [packages-install-path](/reference/project-configs/packages-install-path) configuration in your `clean-targets` configuration. diff --git a/website/docs/reference/resource-configs/bigquery-configs.md b/website/docs/reference/resource-configs/bigquery-configs.md index e7fd7e911ba..fcebb0befdd 100644 --- a/website/docs/reference/resource-configs/bigquery-configs.md +++ b/website/docs/reference/resource-configs/bigquery-configs.md @@ -763,3 +763,44 @@ Views with this configuration will be able to select from objects in `project_1. #### Limitations The `grant_access_to` config is not thread-safe when multiple views need to be authorized for the same dataset. The initial `dbt run` operation after a new `grant_access_to` config is added should therefore be executed in a single thread. Subsequent runs using the same configuration will not attempt to re-apply existing access grants, and can make use of multiple threads. + +## Materialized view + +The BigQuery adapter supports [materialized views](https://cloud.google.com/bigquery/docs/materialized-views-intro) and refreshes them for every subsequent `dbt run` you execute. For more information, see [Refresh Materialized Views](https://cloud.google.com/bigquery/docs/materialized-views-manage#refresh) in the Google docs. + +Materialized views support the optional configuration `on_configuration_change` with the following values: +- `apply` (default) — attempts to update the existing database object if possible, avoiding a complete rebuild. The following changes can be applied without the need to rebuild the materialized view: + - enable_refresh + - refresh_interval_minutes + - max_staleness +- `skip` — allows runs to continue while also providing a warning that the model was skipped +- `fail` — forces runs to fail if a change is detected in a materialized view + +You can create a materialized view by editing _one_ of these files: +- the SQL file for your model +- the `dbt_project.yml` configuration file + +The following examples create a materialized view: + + + +```sql +{{ + config( + materialized = 'materialized_view', + on_configuration_change = 'apply', + ) +}} +``` + + + + + + +```yaml +models: + path: + materialized: materialized_view +``` + \ No newline at end of file diff --git a/website/docs/reference/resource-configs/contract.md b/website/docs/reference/resource-configs/contract.md index 47913214d50..6c7d945032a 100644 --- a/website/docs/reference/resource-configs/contract.md +++ b/website/docs/reference/resource-configs/contract.md @@ -25,7 +25,9 @@ This is to ensure that the people querying your model downstream—both inside a The `data_type` defined in your YAML file must match a data type your data platform recognizes. dbt does not do any type aliasing itself. If your data platform recognizes both `int` and `integer` as corresponding to the same type, then they will return a match. -That said, when dbt is comparing data types, it will not compare granular details such as size, precision, or scale. We don't think you should sweat the difference between `varchar(256)` and `varchar(257)`, because it doesn't really affect the experience of downstream queriers. If you need a more-precise assertion, it's always possible to accomplish by [writing or using a custom test](/guides/best-practices/writing-custom-generic-tests). +When dbt is comparing data types, it will not compare granular details such as size, precision, or scale. We don't think you should sweat the difference between `varchar(256)` and `varchar(257)`, because it doesn't really affect the experience of downstream queriers. If you need a more-precise assertion, it's always possible to accomplish by [writing or using a custom test](/guides/best-practices/writing-custom-generic-tests). + +That said, on certain data platforms, you will need to specify a varchar size or numeric scale if you do not want it to revert to the default. This is most relevant for the `numeric` type on Snowflake, which defaults to a precision of 38 and a scale of 0 (zero digits after the decimal, such as rounded to an integer). To avoid this implicit coercion, specify your `data_type` with a nonzero scale, like `numeric(38, 6)`. ## Example diff --git a/website/docs/reference/resource-configs/postgres-configs.md b/website/docs/reference/resource-configs/postgres-configs.md index 012f6c01ce7..cee1a9861d4 100644 --- a/website/docs/reference/resource-configs/postgres-configs.md +++ b/website/docs/reference/resource-configs/postgres-configs.md @@ -4,6 +4,14 @@ description: "Postgres Configurations - Read this in-depth guide to learn about id: "postgres-configs" --- +## Incremental materialization strategies + +In dbt-postgres, the following incremental materialization strategies are supported: + +- `append` (default) +- `merge` +- `delete+insert` + ## Performance Optimizations @@ -96,3 +104,44 @@ models: ``` + +## Materialized view + +The Postgres adapter supports [materialized views](https://www.postgresql.org/docs/current/rules-materializedviews.html) and refreshes them for every subsequent `dbt run` you execute. For more information, see [Refresh Materialized Views](https://www.postgresql.org/docs/15/sql-refreshmaterializedview.html) in the Postgres docs. + +Materialized views support the optional configuration `on_configuration_change` with the following values: +- `apply` (default) — attempts to update the existing database object if possible, avoiding a complete rebuild. The following index action can be applied without the need to rebuild the materialized view: + - Added + - Dropped + - Updated +- `skip` — allows runs to continue while also providing a warning that the model was skipped +- `fail` — forces runs to fail if a change is detected in a materialized view + +You can create a materialized view by editing _one_ of these files: +- the SQL file for your model +- the `dbt_project.yml` configuration file + +The following examples create a materialized view: + + + +```sql +{{ + config( + materialized = 'materialized_view', + on_configuration_change = 'apply', + ) +}} +``` + + + + + + +```yaml +models: + path: + materialized: materialized_view +``` + \ No newline at end of file diff --git a/website/docs/reference/resource-configs/redshift-configs.md b/website/docs/reference/resource-configs/redshift-configs.md index 6e75b975c74..00da3130f75 100644 --- a/website/docs/reference/resource-configs/redshift-configs.md +++ b/website/docs/reference/resource-configs/redshift-configs.md @@ -10,7 +10,17 @@ To-do: - think about whether some of these should be outside of models ---> -## Performance Optimizations +## Incremental materialization strategies + +In dbt-redshift, the following incremental materialization strategies are supported: + +- `append` (default) +- `merge` +- `delete+insert` + +All of these strategies are inheirited via from dbt-postgres. + +## Performance optimizations ### Using sortkey and distkey @@ -85,3 +95,41 @@ models: ``` + +## Materialized view + +The Redshift adapter supports [materialized views](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-overview.html) and refreshes them for every subsequent `dbt run` that you execute. For more information, see [Refresh Materialized Views](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-refresh.html) in the Redshift docs. + +Materialized views support the optional configuration `on_configuration_change` with the following values: +- `apply` (default) — attempts to update the existing database object if possible, avoiding a complete rebuild. The `auto_refresh` action can applied without the need to rebuild the materialized view. +- `skip` — allows runs to continue while also providing a warning that the model was skipped +- `fail` — forces runs to fail if a change is detected in a materialized view + +You can create a materialized view by editing _one_ of these files: +- the SQL file for your model +- the `dbt_project.yml` configuration file + +The following examples create a materialized view: + + + +```sql +{{ + config( + materialized = 'materialized_view', + on_configuration_change = 'apply', + ) +}} +``` + + + + + + +```yaml +models: + path: + materialized: materialized_view +``` + \ No newline at end of file diff --git a/website/docs/reference/resource-configs/snapshot_name.md b/website/docs/reference/resource-configs/snapshot_name.md index 76cd8ed8563..bb4826a116b 100644 --- a/website/docs/reference/resource-configs/snapshot_name.md +++ b/website/docs/reference/resource-configs/snapshot_name.md @@ -17,7 +17,7 @@ description: "Snapshot-name - Read this in-depth guide to learn about configurat The name of a snapshot, as defined in the `{% snapshot %}` block header. This name is used when selecting from a snapshot using the [`ref` function](/reference/dbt-jinja-functions/ref) -This name must not conflict with any other snapshot names, or any model names. +This name must not conflict with the name of any other "refable" resource (models, seeds, other snapshots) defined in this project or package. The name does not need to match the file name. As a result, snapshot filenames do not need to be unique. diff --git a/website/docs/reference/resource-configs/where.md b/website/docs/reference/resource-configs/where.md index 231d7737567..3ccd96f2f35 100644 --- a/website/docs/reference/resource-configs/where.md +++ b/website/docs/reference/resource-configs/where.md @@ -154,7 +154,7 @@ models: tests: - unique: config: - where: "date_column > __last_three_days__" # placeholder string for static config + where: "date_column > __three_days_ago__" # placeholder string for static config ``` @@ -164,7 +164,7 @@ models: ```sql {% macro get_where_subquery(relation) -%} {% set where = config.get('where', '') %} - {% if "__three_days_ago__" in where %} + {% if where and "__three_days_ago__" in where %} {# replace placeholder string with result of custom macro #} {% set three_days_ago = dbt.dateadd('day', -3, current_timestamp()) %} {% set where = where | replace("__three_days_ago__", three_days_ago) %} diff --git a/website/docs/reference/resource-properties/constraints.md b/website/docs/reference/resource-properties/constraints.md index 677aaa38b52..b25893729e5 100644 --- a/website/docs/reference/resource-properties/constraints.md +++ b/website/docs/reference/resource-properties/constraints.md @@ -64,7 +64,7 @@ In transactional databases, it is possible to define "constraints" on the allowe Most analytical data platforms support and enforce a `not null` constraint, but they either do not support or do not enforce the rest. It is sometimes still desirable to add an "informational" constraint, knowing it is _not_ enforced, for the purpose of integrating with legacy data catalog or entity-relation diagram tools ([dbt-core#3295](https://github.com/dbt-labs/dbt-core/issues/3295)). -To that end, there are two optional fields you can specify on any constraint: +To that end, there are two optional fields you can specify on any filter: - `warn_unenforced: False` to skip warning on constraints that are supported, but not enforced, by this data platform. The constraint will be included in templated DDL. - `warn_unsupported: False` to skip warning on constraints that aren't supported by this data platform, and therefore won't be included in templated DDL. diff --git a/website/docs/reference/resource-properties/deprecation_date.md b/website/docs/reference/resource-properties/deprecation_date.md new file mode 100644 index 00000000000..9fe9e2e1098 --- /dev/null +++ b/website/docs/reference/resource-properties/deprecation_date.md @@ -0,0 +1,32 @@ +--- +resource_types: [models] +datatype: deprecation_date +required: no +--- + + + +```yml +models: + - name: my_model + description: deprecated + deprecation_date: 1999-01-01 00:00:00.00+00:00 +``` + + + + +```yml +version: 2 +models: + - name: my_model + description: deprecating in the future + deprecation_date: 2999-01-01 00:00:00.00+00:00 +``` + + + +## Definition + +The deprecation date of the model in YAML DateTime format. + diff --git a/website/docs/reference/resource-properties/tests.md b/website/docs/reference/resource-properties/tests.md index 852796e05e7..f25e5306542 100644 --- a/website/docs/reference/resource-properties/tests.md +++ b/website/docs/reference/resource-properties/tests.md @@ -3,6 +3,7 @@ title: "About tests property" sidebar_label: "tests" resource_types: all datatype: test +keywords: [test, tests, custom tests, custom test name, test name] --- @@ -277,7 +278,7 @@ models: -### Define and use a custom generic test +### Use custom generic test If you've defined your own custom generic test, you can use that as the `test_name`: @@ -301,7 +302,7 @@ Check out the guide on writing a [custom generic test](/guides/best-practices/wr -### Define a custom name for one test +### Custom test name By default, dbt will synthesize a name for your generic test by concatenating: - test name (`not_null`, `unique`, etc) @@ -351,7 +352,7 @@ $ dbt test --select unexpected_order_status_today 12:43:41 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1 ``` -A test's name must be unique for all tests defined on a given model-column combination. If you give the same name to tests defined on several different columns, or across several different models, then `dbt test --select ` will select them all. +A test's name must be unique for all tests defined on a given model-column combination. If you give the same name to tests defined on several different columns, or across several different models, then `dbt test --select ` will select them all. **When might you need this?** In cases where you have defined the same test twice, with only a difference in configuration, dbt will consider these tests to be duplicates: @@ -390,7 +391,7 @@ Compilation Error - test.testy.accepted_values_orders_status__placed__shipped__completed__returned.69dce9e5d5 (models/one_file.yml) ``` -By providing a custom name, you enable dbt to disambiguate them: +By providing a custom name, you help dbt differentiate tests: @@ -435,7 +436,7 @@ $ dbt test 12:48:04 Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2 ``` -**If using [`store_failures`](/reference/resource-configs/store_failures):** dbt uses each test's name as the name of the table in which to store any failing records. If you have defined a custom name for one test, that custom name will also be used for its table of failures. You may optionally configure an [`alias`](/reference/resource-configs/alias) for the test, in order to separately control both the name of the test (for metadata) and the name of its database table (for storing failures). +**If using [`store_failures`](/reference/resource-configs/store_failures):** dbt uses each test's name as the name of the table in which to store any failing records. If you have defined a custom name for one test, that custom name will also be used for its table of failures. You may optionally configure an [`alias`](/reference/resource-configs/alias) for the test, to separately control both the name of the test (for metadata) and the name of its database table (for storing failures). @@ -443,7 +444,7 @@ $ dbt test ### Alternative format for defining tests -When defining a generic test with a number of arguments and configurations, the YAML can look and feel unwieldy. If you find it easier, you can define the same test properties as top-level keys of a single dictionary, by providing the test name as `test_name` instead. It's totally up to you. +When defining a generic test with several arguments and configurations, the YAML can look and feel unwieldy. If you find it easier, you can define the same test properties as top-level keys of a single dictionary, by providing the test name as `test_name` instead. It's totally up to you. This example is identical to the one above: diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index dc8a81259b3..59f4e3c254e 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -46,10 +46,12 @@ snapshots: - +**Note:** Required snapshot properties _will not_ work when defined in `config` YAML blocks. We recommend that you define these in `dbt_project.yml` or a `config()` block within the snapshot `.sql` file. -**Note:** Required snapshot properties may not work when defined in `config` YAML blocks. We recommend that you define these in `dbt_project.yml` or a `config()` block within the snapshot `.sql` file. + diff --git a/website/sidebars.js b/website/sidebars.js index 657252b8409..3198d95e0f3 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -21,6 +21,7 @@ const sidebarSettings = { "docs/cloud/about-cloud/tenancy", "docs/cloud/about-cloud/regions-ip-addresses", "docs/cloud/about-cloud/about-cloud-ide", + "docs/cloud/about-cloud/browsers", ], }, // About dbt Cloud directory { @@ -34,12 +35,14 @@ const sidebarSettings = { collapsed: true, items: [ "docs/about-setup", + "docs/environments-in-dbt", { type: "category", label: "dbt Cloud", collapsed: true, items: [ "docs/cloud/about-cloud-setup", + "docs/dbt-cloud-environments", { type: "category", label: "Connect data platform", @@ -143,6 +146,7 @@ const sidebarSettings = { link: { type: "doc", id: "docs/core/about-core-setup" }, items: [ "docs/core/about-the-cli", + "docs/core/dbt-core-environments", { type: "category", label: "Install dbt", @@ -239,6 +243,45 @@ const sidebarSettings = { "docs/build/groups", ], }, + { + type: "category", + label: "Build your metrics", + link: { type: "doc", id: "docs/build/build-metrics-intro"}, + collapsed: true, + items: [ + { + type: "category", + label: "About MetricFlow", + link: { type: "doc", id: "docs/build/about-metricflow" }, + items: [ + "docs/build/join-logic", + "docs/build/validation", + "docs/build/metricflow-time-spine", + ] + }, + "docs/build/sl-getting-started", + { + type: "category", + label: "Semantic models", + link: { type: "doc", id: "docs/build/semantic-models" }, + items: [ + "docs/build/dimensions", + "docs/build/entities", + "docs/build/measures" + ] + }, + { + type: "category", + label: "Metrics", + link: { type: "doc", id: "docs/build/metrics-overview"}, + items: [ + "docs/build/derived", + "docs/build/ratio", + "docs/build/simple", + ] + }, + ], + }, { type: "category", label: "Enhance your models", @@ -279,51 +322,50 @@ const sidebarSettings = { }, { type: "category", - label: "Deploy dbt jobs", + label: "Deploy dbt", collapsed: true, link: { type: "doc", id: "docs/deploy/deployments" }, items: [ + "docs/deploy/job-scheduler", + "docs/deploy/deploy-environments", { type: "category", - label: "Deploy with dbt Cloud", + label: "dbt Cloud jobs", link: { type: "doc", id: "docs/deploy/dbt-cloud-job" }, items: [ - "docs/deploy/artifacts", - "docs/deploy/job-scheduler", "docs/deploy/job-settings", "docs/deploy/job-commands", "docs/deploy/job-triggers", - "docs/deploy/job-notifications", + ], + }, + { + type: "category", + label: "Continuous integration", + link: { type: "doc", id: "docs/deploy/continuous-integration" }, + items: [ + "docs/deploy/slim-ci-jobs", + ], + }, + { + type: "category", + label: "Monitor jobs and alerts", + link: { type: "doc", id: "docs/deploy/monitor-jobs" }, + items: [ "docs/deploy/run-visibility", - "docs/deploy/source-freshness", + "docs/deploy/job-notifications", "docs/deploy/webhooks", + "docs/deploy/artifacts", + "docs/deploy/source-freshness", "docs/deploy/dashboard-status-tiles", - { - type: "category", - label: "Continuous integration", - link: { type: "doc", id: "docs/deploy/continuous-integration" }, - items: [ - "docs/deploy/slim-ci-jobs", - ], - }, ], }, "docs/deploy/deployment-tools", ], - }, // end of "Deploy dbt jobs" + }, // end of "Deploy dbt" { type: "category", label: "Collaborate with others", items: [ - { - type: "category", - label: "Environments", - items: [ - "docs/collaborate/environments/environments-in-dbt", - "docs/collaborate/environments/dbt-cloud-environments", - "docs/collaborate/environments/dbt-core-environments", - ], - }, { type: "category", label: "Git version control", @@ -391,12 +433,17 @@ const sidebarSettings = { items: [ { type: "link", - label: "API v2", + label: "API v2 (legacy docs)", + href: "/dbt-cloud/api-v2-legacy", + }, + { + type: "link", + label: "API v2 (beta docs)", href: "/dbt-cloud/api-v2", }, { type: "link", - label: "API v3", + label: "API v3 (beta docs)", href: "/dbt-cloud/api-v3", }, ], @@ -441,6 +488,7 @@ const sidebarSettings = { "docs/dbt-versions/core", "docs/dbt-versions/upgrade-core-in-cloud", "docs/dbt-versions/product-lifecycles", + "docs/dbt-versions/experimental-features", { type: "category", label: "dbt Cloud Release Notes", @@ -546,6 +594,7 @@ const sidebarSettings = { "reference/resource-properties/columns", "reference/resource-properties/config", "reference/resource-properties/constraints", + "reference/resource-properties/deprecation_date", "reference/resource-properties/description", "reference/resource-properties/latest_version", "reference/resource-properties/include-exclude", @@ -692,6 +741,7 @@ const sidebarSettings = { "reference/commands/init", "reference/commands/list", "reference/commands/parse", + "reference/commands/retry", "reference/commands/rpc", "reference/commands/run", "reference/commands/run-operation", @@ -799,6 +849,22 @@ const sidebarSettings = { "guides/best-practices/how-we-structure/5-the-rest-of-the-project", ], }, + { + type: "category", + label: "How we style our dbt projects", + link: { + type: "doc", + id: "guides/best-practices/how-we-style/0-how-we-style-our-dbt-projects", + }, + items: [ + "guides/best-practices/how-we-style/1-how-we-style-our-dbt-models", + "guides/best-practices/how-we-style/2-how-we-style-our-sql", + "guides/best-practices/how-we-style/3-how-we-style-our-python", + "guides/best-practices/how-we-style/4-how-we-style-our-jinja", + "guides/best-practices/how-we-style/5-how-we-style-our-yaml", + "guides/best-practices/how-we-style/6-how-we-style-conclusion", + ], + }, { type: "category", label: "Materializations best practices", @@ -1021,7 +1087,6 @@ const sidebarSettings = { label: "Legacy", items: [ "guides/legacy/debugging-schema-names", - "guides/legacy/getting-help", "guides/legacy/best-practices", "guides/legacy/building-packages", "guides/legacy/videos", @@ -1068,10 +1133,10 @@ const sidebarSettings = { items: [ "community/resources/viewpoint", "community/resources/code-of-conduct", - "community/resources/slack-rules-of-the-road", + "community/resources/community-rules-of-the-road", "community/resources/maintaining-a-channel", - "community/resources/vendor-guidelines", "community/resources/forum-guidelines", + "community/resources/getting-help", "community/resources/organizing-inclusive-events", "community/resources/oss-expectations", "community/resources/oss-projects", diff --git a/website/snippets/_available-enterprise-only.md b/website/snippets/_available-enterprise-only.md new file mode 100644 index 00000000000..d9d5d8f4546 --- /dev/null +++ b/website/snippets/_available-enterprise-only.md @@ -0,0 +1,5 @@ +:::info Limited to Enterprise + +This feature is limited to the dbt Cloud Enterprise plan. If you're interested in learning more about an Enterprise plan, contact us at . + +::: diff --git a/website/snippets/_available-tiers-iprestrictions.md b/website/snippets/_available-tiers-iprestrictions.md new file mode 100644 index 00000000000..9d5e7ebb289 --- /dev/null +++ b/website/snippets/_available-tiers-iprestrictions.md @@ -0,0 +1,9 @@ +:::info Limited to certain Enterprise tiers + +Organizations can configure IP restrictions using the following dbt Cloud Enterprise tiers: + * Business Critical + * Virtual Private + +To learn more about these tiers, contact us at . + +::: diff --git a/website/snippets/_cloud-environments-info.md b/website/snippets/_cloud-environments-info.md new file mode 100644 index 00000000000..d8ea7e3d799 --- /dev/null +++ b/website/snippets/_cloud-environments-info.md @@ -0,0 +1,44 @@ + +## Types of environments + +In dbt Cloud, there are two types of environments: +- Deployment environment — Determines the settings used when jobs created within that environment are executed. +- Development environment — Determines the settings used in the dbt Cloud IDE for that particular dbt Cloud project. + +Each dbt Cloud project can only have a single development environment but can have any number of deployment environments. + +| | Development Environments | Deployment Environments | +| --- | --- | --- | +| Determines settings for | dbt Cloud IDE | dbt Cloud Job runs | +| How many can I have in my project? | 1 | Any number | + +:::note +For users familiar with development on the CLI, each environment is roughly analogous to an entry in your `profiles.yml` file, with some additional information about your repository to ensure the proper version of code is executed. More info on dbt core environments [here](/docs/core/dbt-core-environments). +::: + +## Common environment settings + +Both development and deployment environments have a section called **General Settings**, which has some basic settings that all environments will define: + +| Setting | Example Value | Definition | Accepted Values | +| --- | --- | --- | --- | +| Name | Production | The environment name | Any string! | +| Environment Type | Deployment | The type of environment | [Deployment, Development] | +| dbt Version | 1.4 (latest) | The dbt version used | Any dbt version in the dropdown | +| Default to Custom Branch | ☑️ | Determines whether to use a branch other than the repository’s default | See below | +| Custom Branch | dev | Custom Branch name | See below | + +:::note About dbt version + +- dbt Cloud allows users to select any dbt release. At this time, **environments must use a dbt version greater than or equal to v1.0.0;** [lower versions are no longer supported](/docs/dbt-versions/upgrade-core-in-cloud). +- If you select a current version with `(latest)` in the name, your environment will automatically install the latest stable version of the minor version selected. +::: + +### Custom branch behavior + +By default, all environments will use the default branch in your repository (usually the `main` branch) when accessing your dbt code. This is overridable within each dbt Cloud Environment using the **Default to a custom branch** option. This setting have will have slightly different behavior depending on the environment type: + +- **Development**: determines which branch in the dbt Cloud IDE developers create branches from and open PRs against +- **Deployment:** determines the branch is cloned during job executions for each environment. + +For more info, check out this [FAQ page on this topic](/faqs/Environments/custom-branch-settings)! diff --git a/website/snippets/_sso-docs-mt-available.md b/website/snippets/_sso-docs-mt-available.md new file mode 100644 index 00000000000..630ef13a2b0 --- /dev/null +++ b/website/snippets/_sso-docs-mt-available.md @@ -0,0 +1,7 @@ +:::info Enterprise feature + +This guide describes a feature of the dbt Cloud Enterprise plan. If you’re interested in learning more about an Enterprise plan, contact us at . + +These SSO configuration documents apply to multi-tenant Enterprise deployments only. [Single-tenant](/docs/cloud/about-cloud/tenancy#single-tenant) Virtual Private users can [email dbt Cloud Support](mailto:support@getdbt.com) to set up or update their SSO configuration. + +::: diff --git a/website/snippets/_test-tenancy.md b/website/snippets/_test-tenancy.md new file mode 100644 index 00000000000..7fa5fac97f1 --- /dev/null +++ b/website/snippets/_test-tenancy.md @@ -0,0 +1,2 @@ + +dbt Cloud is available in both single (virtual private) and multi-tenant configurations. diff --git a/website/snippets/cloud-feature-parity.md b/website/snippets/cloud-feature-parity.md index 631137476f3..bcaa2ef3784 100644 --- a/website/snippets/cloud-feature-parity.md +++ b/website/snippets/cloud-feature-parity.md @@ -4,11 +4,11 @@ The following table outlines which dbt Cloud features are supported on the diffe |-------------------------------|--------------|-----------------------|----------------------| | Scheduler | ✅ | ✅ | ✅ | | Cloud IDE | ✅ | ✅ | ✅ | -| Audit logs | ✅ | ✅ (select customers) | ❌ | -| Discovery API | ✅ | ✅ (select customers) | ❌ | +| Audit logs | ✅ | ✅ | ✅ | +| Discovery API | ✅ | ✅ (select customers) | ❌ | | Webhooks (Outbound) | ✅ | ❌ | ❌ | | Continuous Integration, including Slim CI | ✅ | ✅ | ✅ | | Semantic Layer | ✅ (North America Only) | ❌ | ❌ | -| IP Restrictions | ❌ | ✅ | ✅ | +| IP Restrictions | ✅ | ✅ | ✅ | | PrivateLink egress | ✅ | ✅ | ✅ | | PrivateLink ingress | ❌ | ✅ | ✅ | diff --git a/website/snippets/core-version-support.md b/website/snippets/core-version-support.md new file mode 100644 index 00000000000..ff9fa94ff8c --- /dev/null +++ b/website/snippets/core-version-support.md @@ -0,0 +1,5 @@ + +- **[Active](/docs/dbt-versions/core#ongoing-patches)** — We will patch regressions, new bugs, and include fixes for older bugs / quality-of-life improvements. We implement these changes when we have high confidence that they're narrowly scoped and won't cause unintended side effects. +- **[Critical](/docs/dbt-versions/core#ongoing-patches)** — Newer minor versions transition the previous minor version into "Critical Support" with limited "security" releases for critical security and installation fixes. +- **[End of Life](/docs/dbt-versions/core#eol-version-support)** — Minor versions that have reached EOL no longer receive new patch releases. +- **Deprecated** — dbt-core versions older than v1.0 are no longer maintained by dbt Labs, nor supported in dbt Cloud. diff --git a/website/snippets/core-versions-table.md b/website/snippets/core-versions-table.md index 7da0b2b82ba..6997353545b 100644 --- a/website/snippets/core-versions-table.md +++ b/website/snippets/core-versions-table.md @@ -6,10 +6,10 @@ | [**v1.4**](/guides/migration/versions/upgrading-to-v1.4) | Jan 25, 2023 | Critical | Jan 25, 2024 | | [**v1.3**](/guides/migration/versions/upgrading-to-v1.3) | Oct 12, 2022 | Critical | Oct 12, 2023 | | [**v1.2**](/guides/migration/versions/upgrading-to-v1.2) | Jul 26, 2022 | Critical | Jul 26, 2023 | -| [**v1.1**](/guides/migration/versions/upgrading-to-v1.1) ⚠️ | Apr 28, 2022 | End of Life ⚠️ | Apr 28, 2023 | -| [**v1.0**](/guides/migration/versions/upgrading-to-v1.0) ⚠️ | Dec 3, 2021 | End of Life ⚠️ | Dec 3, 2022 ⚠️ | -| **v0.X** ⚠️ | (Various dates) | End of Life ⚠️ | Deprecated ⚠️ | - +| [**v1.1**](/guides/migration/versions/upgrading-to-v1.1) ⚠️ | Apr 28, 2022 | End of Life* ⚠️ | Apr 28, 2023 | +| [**v1.0**](/guides/migration/versions/upgrading-to-v1.0) ⚠️ | Dec 3, 2021 | End of Life* ⚠️ | Dec 3, 2022 ⚠️ | +| **v0.X** ⛔️ | (Various dates) | Deprecated ⛔️ | Deprecated ⛔️ | +_*All versions of dbt Core since v1.0 are available in dbt Cloud until further notice. Versions that are EOL do not receive any fixes. For the best support, we recommend upgrading to a version released within the past 12 months._ ### Planned future releases _Future release dates are tentative and subject to change._ diff --git a/website/snippets/quickstarts/schedule-a-job.md b/website/snippets/quickstarts/schedule-a-job.md index 620e89137c8..55504636192 100644 --- a/website/snippets/quickstarts/schedule-a-job.md +++ b/website/snippets/quickstarts/schedule-a-job.md @@ -13,9 +13,10 @@ Use dbt Cloud's Scheduler to deploy your production jobs confidently and build o 1. In the upper left, select **Deploy**, then click **Environments**. 2. Click **Create Environment**. -3. Name your deployment environment. For example, "Production." -4. Add a target dataset, for example, "Analytics." dbt will build into this dataset. For some warehouses this will be named "schema." -5. Click **Save**. +3. In the **Name** field, write the name of your deployment environment. For example, "Production." +4. In the **dbt Version** field, select the latest version from the dropdown. +5. Under **Deployment Credentials**, enter the name of the dataset you want to use as the target, such as "Analytics". This will allow dbt to build and work with that dataset. For some data warehouses, the target dataset may be referred to as a "schema". +6. Click **Save**. ### Create and run a job diff --git a/website/src/components/docCarousel/index.js b/website/src/components/docCarousel/index.js index d7fda2c59a5..4f6a11b644c 100644 --- a/website/src/components/docCarousel/index.js +++ b/website/src/components/docCarousel/index.js @@ -1,8 +1,9 @@ import React from 'react'; import { Swiper, SwiperSlide } from 'swiper/react'; import 'swiper/css'; -import { Navigation } from 'swiper'; +import { Navigation, Pagination } from 'swiper'; import 'swiper/css/navigation'; +import 'swiper/css/pagination'; function DocCarousel({ slidesPerView = 3, children }) { if ( !children?.length > 0 ){ @@ -21,8 +22,8 @@ function DocCarousel({ slidesPerView = 3, children }) { slidesPerView={1} effect="fade" navigation - modules={[Navigation]} - + modules={[Navigation, Pagination]} + pagination={{ clickable: true }} breakpoints={{ 640: { slidesPerView: 2, diff --git a/website/src/components/stoplight/index.js b/website/src/components/stoplight/index.js index 03d27cbce76..bff43dd27c8 100644 --- a/website/src/components/stoplight/index.js +++ b/website/src/components/stoplight/index.js @@ -20,6 +20,7 @@ export default function Stoplight({ version }) { } platformUrl={useBaseUrl("/")} basePath={useBaseUrl("/dbt-cloud/api-" + version) + "#"} + hideSchemas /> ); diff --git a/website/src/css/custom.css b/website/src/css/custom.css index a092446d369..c8047407450 100644 --- a/website/src/css/custom.css +++ b/website/src/css/custom.css @@ -1815,6 +1815,24 @@ section>h2:not(.resource-section) { font-weight: 800; } +/* General Swiper Styles */ +.docswiper .swiper-pagination-bullet { + height: 10px; + width: 10px; +} + +.docswiper .swiper-pagination-bullet.swiper-pagination-bullet-active { + background: var(--ifm-color-info); +} + +[data-theme='dark'] .docswiper .swiper-pagination-bullet { + background: var(--color-off-white); +} + +[data-theme='dark'] .docswiper .swiper-pagination-bullet.swiper-pagination-bullet-active { + background: var(--color-light-teal); +} + /* Community Home styles */ .community-home section { margin: calc(5vh) auto calc(2vh); diff --git a/website/src/pages/dbt-cloud/api-v2-old.js b/website/src/pages/dbt-cloud/api-v2-legacy.js similarity index 100% rename from website/src/pages/dbt-cloud/api-v2-old.js rename to website/src/pages/dbt-cloud/api-v2-legacy.js diff --git a/website/src/theme/BlogPostItem/Header/Author/index.js b/website/src/theme/BlogPostItem/Header/Author/index.js index 99807b8dd28..a37d9e9985a 100644 --- a/website/src/theme/BlogPostItem/Header/Author/index.js +++ b/website/src/theme/BlogPostItem/Header/Author/index.js @@ -15,7 +15,7 @@ function MaybeLink(props) { */ export default function BlogPostItemHeaderAuthor({author, className}) { - const {name, title, url, imageURL, email, key} = author; + const {name, url, imageURL, email, key, job_title, organization} = author; const link = url || (email && `mailto:${email}`) || undefined; return (
@@ -36,9 +36,9 @@ export default function BlogPostItemHeaderAuthor({author, className}) { {name}
- {title && ( + {job_title && organization && ( - {title} + {job_title && job_title} {organization && `@ ${organization}`} )} diff --git a/website/static/_redirects b/website/static/_redirects index 040e604b960..3f964f66d24 100644 --- a/website/static/_redirects +++ b/website/static/_redirects @@ -1,5 +1,12 @@ -/docs/deploy/cloud-ci-job /docs/deploy/continuous-integration 301 +## refocus deploy page +/docs/collaborate/environments/environments-in-dbt /docs/environments-in-dbt 301 +/docs/collaborate/environments/dbt-cloud-environments /docs/deploy/dbt-cloud-environments 301 +/docs/collaborate/environments/dbt-core-environments /docs/core/dbt-core-environments 301 + +/docs/cloud/manage-access/licenses-and-groups /docs/cloud/manage-access/about-user-access 301 + +/docs/deploy/cloud-ci-job /docs/deploy/continuous-integration 301 ## quickstarts redirect again /docs/quickstarts/dbt-cloud/bigquery /quickstarts/bigquery 301 @@ -571,7 +578,6 @@ docs/dbt-cloud/using-dbt-cloud/cloud-model-timing-tab /docs/deploy/dbt-cloud-job /docs/setting-up-snowflake-sso /docs/dbt-cloud/dbt-cloud-enterprise/setting-up-enterprise-snowflake-oauth 301 /docs/setting-up-sso-with-google-gsuite /docs/dbt-cloud/dbt-cloud-enterprise/setting-up-sso-with-google-gsuite 301 /docs/setting-up-sso-with-okta /docs/dbt-cloud/dbt-cloud-enterprise/setting-up-sso-with-okta 301 -/docs/slack-rules-of-the-road /docs/contributing/slack-rules-of-the-road 301 /docs/snapshot /reference/commands/snapshot 301 /docs/snapshots /docs/building-a-dbt-project/snapshots 301 /docs/snowflake-configs /reference/resource-configs/snowflake-configs 301 @@ -679,6 +685,7 @@ https://tutorial.getdbt.com/* https://docs.getdbt.com/:splat 301! /reference/model-selection-syntax/#test-selection-examples /reference/node-selection/test-selection-examples 301 /docs/building-a-dbt-project/building-models/using-custom-database /docs/building-a-dbt-project/building-models/using-custom-databases 301 /dbt-cloud/api /dbt-cloud/api-v2 301 +/dbt-cloud/api-v2-old /dbt-cloud/api-v2-legacy 301 /dbt-cloud/api-v4 /docs/dbt-cloud-apis/admin-cloud-api /reference/project-configs/source-paths /reference/project-configs/model-paths 301 /reference/project-configs/data-paths /reference/project-configs/seed-paths 301 @@ -825,11 +832,14 @@ https://tutorial.getdbt.com/* https://docs.getdbt.com/:splat 301! /docs/contributing/contributor-license-agreements /community/resources/contributor-license-agreements 301 /community/maintaining-a-channel /community/resources/maintaining-a-channel 301 /docs/contributing/oss-expectations /community/resources/oss-expectations 301 -/docs/contributing/slack-rules-of-the-road /community/resources/slack-rules-of-the-road 301 +/docs/slack-rules-of-the-road /community/resources/community-rules-of-the-road 301 +/docs/contributing/slack-rules-of-the-road /community/resources/community-rules-of-the-road 301 +/community/resources/slack-rules-of-the-road /community/resources/community-rules-of-the-road 301 /blog/getting-started-with-the-dbt-semantic-layer /blog/understanding-the-components-of-the-dbt-semantic-layer 301! /docs/getting-started/develop-in-the-cloud#creating-a-development-environment /docs/get-started/develop-in-the-cloud#set-up-and-access-the-cloud-ide 301 /docs/cloud-developer-ide /docs/build/custom-target-names#dbt-cloud-ide 301 /website/docs/docs/contributing/building-a-new-adapter.md /guides/dbt-ecosystem/adapter-development/3-building-a-new-adapter 301 +/guides/legacy/getting-help /community/resources/getting-help 301 # Blog docs diff --git a/website/static/img/blog/2023-07-03-data-vault-2-0-with-dbt-cloud/data-dungeon-meme.jpeg b/website/static/img/blog/2023-07-03-data-vault-2-0-with-dbt-cloud/data-dungeon-meme.jpeg new file mode 100644 index 00000000000..cbedf5014d5 Binary files /dev/null and b/website/static/img/blog/2023-07-03-data-vault-2-0-with-dbt-cloud/data-dungeon-meme.jpeg differ diff --git a/website/static/img/blog/2023-07-03-data-vault-2-0-with-dbt-cloud/reservoir-dam-hallucination.png b/website/static/img/blog/2023-07-03-data-vault-2-0-with-dbt-cloud/reservoir-dam-hallucination.png new file mode 100644 index 00000000000..2a37cc567b2 Binary files /dev/null and b/website/static/img/blog/2023-07-03-data-vault-2-0-with-dbt-cloud/reservoir-dam-hallucination.png differ diff --git a/website/static/img/blog/authors/rastislav-zdechovan.png b/website/static/img/blog/authors/rastislav-zdechovan.png new file mode 100644 index 00000000000..40f8151d620 Binary files /dev/null and b/website/static/img/blog/authors/rastislav-zdechovan.png differ diff --git a/website/static/img/docs/building-a-dbt-project/MetricFlow-SchemaExample.jpeg b/website/static/img/docs/building-a-dbt-project/MetricFlow-SchemaExample.jpeg new file mode 100644 index 00000000000..9b0f0181b76 Binary files /dev/null and b/website/static/img/docs/building-a-dbt-project/MetricFlow-SchemaExample.jpeg differ diff --git a/website/static/img/docs/building-a-dbt-project/multihop-diagram.png b/website/static/img/docs/building-a-dbt-project/multihop-diagram.png new file mode 100644 index 00000000000..b6df1c12c03 Binary files /dev/null and b/website/static/img/docs/building-a-dbt-project/multihop-diagram.png differ diff --git a/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/create-deploy-env.jpg b/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/create-deploy-env.jpg new file mode 100644 index 00000000000..851ef0b60d6 Binary files /dev/null and b/website/static/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/create-deploy-env.jpg differ diff --git a/website/static/img/docs/dbt-cloud/cloud-ide/gitignore-italics.jpg b/website/static/img/docs/dbt-cloud/cloud-ide/gitignore-italics.jpg new file mode 100644 index 00000000000..b1cde629744 Binary files /dev/null and b/website/static/img/docs/dbt-cloud/cloud-ide/gitignore-italics.jpg differ diff --git a/website/static/img/docs/dbt-cloud/cloud-ide/project-yml-clean.jpg b/website/static/img/docs/dbt-cloud/cloud-ide/project-yml-clean.jpg new file mode 100644 index 00000000000..bdb3dfe757b Binary files /dev/null and b/website/static/img/docs/dbt-cloud/cloud-ide/project-yml-clean.jpg differ diff --git a/website/static/img/docs/dbt-cloud/cloud-ide/project-yml-gitignore.jpg b/website/static/img/docs/dbt-cloud/cloud-ide/project-yml-gitignore.jpg new file mode 100644 index 00000000000..782454ff3ac Binary files /dev/null and b/website/static/img/docs/dbt-cloud/cloud-ide/project-yml-gitignore.jpg differ diff --git a/website/static/img/docs/dbt-cloud/cloud-ide/restart-ide.jpg b/website/static/img/docs/dbt-cloud/cloud-ide/restart-ide.jpg new file mode 100644 index 00000000000..084f6b33104 Binary files /dev/null and b/website/static/img/docs/dbt-cloud/cloud-ide/restart-ide.jpg differ diff --git a/website/static/img/docs/dbt-versions/experimental-feats.png b/website/static/img/docs/dbt-versions/experimental-feats.png new file mode 100644 index 00000000000..f4c353b8bb4 Binary files /dev/null and b/website/static/img/docs/dbt-versions/experimental-feats.png differ diff --git a/website/static/img/prep-start.jpg b/website/static/img/prep-start.jpg new file mode 100644 index 00000000000..6e3680354b8 Binary files /dev/null and b/website/static/img/prep-start.jpg differ diff --git a/website/static/img/run-start.jpg b/website/static/img/run-start.jpg new file mode 100644 index 00000000000..d5706ab8140 Binary files /dev/null and b/website/static/img/run-start.jpg differ