-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: add Getting Started with Databricks guide (#7050)
- Loading branch information
1 parent
e4d8c16
commit 5516f23
Showing
8 changed files
with
537 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
module.exports = { | ||
"core": "Cube Core", | ||
"cloud": "Cube Cloud", | ||
"databricks": "Cube Cloud and Databricks", | ||
"migrate-from-core": "Migrate from Cube Core" | ||
} | ||
} |
15 changes: 15 additions & 0 deletions
15
docs/docs-new/pages/product/getting-started/databricks.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Getting started with Cube Cloud and Databricks | ||
|
||
This getting started guide will show you how to use Cube Cloud with Databricks. | ||
You will learn how to: | ||
|
||
- Load sample data into your Databricks account | ||
- Connect Cube Cloud to Databricks | ||
- Create your first Cube data model | ||
- Connect to a BI tool to explore this model | ||
- Create React application with Cube REST API | ||
|
||
## Prerequisites | ||
|
||
- [Cube Cloud account](https://cubecloud.dev/auth/signup) | ||
- [Databricks account](https://www.databricks.com/try-databricks) |
7 changes: 7 additions & 0 deletions
7
docs/docs-new/pages/product/getting-started/databricks/_meta.js
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
module.exports = { | ||
"load-data": "Load data", | ||
"connect-to-databricks": "Connect to Databricks", | ||
"create-data-model": "Create data model", | ||
"query-from-bi": "Query from BI", | ||
"query-from-react-app": "Query from React" | ||
} |
81 changes: 81 additions & 0 deletions
81
docs/docs-new/pages/product/getting-started/databricks/connect-to-databricks.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
# Connect to Databricks | ||
|
||
In this section, we’ll create a Cube Cloud deployment and connect it to | ||
Databricks. A deployment represents a data model, configuration, and managed | ||
infrastructure. | ||
|
||
To continue with this guide, you'll need to have a Cube Cloud account. If you | ||
don't have one yet, [click here to sign up][cube-cloud-signup] for free. | ||
|
||
First, [sign in to your Cube Cloud account][cube-cloud-signin]. Then, | ||
click <Btn>Create Deployment</Btn>: | ||
|
||
Give the deployment a name, select the cloud provider and region of your choice, | ||
and click <Btn>Next</Btn>: | ||
|
||
<Screenshot | ||
alt="Cube Cloud Create Deployment Screen" | ||
src="https://ucarecdn.com/2338323e-0db8-4224-8e7a-3b4daf9c60ec/" | ||
/> | ||
|
||
<SuccessBox> | ||
|
||
Microsoft Azure is available in Cube Cloud on | ||
[Premium](https://cube.dev/pricing) tier. [Contact us](https://cube.dev/contact) | ||
for details. | ||
|
||
</SuccessBox> | ||
|
||
## Set up a Cube project | ||
|
||
Next, click <Btn>Create</Btn> to create a new project from scratch: | ||
|
||
<Screenshot | ||
alt="Cube Cloud Upload Project Screen" | ||
src="https://ucarecdn.com/46b72b61-b650-4271-808d-55203f1c8d8b/" | ||
/> | ||
|
||
## Connect to your Databricks | ||
|
||
The last step is to connect Cube Cloud to Databricks. First, select it from the | ||
grid: | ||
|
||
<Screenshot | ||
alt="Cube Cloud Setup Database Screen" | ||
src="https://ucarecdn.com/1d656ba9-dd83-4ff4-a59e-8b5f97a9ddcc/" | ||
/> | ||
|
||
Then enter your Databricks credentials: | ||
|
||
- **Access Token:** A personal access token for your Databricks account. [You | ||
can generate one][databricks-docs-pat] in your Databricks account settings. | ||
- **Databricks JDBC URL:** The JDBC URL for your Databricks SQL warehouse. [You | ||
can find it][databricks-docs-jdbc-url] in the SQL warehouse settings screen. | ||
- **Databricks Catalog:** This should match the same catalog where you uploaded | ||
the files in the last section. If left unspecified, the `default` catalog is | ||
used. | ||
|
||
[databricks-docs-pat]: | ||
https://docs.databricks.com/en/dev-tools/auth.html#databricks-personal-access-tokens-for-workspace-users | ||
[databricks-docs-jdbc-url]: | ||
https://docs.databricks.com/en/integrations/jdbc-odbc-bi.html#get-connection-details-for-a-sql-warehouse | ||
|
||
Click <Btn>Apply</Btn>, Cube Cloud will test the connection and proceed to the | ||
next step. | ||
|
||
## Generate data model from your Databricks schema | ||
|
||
Cube can now generate a basic data model from your data warehouse, which helps | ||
getting started with data modeling faster. Select all four tables in our catalog | ||
and click through the data model generation wizard. We'll inspect these | ||
generated files in the next section and start making changes to them. | ||
|
||
[aws-docs-sec-group]: | ||
https://docs.aws.amazon.com/vpc/latest/userguide/security-groups.html | ||
[aws-docs-sec-group-rule]: | ||
https://docs.aws.amazon.com/vpc/latest/userguide/security-group-rules.html | ||
[cube-cloud-signin]: https://cubecloud.dev/auth | ||
[cube-cloud-signup]: https://cubecloud.dev/auth/signup | ||
[ref-conf-db]: /product/configuration/data-sources | ||
[ref-getting-started-cloud-generate-models]: | ||
/getting-started/cloud/generate-models |
213 changes: 213 additions & 0 deletions
213
docs/docs-new/pages/product/getting-started/databricks/create-data-model.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,213 @@ | ||
# Create your first data model | ||
|
||
Cube follows a dataset-oriented data modeling approach, which is inspired by and | ||
expands upon dimensional modeling. Cube incorporates this approach and provides | ||
a practical framework for implementing dataset-oriented data modeling. | ||
|
||
When building a data model in Cube, you work with two dataset-centric objects: | ||
**cubes** and **views**. **Cubes** usually represent business entities such as | ||
customers, line items, and orders. In cubes, you define all the calculations | ||
within the measures and dimensions of these entities. Additionally, you define | ||
relationships between cubes, such as "an order has many line items" or "a user | ||
may place multiple orders." | ||
|
||
**Views** sit on top of a data graph of cubes and create a facade of your entire | ||
data model, with which data consumers can interact. You can think of views as | ||
the final data products for your data consumers - BI users, data apps, AI | ||
agents, etc. When building views, you select measures and dimensions from | ||
different connected cubes and present them as a single dataset to BI or data | ||
apps. | ||
|
||
<Diagram | ||
alt="Architecture diagram of queries being sent to cubes and views" | ||
src="https://ucarecdn.com/bfc3e04a-b690-40bc-a6f8-14a9175fb4fd/" | ||
/> | ||
|
||
## Working with cubes | ||
|
||
To begin building your data model, click on <Btn>Enter Development Mode</Btn> in | ||
Cube Cloud. This will take you to your personal developer space, where you can | ||
safely make changes to your data model without affecting the production | ||
environment. | ||
|
||
In the previous section, we generated four cubes. To see the data graph of these | ||
four cubes and how they are connected to each other, click the <Btn>Show | ||
Graph</Btn> button on the Data Model page. | ||
|
||
Let's review the `orders` cube first and update it with additional dimensions | ||
and measures. | ||
|
||
Once you are in developer mode, navigate to the <Btn>Data Model</Btn> and click | ||
on the `orders.yml` file in the left sidebar inside the `model/cubes` directory | ||
to open it. | ||
|
||
You should see the following content of `model/cubes/orders.yml` file. | ||
|
||
```yaml | ||
cubes: | ||
- name: orders | ||
sql_table: ECOM.ORDERS | ||
|
||
joins: | ||
- name: users | ||
sql: "{CUBE}.USER_ID = {users}.USER_ID" | ||
relationship: many_to_one | ||
|
||
dimensions: | ||
- name: status | ||
sql: STATUS | ||
type: string | ||
|
||
- name: id | ||
sql: ID | ||
type: number | ||
primary_key: true | ||
|
||
- name: created_at | ||
sql: CREATED_AT | ||
type: time | ||
|
||
- name: completed_at | ||
sql: COMPLETED_AT | ||
type: time | ||
|
||
measures: | ||
- name: count | ||
type: count | ||
``` | ||
As you can see, we already have a `count` measure that we can use to calculate | ||
the total count of our orders. | ||
|
||
Let's add an additional measure to the `orders` cube to calculate only | ||
**completed orders**. The `status` dimension in the `orders` cube reflects the | ||
three possible statuses: **processing**, **shipped**, or **completed**. We will | ||
create a new measure `completed_count` by using a filter on that dimension. To | ||
do this, we will use a | ||
[filter parameter](/product/data-modeling/reference/measures#filters) of the | ||
measure and | ||
[refer](/product/data-modeling/fundamentals/syntax#referring-to-objects) to the | ||
existing dimension. | ||
|
||
Add the following measure definition to your `model/cubes/orders.yml` file. It | ||
should be included within the `measures` block. | ||
|
||
```yaml | ||
- name: completed_count | ||
type: count | ||
filters: | ||
- sql: "{CUBE}.status = 'completed'" | ||
``` | ||
|
||
With these two measures in place, `count` and `completed_count`, we can create a | ||
**derived measure**. Derived measures are measures that you can create based on | ||
existing measures. Let's create the `completed_percentage` derived measure. | ||
|
||
Add the following measure definition to your `model/cubes/orders.yml` file | ||
within the `measures` block. | ||
|
||
```yaml | ||
- name: completed_percentage | ||
type: number | ||
sql: "({completed_count} / NULLIF({count}, 0)) * 100.0" | ||
format: percent | ||
``` | ||
|
||
Below you can see what your updated `orders` cube should look like with two new | ||
measures. Feel free to copy this code and paste it into your | ||
`model/cubes/order.yml` file. | ||
|
||
```yaml | ||
cubes: | ||
- name: orders | ||
sql_table: ECOM.ORDERS | ||
joins: | ||
- name: users | ||
sql: "{CUBE}.USER_ID = {users}.USER_ID" | ||
relationship: many_to_one | ||
dimensions: | ||
- name: status | ||
sql: STATUS | ||
type: string | ||
- name: id | ||
sql: ID | ||
type: number | ||
primary_key: true | ||
- name: created_at | ||
sql: CREATED_AT | ||
type: time | ||
- name: completed_at | ||
sql: COMPLETED_AT | ||
type: time | ||
measures: | ||
- name: count | ||
type: count | ||
- name: completed_count | ||
type: count | ||
filters: | ||
- sql: "{CUBE}.status = 'completed'" | ||
- name: completed_percentage | ||
type: number | ||
sql: "({completed_count} / NULLIF({count}, 0)) * 100.0" | ||
format: percent | ||
``` | ||
|
||
Click <Btn>Save All</Btn> in the upper corner to save changes to the data model. | ||
Now, you can navigate to Cube’s Playground. The Playground is a web-based tool | ||
that allows you to query your data without connecting any tools or writing any | ||
code. It's the fastest way to explore and test your data model. | ||
|
||
You can select measures and dimensions from different cubes in playground, | ||
including your newly created `completed_percentage` measure. | ||
|
||
## Working with views | ||
|
||
When building views, we recommend following entity-oriented design and | ||
structuring your views around your business entities. Usually, cubes tend to be | ||
normalized entities without duplicated or redundant members, while views are | ||
denormalized entities where you pick as many measures and dimensions from | ||
multiple cubes as needed to describe a business entity. | ||
|
||
Let's create our first view, which will provide all necessary measures and | ||
dimensions to explore orders. Views are usually located in the `views` folder | ||
and have a `_view` postfix. | ||
|
||
Create `model/views/orders_view.yml` with the following content: | ||
|
||
```yaml | ||
views: | ||
- name: orders_view | ||
cubes: | ||
- join_path: orders | ||
includes: | ||
- status | ||
- created_at | ||
- count | ||
- completed_count | ||
- completed_percentage | ||
- join_path: orders.users | ||
prefix: true | ||
includes: | ||
- city | ||
- age | ||
- state | ||
``` | ||
|
||
When building views, you can leverage the `cubes` parameter, which enables you | ||
to include measures and dimensions from other cubes in the view. You can build | ||
your view by combining multiple joined cubes and specifying the path by which | ||
they should be joined for that particular view. | ||
|
||
After saving, you can experiment with your newly created view in the Playground. | ||
In the next section, we will learn how to query our `orders_view` using a BI | ||
tool. |
34 changes: 34 additions & 0 deletions
34
docs/docs-new/pages/product/getting-started/databricks/load-data.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Load data | ||
|
||
The following steps will guide you through setting up a Databricks account and | ||
uploading the demo dataset, which is stored as CSV files in a public S3 bucket. | ||
|
||
First, download the following files to your local machine: | ||
|
||
- [`line_items.csv`](https://cube-tutorial.s3.us-east-2.amazonaws.com/line_items.csv) | ||
- [`orders.csv`](https://cube-tutorial.s3.us-east-2.amazonaws.com/orders.csv) | ||
- [`users.csv`](https://cube-tutorial.s3.us-east-2.amazonaws.com/users.csv) | ||
- [`products.csv`](https://cube-tutorial.s3.us-east-2.amazonaws.com/products.csv) | ||
|
||
Next, let's ensure we have a SQL warehouse that is active. Log in to your | ||
Databricks account, then from the sidebar, click on <Btn>SQL → SQL | ||
Warehouses</Btn>: | ||
|
||
<Screenshot | ||
alt="Databricks SQL Warehouses screen" | ||
src="https://ucarecdn.com/92e82ca3-0ca4-4064-8ed6-394e5a66e869/" | ||
/> | ||
|
||
<InfoBox> | ||
|
||
Ensure the warehouse is active by checking its status; if it is inactive, click | ||
|
||
<Btn>▶️</Btn> to start it. | ||
|
||
</InfoBox> | ||
|
||
Next, click <Btn>New → File upload</Btn> from the sidebar, and upload | ||
`line_items.csv`. The UI will show a preview of the data within the file; when | ||
ready, click <Btn>Create table</Btn>. | ||
|
||
Repeat the above steps for the three other files. |
Oops, something went wrong.