diff --git a/docs/assets/images/databricks/databricks-integration-cluster.png b/docs/assets/images/databricks/databricks-integration-cluster.png new file mode 100644 index 0000000000..f5c0659f0a Binary files /dev/null and b/docs/assets/images/databricks/databricks-integration-cluster.png differ diff --git a/docs/assets/images/databricks/databricks-integration.png b/docs/assets/images/databricks/databricks-integration.png new file mode 100644 index 0000000000..ed4f142b83 Binary files /dev/null and b/docs/assets/images/databricks/databricks-integration.png differ diff --git a/docs/hopsworksai/azure/getting_started.md b/docs/hopsworksai/azure/getting_started.md index 9c3c120e20..6ca1c12fd1 100644 --- a/docs/hopsworksai/azure/getting_started.md +++ b/docs/hopsworksai/azure/getting_started.md @@ -64,13 +64,15 @@ At this point, you might get the following error message. This means that your Azure user does not have sufficient permissions to add the service principal. In this case, please ask your Azure administrator to add it for you or give you the required permissions. -```bash -$ az ad sp create --id d4abcc44-2c40-40bd-9bba-986df591c28f -``` +!!! error + + ```bash + az ad sp create --id d4abcc44-2c40-40bd-9bba-986df591c28f + ``` -!!! note When using this permission, the backing application of the service principal being created must in the local tenant. + ### Step 1.2: Creating a custom role for Hopsworks.ai Proceed to the Azure Portal and open either a *Subscription* or *Resource Group* that you want to use for Hopsworks.ai. diff --git a/docs/integrations/databricks/api_key.md b/docs/integrations/databricks/api_key.md index f5129b644a..56480d0f2b 100644 --- a/docs/integrations/databricks/api_key.md +++ b/docs/integrations/databricks/api_key.md @@ -23,6 +23,23 @@ In Hopsworks, click on your *username* in the top-right corner and select *Setti !!! info You are only ably to retrieve the API Key once. If you miss to copy it to your clipboard, delete it again and create a new one. +## Quickstart API Key File + +!!! hint "Save API Key as File" + To get started quickly, without saving the Hopsworks API in a secret storage, you can simply create a file with the previously created Hopsworks API Key and place it on the environment from which you wish to connect to the Hopsworks Feature Store. That is either save it on the Databricks File System (DBFS) or in your Databricks workspace. + + You can then connect by simply passing the path to the key file when instantiating a connection: + ```python hl_lines="6" + import hsfs + conn = hsfs.connection( + 'my_instance', # DNS of your Feature Store instance + 443, # Port to reach your Hopsworks instance, defaults to 443 + 'my_project', # Name of your Hopsworks Feature Store project + api_key_file='featurestore.key', # The file containing the API key generated above + hostname_verification=True) # Disable for self-signed certificates + ) + fs = conn.get_feature_store() # Get the project's default feature store + ``` ## Storing the API Key diff --git a/docs/integrations/databricks/configuration.md b/docs/integrations/databricks/configuration.md index caca5912f9..04605caf1a 100644 --- a/docs/integrations/databricks/configuration.md +++ b/docs/integrations/databricks/configuration.md @@ -32,11 +32,20 @@ Users can register a new Databricks instance by navigating to the `Integrations` The instance address should be in the format `[UUID].cloud.databricks.com` (or `adb-[UUID].19.azuredatabricks.net` for Databricks on Azure), essentially the same web address used to reach the Databricks instance from the browser. +

+

+ + Register a Databricks Instance along with a Databricks API Key + +
Register a Databricks Instance along with a Databricks API Key
+
+

+ The API Key will be stored in the Hopsworks secret store for the user and will be available only for that user. If multiple users need to configure Databricks clusters, each has to generate an API Key and register an instance. The Databricks instance registration does not have a project scope, meaning that once registered, the user can configure clusters for all projects they are part of. ## Databricks Cluster -A cluster needs to exists before users can configure it using the Hopsworks UI. The cluster can be in any state prior to the configuration. +A cluster needs to exist before users can configure it using the Hopsworks UI. The cluster can be in any state prior to the configuration. !!! warning "Runtime limitation" @@ -47,6 +56,15 @@ A cluster needs to exists before users can configure it using the Hopsworks UI. Clusters are configured for a project user, which, in Hopsworks terms, means a user operating within the scope of a project. To configure a cluster, click on the `Configure` button. By default the cluster will be configured for the user making the request. If the user doesn't have `Can Manage` privilege on the cluster, they can ask a project `Data Owner` to configure it for them. Hopsworks `Data Owners` are allowed to configure clusters for other project users, as long as they have the required Databricks privileges. +

+

+ + Configure a Databricks Cluster from Hopsworks + +
Configure a Databricks Cluster from Hopsworks
+
+

+ During the cluster configuration the following steps will be taken: - Upload an archive to DBFS containing the necessary Jars for HSFS and HopsFS to be able to read and write from the Hopsworks Feature Store diff --git a/docs/integrations/databricks/networking.md b/docs/integrations/databricks/networking.md index fe03ea3c61..4e82097ae0 100644 --- a/docs/integrations/databricks/networking.md +++ b/docs/integrations/databricks/networking.md @@ -206,11 +206,11 @@ Wait for the peering to show up as *Connected*. There should now be bi-direction

-### Step 2: Configuring the Security Group +### Step 2: Configuring the Network Security Group -The Feature Store *Security Group* needs to be configured to allow traffic from your Databricks clusters to be able to connect to the Feature Store. +The *Network Security Group* of the Feature Store on Azure needs to be configured to allow traffic from your Databricks clusters to be able to connect to the Feature Store. -Ensure that ports *443*, *9083*, *9085*, *8020* and *50010* are reachable from the Databricks Security Group. +Ensure that ports *443*, *9083*, *9085*, *8020* and *50010* are reachable from the Databricks cluster *Network Security Group*. !!! note "Hopsworks.ai" If you deployed your Hopsworks Feature Store instance with Hopsworks.ai, it suffices to enable [outside access of the Feature Store and Online Feature Store services](../../hopsworksai/azure/getting_started/#step-5-outside-access-to-the-feature-store). diff --git a/python/hsfs/connection.py b/python/hsfs/connection.py index d53623ac13..ddfe87c865 100644 --- a/python/hsfs/connection.py +++ b/python/hsfs/connection.py @@ -47,6 +47,27 @@ class Connection: conn = hsfs.connection() ``` + !!! hint "Save API Key as File" + To get started quickly, without saving the Hopsworks API in a secret storage, + you can simply create a file with the previously created Hopsworks API Key and + place it on the environment from which you wish to connect to the Hopsworks + Feature Store. + + You can then connect by simply passing the path to the key file when + instantiating a connection: + + ```python hl_lines="6" + import hsfs + conn = hsfs.connection( + 'my_instance', # DNS of your Feature Store instance + 443, # Port to reach your Hopsworks instance, defaults to 443 + 'my_project', # Name of your Hopsworks Feature Store project + api_key_file='featurestore.key', # The file containing the API key generated above + hostname_verification=True) # Disable for self-signed certificates + ) + fs = conn.get_feature_store() # Get the project's default feature store + ``` + Clients in external clusters need to connect to the Hopsworks Feature Store using an API key. The API key is generated inside the Hopsworks platform, and requires at least the "project" and "featurestore" scopes to be able to access a feature store.