Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs improvements and fixes #164

Merged
merged 1 commit into from
Nov 19, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 6 additions & 4 deletions docs/hopsworksai/azure/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,13 +64,15 @@ At this point, you might get the following error message.
This means that your Azure user does not have sufficient permissions to add the service principal.
In this case, please ask your Azure administrator to add it for you or give you the required permissions.

```bash
$ az ad sp create --id d4abcc44-2c40-40bd-9bba-986df591c28f
```
!!! error

```bash
az ad sp create --id d4abcc44-2c40-40bd-9bba-986df591c28f
```

!!! note
When using this permission, the backing application of the service principal being created must in the local tenant.


### Step 1.2: Creating a custom role for Hopsworks.ai

Proceed to the Azure Portal and open either a *Subscription* or *Resource Group* that you want to use for Hopsworks.ai.
Expand Down
17 changes: 17 additions & 0 deletions docs/integrations/databricks/api_key.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,23 @@ In Hopsworks, click on your *username* in the top-right corner and select *Setti
!!! info
You are only ably to retrieve the API Key once. If you miss to copy it to your clipboard, delete it again and create a new one.

## Quickstart API Key File

!!! hint "Save API Key as File"
To get started quickly, without saving the Hopsworks API in a secret storage, you can simply create a file with the previously created Hopsworks API Key and place it on the environment from which you wish to connect to the Hopsworks Feature Store. That is either save it on the Databricks File System (DBFS) or in your Databricks workspace.

You can then connect by simply passing the path to the key file when instantiating a connection:
```python hl_lines="6"
import hsfs
conn = hsfs.connection(
'my_instance', # DNS of your Feature Store instance
443, # Port to reach your Hopsworks instance, defaults to 443
'my_project', # Name of your Hopsworks Feature Store project
api_key_file='featurestore.key', # The file containing the API key generated above
hostname_verification=True) # Disable for self-signed certificates
)
fs = conn.get_feature_store() # Get the project's default feature store
```

## Storing the API Key

Expand Down
20 changes: 19 additions & 1 deletion docs/integrations/databricks/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,20 @@ Users can register a new Databricks instance by navigating to the `Integrations`

The instance address should be in the format `[UUID].cloud.databricks.com` (or `adb-[UUID].19.azuredatabricks.net` for Databricks on Azure), essentially the same web address used to reach the Databricks instance from the browser.

<p align="center">
<figure>
<a href="../../../assets/images/databricks/databricks-integration.png">
<img src="../../../assets/images/databricks/databricks-integration.png" alt="Register a Databricks Instance along with a Databricks API Key">
</a>
<figcaption>Register a Databricks Instance along with a Databricks API Key</figcaption>
</figure>
</p>

The API Key will be stored in the Hopsworks secret store for the user and will be available only for that user. If multiple users need to configure Databricks clusters, each has to generate an API Key and register an instance. The Databricks instance registration does not have a project scope, meaning that once registered, the user can configure clusters for all projects they are part of.

## Databricks Cluster

A cluster needs to exists before users can configure it using the Hopsworks UI. The cluster can be in any state prior to the configuration.
A cluster needs to exist before users can configure it using the Hopsworks UI. The cluster can be in any state prior to the configuration.

!!! warning "Runtime limitation"

Expand All @@ -47,6 +56,15 @@ A cluster needs to exists before users can configure it using the Hopsworks UI.
Clusters are configured for a project user, which, in Hopsworks terms, means a user operating within the scope of a project.
To configure a cluster, click on the `Configure` button. By default the cluster will be configured for the user making the request. If the user doesn't have `Can Manage` privilege on the cluster, they can ask a project `Data Owner` to configure it for them. Hopsworks `Data Owners` are allowed to configure clusters for other project users, as long as they have the required Databricks privileges.

<p align="center">
<figure>
<a href="../../../assets/images/databricks/databricks-integration-cluster.png">
<img src="../../../assets/images/databricks/databricks-integration-cluster.png" alt="Configure a Databricks Cluster from Hopsworks">
</a>
<figcaption>Configure a Databricks Cluster from Hopsworks</figcaption>
</figure>
</p>

During the cluster configuration the following steps will be taken:

- Upload an archive to DBFS containing the necessary Jars for HSFS and HopsFS to be able to read and write from the Hopsworks Feature Store
Expand Down
6 changes: 3 additions & 3 deletions docs/integrations/databricks/networking.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,11 +206,11 @@ Wait for the peering to show up as *Connected*. There should now be bi-direction
</figure>
</p>

### Step 2: Configuring the Security Group
### Step 2: Configuring the Network Security Group

The Feature Store *Security Group* needs to be configured to allow traffic from your Databricks clusters to be able to connect to the Feature Store.
The *Network Security Group* of the Feature Store on Azure needs to be configured to allow traffic from your Databricks clusters to be able to connect to the Feature Store.

Ensure that ports *443*, *9083*, *9085*, *8020* and *50010* are reachable from the Databricks Security Group.
Ensure that ports *443*, *9083*, *9085*, *8020* and *50010* are reachable from the Databricks cluster *Network Security Group*.

!!! note "Hopsworks.ai"
If you deployed your Hopsworks Feature Store instance with Hopsworks.ai, it suffices to enable [outside access of the Feature Store and Online Feature Store services](../../hopsworksai/azure/getting_started/#step-5-outside-access-to-the-feature-store).
Expand Down
21 changes: 21 additions & 0 deletions python/hsfs/connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,27 @@ class Connection:
conn = hsfs.connection()
```

!!! hint "Save API Key as File"
To get started quickly, without saving the Hopsworks API in a secret storage,
you can simply create a file with the previously created Hopsworks API Key and
place it on the environment from which you wish to connect to the Hopsworks
Feature Store.

You can then connect by simply passing the path to the key file when
instantiating a connection:

```python hl_lines="6"
import hsfs
conn = hsfs.connection(
'my_instance', # DNS of your Feature Store instance
443, # Port to reach your Hopsworks instance, defaults to 443
'my_project', # Name of your Hopsworks Feature Store project
api_key_file='featurestore.key', # The file containing the API key generated above
hostname_verification=True) # Disable for self-signed certificates
)
fs = conn.get_feature_store() # Get the project's default feature store
```

Clients in external clusters need to connect to the Hopsworks Feature Store using an
API key. The API key is generated inside the Hopsworks platform, and requires at
least the "project" and "featurestore" scopes to be able to access a feature store.
Expand Down